CLASSIFICATION OF IMAGES AS A PRE-PROCESSING STEP FOR IMAGE RETRIEVAL

overratedbeltΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

52 εμφανίσεις

CLASSIFICATION OF IMAGES AS A PRE
-
PROCESSING STEP FOR
IMAGE RETRIEVAL


Tat Loong Chan* & Ling Guan**


*Crux Cybernetics Pty Ltd, GPO Box 464, Sydney, NSW 2001, Australia

**Ryerson Polytechnic University, Toronto, Ont M5B 2K3, Canada


ABSTRACT


The indexi
ng and retrieval of images in a multimedia
database can be a very time consuming process,
considering the amount of visual data stored in digital
video libraries (DVLs). This paper proposes a semi
-
automated method to categorize images into groups as a
pre
-
processing step in order to enable faster retrieval
from a DVL. The wavelet transform is used to extract
low
-
level features such as colour, texture and shape
from images. The extracted features are employed to
categorize the images using a novel neural net
work, the
self
-
organizing tree map. The classification results are
then used to train a feed forward neural network to
identify the image classes an incoming query image
belongs to. Initial experiments show promising results.





1.

INTRODUCTION


The recent g
rowth in multimedia technology has
presented researchers with new challenges. One of the
most noticeable is the large volume of digital visual data
and the distributed nature of the image/video database
over the Internet. Therefore, execution time and
accu
racy in locating images/videos of interest has
become a critical issue in indexing and retrieval in
DVLs over the Internet. Intuitively, one of the solutions
to the above problem is to apply pattern classification
method as a pre
-
processing step in order t
o reduce the
computational complexity in retrieval.



The first attempt to apply pattern classification as a
pre
-
processing step in image retrieval was introduced in
[1]. It was proposed that knowledge of human
perception was explicitly incorporated i
nto the pattern
classification model. Experimental result showed the
feasibility of the method.

.

This paper proposes an alternative approach, which
implicitly incorporate human perception, to semi
-
automatically categorize images into groups as a pre
-
proc
essing step in order to enable faster retrieval from a
DVL. It follows the popular content
-
based image
retrieval model in [2],[3],[4], and extracts image
features such as colour, texture and shape in the wavelet
domain to represent the images. During the
construction
of the DVL, a novel neural network, the self
-
organizing
tree map (SOTM), is employed to cluster the images
into categories based on the wavelet features. This
procedure divides images in a DVL into different
classes. Then a feed forward neural

network (FFNN) is
trained to recognize the image. In retrieval, the same
wavelet features of a user defined query will first be
analysed by the FFNN to determine the subset of the
image collection in the designated DVL this query may
belong to. Then the c
lassical retrieval method will only
be carried out in the subset that the query has a high
probability of belonging to.


2.

THE PROPOSED SYSTEM


The proposed system has three components: the feature
extraction module, the image clustering module and the
im
age class identifier.


The feature extraction module, shown in Figure 1, is
responsible for performing low
-
level feature extraction
from images to obtain objective descriptions of images
using colour, texture and shape. Features are extracted
in the wavel
et domain.
















Figure 1: The Feature Extraction Module










Figure 2: The Clustering Module

Collection
of JPEG
images

Decompression,
conversion into
PPM format
and OPP
-
like
colour model

Wavelet
transform

File of
Extracted
Features

Class 1

Class 2

Class N

SOTM
Classifier


Extracted
Features
File


The clustering module, shown in Figure 2 above,
groups the images into classes based on the extracted
features obtained from the wavelet tran
sform. Clustering
is performed using the SOTM.


The result of clustering is used in training the
identification module, depicted in Figure 3.















Figure 3: The Image Identification Module



The FFNN is used to identify the class a query
image b
elongs to as shown in Figure 4.













Figure 4: The image class identification process



3.

FEATURE EXTRACTION


In feature extraction, a three
-
level wavelet transform is
performed on each image, resulting in one low pass sub
-
image and nine high pass an
d band pass directional sub
-
images.



Since an image is a two
-
dimensional array of
pixels, the one
-
dimensional (1
-
D) wavelet transform
must be extended to two dimensions (2
-
D) to work on
images. The 2
-
D wavelet transform is normally
performed using separab
le

products of a 1
-
D scaling
function and a 1
-
D wavelet function.


Four 2
-
D wavelet functions are constructed by
multiplying four 1
-
D wavelet functions (

) with four 1
-
D scaling functions (

) as below.




1

corresponds to a low pass

filter,

2

corresponds to variations in the vertical directions,

3

corresponds to variations in the horizontal directions
and

4

corresponds to variations along the diagonal
directions. In fact,

2
,

3

and


4

are directional band
pass filters.


In digita
l filter terminology, the wavelet
transform can be implemented using the following
equations.


(8)

*
*
(7)

*
*
(6)

*
*
(5)

*
*
3
2
1
1
I
G
G
D
I
H
G
D
I
G
H
D
I
H
H
L
y
x
y
x
y
x
y
x







*

is the convolution operator,
H

is the low pass filter,
G

is the band
-
pass filter and
I

is the 2
-
D image. L
1

is the
low pass sub
-
image

while D
1
, D
2

and D
3
are the
directional high pass sub
-
images.


Application of equations (5) to (8) to an image
results in the sub
-
images mentioned in the previous
paragraph. By applying the equations to the image 3
times, we end up with 1 low pass sub
-
ima
ge and 9 high
pas sub
-
images (the output low pass sub
-
image of each
transform is used as the input image for the next
transform).


The mean and variance for each sub
-
image is then
calculated. At each level, the wavelet transform is
performed independently

on the three colour planes of
the image. Hence, the feature vector for each image has
3 * 10 * 2 = 60 dimensions. Clustering of images is
performed in this 60 dimensional feature space. The
feature vector will also be used to train the FFNN in the
pre
-
pr
ocessing of image retrieval. In addition to this, the
feature vector could be used as one of the keys to
identify each image in a DVL.


The Daubechies wavelet filter coefficients were used
in the filters for feature extraction. The Daubechies
wavelets cons
ist of classes that range from the highly
Training
Feature
Vector
s


FFNN


FFNN
Outputs


Comparator


Cluster File


Wavelet

Transform

Multi

Layer
Perceptron


Class X

(4)

)
(
)
(
)
,
(
(3)

)
(
)
(
)
,
(
(2)

)
(
)
(
)
,
(
(1)

)
(
)
(
)
,
(
4
3
2
1
y
x
y
x
x
y
y
x
y
x
y
x
y
x
y
x
















localised in space to classes that are very smooth.
However, the purpose of classification in the system is
to act as a pre
-
processing step for retrieval. Hence there
was the need to avoid the creation of over
-
spec
ialised
classes. The Daubechies
-
8 class offers a good
compromise between the high localisation of the
Daubechies
-
2 class and the highly smooth Daubechies
-
10 class. The filter coefficients were obtained from [5].



4.

CLUSTERING USING THE SOTM


The division of

an image collection into subsets via
clustering allows search engine to look into a smaller
but relevant subspace of images, instead of the whole
DVL. The task of clustering is to group images that are
closely related to one another based on generic
chara
cteristics.


The task is performed by a neural network called the
Self
-
Organizing Tree Map (SOTM) [6], an extension to
the well known Self Organizing Map (SOM). The
SOTM is a tree structured SOM which combines the
best features of two popular classificatio
n (clustering)
algorithms, the
k
-
means

method and the SOM. The
SOTM has the ability to locate cluster centres like the
k
-
means

method, while preserving topological relations
like the SOM. In the SOTM, a tree
-
like structure is
mapped onto the two
-
dimension
al neuron lattice of the
SOM.


The extracted features from an image collection
obtained via the wavelet transform are stored in a file
and are used as input feature vectors for the SOTM.
User intervention is required in the clustering process
because the
SOTM is unable to dynamically determine
how many categories exist in an image collection. The
SOTM needs to know the number of clusters to be
formed before the clustering process begins.



For each image collection, clustering is performed
several times u
sing different number of classes. The
clustering results were then viewed in a web browser to
determine the number of classes that best suits a
particular image collection. The number of classes to be
formed should not be too large, because the creation of

over
-
specialised classes would result in poorer
performance by the MLP. Clustering an image
collection into a large number of classes would produce
classes with fewer images in each class compared to
classifying the same image collection into a smaller
nu
mber of classes. There will be fewer images per class
available to train the MLP to identify each class,
resulting in poorer class identification performance by
the MLP.


The final result of the clustering process is the
formation of groups of images that

are similar in terms
of their feature vectors. Each image in an image
collection will be assigned to a class. Hence, the whole
image set has been divided into smaller subsets, and an
image retrieval module can be directed to search for
images in particula
r classes.



5.

CLASS IDENTIFICATION USING THE
MLP


The purpose of the MLP in the system is to identify the
class (or classes) a user defined query image belongs to.
A search engine could then search the images that
belong in the identified class instead of s
earching the
entire image collection, resulting in faster image
retrieval, since only a smaller sub
-
space will be
searched by the standard retrieval process. The training
and testing data sets for the MLP are obtained from the
clustering results of the SOT
M. Each image in the
collection has been assigned to a class by the SOTM.
Five sixths of the images were randomly set aside as
training images and the remaining one sixth was used to
test the class identification abilities of the MLP.


After the training i
s complete, the testing consists of
presenting sample query images to the MLP. The output
of the MLP would be the class (or classes) in which the
sample query image belongs to. The matching class
consists of images that are similar to the query image,
from

the perspective of the feature vectors. A suitable
image retriever that can be used with this system would
be one that performs a search based on a sample query
image presented to it (for example, a “query by image
content” retriever).



6.

EXPERIMENTAL PRO
CEDURES AND
RESULTS


The experiment was performed on a Pentium II
-
233
system with 64MB of RAM running Red Hat Linux 6.2
with kernel 2.2.14
-
5.0 All code was written in ANSI C
and compiled using GNU’s gcc 2.91.66.


Images from the Mediagraphics Picture CD we
re
used in this project. Before the wavelet transform could
be applied to the images for feature extraction, the
images were converted from the compressed JPEG
format to the uncompressed Portable PixMap (PPM)
format. The PPM format was chosen because the
w
avelet transform could be applied directly onto the
colour planes of PPM images. After the images were
converted to the PPM format, the image colour space
was then transformed from RGB to the opponent colour
axes.


The Opponent Colour Space has a luminance

channel and two chrominance channels. The conversion
from the RGB colour space to OPP is given in equation
(9) .




The results of clustering by the SOTM show
that similar images were grouped into the same class.
Among the image collections used were the

outdoor
images, sunset images, texture images, nautical images
and architectural images from the Mediagraphics CD.
To increase the likelihood of retrieving an image from
the proper image sub
-
space, the three closest matching
classes, corresponding to the
three largest outputs of the
MLP, is passed on to the classical retrieval unit. Hence,
the MLP identifies the correct sub
-
space a query image
belongs to.



Image

Collection


Lowest percentage of correct
predictions made by MLP

Number of classes

1

2

3

Background

57.4

80.3

82.0

Textures

42.9

65.4

71.4

Mountain

48.5

63.6

72.7

Table 1: Worst results obtained by the MLP in image
identification performance




Image

Collection


Highest percentage of correct
predictions made by MLP

Number of classes

1

2

3

Background

73.8

86.9

95.1

Textures

54.3

82.9

91.4

Mountain

63.6

78.8

87.9

Table 2: Best results obtained by the MLP in image
identification performance



Quantity

Of Image
Type

Image Collection


Background

Textures

Mountains

Test
Images


61

34

3
3

Training
Images

302

170

164

Total
Images

363

204

197

Number of
classes

12

10

12

Table 3: The number of images and classes in the image
collections



The results in Table 1 and 2 show that the best
correct identification rate that can be achieved from

the
MLP using the largest output (one class) of the MLP is
73.8% (the Background image collection)s. If the two
and three largest outputs of the MLP are used, the best
results are increased to 86.9% and 95.1%, respectively.
Note that when three most likel
y classes are included in
the sub
-
space, the overall identification performance is


(95.1x363+91.4x204+87.9x197)/(363+204+197) = 92.3


The results show that including three classes
with the highest probability as the designated search
sub
-
space ensures tha
t over 90% of the relevant images
will go through further retrieval process, but the search
space is reduced to about 25% of the whole image
collection in the DVL. A reasonable compromise
between retrieval accuracy and speed can be made in an
image retriev
al system by selecting the number of
classes that are to be presented to the retriever for
searching. Hence, including more classes in the search
sub
-
space will further increase the likelihood of correct
retrieval result at the expense of a more computati
onally
intensive process.




7.

CONCLUSION AND FUTURE WORK


We have introduced a pre
-
processing method which is
able to keep a significant number of relevant images for
retrieval in a DVL, while substantially reducing the
computational time required to ret
rieve the images..
The wavelet transform has been shown to be effective as
a method of feature extraction from colour images for
image clustering using the SOTM. By using image
classification as a pre
-
processing step, the retrieval
process requires less co
mputational time because only a
smaller but relevant image subset in the whole image
collection needs to be searched after the pre
-
processing
step. Further work using an actual image retrieval
module needs to be done to verify this conclusion, but
the resu
lts from Tables 1 and 2 indicate that proposed
pre
-
processing step shows promise in reducing the
image retrieval time.


Future work would include using a classifier that can
dynamically determine the required number of classes to
be formed. To improve the
correct class identification
rate, modular neural networks or a committee machine
should replace the MLP in classification. The adoption
of the wavelet transform as the compression scheme in
the JPEG2000 standard provides an interesting
possibility of dire
ctly extracting the wavelet coefficients
from JPEG2000 images without the need to uncompress
the images and perform the wavelet transform, which
could make another reduction in retrieval time.





(9)


1
2
1
2
1
1
1
1
1
3
2
1


































B
G
R
C
C
C
8.

REFERENCES



[1] Zijun Yang and C.C Jay Kuo, “A Semantic
Cl
assification and Composite Indexing Approach to
Robust Image Retrieval”, International Conference on
Image Processing, Kobe, Japan, Oct. 25
-
28, 1999.


[2] M.K Mandal, T. Aboulnasr and S. Panchanathan,
“Image Indexing Using Moments and Wavelets”, IEEE
Tran
s on Consumer Electronics, Vol. 42, No. 3, August
1996



[3] Elif Albuz, Erturk Kocala and Ashfaq A. Khokhar,
“Scalable Image Indexing and Retrieval using
Wavelets”, IEEE Transactions on Knowledge and Data
Engineering, to appear 1999


[4] James Ze Wang, Gi
o Widerhold, Oscar Firschein
and Sha Xin Wei. “Wavelet
-
based Image Indexing
Techniques with Partial Sketch Retrieval Capability”,
Proceedings Fourth Forum on Research and Technology
Advances in Digital Libraries, 1997



[5] I. Daubechies, “Orthonormal base
s of compactly
supported wavelets”, Communications on Pure and
Applied Mathematics, vol. 41, pp 909
-
996, Institute for
Mathematics and Mechanics, New York University,
1988



[6] Haosong Kong and Ling Guan, “Self
-
organising
tree map for eliminating impulse
noise with random
intensity distributions”, Journal of Electronic Imaging,
Vol. 7(1), January 1998