Object Recognition Using a Neural Network and Invariant Zernike ...

cartcletchAI and Robotics

Oct 19, 2013 (4 years and 2 months ago)

75 views


Kumar Srijan (200602015)

Syed Ahsan(200601096)




Problem Statement



To create a Neural Networks based multiclass object
classifier which can do rotation, scale and translation
invariant object recognition.




Translation Invariance

Scale Invariance

Rotation Invariance

Translation, Rotation and Scale
Invariance

Solution


Normalize the image so that scaled and translated
images look the same.


Extract features from the images which are invariant to
rotation.


Create a classifier based on these features.








All these images are same if we consider scale and
translation invariance and are equal to:


This is accomplished by normalization with respect to first
two orders of
geometric moments.


Geometric Moment for an image is :





The zeroth order moment,
M
00
, represents the total mass
of the image.


The two first order moments, (
M
10
,
M
01
), provide the
position of the center of mass.



Translation Invariance


Translation invariance is achieved by transforming the
image into a new one whose first order moments,
M
10

and
M
01
, are both equal to zero
.


So, we transform the original image (f(x,y)) into:




f(x + x’, y + y’)


where x’= M
10
/M
00




and,


y’= M
01
/M
00

Scale Invariance


Enlarge or reduce the object such that its zeroth order
moment,
M
00
, is set equal to a predetermined value β.


This is done after making the image translation invariant.


Done by changing the image to a new function:




f(x/a,y/a)




where,




a=sqrt(β/M
00
)



Rotation Invariance


Zernike moments for an image is defined as:




Here, n represents the order and m represents the
repetition.


For n=5, the valid values of m are :
-
5,
-
3,
-
1,1,3 and 5



Rotation Invariance


Now suppose the image is rotated by an angle φ so,







Thus, |Z
nm
| can be taken as rotational invariant feature
of underlying image function.



Feature Extraction


Binarize the image first according to some threshold


Normalize it to make translation invariant and scale
invariant


Calculate Zernike moments of g(x,y) from 2
nd

order to
n
th

order (since, the 0
th

order moment is = β/π and 1
st

the
first order moments are = 0 for all images after making
them scale and translation invariant ).

Classification


It is done using a multi layer neural network.



In this case, we used only one hidden layer.



Back Propagation of error is used for learning.

Classifier Details


The activation function used at the output layer and the
hidden layer is Symmetric Sigmoid Function.


Sigmoid Function

Classifier Details


The symmetric sigmoid is simply the sigmoid that is stretched so
that the y range is 2 and then shifted down by 1 so that it ranges
between
-
1 and 1.


If f(x) is the standard sigmoid then the symmetric sigmoid is
g(x) = 2*f(x)


1. So, this becomes symmetric about the origin
.

Classifier Details


26 nodes in the output layer.


Number of hidden layer nodes can be varied.


Number of input layer nodes is equal to the length of
feature vector.

Training of the Classifier


We have used the Back Propagation Algorithm.


Initialize all
W
ij
’s

to small random values.


Present an input from class m and specify the desired
output. The desired output is
-
1 for all the output nodes
except the
m
th

node which is 1.


Calculate actual outputs of all the nodes using the present
value of
W
ij
’s
.


This is done by mapping the total input at the node
according to the symmetric sigmoid function.


Training of the Classifier


Find an error term,
δ
j

for all the nodes.


If
d
j

and
y
j

stand for desired and actual values of a node
respectively, for an output node.



and for a hidden layer node




Where, k is over all nodes in the layer above node j.


Training of the Classifier


Now, we adjust weights by:




where (n+1), (n), and (n
-
1) index next, present, and
previous respectively. α is a learning rate similar to step size
in gradient search algorithms.


ζ is a constant between 0 and 1 which determines the effect
of past weight changes on the current direction of
movement in weight space.


All the training inputs are presented cyclically until
weights stabilize.


Experimentation


We trained the classifier using 4 images of each of the letters of the
alphabet.


A sample of training data:




Zernike moments from 2
nd

to n
th

order were calculated for all images
and treated as feature vectors.

Testing


Similar kinds of images with translation, scale and
rotation imbalances were used for testing.


Some images used in testing:





A total of 104 images are 4 for each alphabet with
various scale, translation and rotation imbalances were
used as testing data.


Results


EXPERIMENT


I


Keeping number of features = 47 (2
nd



12
th
order).
Results for varying the number of hidden layer nodes.


Results

Results


EXPERIMENT


II


Keeping number of hidden nodes = 50 and varying the
length of feature vector.


Results

Inferences


Good classification can be achieved even by the use of one
hidden layer.


Use of very few hidden layer nodes can may lead to a very
bad classifier.


After a limit, there is a saturation in the amount of
performance one can get by increasing the number of
hidden layer nodes.


For very good classification, one must have sufficient
number of features.


Use of too many features does not guarantee better
classifier.