A NEURAL NETWORK FOR CALCULATING ADAPTIVE SHIFT AND ROTATION INVARIANT IMAGE FEATURES

prudencewooshAI and Robotics

Oct 19, 2013 (3 years and 10 months ago)

52 views

A NEURAL NETW ORK F OR
CALCULA TING AD APTIVE SHIFT AND R OT A TION
INV ARIANT IMA GE FEA TURES
Sabine Kr oner
T ec hnisc he Informatik I
T ec hnisc he Univ ersit at Ham burgHarburg
 Ham burg German y
T elF ax
      
email kroenertu ha rb urg d  de
ABSTRA CT
Shift and rotation in v arian t pattern recognition is usu
ally p erformed b y rst extracting in v arian t features from
the images and second classifying them This p oses the
problem of not only nding suitable features but also a
suitable classier
Here a structured in v arian t neural net w ork arc hitec
ture SINN is presen ted that p erforms adaptiv e in
v arian t feature extraction and classication sim ultane
ously  The net w ork is sparsely connected and uses
shared w eigh t v ectors As a result features esp ecially
w ell suited for a giv en application are calculated with
a computational complexit y of O  N  for N  
n
input
elemen ts Exp erimen ts sho w the recognition abilit y of
the in v arian t neural net w ork on syn thetic and real data
 In tro duction
Most image pro cessing systems for shift and rotation
in v arian t pattern recognition ac hiev e their in v ariance
prop ert y b y rst calculating in v arian t features and sec
ond classifying them with either standard classiers or
neural net w orks F or the feature extraction usually gen
eral metho ds lik e momen t in v arian ts or geometric in v ari
an ts are used
Ho w ev er the computation of the in v arian ts with these
metho ds turns out to b e v ery costly since the calculated
features are not adapted to the patterns in the presen t
application Often the separation qualit y of eac h feature
for the underlying pattern set is not kno wn in adv ance
this ma y lead to the calculation of n umerous unsuitable
features
The problem can b e o v ercome b y calculating adap
tiv e features whic h are esp ecially appropriate for a giv en
application Here a neural net w ork arc hitecture is pre
sen ted that p erforms this task F rom the patterns adap
tiv e shift and rotation in v arian t features are extracted
and classied sim ultaneously  The in v ariance prop ert y
is built in to the net w ork arc hitecture b y use of shared
w eigh t v ectors and a sparse connection structure
The pap er is organized as follo ws
First the arc hitec
ture of the structured in v arian t neural net w ork SINN
for the calculation of shift and rotation in v arian t im
age features is presen ted Then the realization of the
no de functions is explained together with a learning al
gorithm for the w eigh ts Exp erimen ts sho w the recog
nition abilit y of the in v arian t neural net w ork compared
to a standard feature extraction metho d using geomet
ric in v arian ts on syn thetic and real data Finally the
results are summarized in a conclusion
 The Net w ork Arc hitecture
The arc hitecture of the shift and rotation in v arian t neu
ral net w ork is a feedforw ard arc hitecture resem bling a
binary tree Its size is determined b y the n um b er of
input elemen ts
it has ld N la y ers for an input v ector
of N  
n
elemen ts The realization for shift in v ari
ance only corresp onds to one branc h in the connection
structure of the signal o w graph of the shift in v arian t
transforms of the class C
T  Fig sho ws the one
dimensional shift in v arian t net w ork arc hitecture The
f
￿
f
￿
f
￿
f
￿
f
￿
f
￿
f
￿
output
m
￿
m
￿
m
￿
m
￿
m
￿
m
￿
m
￿
m
￿
F  m 
input
pattern
la y er 
la y er 
la y er
la y er 
Figure
Arc hitecture of the structured shift in v arian t
neural net w ork for  input elemen ts
sparsely connected net w ork arc hitecture with only in
degree t w o of all no des is necessary for the in v ariance
f
i ￿
f
i ￿
f
i ￿
Figure 
Connection of four input elemen ts on la y er
i in a structured shift and  degree rotation in v arian t
net w ork
prop ert y of the net w ork output Another prerequisite
for the in v ariance is the symmetry of the no de functions
f
i
 Moreo v er all no des of the same la y er are coupled
This means that the no des of one la y er share all their
w eigh ts and also their transfer functions Therefore the
action of la y er i in this arc hitecture can completely b e
describ ed b y the no de function f
i
 Due to this deni
tion of the arc hitecture the net w ork is called structured
in v arian t neural net w ork SINN
The net w ork arc hitecture for onedimensional pat
terns can b e extended to an arc hitecture for the recog
nition of t w odimensional patterns With the connec
tion structure sho wn in Fig  additional in v ariance with
resp ect to rotations of m ultiples of  degrees is in
tegrated in to the same arc hitecture without an y extra
costs F or in v ariance with resp ect to shifts and general
rotations the net w ork has to b e implemen ted in sev eral
dieren t angular orien tations so that the net w orks co v er
regularly the full circle of  degrees Caused b y the
go o d generalization abilit y of the net w ork it is sucien t
for most applications to implemen t the t w odimensional
shift and degree rotation in v arian t net w ork in the
four angular orien tations of    and  de
grees After the input has b een pro cessed in parallel b y
the four net w orks the results are com bined to the nal
output again using the connection structure of Fig 
The in v ariance prop ert y of the arc hitecture is
ac hiev ed step wise It can b e sho wn that it increases
la y er b y la y er up to nal shift and rotation in v ariance
of the image   The pro of is based on results from the
theory of groups
 The No de F unctions
The main requiremen t for the no de functions in the
SINN is symmetry with resp ect to the t w o inputs It can
b e solv ed b y use of t win no des p erforming comm utativ e
calculations with resp ect to the w eigh ts and the input
argumen ts A p ossible realization of the no de function
f
i
in the neural no des of la y er i is sho wn in Fig  with
the equation
f
i
 u
i
 w
i ￿
 t
i
 w
i ￿
m
￿
 w
i ￿
m
￿
 w
i ￿
 
w
i ￿
 t
i
 w
i ￿
m
￿
 w
i ￿
m
￿
 w
i ￿
 
where w
ij
are the w eigh ts on the connectional links w
i ￿
is the threshold and t
i
and u
i
are hard limiter or sigmoid
transfer functions

w
i ￿
w
i ￿
w
i ￿
w
i ￿

w
i ￿
w
i ￿
w
i ￿
w
i ￿
f
i
t
i
t
i
u
i
no de function
m
￿
m
￿
Figure 
No de function f
i
When using hard limiter transfer functions only t w o
pattern classes can b e distinguished b y the net w ork out
put So for separating k classes at least k net w orks are
needed one for the recognition of eac h class The re
sult is a binary output v ector with a one in that en try
that corresp onds to the recognized pattern and zero in
all others
 The Learning Algorithm
Assume t w o pattern classes A and B The w eigh ts ha v e
to b e c hosen so that the p ercen tage of misclassied pat
terns of b oth classes A and B is minim ized in la y er 
So the error function E can b e form ulated as
E  min
￿ w eigh ts
￿
 incorrect patt class A
 patterns class A

 incorrect patt class B
 patterns class B
￿
  
F or the training of the w eigh ts a global learning al
gorithm lik e a bac kpropagation algorithm mo died for
the training of structured arc hitectures   can b e used
if dieren tiable transfer functions are giv en
Another p ossibilit y are lo cal optimization tec hniques
Suc h metho ds are applicable since the strong w eigh t
coupling extremely reduces the dimension of the input
space In fact the dimension of the input space is deter
mined b y the indegree of the no des in eac h la y er So in
stead of represen ting a N dimensional pattern as a p oin t
in a N dimensional input space it is equiv alen t with the
arc hitecture of a SINN to represen t it as N   p oin ts in
a Dinput space With the symmetric no de functions
all cyclic p erm utations of the input elemen ts can b e rep
resen ted in parallel in the same input space Dieren t
patterns are mark ed as dieren t sets of p oin ts in this
space The w eigh ts asso ciated with the no de function
of a la y er then determine the form of the decision line
b et w een the p oin ts of t w o pattern classes This means
that only the few parameters of one no de function ha v e
to b e adjusted on ev ery la y er
The net w ork training consists in successiv ely deter
mining the w eigh ts so that as man y p oin ts of patterns
of class A as p ossible are separated from those of class B
b y the decision line of the curren t la y er Binary trans
fer functions limit the n um b er of w eigh t com binations
of the w
i
 whic h result in dieren t outputs to six on
all la y ers i    i  n   It can b e sho wn that the
w eigh t com binations giving the minim um of the lo cal
error functions also lead to a minim um of the global er
ror function on la y er  On the rst la y er where the
input elemen ts are of the in terv al    the ev aluation
of more than six w eigh t com binations is suitable Here
also quadratic decision lines can b e applied
Once the w eigh ts are trained the computational com
plexit y to recognize unkno wn patterns is of order O  N 
 Exp erimen ts
First the robustness of the adaptiv e features calculated
with the neural net w ork with resp ect to statistical pat
tern distortions is in v estigated on syn thetic grey scale
images F our grey v alue images of size    pix
els are used Fig  one with constan t grey v alue the
others sho wing rectangles of dieren t size They are
all equal with resp ect to the grey v alue sum A total of
 noisy images are generated b y adding gaussian noise
with mean  and a v ariance leading to a xed signalto
noiseratio The recognition is p erformed with four shift
and 
￿
rotation in v arian t SINNs with n    ld 
￿
la y ers The w eigh ts are determined b y training the orig
inal patterns only 
The results are compared with the standard feature
extraction metho d of in v arian t in tegration   follo w ed
b y classication Here t w elv e dieren t shift and rotation
in v arian t grey scale features based on monomial s of up
to order three are used They are ev aluated with the
Ba y es classier under the assumption that the features
are classwise normally distributed the w eigh ted near
est neigh b our classier WNC ie the euclidean dis
tance w eigh ted with the v ariance of the features and
the nearest neigh b our classier NC ie the simple
euclidean distance resp ectiv ely  The training set for
the three classiers consisted of  images p er class in
cluding noisy images with dieren t SNR It has to b e
emphasized that no prepro cessing eg smo othing seg
men tation is p erformed on the images neither for the
SINN nor for the metho d of in v arian t in tegration
The upp er part of T able sho ws the recognition rates
Figure 
Syn thetic images Q  Q Q and Q of size

￿
with equal sum of grey v alues
for the dieren t signaltonoiseratios SNR ac hiev ed b y
using the SINN whereas the lo w er part sho ws the recog
nition rates for the three dieren t classiers based on
the in v arian t grey scale features T able sho ws that
due to the adaptivit y of the SINNs the classication of
the four patterns from Fig  is v ery robust with re
sp ect to statistical pattern distortions Using the grey
scale features together with the nearest neigh b our NC
or w eigh ted nearest classier WNC do es not allo w a
reliable separation of the classes Ho w ev er the Ba y es
classier together with the grey scale features also p er
mits the robust separation of the classes
In the second exp erimen t the in v ariance prop ert y of
the SINN is sho wn with resp ect to arbitrary shifts and
rotations Moreo v er the separation abilit y is demon
strated in comparison to the standard feature extraction
metho d F or this exp erimen t subimages of size   
from scanned grey scale images b elonging to  dieren t
classes are used Eac h class consists of  arbitrarily
shifted and rotated v ersions of the same ob ject Fig 
sho ws one example image of eac h class
The bac kground is mo died with gaussian noise so
that the mean v alue of the images is appro ximately the
same otherwise the separation problem could b e solv ed
just b y comparing the mean grey v alues Again only
one image of eac h class is used for training the four
dieren t structured shift and rotation in v arian t neural
net w orks In the rst la y er a quadratic decision line is
applied and in all other la y ers a linear one In ternally a
represen tation with four dieren t rotation angles is used
as describ ed in section  Ho w ev er these represen tations
do not coincide with all the o ccuring rotation angles
and shifted p ositions it can b e seen from the recognition
rates of the patterns of the test set in T able  that the
net w ork has a go o d generalization abilit y with resp ect
to general rotations and shifts
These results are compared with the same standard
feature extraction metho d of in v arian t in tegration as in
SNR
      
Q
      
Q
      
SINN
Q
      
Q
      
Q
      
Q
      
NC
Q
      
Q
      
Q
      
Q
      
WNC
Q
      
Q
      
Q
      
Q
      
Ba y es
Q
      
Q
      
T able
Recognition rates in p ercen t for the patterns
sho wn in Fig  for a structured shift and 
￿
rotation
in v arian t neural net w ork and for in v arian t grey scale fea
tures and three dieren t classiers
the rst exp erimen t Tw o sets of features consisting of
 features based on monomi als of up to order t w o and of
 features based on monomi al s of up to order three are
used Since the n um b er of images is to o small to train a
Ba y es classier recognition rates are only presen ted for
the nearest neigh b our NN classier s T able  The
training set consisted of three images p er class
The recognition rates sho w that a complete separation
of the images can only b e ac hiev ed b y use of monomi al s
of order three with the metho d of in v arian t in tegration
The computational costs for the calculation of these fea
tures is rather high esp ecially b ecause the selection of
features needed for separation is not kno wn in adv ance
Figure 
Scanned grey v alue images of apple p ear
m ushro om and tomato of size 
￿

apple p ear m ushr tomato
SINN
    
NC on  features
   
NC on  features
   
T able 
Recognition rates in p ercen t for the patterns
sho wn in Figure  for a structured shift and rotation
in v arian t neural net w ork and for  and  in v arian t grey
scale features
If a SINN is used instead go o d separation results are
ac hiev ed with m uc h lo w er costs Moreo v er there is no
need to nd a suitable classier
 Conclusion
A neural net w ork has b een presen ted for the calculation
and sim ultaneous classication of cyclic shift and rota
tion in v arian t image features The in v ariance prop ert y
is ac hiev ed b y a sparsely connected structured in v ari
an t neural net w ork arc hitecture with la y erwise coupled
no de functions During the learning pro cess the whole
net w ork is adapted to the c haracteristics of the pattern
set in a giv en application This reduces the total n um
b er of in v arian t features necessary for the separation
of the patterns and leads to b etter robustness with re
sp ect to global disturbances Moreo v er b y the sim ulta
neous adaptiv e feature extraction and classication the
usual problem of suitably c ho osing the feature extrac
tion metho d and the design of the classier to obtain
go o d recognition results is a v oided Presen t in v estiga
tions fo cus on the dev elopmen t of enhanced no de func
tions and the implemen tation of higher order c harac
teristics in to the net w ork arc hitecture
References
 H Burkhardt T r ansformationen zur lageinvari
anten Merkmalgewinnung  VDIF ortsc hrittb eric h t
Reihe  Nr  Oct 
 H Sc h ulzMirbac h Invariant fe atur es for gr ay sc ale
images   D A GM Symp osium Mustererk enn ung
 Sagerer G P osc h S Kummert F Ed S
  Springer V erlag Bielefeld Sept 
 S Kr oner H Sc h ulzMirbac h A daptive fe atur es for
p osition invariant p attern r e c o gnition   D A GM
Symp osium Mustererk enn ung  Sagerer G
P osc h S Kummert F Ed S  Springer
V erlag Bielefeld Sept 
 DE Rumelhart et al L e arning internal r epr esen
tations by err or pr op agation  in P arallel Distributed
Pro cessing v ol  c h  Rumelhart DE McClel
land JL Eds Cam bridge MA MIT Press