A NEURAL NETW ORK F OR

CALCULA TING AD APTIVE SHIFT AND R OT A TION

INV ARIANT IMA GE FEA TURES

Sabine Kr oner

T ec hnisc he Informatik I

T ec hnisc he Univ ersit at Ham burgHarburg

Ham burg German y

T elF ax

email kroenertu ha rb urg d de

ABSTRA CT

Shift and rotation in v arian t pattern recognition is usu

ally p erformed b y rst extracting in v arian t features from

the images and second classifying them This p oses the

problem of not only nding suitable features but also a

suitable classier

Here a structured in v arian t neural net w ork arc hitec

ture SINN is presen ted that p erforms adaptiv e in

v arian t feature extraction and classication sim ultane

ously The net w ork is sparsely connected and uses

shared w eigh t v ectors As a result features esp ecially

w ell suited for a giv en application are calculated with

a computational complexit y of O N for N

n

input

elemen ts Exp erimen ts sho w the recognition abilit y of

the in v arian t neural net w ork on syn thetic and real data

In tro duction

Most image pro cessing systems for shift and rotation

in v arian t pattern recognition ac hiev e their in v ariance

prop ert y b y rst calculating in v arian t features and sec

ond classifying them with either standard classiers or

neural net w orks F or the feature extraction usually gen

eral metho ds lik e momen t in v arian ts or geometric in v ari

an ts are used

Ho w ev er the computation of the in v arian ts with these

metho ds turns out to b e v ery costly since the calculated

features are not adapted to the patterns in the presen t

application Often the separation qualit y of eac h feature

for the underlying pattern set is not kno wn in adv ance

this ma y lead to the calculation of n umerous unsuitable

features

The problem can b e o v ercome b y calculating adap

tiv e features whic h are esp ecially appropriate for a giv en

application Here a neural net w ork arc hitecture is pre

sen ted that p erforms this task F rom the patterns adap

tiv e shift and rotation in v arian t features are extracted

and classied sim ultaneously The in v ariance prop ert y

is built in to the net w ork arc hitecture b y use of shared

w eigh t v ectors and a sparse connection structure

The pap er is organized as follo ws

First the arc hitec

ture of the structured in v arian t neural net w ork SINN

for the calculation of shift and rotation in v arian t im

age features is presen ted Then the realization of the

no de functions is explained together with a learning al

gorithm for the w eigh ts Exp erimen ts sho w the recog

nition abilit y of the in v arian t neural net w ork compared

to a standard feature extraction metho d using geomet

ric in v arian ts on syn thetic and real data Finally the

results are summarized in a conclusion

The Net w ork Arc hitecture

The arc hitecture of the shift and rotation in v arian t neu

ral net w ork is a feedforw ard arc hitecture resem bling a

binary tree Its size is determined b y the n um b er of

input elemen ts

it has ld N la y ers for an input v ector

of N

n

elemen ts The realization for shift in v ari

ance only corresp onds to one branc h in the connection

structure of the signal o w graph of the shift in v arian t

transforms of the class C

T Fig sho ws the one

dimensional shift in v arian t net w ork arc hitecture The

f

f

f

f

f

f

f

output

m

m

m

m

m

m

m

m

F m

input

pattern

la y er

la y er

la y er

la y er

Figure

Arc hitecture of the structured shift in v arian t

neural net w ork for input elemen ts

sparsely connected net w ork arc hitecture with only in

degree t w o of all no des is necessary for the in v ariance

f

i

f

i

f

i

Figure

Connection of four input elemen ts on la y er

i in a structured shift and degree rotation in v arian t

net w ork

prop ert y of the net w ork output Another prerequisite

for the in v ariance is the symmetry of the no de functions

f

i

Moreo v er all no des of the same la y er are coupled

This means that the no des of one la y er share all their

w eigh ts and also their transfer functions Therefore the

action of la y er i in this arc hitecture can completely b e

describ ed b y the no de function f

i

Due to this deni

tion of the arc hitecture the net w ork is called structured

in v arian t neural net w ork SINN

The net w ork arc hitecture for onedimensional pat

terns can b e extended to an arc hitecture for the recog

nition of t w odimensional patterns With the connec

tion structure sho wn in Fig additional in v ariance with

resp ect to rotations of m ultiples of degrees is in

tegrated in to the same arc hitecture without an y extra

costs F or in v ariance with resp ect to shifts and general

rotations the net w ork has to b e implemen ted in sev eral

dieren t angular orien tations so that the net w orks co v er

regularly the full circle of degrees Caused b y the

go o d generalization abilit y of the net w ork it is sucien t

for most applications to implemen t the t w odimensional

shift and degree rotation in v arian t net w ork in the

four angular orien tations of and de

grees After the input has b een pro cessed in parallel b y

the four net w orks the results are com bined to the nal

output again using the connection structure of Fig

The in v ariance prop ert y of the arc hitecture is

ac hiev ed step wise It can b e sho wn that it increases

la y er b y la y er up to nal shift and rotation in v ariance

of the image The pro of is based on results from the

theory of groups

The No de F unctions

The main requiremen t for the no de functions in the

SINN is symmetry with resp ect to the t w o inputs It can

b e solv ed b y use of t win no des p erforming comm utativ e

calculations with resp ect to the w eigh ts and the input

argumen ts A p ossible realization of the no de function

f

i

in the neural no des of la y er i is sho wn in Fig with

the equation

f

i

u

i

w

i

t

i

w

i

m

w

i

m

w

i

w

i

t

i

w

i

m

w

i

m

w

i

where w

ij

are the w eigh ts on the connectional links w

i

is the threshold and t

i

and u

i

are hard limiter or sigmoid

transfer functions

w

i

w

i

w

i

w

i

w

i

w

i

w

i

w

i

f

i

t

i

t

i

u

i

no de function

m

m

Figure

No de function f

i

When using hard limiter transfer functions only t w o

pattern classes can b e distinguished b y the net w ork out

put So for separating k classes at least k net w orks are

needed one for the recognition of eac h class The re

sult is a binary output v ector with a one in that en try

that corresp onds to the recognized pattern and zero in

all others

The Learning Algorithm

Assume t w o pattern classes A and B The w eigh ts ha v e

to b e c hosen so that the p ercen tage of misclassied pat

terns of b oth classes A and B is minim ized in la y er

So the error function E can b e form ulated as

E min

w eigh ts

incorrect patt class A

patterns class A

incorrect patt class B

patterns class B

F or the training of the w eigh ts a global learning al

gorithm lik e a bac kpropagation algorithm mo died for

the training of structured arc hitectures can b e used

if dieren tiable transfer functions are giv en

Another p ossibilit y are lo cal optimization tec hniques

Suc h metho ds are applicable since the strong w eigh t

coupling extremely reduces the dimension of the input

space In fact the dimension of the input space is deter

mined b y the indegree of the no des in eac h la y er So in

stead of represen ting a N dimensional pattern as a p oin t

in a N dimensional input space it is equiv alen t with the

arc hitecture of a SINN to represen t it as N p oin ts in

a Dinput space With the symmetric no de functions

all cyclic p erm utations of the input elemen ts can b e rep

resen ted in parallel in the same input space Dieren t

patterns are mark ed as dieren t sets of p oin ts in this

space The w eigh ts asso ciated with the no de function

of a la y er then determine the form of the decision line

b et w een the p oin ts of t w o pattern classes This means

that only the few parameters of one no de function ha v e

to b e adjusted on ev ery la y er

The net w ork training consists in successiv ely deter

mining the w eigh ts so that as man y p oin ts of patterns

of class A as p ossible are separated from those of class B

b y the decision line of the curren t la y er Binary trans

fer functions limit the n um b er of w eigh t com binations

of the w

i

whic h result in dieren t outputs to six on

all la y ers i i n It can b e sho wn that the

w eigh t com binations giving the minim um of the lo cal

error functions also lead to a minim um of the global er

ror function on la y er On the rst la y er where the

input elemen ts are of the in terv al the ev aluation

of more than six w eigh t com binations is suitable Here

also quadratic decision lines can b e applied

Once the w eigh ts are trained the computational com

plexit y to recognize unkno wn patterns is of order O N

Exp erimen ts

First the robustness of the adaptiv e features calculated

with the neural net w ork with resp ect to statistical pat

tern distortions is in v estigated on syn thetic grey scale

images F our grey v alue images of size pix

els are used Fig one with constan t grey v alue the

others sho wing rectangles of dieren t size They are

all equal with resp ect to the grey v alue sum A total of

noisy images are generated b y adding gaussian noise

with mean and a v ariance leading to a xed signalto

noiseratio The recognition is p erformed with four shift

and

rotation in v arian t SINNs with n ld

la y ers The w eigh ts are determined b y training the orig

inal patterns only

The results are compared with the standard feature

extraction metho d of in v arian t in tegration follo w ed

b y classication Here t w elv e dieren t shift and rotation

in v arian t grey scale features based on monomial s of up

to order three are used They are ev aluated with the

Ba y es classier under the assumption that the features

are classwise normally distributed the w eigh ted near

est neigh b our classier WNC ie the euclidean dis

tance w eigh ted with the v ariance of the features and

the nearest neigh b our classier NC ie the simple

euclidean distance resp ectiv ely The training set for

the three classiers consisted of images p er class in

cluding noisy images with dieren t SNR It has to b e

emphasized that no prepro cessing eg smo othing seg

men tation is p erformed on the images neither for the

SINN nor for the metho d of in v arian t in tegration

The upp er part of T able sho ws the recognition rates

Figure

Syn thetic images Q Q Q and Q of size

with equal sum of grey v alues

for the dieren t signaltonoiseratios SNR ac hiev ed b y

using the SINN whereas the lo w er part sho ws the recog

nition rates for the three dieren t classiers based on

the in v arian t grey scale features T able sho ws that

due to the adaptivit y of the SINNs the classication of

the four patterns from Fig is v ery robust with re

sp ect to statistical pattern distortions Using the grey

scale features together with the nearest neigh b our NC

or w eigh ted nearest classier WNC do es not allo w a

reliable separation of the classes Ho w ev er the Ba y es

classier together with the grey scale features also p er

mits the robust separation of the classes

In the second exp erimen t the in v ariance prop ert y of

the SINN is sho wn with resp ect to arbitrary shifts and

rotations Moreo v er the separation abilit y is demon

strated in comparison to the standard feature extraction

metho d F or this exp erimen t subimages of size

from scanned grey scale images b elonging to dieren t

classes are used Eac h class consists of arbitrarily

shifted and rotated v ersions of the same ob ject Fig

sho ws one example image of eac h class

The bac kground is mo died with gaussian noise so

that the mean v alue of the images is appro ximately the

same otherwise the separation problem could b e solv ed

just b y comparing the mean grey v alues Again only

one image of eac h class is used for training the four

dieren t structured shift and rotation in v arian t neural

net w orks In the rst la y er a quadratic decision line is

applied and in all other la y ers a linear one In ternally a

represen tation with four dieren t rotation angles is used

as describ ed in section Ho w ev er these represen tations

do not coincide with all the o ccuring rotation angles

and shifted p ositions it can b e seen from the recognition

rates of the patterns of the test set in T able that the

net w ork has a go o d generalization abilit y with resp ect

to general rotations and shifts

These results are compared with the same standard

feature extraction metho d of in v arian t in tegration as in

SNR

Q

Q

SINN

Q

Q

Q

Q

NC

Q

Q

Q

Q

WNC

Q

Q

Q

Q

Ba y es

Q

Q

T able

Recognition rates in p ercen t for the patterns

sho wn in Fig for a structured shift and

rotation

in v arian t neural net w ork and for in v arian t grey scale fea

tures and three dieren t classiers

the rst exp erimen t Tw o sets of features consisting of

features based on monomi als of up to order t w o and of

features based on monomi al s of up to order three are

used Since the n um b er of images is to o small to train a

Ba y es classier recognition rates are only presen ted for

the nearest neigh b our NN classier s T able The

training set consisted of three images p er class

The recognition rates sho w that a complete separation

of the images can only b e ac hiev ed b y use of monomi al s

of order three with the metho d of in v arian t in tegration

The computational costs for the calculation of these fea

tures is rather high esp ecially b ecause the selection of

features needed for separation is not kno wn in adv ance

Figure

Scanned grey v alue images of apple p ear

m ushro om and tomato of size

apple p ear m ushr tomato

SINN

NC on features

NC on features

T able

Recognition rates in p ercen t for the patterns

sho wn in Figure for a structured shift and rotation

in v arian t neural net w ork and for and in v arian t grey

scale features

If a SINN is used instead go o d separation results are

ac hiev ed with m uc h lo w er costs Moreo v er there is no

need to nd a suitable classier

Conclusion

A neural net w ork has b een presen ted for the calculation

and sim ultaneous classication of cyclic shift and rota

tion in v arian t image features The in v ariance prop ert y

is ac hiev ed b y a sparsely connected structured in v ari

an t neural net w ork arc hitecture with la y erwise coupled

no de functions During the learning pro cess the whole

net w ork is adapted to the c haracteristics of the pattern

set in a giv en application This reduces the total n um

b er of in v arian t features necessary for the separation

of the patterns and leads to b etter robustness with re

sp ect to global disturbances Moreo v er b y the sim ulta

neous adaptiv e feature extraction and classication the

usual problem of suitably c ho osing the feature extrac

tion metho d and the design of the classier to obtain

go o d recognition results is a v oided Presen t in v estiga

tions fo cus on the dev elopmen t of enhanced no de func

tions and the implemen tation of higher order c harac

teristics in to the net w ork arc hitecture

References

H Burkhardt T r ansformationen zur lageinvari

anten Merkmalgewinnung VDIF ortsc hrittb eric h t

Reihe Nr Oct

H Sc h ulzMirbac h Invariant fe atur es for gr ay sc ale

images D A GM Symp osium Mustererk enn ung

Sagerer G P osc h S Kummert F Ed S

Springer V erlag Bielefeld Sept

S Kr oner H Sc h ulzMirbac h A daptive fe atur es for

p osition invariant p attern r e c o gnition D A GM

Symp osium Mustererk enn ung Sagerer G

P osc h S Kummert F Ed S Springer

V erlag Bielefeld Sept

DE Rumelhart et al L e arning internal r epr esen

tations by err or pr op agation in P arallel Distributed

Pro cessing v ol c h Rumelhart DE McClel

land JL Eds Cam bridge MA MIT Press

## Comments 0

Log in to post a comment