# w 1

Τεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

123 εμφανίσεις

Project reminder

: Monday 16. 5. 11:00

Prepare
10 minutes long pesentation (in
Czech/Slovak), which you’ll present on
Wednesday 18. 5. 2011 during the Data
mining lecture/exercise.

Self
-
Organizing Map

(SOM)

Unsupervised
neural networks, equivalent
to
clustering.

Two
layers

input and output

The input layer represents the
input variables.

The output layer: neurons arranged in a single line
(one
-
dimensional) or a
two
-
dimensional grid.

Main feature

weights

Learning

Each
inputs through
the weights.

weight vector has the same dimensionality as

the input vector

The output of each neuron is its activation

weighted sum of inputs (i.e. linear activation
function).

2

w
11

w
21

u = x
1
w
11

+ x
2
w
21

The objective of learning: project high
-
dimensional data onto 1D or 2D output
neurons.

Each neuron

incrementally learns to represent
a
cluster

of data.

the
weights of neurons are called
codebook
vectors
(codebooks).

Competitive learning

The so
-
called
competitive learning

(winner
-
takes
-
all) .

Competitive learning will be demonstrated on
simple 1D network with two inputs.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

First, number of output neurons (i.e. clusters)
must be selected.

Not always known, do reasonable estimate, it is
better to use more, not used can be eliminated
later.

Then initialize weights.

e.g. small random values

Or randomly choose some input vectors and use
their values for the weights.

Then competitive learning can begin.

The activation for each output neuron is
calculated as weighted sum of inputs.

E.g. for the output neuron 1, its activation

u
1

= w
11
x
1

+ w
21
x
2
. Generally

Activation is the dot product between input
vector
x

and weight vector
w
j
.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Dot product is not only , but also

If |
x
| = |
w
j
| = 1, then
u
j

= cos
θ
.

The closer these two vectors are (i.e. the
smaller
θ

is), the bigger the
u
j

is (cos 0 = 1).

x

w

θ

Say it again, and loudly:

The closer the weight and input vectors are,
the bigger the neuron activation is. Dan na

A
simple measure of the closeness

Euclidean
distance between
x

and
w
j
.

Scale the input vector so that its length is
equal to one. |
x
|=1

An input is presented to the network.

Scale weight vectors of individual output
neurons to the unit length. |
w
|=1

Calculate, how close is input vector
x

to each
of weight vector
w
j

(
j

is 1 … # output neurons).

The neuron which codebook is closest to the
input vector becomes
winner

(
BMU
, Best
Matching Unit).

Its weights will be updated.

Weight update

The weight vector
w

is updated so that it
moves closer to the input
x
.

x

w

d

Δ
w

β

learning rate

Recursive vs. batch learning

Conceptually similar to online/batch learning

Recursive learning:

update weights of the winning neuron after each
presentation of input vector

Batch learning:

the weight update for each input vector is noted

the average weight adjustment for each output
neuron is done after the whole epoch

When to terminate learning?

mean distance between neurons and inputs they
represent is at a minimum

distance stops
changing

Example

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

epoch

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Topology is not

preserved.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Meet today’s hero

Teuvo Kohonen

Self
-
Organizing Maps

SOM, also Self
-
Organizing Feature Map
(SOFM), Kohonen neural network.

Inspired by the function of brain:

Different brain regions correspond to specific
aspects of
human activities.

These regions are organized such that tasks of
similar nature (e.g. speech and vision) are
controlled by regions that are in spatial proximity
each to other.

This is called
topology preservation
.

In SOM learning, not only the winner, but also
the
neighboring neurons

Neurons closer to the winner adjust weights
more
than farther neurons.

Thus we need

1.
to define
the size of neighborhood

2.
to define
a way how much neighboring neurons
their weights

Neighborhood definition

r

1

2

3

1

2

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

1

1

2

2

Training in SOM

Follows similar manner of standard winner
-
takes
-
all competitive training.

However, new rule
is used
for weight changes.

Suppose, that the BMU is at position {
i
win
,
j
win
} on
the 2D map.

Then all codebook vectors of BMU and neighbors
w

j

according to

where NS is the neighbor strength varying with
the distance to the BMU.
β

is learning rate.

Neighbor strength

When using neighbor features, all neighbor
codebooks are shifted towards the input
vector.

However, BMU updates most, and the farther
away the neighbor neuron is, the less its
weights update.

The NS
function tells us how
the weight
adjustment decays with distance from the
winner.

Slide by Johan Everts

Linear

Gaussian

Exponential

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

2D Side Effects

Shrinking neighborhood size

Large neighborhood

proper placement of neurons
in the initial stage to broadly represent spatial
organization of input data.

Further refinement

subsequent shrinking of the
neighborhood.

The size of large starting neighborhood is reduced
with iterations.

σ
0

… initial neighborhood size

σ
t

… neighborhood width at iteration
t

T

… total number of iterations
bringing neighborhood to zero (i.e.
only winner)

linear decay

exponential decay

Learning rate decay

The step length (learning rate
β
) is also reduce with
iterations.

Two common forms: linear or exponential decay

β
, decrease
1

T … constant bringing
β

to zero (or small value)

Weight update incorporating learning rate and neighborhood decay

Recursive/Batch Learning

Batch mode, no neigborhood

equivalent to
K
-
means

Neighbor incorporating

topology
preservation

Regions closer in input space are represented by
neurons closer in the map.

Two Phases of SOM Training

Two phases

1.
ordering

2.
convergence

Ordering

neighborhood and learning rate are reduced to small
values

topological ordering

start
β

high, gradually decrease, remain above 0.01

neighborhood

cover whole output layer

Convergence

fine tuning with the shrunk neighborhood

small non
-
zero (~0.01) learning rate, NS no more than
1
st

neighborghood

Example contd.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

neighborhood drops to 0 after 3 iterations

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

After 3 iterations

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

topology preservation takes effect very quickly

Complete training

Converged after 40 epochs.

Epochs

Complete training

All vectors have
found cluster
centers

Except one

more neuron

1

2

3

6

5

4

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Sandhya Samarasinghe, Neural Networks for Applied
Sciences and Engineering, 2006

1

2

3

6

5

4

7

2D output

Play with

http://www.neuroinformatik.ruhr
-
uni
-
bochum.de/VDM/research/gsn/DemoGNG/GNG.html

Self
-
organizing map

neighborhood size

learning rate

A self
-
organizing feature map from a square source space to a
square (grid) target space.

Duda, Hart, Stork, Pattern Classification, 2000

Some initial (random) weights and the particular sequence of
patterns (randomly chosen) lead to kinks in the map; even
extensive further training does not eliminate the kink. In such
cases, learning should be re
-
started with randomized weights and
possibly a wider window function and slower decay in learning.

Duda, Hart, Stork, Pattern Classification, 2000

2D maps on Multidimensional data

Iris data set

150 patterns, 4 attributes, 3 classes (Set

1, Vers

2, Virg

3)

more than 2 dimensions, so all data can not be
vizualized in a meaningful way

SOM can be used not only to cluster input data,
but also to exlpore the relationships between
different attributes.

SOM structure

8x8, hexagonal, exp decay of learning rate
β

(
β
init

=
0.5, T
max

= 20x150 = 3000), NS: Gaussian

What can be learned?

petal length and width have similar structure to the class panel

low length correlates with low width and these relate to class Versicolor

sepal width

very different pattern

class panel

boundary between Virginica and Setosa

classes overlap

setosa

versicolor

virginica

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Since we have class labels, we can assess the
classification accuracy of the map.

So first we train the map using all 150
patterns.

And then we present input patterns
aindividually again and note the winning
neuron.

The class to which the input belongs is the class
associated with this BMU codebook vector (see
previous slide, Class panel).

Only the winner decides classification.

Vers

100% accuracy

Set

86%

Virg

88%

Overall accuracy = 91.3%

Vers

100% accuracy

Set

90%

Virg

94%

Overall accuracy = 94.7%

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

U
-
matrix

Distance between the neighboring codebook
vectors can highlight different cluster regions in
the map and can be a useful visualization tool

Two neurons:
w
1
= {w
11
, w
21
, … w
n1
},
w
2
= {w
12
, w
22
, … w
n2
}

Euclidean distance between them

The average of the distance to the nearest
neighbors

unified distance
, U
-
matrix

The
larger the distance
between neurons, the
larger the U value and more
separated the clusters. The
lighter the color, the larger
the U value.

Large distance
between this
cluster (Iris versicolor) and the
middle cluster (Iris setosa). Large
distances between codebook
vectors indicate a sharp
boundary between the clusters
.

Surface graph

The
height represents the
distance
.

3
rd

row

large height =
separation

Other two clusters are not
separated.

Quantization error

Measure of the distance between codebook
vectors and inputs.

If for input vector
x

the winner is
w
c
, then
distortion error

e

can be calculated as

Comput
e

for all input vectors and get average

quantization error, average map distortion
error
E
.

Iris quantization error

High distortion error indicates areas where the codebook
vector is relatively far from the
inputs. Such
information can
be used to refine the map to obtain a more uniform distortion
error measure if a more faithful reproduction of the input
distribution from the map is desired.