1
jL
xed
j
X
l2L
xed
X
v2U
out
@ out
(l)
v
@ ex
(l)
u
:
Formal derivation:Apply chain rule.
@ out
v
@ ex
u
=
@ out
v
@ out
u
@ out
u
@ ex
u
=
@ out
v
@ net
v
@ net
v
@ out
u
@ out
u
@ ex
u
:
Simplication:Assume that the output function is the identity.
@ out
u
@ ex
u
= 1:
Christian Borgelt Introduction to Neural Networks 113
Sensitivity Analysis
For the second factor we get the general result:
@ net
v
@ out
u
=
@
@ out
u
X
p2pred(v)
w
vp
out
p
=
X
p2pred(v)
w
vp
@ out
p
@ out
u
:
This leads to the recursion formula
@ out
v
@ out
u
=
@ out
v
@ net
v
@ net
v
@ out
u
=
@ out
v
@ net
v
X
p2pred(v)
w
vp
@ out
p
@ out
u
:
However,for the rst hidden layer we get
@ net
v
@ out
u
= w
vu
;therefore
@ out
v
@ out
u
=
@ out
v
@ net
v
w
vu
:
This formula marks the start of the recursion.
Christian Borgelt Introduction to Neural Networks 114
Sensitivity Analysis
Consider as usual the special case with
 output function is the identity,
 activation function is logistic.
The recursion formula is in this case
@ out
v
@ out
u
= out
v
(1 out
v
)
X
p2pred(v)
w
vp
@ out
p
@ out
u
and the anchor of the recursion is
@ out
v
@ out
u
= out
v
(1 out
v
)w
vu
:
Christian Borgelt Introduction to Neural Networks 115
Demonstration Software:xmlp/wmlp
Demonstration of multilayer perceptron training:
 Visualization of the training process
 Biimplication and Exclusive Or,two continuous functions
 http://www.borgelt.net/mlpd.html
Christian Borgelt Introduction to Neural Networks 116
Multilayer Perceptron Software:mlp/mlpgui
Software for training general multilayer perceptrons:
 Command line version written in C,fast training
 Graphical user interface in Java,easy to use
 http://www.borgelt.net/mlp.html,http://www.borgelt.net/mlpgui.html
Christian Borgelt Introduction to Neural Networks 117
Radial Basis Function Networks
Christian Borgelt Introduction to Neural Networks 118
Radial Basis Function Networks
A radial basis function network (RBFN) is a neural network
with a graph G = (U;C) that satises the following conditions
(i) U
in
\U
out
=;;
(ii) C = (U
in
U
hidden
) [C
0
;C
0
 (U
hidden
U
out
)
The network input function of each hidden neuron is a distance function
of the input vector and the weight vector,i.e.
8u 2 U
hidden
:f
(u)
net
( ~w
u
;
~
in
u
) = d( ~w
u
;
~
in
u
);
where d:IR
n
IR
n
!IR
+
0
is a function satisfying 8~x;~y;~z 2 IR
n
:
(i) d(~x;~y) = 0,~x = ~y;
(ii) d(~x;~y) = d(~y;~x) (symmetry);
(iii) d(~x;~z)  d(~x;~y) +d(~y;~z) (triangle inequality):
Christian Borgelt Introduction to Neural Networks 119
Distance Functions
Illustration of distance functions
d
k
(~x;~y) =
0
@
n
X
i=1
(x
i
y
i
)
k
1
A
1
k
Well-known special cases from this family are:
k = 1:Manhattan or city block distance,
k = 2:Euclidean distance,
k!1:maximum distance,i.e.d
1
(~x;~y) = max
n
i=1
jx
i
y
i
j.
k = 1
k = 2
k!1
Christian Borgelt Introduction to Neural Networks 120
Radial Basis Function Networks
The network input function of the output neurons is the weighted sum of their
inputs,i.e.
8u 2 U
out
:f
(u)
net
( ~w
u
;
~
in
u
) = ~w
u
~
in
u
=
X
v2pred(u)
w
uv
out
v
:
The activation function of each hidden neuron is a so-called radial function,i.e.
a monotonously decreasing function
f:IR
+
0
![0;1] with f(0) = 1 and lim
x!1
f(x) = 0:
The activation function of each output neuron is a linear function,namely
f
(u)
act
(net
u
;
u
) = net
u

u
:
(The linear activation function is important for the initialization.)
Christian Borgelt Introduction to Neural Networks 121
Radial Activation Functions
rectangle function:
f
act
(net;) =

0;if net > ;
1;otherwise.
net
0
1

triangle function:
f
act
(net;) =

0;if net > ;
1 
net

;otherwise.
net
0
1

cosine until zero:
f
act
(net;) =
(
0;if net > 2,
cos
(

2
net
)
+1
2
;otherwise.
net
0
1
 2
1
2
Gaussian function:
f
act
(net;) = e

net
2
2
2
net
0
1
 2
e

1
2
e
2
Christian Borgelt Introduction to Neural Networks 122
Radial Basis Function Networks:Examples
Radial basis function networks for the conjunction x
1
^x
2
1
2
0
x
1
x
2
1
1
1
y
0 1
1
0
x
1
x
2
6
5
1
x
1
x
2
0
0
1
y
0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 123
Radial Basis Function Networks:Examples
Radial basis function networks for the biimplication x
1
$x
2
Idea:logical decomposition
x
1
$x
2
 (x
1
^x
2
) _:(x
1
_x
2
)
1
2
1
2
0
x
1
x
2
1
1
0
0
1
1
y
0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 124
Radial Basis Function Networks:Function Approximation
x
y
x
1
x
2
x
3
x
4
x
y
x
1
x
2
x
3
x
4
y
1
y
2
y
3
y
4
y
1
y
2
y
3
y
4
0
1
y
4
0
1
y
3
0
1
y
2
0
1
y
1
Christian Borgelt Introduction to Neural Networks 125
Radial Basis Function Networks:Function Approximation




0
x
.
.
.
x
1
x
2
x
3
x
4
.
.
.
.
.
.
.
.
.
y
1
y
2
y
3
y
4
.
.
.
.
.
.
y
 =
1
2
x =
1
2
(x
i+1
x
i
)
Christian Borgelt Introduction to Neural Networks 126
Radial Basis Function Networks:Function Approximation
x
y
x
1
x
2
x
3
x
4
x
y
x
1
x
2
x
3
x
4
y
1
y
2
y
3
y
4
y
1
y
2
y
3
y
4
0
1
0
1
0
1
0
1
!
!
!
!
!
!
a
a
a
a
a
a
y
4
!
!
!
!
!
!
a
a
a
a
a
a
y
3
!
!
!
!
!
!
a
a
a
a
a
a
y
2
!
!
!
!
!
!
a
a
a
a
a
a
y
1
Christian Borgelt Introduction to Neural Networks 127
Radial Basis Function Networks:Function Approximation
x
y
2
1
0
1
2 4 6 8
x
y
2
1
0
1
2 4 6 8
0
1
w
1
0
1
w
2
0
1
w
3
Christian Borgelt Introduction to Neural Networks 128
Radial Basis Function Networks:Function Approximation
Radial basis function network for a sum of three Gaussian functions
x
2
5
6
1
1
1
1
3
2
0
y
Christian Borgelt Introduction to Neural Networks 129
Training Radial Basis Function Networks
Christian Borgelt Introduction to Neural Networks 130
Radial Basis Function Networks:Initialization
Let L
xed
= fl
1
;:::;l
m
g be a xed learning task,
consisting of m training patterns l = (~{
(l)
;~o
(l)
).
Simple radial basis function network:
One hidden neuron v
k
,k = 1;:::;m,for each training pattern:
8k 2 f1;:::;mg:~w
v
k
=~{
(l
k
)
:
If the activation function is the Gaussian function,
the radii 
k
are chosen heuristically
8k 2 f1;:::;mg:
k
=
d
max
p
2m
;
where
d
max
= max
l
j
;l
k
2L
xed
d

~{
(l
j
)
;~{
(l
k
)

:
Christian Borgelt Introduction to Neural Networks 131
Radial Basis Function Networks:Initialization
Initializing the connections from the hidden to the output neurons
8u:
m
X
k=1
w
uv
m
out
(l)
v
m

u
= o
(l)
u
or abbreviated A ~w
u
= ~o
u
;
where ~o
u
= (o
(l
1
)
u
;:::;o
(l
m
)
u
)
>
is the vector of desired outputs,
u
= 0,and
A=
0
B
B
B
B
B
B
@
out
(l
1
)
v
1
out
(l
1
)
v
2
:::out
(l
1
)
v
m
out
(l
2
)
v
1
out
(l
2
)
v
2
:::out
(l
2
)
v
m
.
.
.
.
.
.
.
.
.
out
(l
m
)
v
1
out
(l
m
)
v
2
:::out
(l
m
)
v
m
1
C
C
C
C
C
C
A
:
This is a linear equation system,that can be solved by inverting the matrix A:
~w
u
= A
1
 ~o
u
:
Christian Borgelt Introduction to Neural Networks 132
RBFN Initialization:Example
Simple radial basis function network for the biimplication x
1
$x
2
x
1
x
2
y
0
0
1
1
0
0
0
1
0
1
1
1
1
2
1
2
1
2
1
2
0
x
1
x
2
0
0
1
0
0
1
1
1
w
1
w
2
w
3
w
4
y
Christian Borgelt Introduction to Neural Networks 133
RBFN Initialization:Example
Simple radial basis function network for the biimplication x
1
$x
2
A=
0
B
B
B
B
@
1 e
2
e
2
e
4
e
2
1 e
4
e
2
e
2
e
4
1 e
2
e
4
e
2
e
2
1
1
C
C
C
C
A
A
1
=
0
B
B
B
B
B
B
@
a
D
b
D
b
D
c
D
b
D
a
D
c
D
b
D
b
D
c
D
a
D
b
D
c
D
b
D
b
D
a
D
1
C
C
C
C
C
C
A
where
D = 1 4e
4
+6e
8
4e
12
+e
16
 0:9287
a = 1 2e
4
+e
8
 0:9637
b = e
2
+2e
6
e
10
 0:1304
c = e
4
2e
8
+e
12
 0:0177
~w
u
= A
1
 ~o
u
=
1
D
0
B
B
B
B
@
a +c
2b
2b
a +c
1
C
C
C
C
A

0
B
B
B
B
@
1:0567
0:2809
0:2809
1:0567
1
C
C
C
C
A
Christian Borgelt Introduction to Neural Networks 134
RBFN Initialization:Example
Simple radial basis function network for the biimplication x
1
$x
2
single basis function
x
2
x
1
1
1
0
1
2
1
0
1
2
act
all basis functions
x
2
x
1
1
1
0
1
2
1
0
1
2
act
output
x
2
x
1
1
1
0
1
2
1
0
1
2
y
(1,0)
 Initialization leads already to a perfect solution of the learning task.
 Subsequent training is not necessary.
Christian Borgelt Introduction to Neural Networks 135
Radial Basis Function Networks:Initialization
Normal radial basis function networks:
Select subset of k training patterns as centers.
A=
0
B
B
B
B
B
B
@
1 out
(l
1
)
v
1
out
(l
1
)
v
2
:::out
(l
1
)
v
k
1 out
(l
2
)
v
1
out
(l
2
)
v
2
:::out
(l
2
)
v
k
.
.
.
.
.
.
.
.
.
.
.
.
1 out
(l
m
)
v
1
out
(l
m
)
v
2
:::out
(l
m
)
v
k
1
C
C
C
C
C
C
A
A ~w
u
= ~o
u
Compute (Moore{Penrose) pseudo inverse:
A
+
= (A
>
A)
1
A
>
:
The weights can then be computed by
~w
u
= A
+
 ~o
u
= (A
>
A)
1
A
>
 ~o
u
Christian Borgelt Introduction to Neural Networks 136
RBFN Initialization:Example
Normal radial basis function network for the biimplication x
1
$x
2
Select two training patterns:
 l
1
= (~{
(l
1
)
;~o
(l
1
)
) = ((0;0);(1))
 l
4
= (~{
(l
4
)
;~o
(l
4
)
) = ((1;1);(1))
1
2
1
2

x
1
x
2
1
1
0
0
w
1
w
2
y
Christian Borgelt Introduction to Neural Networks 137
RBFN Initialization:Example
Normal radial basis function network for the biimplication x
1
$x
2
A=
0
B
B
B
B
@
1 1 e
4
1 e
2
e
2
1 e
2
e
2
1 e
4
1
1
C
C
C
C
A
A
+
= (A
>
A)
1
A
>
=
0
B
@
a b b a
c d d e
e d d c
1
C
A
where
a  0:1810;b  0:6810;
c  1:1781;d  0:6688;e  0:1594:
Resulting weights:
~w
u
=
0
B
@

w
1
w
2
1
C
A = A
+
 ~o
u

0
B
@
0:3620
1:3375
1:3375
1
C
A:
Christian Borgelt Introduction to Neural Networks 138
RBFN Initialization:Example
Normal radial basis function network for the biimplication x
1
$x
2
basis function (0,0)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
basis function (1,1)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
output
y
1
0
0:36
(1,0)
 Initialization leads already to a perfect solution of the learning task.
 This is an accident,because the linear equation system is not over-determined,
due to linearly dependent equations.
Christian Borgelt Introduction to Neural Networks 139
Radial Basis Function Networks:Initialization
Finding appropriate centers for the radial basis functions
One approach:k-means clustering
 Select randomly k training patterns as centers.
 Assign to each center those training patterns that are closest to it.
 Compute new centers as the center of gravity of the assigned training patterns
 Repeat previous two steps until convergence,
i.e.,until the centers do not change anymore.
 Use resulting centers for the weight vectors of the hidden neurons.
Alternative approach:learning vector quantization
Christian Borgelt Introduction to Neural Networks 140
Radial Basis Function Networks:Training
Training radial basis function networks:
Derivation of update rules is analogous to that of multilayer perceptrons.
Weights from the hidden to the output neurons.
Gradient:
~
r
~w
u
e
(l)
u
=
@e
(l)
u
@ ~w
u
= 2(o
(l)
u
out
(l)
u
)
~
in
(l)
u
;
Weight update rule:
~w
(l)
u
= 

3
2
~
r
~w
u
e
(l)
u
= 
3
(o
(l)
u
out
(l)
u
)
~
in
(l)
u
(Two more learning rates are needed for the center coordinates and the radii.)
Christian Borgelt Introduction to Neural Networks 141
Radial Basis Function Networks:Training
Training radial basis function networks:
Center coordinates (weights from the input to the hidden neurons).
Gradient:
~
r
~w
v
e
(l)
=
@e
(l)
@ ~w
v
= 2
X
s2succ(v)
(o
(l)
s
out
(l)
s
)w
su
@ out
(l)
v
@ net
(l)
v
@ net
(l)
v
@ ~w
v
Weight update rule:
~w
(l)
v
= 

1
2
~
r
~w
v
e
(l)
= 
1
X
s2succ(v)
(o
(l)
s
out
(l)
s
)w
sv
@ out
(l)
v
@ net
(l)
v
@ net
(l)
v
@ ~w
v
Christian Borgelt Introduction to Neural Networks 142
Radial Basis Function Networks:Training
Training radial basis function networks:
Center coordinates (weights from the input to the hidden neurons).
Special case:Euclidean distance
@ net
(l)
v
@ ~w
v
=
0
@
n
X
i=1
(w
vp
i
out
(l)
p
i
)
2
1
A

1
2
( ~w
v

~
in
(l)
v
):
Special case:Gaussian activation function
@ out
(l)
v
@ net
(l)
v
=
@f
act
( net
(l)
v
;
v
)
@ net
(l)
v
=
@
@ net
(l)
v
e


net
(l)
v

2
2
2
v
= 
net
(l)
v

2
v
e


net
(l)
v

2
2
2
v
:
Christian Borgelt Introduction to Neural Networks 143
Radial Basis Function Networks:Training
Training radial basis function networks:
Radii of radial basis functions.
Gradient:
@e
(l)
@
v
= 2
X
s2succ(v)
(o
(l)
s
out
(l)
s
)w
su
@ out
(l)
v
@
v
:
Weight update rule:

(l)
v
= 

2
2
@e
(l)
@
v
= 
2
X
s2succ(v)
(o
(l)
s
out
(l)
s
)w
sv
@ out
(l)
v
@
v
:
Special case:Gaussian activation function
@ out
(l)
v
@
v
=
@
@
v
e


net
(l)
v

2
2
2
v
=

net
(l)
v

2

3
v
e


net
(l)
v

2
2
2
v
:
Christian Borgelt Introduction to Neural Networks 144
Radial Basis Function Networks:Generalization
Generalization of the distance function
Idea:Use anisotropic distance function.
Example:Mahalanobis distance
d(~x;~y) =
q
(~x ~y)
>

1
(~x ~y):
Example:biimplication
1
3
0
x
1
x
2
1
2
1
2
1
y
 =

9 8
8 9

0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 145
Learning Vector Quantization
Christian Borgelt Introduction to Neural Networks 146
Vector Quantization
Voronoi diagram of a vector quantization
 Dots represent vectors that are used for quantizing the area.
 Lines are the boundaries of the regions of points
that are closest to the enclosed vector.
Christian Borgelt Introduction to Neural Networks 147
Learning Vector Quantization
Finding clusters in a given set of data points
 Data points are represented by empty circles ().
 Cluster centers are represented by full circles ().
Christian Borgelt Introduction to Neural Networks 148
Learning Vector Quantization Networks
A learning vector quantization network (LVQ) is a neural network
with a graph G = (U;C) that satises the following conditions
(i) U
in
\U
out
=;;U
hidden
=;
(ii) C = U
in
U
out
The network input function of each output neuron is a distance function
of the input vector and the weight vector,i.e.
8u 2 U
out
:f
(u)
net
( ~w
u
;
~
in
u
) = d( ~w
u
;
~
in
u
);
where d:IR
n
IR
n
!IR
+
0
is a function satisfying 8~x;~y;~z 2 IR
n
:
(i) d(~x;~y) = 0,~x = ~y;
(ii) d(~x;~y) = d(~y;~x) (symmetry);
(iii) d(~x;~z)  d(~x;~y) +d(~y;~z) (triangle inequality):
Christian Borgelt Introduction to Neural Networks 149
Distance Functions
Illustration of distance functions
d
k
(~x;~y) =
0
@
n
X
i=1
(x
i
y
i
)
k
1
A
1
k
Well-known special cases from this family are:
k = 1:Manhattan or city block distance,
k = 2:Euclidean distance,
k!1:maximum distance,i.e.d
1
(~x;~y) = max
n
i=1
jx
i
y
i
j.
k = 1
k = 2
k!1
Christian Borgelt Introduction to Neural Networks 150
Learning Vector Quantization
The activation function of each output neuron is a so-called radial function,i.e.
a monotonously decreasing function
f:IR
+
0
![0;1] with f(0) = 1 and lim
x!1
f(x) = 0:
Sometimes the range of values is restricted to the interval [0;1].
However,due to the special output function this restriction is irrelevant.
The output function of each output neuron is not a simple function of the activation
of the neuron.Rather it takes into account the activations of all output neurons:
f
(u)
out
(act
u
) =
8
<
:
1;if act
u
= max
v2U
out
act
v
;
0;otherwise.
If more than one unit has the maximal activation,one is selected at randomto have
an output of 1,all others are set to output 0:winner-takes-all principle.
Christian Borgelt Introduction to Neural Networks 151
Radial Activation Functions
rectangle function:
f
act
(net;) =

0;if net > ;
1;otherwise.
net
0
1

triangle function:
f
act
(net;) =

0;if net > ;
1 
net

;otherwise.
net
0
1

cosine until zero:
f
act
(net;) =
(
0;if net > 2,
cos
(

2
net
)
+1
2
;otherwise.
net
0
1
 2
1
2
Gaussian function:
f
act
(net;) = e

net
2
2
2
net
0
1
 2
e

1
2
e
2
Christian Borgelt Introduction to Neural Networks 152
Learning Vector Quantization
Adaptation of reference vectors/codebook vectors
 For each training pattern nd the closest reference vector.
 Adapt only this reference vector (winner neuron).
 For classied data the class may be taken into account:
Each reference vector is assigned to a class.
Attraction rule (data point and reference vector have same class)
~r
(new)
= ~r
(old)
+(~x ~r
(old)
);
Repulsion rule (data point and reference vector have dierent class)
~r
(new)
= ~r
(old)
(~x ~r
(old)
):
Christian Borgelt Introduction to Neural Networks 153
Learning Vector Quantization
Adaptation of reference vectors/codebook vectors
~r
1
~r
2
~r
3
~x
d
d
attraction rule
~r
1
~r
2
~r
3
~x
d
d
repulsion rule
 ~x:data point,~r
i
:reference vector
  = 0:4 (learning rate)
Christian Borgelt Introduction to Neural Networks 154
Learning Vector Quantization:Example
Adaptation of reference vectors/codebook vectors
 Left:Online training with learning rate  = 0:1,
 Right:Batch training with learning rate  = 0:05.
Christian Borgelt Introduction to Neural Networks 155
Learning Vector Quantization:Learning Rate Decay
Problem:xed learning rate can lead to oscillations
Solution:time dependent learning rate
(t) = 
0

t
;0 <  < 1;or (t) = 
0
t

; > 0:
Christian Borgelt Introduction to Neural Networks 156
Learning Vector Quantization:Classied Data
Improved update rule for classied data
 Idea:Update not only the one reference vector that is closest to the data point
(the winner neuron),but update the two closest reference vectors.
 Let ~x be the currently processed data point and c its class.
Let ~r
j
and ~r
k
be the two closest reference vectors and z
j
and z
k
their classes.
 Reference vectors are updated only if z
j
6= z
k
and either c = z
j
or c = z
k
.
(Without loss of generality we assume c = z
j
.)
The update rules for the two closest reference vectors are:
~r
(new)
j
= ~r
(old)
j
+(~x ~r
(old)
j
) and
~r
(new)
k
= ~r
(old)
k
(~x ~r
(old)
k
);
while all other reference vectors remain unchanged.
Christian Borgelt Introduction to Neural Networks 157
Learning Vector Quantization:Window Rule
 It was observed in practical tests that standard learning vector quantization
may drive the reference vectors further and further apart.
 To counteract this undesired behavior a window rule was introduced:
update only if the data point ~x is close to the classication boundary.
\Close to the boundary"is made formally precise by requiring
min

d(~x;~r
j
)
d(~x;~r
k
)
;
d(~x;~r
k
)
d(~x;~r
j
)
!
> ;where  =
1 
1 +
:
 is a parameter that has to be specied by a user.
 Intuitively, describes the\width"of the window around the classication
boundary,in which the data point has to lie in order to lead to an update.
 Using it prevents divergence,because the update ceases for a data point once
the classication boundary has been moved far enough away.
Christian Borgelt Introduction to Neural Networks 158
Soft Learning Vector Quantization
Idea:Use soft assignments instead of winner-takes-all.
Assumption:Given data was sampled from a mixture of normal distributions.
Each reference vector describes one normal distribution.
Objective:Maximize the log-likelihood ratio of the data,that is,maximize
lnL
ratio
=
n
X
j=1
ln
X
~r2R(c
j
)
exp
0
@

(~x
j
~r)
>
(~x
j
~r)
2
2
1
A

n
X
j=1
ln
X
~r2Q(c
j
)
exp
0
@

(~x
j
~r)
>
(~x
j
~r)
2
2
1
A
:
Here  is a parameter specifying the\size"of each normal distribution.
R(c) is the set of reference vectors assigned to class c and Q(c) its complement.
Intuitively:at each data point the probability density for its class should be as large
as possible while the density for all other classes should be as small as possible.
Christian Borgelt Introduction to Neural Networks 159
Soft Learning Vector Quantization
Update rule derived from a maximum log-likelihood approach:
~r
(new)
i
= ~r
(old)
i
+ 
8
>
<
>
:
u

ij
 (~x
j
~r
(old)
i
);if c
j
= z
i
,
u

ij
 (~x
j
~r
(old)
i
);if c
j
6= z
i
,
where z
i
is the class associated with the reference vector ~r
i
and
u

ij
=
exp(
1
2
2
(~x
j
~r
(old)
i
)
>
(~x
j
~r
(old)
i
))
X
~r2R(c
j
)
exp(
1
2
2
(~x
j
~r
(old)
)
>
(~x
j
~r
(old)
))
and
u

ij
=
exp(
1
2
2
(~x
j
~r
(old)
i
)
>
(~x
j
~r
(old)
i
))
X
~r2Q(c
j
)
exp(
1
2
2
(~x
j
~r
(old)
)
>
(~x
j
~r
(old)
))
:
R(c) is the set of reference vectors assigned to class c and Q(c) its complement.
Christian Borgelt Introduction to Neural Networks 160
Hard Learning Vector Quantization
Idea:Derive a scheme with hard assignments from the soft version.
Approach:Let the size parameter  of the Gaussian function go to zero.
The resulting update rule is in this case:
~r
(new)
i
= ~r
(old)
i
+ 
8
>
<
>
:
u

ij
 (~x
j
~r
(old)
i
);if c
j
= z
i
,
u

ij
 (~x
j
~r
(old)
i
);if c
j
6= z
i
,
where
u

ij
=
8
>
<
>
:
1;if ~r
i
= argmin
~r2R(c
j
)
d(~x
j
;~r);
0;otherwise,
u

ij
=
8
>
<
>
:
1;if ~r
i
= argmin
~r2Q(c
j
)
d(~x
j
;~r);
0;otherwise.
~r
i
is closest vector of same class ~r
i
is closest vector of dierent class
This update rule is stable without a window rule restricting the update.
Christian Borgelt Introduction to Neural Networks 161
Learning Vector Quantization:Extensions
 Frequency Sensitive Competitive Learning
 The distance to a reference vector is modied according to
the number of data points that are assigned to this reference vector.
 Fuzzy Learning Vector Quantization
 Exploits the close relationship to fuzzy clustering.
 Can be seen as an online version of fuzzy clustering.
 Leads to faster clustering.
 Size and Shape Parameters
 Associate each reference vector with a cluster radius.
Update this radius depending on how close the data points are.
 Associate each reference vector with a covariance matrix.
Update this matrix depending on the distribution of the data points.
Christian Borgelt Introduction to Neural Networks 162
Demonstration Software:xlvq/wlvq
Demonstration of learning vector quantization:
 Visualization of the training process
 Arbitrary datasets,but training only in two dimensions
 http://www.borgelt.net/lvqd.html
Christian Borgelt Introduction to Neural Networks 163
Self-Organizing Maps
Christian Borgelt Introduction to Neural Networks 164
Self-Organizing Maps
A self-organizing map or Kohonen feature map is a neural network with
a graph G = (U;C) that satises the following conditions
(i) U
hidden
=;,U
in
\U
out
=;,
(ii) C = U
in
U
out
.
The network input function of each output neuron is a distance function of
input and weight vector.The activation function of each output neuron is a radial
function,i.e.a monotonously decreasing function
f:IR
+
0
![0;1] with f(0) = 1 and lim
x!1
f(x) = 0:
The output function of each output neuron is the identity.
The output is often discretized according to the\winner takes all"principle.
On the output neurons a neighborhood relationship is dened:
d
neurons
:U
out
U
out
!IR
+
0
:
Christian Borgelt Introduction to Neural Networks 165
Self-Organizing Maps:Neighborhood
Neighborhood of the output neurons:neurons form a grid
quadratic grid
hexagonal grid
 Thin black lines:Indicate nearest neighbors of a neuron.
 Thick gray lines:Indicate regions assigned to a neuron for visualization.
Christian Borgelt Introduction to Neural Networks 166
Topology Preserving Mapping
Images of points close to each other in the original space
should be close to each other in the image space.
Example:Robinson projection of the surface of a sphere
-
 Robinson projection is frequently used for world maps.
Christian Borgelt Introduction to Neural Networks 167
Self-Organizing Maps:Neighborhood
Find topology preserving mapping by respecting the neighborhood
Reference vector update rule:
~r
(new)
u
= ~r
(old)
u
+(t)  f
nb
(d
neurons
(u;u

);%(t))  (~x ~r
(old)
u
);
 u

is the winner neuron (reference vector closest to data point).
 The function f
nb
is a radial function.
Time dependent learning rate
(t) = 
0

t

;0 < 

< 1;or (t) = 
0
t


;

> 0:
Time dependent neighborhood radius
%(t) = %
0

t
%
;0 < 
%
< 1;or %(t) = %
0
t

%
;
%
> 0:
Christian Borgelt Introduction to Neural Networks 168
Self-Organizing Maps:Examples
Example:Unfolding of a two-dimensional self-organizing map.
Christian Borgelt Introduction to Neural Networks 169
Self-Organizing Maps:Examples
Example:Unfolding of a two-dimensional self-organizing map.
Christian Borgelt Introduction to Neural Networks 170
Self-Organizing Maps:Examples
Example:Unfolding of a two-dimensional self-organizing map.
Training a self-organizing map may fail if
 the (initial) learning rate is chosen too small or
 or the (initial) neighbor is chosen too small.
Christian Borgelt Introduction to Neural Networks 171
Self-Organizing Maps:Examples
Example:Unfolding of a two-dimensional self-organizing map.
(a) (b) (c)
Self-organizing maps that have been trained with random points from
(a) a rotation parabola,(b) a simple cubic function,(c) the surface of a sphere.
 In this case original space and image space have dierent dimensionality.
 Self-organizing maps can be used for dimensionality reduction.
Christian Borgelt Introduction to Neural Networks 172
Demonstration Software:xsom/wsom
Demonstration of self-organizing map training:
 Visualization of the training process
 Two-dimensional areas and three-dimensional surfaces
 http://www.borgelt.net/somd.html
Christian Borgelt Introduction to Neural Networks 173
Hopeld Networks
Christian Borgelt Introduction to Neural Networks 174
Hopeld Networks
A Hopeld network is a neural network with a graph G = (U;C) that satises
the following conditions:
(i) U
hidden
=;,U
in
= U
out
= U,
(ii) C = U U f(u;u) j u 2 Ug.
 In a Hopeld network all neurons are input as well as output neurons.
 There are no hidden neurons.
 Each neuron receives input from all other neurons.
 A neuron is not connected to itself.
The connection weights are symmetric,i.e.
8u;v 2 U;u 6= v:w
uv
= w
vu
:
Christian Borgelt Introduction to Neural Networks 175
Hopeld Networks
The network input function of each neuron is the weighted sum of the outputs of
all other neurons,i.e.
8u 2 U:f
(u)
net
( ~w
u
;
~
in
u
) = ~w
u
~
in
u
=
X
v2Ufug
w
uv
out
v
:
The activation function of each neuron is a threshold function,i.e.
8u 2 U:f
(u)
act
(net
u
;
u
) =
(
1;if net
u
 ,
1;otherwise.
The output function of each neuron is the identity,i.e.
8u 2 U:f
(u)
out
(act
u
) = act
u
:
Christian Borgelt Introduction to Neural Networks 176
Hopeld Networks
Alternative activation function
8u 2 U:f
(u)
act
(net
u
;
u
;act
u
) =
8
>
<
>
:
1;if net
u
> ,
1;if net
u
< ,
act
u
;if net
u
= .
This activation function has advantages w.r.t.the physical interpretation
of a Hopeld network.
General weight matrix of a Hopeld network
W=
0
B
B
B
B
@
0 w
u
1
u
2
:::w
u
1
u
n
w
u
1
u
2
0:::w
u
2
u
n
.
.
.
.
.
.
.
.
.
w
u
1
u
n
w
u
1
u
n
:::0
1
C
C
C
C
A
Christian Borgelt Introduction to Neural Networks 177
Hopeld Networks:Examples
Very simple Hopeld network
0
0
x
1
x
2
u
1
u
2
1 1
y
1
y
2
W=

0 1
1 0
!
The behavior of a Hopeld network can depend on the update order.
 Computations can oscillate if neurons are updated in parallel.
 Computations always converge if neurons are updated sequentially.
Christian Borgelt Introduction to Neural Networks 178
Hopeld Networks:Examples
Parallel update of neuron activations
u
1
u
2
input phase
1
1
work phase
1
1
1
1
1
1
1
1
1
1
1
1
 The computations oscillate,no stable state is reached.
 Output depends on when the computations are terminated.
Christian Borgelt Introduction to Neural Networks 179
Hopeld Networks:Examples
Sequential update of neuron activations
u
1
u
2
input phase
1
1
work phase
1
1
1
1
1
1
1
1
u
1
u
2
input phase
1
1
work phase
1
1
1
1
1
1
1
1
 Regardless of the update order a stable state is reached.
 Which state is reached depends on the update order.
Christian Borgelt Introduction to Neural Networks 180
Hopeld Networks:Examples
Simplied representation of a Hopeld network
0
0
0
x
1
x
2
x
3
1 1
1 12
2
y
1
y
2
y
3
0
0
0
u
1
u
2
u
3
2
1
1
W=
0
B
@
0 1 2
1 0 1
2 1 0
1
C
A
 Symmetric connections between neurons are combined.
 Inputs and outputs are not explicitely represented.
Christian Borgelt Introduction to Neural Networks 181
Hopeld Networks:State Graph
Graph of activation states and transitions
+++
++ ++ ++
+ + +

u
1
u
2
u
3
u
2
u
3
u
1
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
3
u
1
u
1
u
2
u
3
Christian Borgelt Introduction to Neural Networks 182
Hopeld Networks:Convergence
Convergence Theorem:If the activations of the neurons of a Hopeld network
are updated sequentially (asynchronously),then a stable state is reached in a nite
number of steps.
If the neurons are traversed cyclically in an arbitrary,but xed order,at most n 2
n
steps (updates of individual neurons) are needed,where n is the number of neurons
of the Hopeld network.
The proof is carried out with the help of an energy function.
The energy function of a Hopeld network with n neurons u
1
;:::;u
n
is
E = 
1
2
~
act
>
W
~
act +
~

T
~
act
= 
1
2
X
u;v2U;u6=v
w
uv
act
u
act
v
+
X
u2U

u
act
u
:
Christian Borgelt Introduction to Neural Networks 183
Hopeld Networks:Convergence
Consider the energy change resulting from an update that changes an activation:
E = E
(new)
E
(old)
= ( 
X
v2Ufug
w
uv
act
(new)
u
act
v
+
u
act
(new)
u
)
 ( 
X
v2Ufug
w
uv
act
(old)
u
act
v
+
u
act
(old)
u
)
=

act
(old)
u
act
(new)
u

(
X
v2Ufug
w
uv
act
v
|
{z
}
= net
u

u
):
 net
u
< 
u
:Second factor is less than 0.
act
(new)
u
= 1 and act
(old)
u
= 1,therefore rst factor greater than 0.
Result:E < 0.
 net
u
 
u
:Second factor greater than or equal to 0.
act
(new)
u
= 1 and act
(old)
u
= 1,therefore rst factor less than 0.
Result:E  0.
Christian Borgelt Introduction to Neural Networks 184
Hopeld Networks:Examples
Arrange states in state graph according to their energy
+ + ++ ++
++ +
 +++
4
2
0
2
E
Energy function for example Hopeld network:
E = act
u
1
act
u
2
2 act
u
1
act
u
3
act
u
2
act
u
3
:
Christian Borgelt Introduction to Neural Networks 185
Hopeld Networks:Examples
The state graph need not be symmetric
1
1
1
u
1
u
2
u
3
2
2
2

+ +
++ ++
+ +++
++

7
1
1
3
5
E
Christian Borgelt Introduction to Neural Networks 186
Hopeld Networks:Physical Interpretation
Physical interpretation:Magnetism
A Hopeld network can be seen as a (microscopic) model of magnetism
(so-called Ising model,[Ising 1925]).
physical neural
atom neuron
magnetic moment (spin) activation state
strength of outer magnetic eld threshold value
magnetic coupling of the atoms connection weights
Hamilton operator of the magnetic eld energy function
Christian Borgelt Introduction to Neural Networks 187
Hopeld Networks:Associative Memory
Idea:Use stable states to store patterns
First:Store only one pattern ~x = (act
(l)
u
1
;:::;act
(l)
u
n
)
>
2 f1;1g
n
,n  2,
i.e.,nd weights,so that pattern is a stable state.
Necessary and sucient condition:
S(W~x 
~
 ) = ~x;
where
S:IR
n
!f1;1g
n
;
~x 7!~y
with
8i 2 f1;:::;ng:y
i
=
(
1;if x
i
 0,
1;otherwise.
Christian Borgelt Introduction to Neural Networks 188
Hopeld Networks:Associative Memory
If
~
 =
~
0 an appropriate matrix Wcan easily be found.It suces
W~x = c~x with c 2 IR
+
.
Algebraically:Find a matrix Wthat has a positive eigenvalue w.r.t.~x.
Choose
W= ~x~x
T
E
where ~x~x
T
is the so-called outer product.
With this matrix we have
W~x = (~x~x
T
)~x  E~x
|{z}
=~x
()
= ~x (~x
T
~x)
|
{z
}
=j~xj
2
=n
~x
= n~x ~x = (n 1)~x:
Christian Borgelt Introduction to Neural Networks 189
Hopeld Networks:Associative Memory
Hebbian learning rule [Hebb 1949]
Written in individual weights the computation of the weight matrix reads:
w
uv
=
8
>
>
>
<
>
>
>
:
0;if u = v,
1;if u 6= v,act
(p)
u
= act
(v)
u
,
1;otherwise.
 Originally derived from a biological analogy.
 Strengthen connection between neurons that are active at the same time.
Note that this learning rule also stores the complement of the pattern:
With W~x = (n 1)~x it is also W(~x) = (n 1)(~x):
Christian Borgelt Introduction to Neural Networks 190
Hopeld Networks:Associative Memory
Storing several patterns
Choose
W~x
j
=
m
X
i=1
W
i
~x
j
=
0
@
m
X
i=1
(~x
i
~x
T
i
)~x
j
1
A
mE~x
j
|
{z
}
=~x
j
=
0
@
m
X
i=1
~x
i
(~x
T
i
~x
j
)
1
A
m~x
j
If patterns are orthogonal,we have
~x
T
i
~x
j
=
(
0;if i 6= j,
n;if i = j,
and therefore
W~x
j
= (n m)~x
j
:
Christian Borgelt Introduction to Neural Networks 191
Hopeld Networks:Associative Memory
Storing several patterns
Result:As long as m< n,~x is a stable state of the Hopeld network.
Note that the complements of the patterns are also stored.
With W~x
j
= (n m)~x
j
it is also W(~x
j
) = (n m)(~x
j
):
But:Capacity is very small compared to the number of possible states (2
n
).
Non-orthogonal patterns:
W~x
j
= (n m)~x
j
+
m
X
i=1
i6=j
~x
i
(~x
T
i
~x
j
)
|
{z
}
\disturbance term"
:
Christian Borgelt Introduction to Neural Networks 192
Associative Memory:Example
Example:Store patterns ~x
1
= (+1;+1;1;1)
>
and ~x
2
= (1;+1;1;+1)
>
.
W= W
1
+W
2
= ~x
1
~x
T
1
+~x
2
~x
T
2
2E
where
W
1
=
0
B
B
B
B
@
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
1
C
C
C
C
A
;W
2
=
0
B
B
B
B
@
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
1
C
C
C
C
A
:
The full weight matrix is:
W=
0
B
B
B
B
@
0 0 0 2
0 0 2 0
0 2 0 0
2 0 0 0
1
C
C
C
C
A
:
Therefore it is
W~x
1
= (+2;+2;2;2)
>
and W~x
1
= (2;+2;2;+2)
>
:
Christian Borgelt Introduction to Neural Networks 193
Associative Memory:Examples
Example:Storing bit maps of numbers
 Left:Bit maps stored in a Hopeld network.
 Right:Reconstruction of a pattern from a random input.
Christian Borgelt Introduction to Neural Networks 194
Hopeld Networks:Associative Memory
Training a Hopeld network with the Delta rule
Necessary condition for pattern ~x being a stable state:
s(0 +w
u
1
u
2
act
(p)
u
2
+:::+w
u
1
u
n
act
(p)
u
n

u
1
) =act
(p)
u
1
;
s(w
u
2
u
1
act
(p)
u
1
+0 +:::+w
u
2
u
n
act
(p)
u
n

u
2
) =act
(p)
u
2
;
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
s(w
u
n
u
1
act
(p)
u
1
+w
u
n
u
2
act
(p)
u
2
+:::+0 
u
n
) =act
(p)
u
n
:
with the standard threshold function
s(x) =
(
1;if x  0,
1;otherwise.
Christian Borgelt Introduction to Neural Networks 195
Hopeld Networks:Associative Memory
Training a Hopeld network with the Delta rule
Turn weight matrix into a weight vector:
~w = ( w
u
1
u
2
;w
u
1
u
3
;:::;w
u
1
u
n
;
w
u
2
u
3
;:::;w
u
2
u
n
;
.
.
.
.
.
.
w
u
n1
u
n
;

u
1
;
u
2
;:::;
u
n
):
Construct input vectors for a threshold logic unit
~z
2
= (act
(p)
u
1
;0;:::;0;
|
{z
}
n 2 zeros
act
(p)
u
3
;:::;act
(p)
u
n
;:::0;1;0;:::;0
|
{z
}
n 2 zeros
):
Apply Delta rule training until convergence.
Christian Borgelt Introduction to Neural Networks 196
Demonstration Software:xhfn/whfn
Demonstration of Hopeld networks as associative memory:
 Visualization of the association/recognition process
 Two-dimensional networks of arbitrary size
 http://www.borgelt.net/hfnd.html
Christian Borgelt Introduction to Neural Networks 197
Hopeld Networks:Solving Optimization Problems
Use energy minimization to solve optimization problems
General procedure:
 Transform function to optimize into a function to minimize.
 Transform function into the form of an energy function of a Hopeld network.
 Read the weights and threshold values from the energy function.
 Construct the corresponding Hopeld network.
 Initialize Hopeld network randomly and update until convergence.
 Read solution from the stable state reached.
 Repeat several times and use best solution found.
Christian Borgelt Introduction to Neural Networks 198
Hopeld Networks:Activation Transformation
A Hopeld network may be dened either with activations 1 and 1 or with acti-
vations 0 and 1.The networks can be transformed into each other.
From act
u
2 f1;1g to act
u
2 f0;1g:
w
0
uv
= 2w

uv
and

0
u
= 

u
+
X
v2Ufug
w

uv
From act
u
2 f0;1g to act
u
2 f1;1g:
w

uv
=
1
2
w
0
uv
and


u
= 
0
u

1
2
X
v2Ufug
w
0
uv
:
Christian Borgelt Introduction to Neural Networks 199
Hopeld Networks:Solving Optimization Problems
Combination lemma:Let two Hopeld networks on the same set U of neurons
with weights w
(i)
uv
,threshold values 
(i)
u
and energy functions
E
i
= 
1
2
X
u2U
X
v2Ufug
w
(i)
uv
act
u
act
v
+
X
u2U

(i)
u
act
u
;
i = 1;2,be given.Furthermore let a;b 2 IR.Then E = aE
1
+bE
2
is the energy
function of the Hopeld network on the neurons in U that has the weights w
uv
=
aw
(1)
uv
+bw
(2)
uv
and the threshold values 
u
= a
(1)
u
+b
(2)
u
.
Proof:Just do the computations.
Idea:Additional conditions can be formalized separately and incorporated later.
Christian Borgelt Introduction to Neural Networks 200
Hopeld Networks:Solving Optimization Problems
Example:Traveling salesman problem
Idea:Represent tour by a matrix.
1
3 4
2
city
1 2 3 4
0
B
B
B
B
@
1 0 0 0
0 0 1 0
0 0 0 1
0 1 0 0
1
C
C
C
C
A
1:
2:
3:
4:
step
An element a
ij
of the matrix is 1 if the i-th city is visited in the j-th step and 0
otherwise.
Each matrix element will be represented by a neuron.
Christian Borgelt Introduction to Neural Networks 201
Hopeld Networks:Solving Optimization Problems
Minimization of the tour length
E
1
=
n
X
j
1
=1
n
X
j
2
=1
n
X
i=1
d
j
1
j
2
 m
ij
1
 m
(i mod n)+1;j
2
:
Double summation over steps (index i) needed:
E
1
=
X
(i
1
;j
1
)2f1;:::;ng
2
X
(i
2
;j
2
)2f1;:::;ng
2
d
j
1
j
2
 
(i
1
mod n)+1;i
2
 m
i
1
j
1
 m
i
2
j
2
;
where

ab
=
(
1;if a = b,
0;otherwise.
Symmetric version of the energy function:
E
1
= 
1
2
X
(i
1
;j
1
)2f1;:::;ng
2
(i
2
;j
2
)2f1;:::;ng
2
d
j
1
j
2
 (
(i
1
mod n)+1;i
2
+
i
1
;(i
2
mod n)+1
)  m
i
1
j
1
 m
i
2
j
2
:
Christian Borgelt Introduction to Neural Networks 202
Hopeld Networks:Solving Optimization Problems
Additional conditions that have to be satised:
 Each city is visited on exactly one step of the tour:
8j 2 f1;:::;ng:
n
X
i=1
m
ij
= 1;
i.e.,each column of the matrix contains exactly one 1.
 On each step of the tour exactly one city is visited:
8i 2 f1;:::;ng:
n
X
j=1
m
ij
= 1;
i.e.,each row of the matrix contains exactly one 1.
These conditions are incorporated by nding additional functions to optimize.
Christian Borgelt Introduction to Neural Networks 203
Hopeld Networks:Solving Optimization Problems
Formalization of rst condition as a minimization problem:
E

2
=
n
X
j=1
0
B
@
0
@
n
X
i=1
m
ij
1
A
2
2
n
X
i=1
m
ij
+1
1
C
A
=
n
X
j=1
0
@
0
@
n
X
i
1
=1
m
i
1
j
1
A
0
@
n
X
i
2
=1
m
i
2
j
1
A
2
n
X
i=1
m
ij
+1
1
A
=
n
X
j=1
n
X
i
1
=1
n
X
i
2
=1
m
i
1
j
m
i
2
j
2
n
X
j=1
n
X
i=1
m
ij
+n:
Double summation over cities (index i) needed:
E
2
=
X
(i
1
;j
1
)2f1;:::;ng
2
X
(i
2
;j
2
)2f1;:::;ng
2

j
1
j
2
 m
i
1
j
1
 m
i
2
j
2
2
X
(i;j)2f1;:::;ng
2
m
ij
:
Christian Borgelt Introduction to Neural Networks 204
Hopeld Networks:Solving Optimization Problems
Resulting energy function:
E
2
= 
1
2
X
(i
1
;j
1
)2f1;:::;ng
2
(i
2
;j
2
)2f1;:::;ng
2
2
j
1
j
2
 m
i
1
j
1
 m
i
2
j
2
+
X
(i;j)2f1;:::;ng
2
2m
ij
Second additional condition is handled in a completely analogous way:
E
3
= 
1
2
X
(i
1
;j
1
)2f1;:::;ng
2
(i
2
;j
2
)2f1;:::;ng
2
2
i
1
i
2
 m
i
1
j
1
 m
i
2
j
2
+
X
(i;j)2f1;:::;ng
2
2m
ij
:
Combining the energy functions:
E = aE
1
+bE
2
+cE
3
where
b
a
=
c
a
> 2 max
(j
1
;j
2
)2f1;:::;ng
2
d
j
1
j
2
:
Christian Borgelt Introduction to Neural Networks 205
Hopeld Networks:Solving Optimization Problems
From the resulting energy function we can read the weights
w
(i
1
;j
1
)(i
2
;j
2
)
= ad
j
1
j
2
 (
(i
1
mod n)+1;i
2
+
i
1
;(i
2
mod n)+1
)
|
{z
}
from E
1
2b
j
1
j
2
|
{z
}
from E
2
2c
i
1
i
2
|
{z
}
from E
3
and the threshold values:

(i;j)
= 0a
|{z}
from E
1
2b
|
{z
}
from E
2
2c
|
{z
}
from E
3
= 2(b +c):
Problem:Random initialization and update until convergence not always leads to
a matrix that represents a tour,leave alone an optimal one.
Christian Borgelt Introduction to Neural Networks 206
Recurrent Neural Networks
Christian Borgelt Introduction to Neural Networks 207
Recurrent Networks:Cooling Law
A body of temperature#
0
that is placed into an environment with temperature#
A
.
The cooling/heating of the body can be described by Newton's cooling law:
d#
dt
=
_
#= k(##
A
):
Exact analytical solution:
#(t) =#
A
+(#
0
#
A
)e
k(tt
0
)
Approximate solution with Euler-Cauchy polygon courses:
#
1
=#(t
1
) =#(t
0
) +
_
#(t
0
)t =#
0
k(#
0
#
A
)t:
#
2
=#(t
2
) =#(t
1
) +
_
#(t
1
)t =#
1
k(#
1
#
A
)t:
General recursive formula:
#
i
=#(t
i
) =#(t
i1
) +
_
#(t
i1
)t =#
i1
k(#
i1
#
A
)t
Christian Borgelt Introduction to Neural Networks 208
Recurrent Networks:Cooling Law
Euler{Cauchy polygon courses for dierent step widths:
t
#
#
A
#
0
0 5 10 15 20
t
#
#
A
#
0
0 5 10 15 20
t
#
#
A
#
0
0 5 10 15 20
t = 4
t = 2
t = 1
The thin curve is the exact analytical solution.
Recurrent neural network:
#(t
0
)#(t)
k#
A
t
kt
Christian Borgelt Introduction to Neural Networks 209
Recurrent Networks:Cooling Law
More formal derivation of the recursive formula:
Replace dierential quotient by forward dierence
d#(t)
dt

#(t)
t
=
#(t +t) #(t)
t
with suciently small t.Then it is
#(t +t) #(t) = #(t)  k(#(t) #
A
)t;
#(t +t) #(t) = #(t)  kt#(t) +k#
A
t
and therefore
#
i
#
i1
kt#
i1
+k#
A
t:
Christian Borgelt Introduction to Neural Networks 210
Recurrent Networks:Mass on a Spring
m
x
0
Governing physical laws:
 Hooke's law:F = cl = cx (c is a spring dependent constant)
 Newton's second law:F = ma = mx (force causes an acceleration)
Resulting dierential equation:
mx = cx or x = 
c
m
x:
Christian Borgelt Introduction to Neural Networks 211
Recurrent Networks:Mass on a Spring
General analytical solution of the dierential equation:
x(t) = asin(!t) +b cos(!t)
with the parameters
!=
r
c
m
;
a = x(t
0
) sin(!t
0
) + v(t
0
) cos(!t
0
);
b = x(t
0
) cos(!t
0
)  v(t
0
) sin(!t
0
):
With given initial values x(t
0
) = x
0
and v(t
0
) = 0 and
the additional assumption t
0
= 0 we get the simple expression
x(t) = x
0
cos

r
c
m
t

:
Christian Borgelt Introduction to Neural Networks 212
Recurrent Networks:Mass on a Spring
Turn dierential equation into two coupled equations:
_x = v and _v = 
c
m
x:
Approximate dierential quotient by forward dierence:
x
t
=
x(t +t) x(t)
t
= v and
v
t
=
v(t +t) v(t)
t
= 
c
m
x
Resulting recursive equations:
x(t
i
) = x(t
i1
) +x(t
i1
) = x(t
i1
) +t  v(t
i1
) and
v(t
i
) = v(t
i1
) +v(t
i1
) = v(t
i1
) 
c
m
t  x(t
i1
):
Christian Borgelt Introduction to Neural Networks 213
Recurrent Networks:Mass on a Spring
0
0
x(t
0
)
v(t
0
)
x(t)
v(t)
t
c
m
t
u
2
u
1
Neuron u
1
:f
(u
1
)
net
(v;w
u
1
u
2
) = w
u
1
u
2
v = 
c
m
t v and
f
(u
1
)
act
(act
u
1
;net
u
1
;
u
1
) = act
u
1
+net
u
1

u
1
;
Neuron u
2
:f
(u
2
)
net
(x;w
u
2
u
1
) = w
u
2
u
1
x = t x and
f
(u
2
)
act
(act
u
2
;net
u
2
;
u
2
) = act
u
2
+net
u
2

u
2
:
Christian Borgelt Introduction to Neural Networks 214
Recurrent Networks:Mass on a Spring
Some computation steps of the neural network:
t
v
x
0:0
0:0000
1:0000
0:1
0:5000
0:9500
0:2
0:9750
0:8525
0:3
1:4012
0:7124
0:4
1:7574
0:5366
0:5
2:0258
0:3341
0:6
2:1928
0:1148
x
t
1 2 3 4
 The resulting curve is close to the analytical solution.
 The approximation gets better with smaller step width.
Christian Borgelt Introduction to Neural Networks 215
Recurrent Networks:Dierential Equations
General representation of explicit n-th order dierential equation:
x
(n)
= f(t;x;_x;x;:::;x
(n1)
)
Introduce n 1 intermediary quantities
y
1
= _x;y
2
= x;:::y
n1
= x
(n1)
to obtain the system
_x = y
1
;
_y
1
= y
2
;
.
.
.
_y
n2
= y
n1
;
_y
n1
= f(t;x;y
1
;y
2
;:::;y
n1
)
of n coupled rst order dierential equations.
Christian Borgelt Introduction to Neural Networks 216
Recurrent Networks:Dierential Equations
Replace dierential quotient by forward distance to obtain the recursive equations
x(t
i
) = x(t
i1
) + t  y
1
(t
i1
);
y
1
(t
i
) = y
1
(t
i1
) + t  y
2
(t
i1
);
.
.
.
y
n2
(t
i
) = y
n2
(t
i1
) + t  y
n3
(t
i1
);
y
n1
(t
i
) = y
n1
(t
i1
) + f(t
i1
;x(t
i1
);y
1
(t
i1
);:::;y
n1
(t
i1
))
 Each of these equations describes the update of one neuron.
 The last neuron needs a special activation function.
Christian Borgelt Introduction to Neural Networks 217
Recurrent Networks:Dierential Equations
x
0
_x
0
x
0
.
.
.
x
(n1)
0
t
0
0
0
0
.
.
.

t
x(t)
t
t
t
t
Christian Borgelt Introduction to Neural Networks 218
Recurrent Networks:Diagonal Throw
y
x
y
0
x
0
'
v
0
cos'
v
0
sin'
Diagonal throw of a body.
Two dierential equations (one for each coordinate):
x = 0 and y = g;
where g = 9:81 ms
2
.
Initial conditions x(t
0
) = x
0
,y(t
0
) = y
0
,_x(t
0
) = v
0
cos'and _y(t
0
) = v
0
sin'.
Christian Borgelt Introduction to Neural Networks 219
Recurrent Networks:Diagonal Throw
Introduce intermediary quantities
v
x
= _x and v
y
= _y
to reach the system of dierential equations:
_x = v
x
;_v
x
= 0;
_y = v
y
;_v
y
= g;
from which we get the system of recursive update formulae
x(t
i
) = x(t
i1
) +t v
x
(t
i1
);v
x
(t
i
) = v
x
(t
i1
);
y(t
i
) = y(t
i1
) +t v
y
(t
i1
);v
y
(t
i
) = v
y
(t
i1
) t g:
Christian Borgelt Introduction to Neural Networks 220
Recurrent Networks:Diagonal Throw
Better description:Use vectors as inputs and outputs

~r = g~e
y
;
where ~e
y
= (0;1).
Initial conditions are ~r(t
0
) = ~r
0
= (x
0
;y
0
) and
_
~r(t
0
) =~v
0
= (v
0
cos';v
0
sin').
Introduce one vector-valued intermediary quantity ~v =
_
~r to obtain
_
~r =~v;
_
~v = g~e
y
This leads to the recursive update rules
~r(t
i
) = ~r(t
i1
) +t ~v(t
i1
);
~v(t
i
) = ~v(t
i1
) t g~e
y
Christian Borgelt Introduction to Neural Networks 221
Recurrent Networks:Diagonal Throw
Advantage of vector networks becomes obvious if friction is taken into account:
~a = ~v = 
_
~r
 is a constant that depends on the size and the shape of the body.
This leads to the dierential equation

~r = 
_
~r g~e
y
:
Introduce the intermediary quantity ~v =
_
~r to obtain
_
~r =~v;
_
~v = ~v g~e
y
;
from which we obtain the recursive update formulae
~r(t
i
) = ~r(t
i1
) +t ~v(t
i1
);
~v(t
i
) = ~v(t
i1
) t  ~v(t
i1
) t g~e
y
:
Christian Borgelt Introduction to Neural Networks 222
Recurrent Networks:Diagonal Throw
Resulting recurrent neural network:
~r
0
~v
0
~
0
tg~e
y
t
~r(t)
t
x
y
1 2 3
 There are no strange couplings as there would be in a non-vector network.
 Note the deviation from a parabola that is due to the friction.
Christian Borgelt Introduction to Neural Networks 223
Recurrent Networks:Planet Orbit

~r =  m
~r
j~r j
3
;)
_
~r =~v;
_
~v =  m
~r
j~r j
3
:
Recursive update rules:
~r(t
i
) = ~r(t
i1
) +t ~v(t
i1
);
~v(t
i
) = ~v(t
i1
) t m
~r(t
i1
)
j~r(t
i1
)j
3
;
~r
0
~v
0
~
0
~
0
~x(t)
~v(t)
t mt
x
y
1 0:5
0 0:5
0:5
Christian Borgelt Introduction to Neural Networks 224
Recurrent Networks:Backpropagation through Time
Idea:Unfold the network between training patterns,
i.e.,create one neuron for each point in time.
Example:Newton's cooling law
#(t
0
)
   
#(t)
1kt 1kt 1kt 1kt
Unfolding into four steps.It is  = k#
A
t.
 Training is standard backpropagation on unfolded network.
 All updates refer to the same weight.
 updates are carried out after rst neuron is reached.
Christian Borgelt Introduction to Neural Networks 225