LTM and STM: constrained optimization to preserve previous

AI and Robotics

Oct 20, 2013 (4 years and 6 months ago)

98 views

LTM and STM: constrained optimization to preserve previous
knowledge, during neural network’s training

Preserving previous knowledge is a crucial problem in using neural network.
Consider a pre
-
trained NN: if we have to continue to train the network in
ne
w conditions, it will tend to forget the previous information. Here we will
formulate the problem of the new training as a constrained optimization, in
order to avoid the necessity to represent old trained set to the NN, preserving
acquired knowledge.

Fo
r the sake of simplicity we will consider a smooth single input scalar
function

to be approximated, using a feedforward NN.

Let us assume that the long
-
term memory training set is a precise set of
input/output samples
and that, at the same
p
points, derivatives’
information is also available; then the LTM training set will be
, where
. First we get a NN, which satisfies exactly the LTM
training set, determining th
e LTM network weights. Therefore the adjustable weights of
this NN are chosen, in order to satisfy the following equations:

[1]

[2]

where
contain the input and output weights, res
pectively and

and
b
,
the input and output bias, constitute the adjustable network parameters,
s
are the number
of sigmoids, present in the NN and
p

is the number of samples points.

Training the NN implies to h
ave found vectors
,
, and
scalar
b
LTM
, such that equations [1] and [2] are exactly satisfied.

Then, in order to have enough degrees of freedom (plasticity) to satisfy a new training set
(t
he so called short term memory

STM
-

training set), we will have to augment the NN,
trained for the LTM purposes, introducing new sigmoids. Suppose that the total number
of sigmoids of the augmented NN is
t
; the first
s

sigmoids will refer to the original
NN
and the remaining
, to the augmented part. Suppose that the STM training set is
constituted by a set of input/output samples
:
our objective is to train the
whole network, in order to minimize its mean squar
e error on the STM training set,
preserving previous knowledge: that implies that the NN will have to keep on matching
exactly input/output and derivatives values, on the LTM training set.

First we will set the adjustable network parameters of the augment
ed part equal to zero;
partitioning the weights vectors, into LTM and STM part, their representation
-

before
having trained them to mach the STM samples
-

will be:

,
,
. According to this partition, we can do the
same also for the output of the NN: it will be composed of the sum of two parts: the
output of the original NN, made by LTM connections (sigmoids: 1 to
s
) and the output of
the augmented part, made by STM
connections (sigmoids:
s+1

to
t
). With the same
methodology, we can distinguish from LTM equations (for both function and derivatives)
and STM equations. Therefore equations [1] and [2] become, respectively:

[3]

[4]

For simplicity of notations, let us assume:

[5]

[6]

and

[7]

[8]

Let us group LTM an
d STM weights in the matrices
and
S
respectively.

Therefore LTM equations take the form:

[9]

[10]

Making the same position on STM training set, we will have:

[11]

[12]

Hence the output of the NN, computed on STM inputs, is:

[13]

Assuming we will train the NN, in a batch
-
training mode, we c
an define the mean square
error function, on the STM training set, as:

[14]

Finally constraints’ equations, [3] and [4], are expressed in the usual form:

[15]

Therefore we can regard the NN tra
ining as a constrained optimization problem; where
the function to minimize is the mean square error function of the NN on the STM training
set, the adjustable parameters are the STM weights, and previous LTM knowledge is
preserved through constraints’ equ
ations [3] and [4], which will also be used to determine
LTM weights, as a function of the STM ones. A solution may be found through the
method of Lagrange multipliers, augmenting the function to be minimized by adjoining
the constraint equations, such tha
t a new cost function is defined as:

then we have to impose
, for optimal conditions.

Sketch of algorithm

Compute
L
(0)

by unconstrained training [equations. 3 and 4 are exactly matched]

Impose
S
(0)
=0 (
)

Compute

(at epoch zero it will be the error of LTM weights on STM training set)

Impose

Update STM weights (RPROP or LM)

Utilize equations [9] and [10] to compute LTM weights [

and

are known at this
point: compute
L
, such that

and
]

Compute again
and continue until tolerance is not met