LTM and STM: constrained optimization to preserve previous

sciencediscussionΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

102 εμφανίσεις

LTM and STM: constrained optimization to preserve previous
knowledge, during neural network’s training

Preserving previous knowledge is a crucial problem in using neural network.
Consider a pre
trained NN: if we have to continue to train the network in
w conditions, it will tend to forget the previous information. Here we will
formulate the problem of the new training as a constrained optimization, in
order to avoid the necessity to represent old trained set to the NN, preserving
acquired knowledge.

r the sake of simplicity we will consider a smooth single input scalar

to be approximated, using a feedforward NN.

Let us assume that the long
term memory training set is a precise set of
input/output samples
and that, at the same
points, derivatives’
information is also available; then the LTM training set will be
, where
. First we get a NN, which satisfies exactly the LTM
training set, determining th
e LTM network weights. Therefore the adjustable weights of
this NN are chosen, in order to satisfy the following equations:



contain the input and output weights, res
pectively and

the input and output bias, constitute the adjustable network parameters,
are the number
of sigmoids, present in the NN and

is the number of samples points.

Training the NN implies to h
ave found vectors
, and
, such that equations [1] and [2] are exactly satisfied.

Then, in order to have enough degrees of freedom (plasticity) to satisfy a new training set
he so called short term memory


training set), we will have to augment the NN,
trained for the LTM purposes, introducing new sigmoids. Suppose that the total number
of sigmoids of the augmented NN is
; the first

sigmoids will refer to the original
and the remaining
, to the augmented part. Suppose that the STM training set is
constituted by a set of input/output samples
our objective is to train the
whole network, in order to minimize its mean squar
e error on the STM training set,
preserving previous knowledge: that implies that the NN will have to keep on matching
exactly input/output and derivatives values, on the LTM training set.

First we will set the adjustable network parameters of the augment
ed part equal to zero;
partitioning the weights vectors, into LTM and STM part, their representation

having trained them to mach the STM samples

will be:

. According to this partition, we can do the
same also for the output of the NN: it will be composed of the sum of two parts: the
output of the original NN, made by LTM connections (sigmoids: 1 to
) and the output of
the augmented part, made by STM
connections (sigmoids:

). With the same
methodology, we can distinguish from LTM equations (for both function and derivatives)
and STM equations. Therefore equations [1] and [2] become, respectively:



For simplicity of notations, let us assume:






Let us group LTM an
d STM weights in the matrices

Therefore LTM equations take the form:



Making the same position on STM training set, we will have:



Hence the output of the NN, computed on STM inputs, is:


Assuming we will train the NN, in a batch
training mode, we c
an define the mean square
error function, on the STM training set, as:


Finally constraints’ equations, [3] and [4], are expressed in the usual form:


Therefore we can regard the NN tra
ining as a constrained optimization problem; where
the function to minimize is the mean square error function of the NN on the STM training
set, the adjustable parameters are the STM weights, and previous LTM knowledge is
preserved through constraints’ equ
ations [3] and [4], which will also be used to determine
LTM weights, as a function of the STM ones. A solution may be found through the
method of Lagrange multipliers, augmenting the function to be minimized by adjoining
the constraint equations, such tha
t a new cost function is defined as:

then we have to impose
, for optimal conditions.

Sketch of algorithm


by unconstrained training [equations. 3 and 4 are exactly matched]

=0 (


(at epoch zero it will be the error of LTM weights on STM training set)


Update STM weights (RPROP or LM)

Utilize equations [9] and [10] to compute LTM weights [


are known at this
point: compute
, such that


Compute again
and continue until tolerance is not met