ANALYSIS MULTI-LAYER PERCEPTRON BACKPROPAGATION

glibdoadingΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

70 εμφανίσεις

An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


1


An

ANALYSIS
of

MULTI
-
LAYER PERCEPTRON

with

BACKPROPAGATION



Abstract


Artificial neural networks are tools mostly used in machine learning and/or pattern recognition
branches of computer science. They are tools using a
computing

technique inspired from t
he basic
elements of the brain: neurons.

This technique is a
parametric o
ne and thus prior to learning. Hence
learning in artificial neural networks consists of adjusting these parameters by c
ertain methods (or
algorithms).

“Artificial neural networks can
be most adequately
characterized

as `computational
models`

with particular properties such as the ability to adapt or learn, to
generalize
, or to cluster or
organize

data, and which operation is based on parallel processing”
1
.


I simulated an ANN in two di
fferent
major tasks

and obtained results according to different initial
p
arameters. Deriving relations between the parameters and the results was difficult since in most of the
test cases the traini
ng ended with immediate success; a rapid convergence.


The

rest of this analysis report continues as follows: First part describes the criteria (the parameters)
which can be adjusted from the user interface and hence its effects can be tested. Second part describes
the inner structure of the network.
Third and fo
urth chapters include

the selected test results and the
comments on them. Appendix consists of the programs GUI and
an extra

list of test results.



1.

PROGRAM CAPABILITIES
(TESTED CRITERIA)


Neuro
-
Trainer (NEUTRA)
v
1.0 is an ANN
interface in which some
initial parameters
can be set and the specified network can be
trained. NEUTRA user interface is
explained in full detail in Appendix A.
NEUTRA has four major tasks in itself: It
can create a network, train the network,
test the network and display the tes
t
results.

The parameters set from the user
interface are below. However they will be
introduced in detail later.


The parameters which
are

set during the
creation process
are:


Output Layer:


The number code of the
output layer.

Input Layer:

The number co
de of the input
layer. It gives the number of the layers
with the “
Output

Layer” value.

Layer Sizes:

Neuron counts in the layers.
The numbers are introduced with commas
in between.

Bias:

The bias used in all of the neurons of
the system.

Alpha:

The coeffic
ient used in the
redefined error calculations.

Disable:

The neurons to be disabled from
the upper hidden layer.

Error:

The type of the error. Either
standard or redefined.

Direct Input Conn:

The input neurons are
directly connected to the output layer
neur
ons if checked.


Training parameters are:


Data Size:

The number of training loops.

Learning Rate:

The coefficient used in
backpropagation.

Momentum:

The coefficient that represents
the probability of path
-
changing in
training.

Training/Data:

The number th
at defines the
percentages of the data to be used in
training and testing. If the number is 100
than all the data is also used in testing.

It is
not active in the first task since it has only
4 different input patterns.

An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


2


NEUTRA’s evaluation functions and d
ata
generators are implemented for two major
tasks. First is a [2*2*1] network which
stands for the logic operator XOR. It is
also capable of simulating AND, OR,
NAND, NOR operators. The second is
[5*5*1] network which stands for ((O
1

AND O
2
) OR (O
3

AND O
5
)). The tests and
displaying functions and all methods
requiring a heuristic function is
specifically

designed
for an output layer of count 1.


Although NEUTRA is capable of doing
more (specifying the layer count,
momentum and etc.) I mostly focused on
the

differences in error types, biases and
disabling hidden layer neurons.



2.

NETWORK STRUCTURE


The network is a generic
full
-
connected
multi
-
layer perceptron with
backpropagation.

Its neurons are basically
McCulloch
-
Pitts neurons having evaluation
functio
ns and the network itself is the same
network introduced by Frank Rosenblatt
2
.
The characteristi
cs
of such a

network
can
be listed as

its evaluation function, its
weight updating rule, its error function and
so on. In NEUTRA these characteristics
are defin
ed accordingly:


For the i
th

neuron of the j
th

layer, the firing
function is:


where




The standard error

of
i
th

neuron of the j
th

layer

in a multi
-
layer perceptron is:



if
it is the output
layer,



otherwise;


where


, V is the value fired
by the neuron and W are the connection
weights.


Weight updating
rule

is:




where




Here μ represents momentum coefficient
whereas β represents
the
beta

parameter
.


The redefined error is:




There is no stopping criterion defined. The
train
in
g ends when the maximum number
of loops is reached. I did not im
plement a
stopping criterion because of practical
reasons. In almost every case the training
made successful adjustments in a very
small number of loops and this made it
unnecessary to define a stopping criterion.


When a neuron is disabled its connections

to the upper layers are disabled. When
‘Direct Input Connection’ is selected new
connections from input neurons to output
neurons are added.


For both of the tasks the inputs are either 1
or
-
1; so as the output values. It is basically
a design issue in o
rder to specify a
correspondence

between the neurons and
the evaluation function tanh which has a
range of [
-
1,1].


An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


3

Initial weights are randomized between
[0,1]. Data separation specified with
‘Training/Data’ parameter for training and
test phases is rando
mized too. This means,
in each training the data given to the
training and test phases will be different.



3.

RESULTS & COMMENTS


Tests are done in order to see the effects of
changes in some specific parameters as
mentioned before. These parameters are
t
he error type in the first task and the
Training/Data

percentage in the second
task. However, during the test procedures I
saw extraordinary effects of some other
parameters. Also I test
ed the first tasks
network stru
cture in other similar tasks
such as AN
D, OR problems. Those were to
test the structure only. The results of these
tests are fully present in Appendix B.


3.1

First Task


The first
class tests are

performed with 10
different
amounts

of training data (Figure
3.1). The data count is
(5,10,15,25,5
0,100,250,1000,5000,10000)
accordingly.
I used such smal
l

values since
the network converges to a successful limit
immediately.
The Error/Epochs

and
Tests/Epochs graphics are below (Figure
3.2, Figure 3.3).
Since t
here is not a
significant difference among

the other tests
of the
class considering these results, these
snapshots can also be conside
red as mean
values.







Figure 3.1:

Configurations of the first class tests.


An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


4


Figure 3.2:
Epochs / Tests graphics for data size of 50.



Figure 3.3:

Epochs /

Errors graphics for data size of 50.


The ‘jumps’ in the figure 3.3 points out the
oscillation due to the value of
β. As seen in
the figure 3.2, network gives the exact
solution only after the third loop.

The second class
tests are performed
similarly. T
he data size sets are the same
with the first class.
The only difference is
that the ‘Error Type’ parameter is set to
An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


5

“Redefined”.
The figures of the test results,
again corresponding to the fifth data size,
are below (Figure 3.4, Figure 3.5).



Figure 3
.4:

Epochs
/ Tes
ts graphics for data size of 50.



Figure 3.5:
Epochs / Errors graphics for data size of 50.


An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


6



Figure 3.6:

Output / Input graphics for the output layer of the network trained with data size of 50.





As seen in the figures, the
redef
inition

of
the error parameter has no significant
effect on the network, or it is impossible to
observe the difference since the training
converges in the third loop immediately.
The decision surfaces do not differ either
(Figure 3.6). This situation may h
ave
several reasons. Firstly, the simplicity of
the task and the structure of the network
do

not allow us to differentiate the changes in
this parameter. There is one neuron in the
output layer and there is no specific
heuristic function. Secondly the inpu
t
patterns are bipolarized i.e. they can have
only two values, which leads to a network
responding only such patterns.

However I
have to mention that increasing the α
coefficient leads the network to different
convergence states

(Figure 3.7)
. High
values f
or α (> 0.1) makes the network
unsuccessful in total.


3.2

Second Task


In the second task the main target is the
‘Trainin
g/Data’ parameter. The class
tests
are performed with three different values
for the training data percentages: 70, 90,
and 100

(Figur
e 3.
8
).


In the first phase, their differences are
figured separately (Figure 3.
9
, Figure 3.
10,
and Figure

3.11
). Since there is no
difference among the Epochs / Tests
graphics, they are not figured. Similar to
the first task, the network converges in the
second or the third loop (Figure B.1).



An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


7


Figure 3.7:

The convergence state of a large
α value (0.1)



Figure 3.8
:

Configurations of the second class tests.


An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


8


Figure 3.9
:

Epochs / Errors grap
hic for %70 training percentage with data size of 10000.



Figure 3.10
:

Epochs / Errors graphic for %90 training percentage

with data size of 10000
.




An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


9


Figure 3.1
1
:

Epochs / Errors graphic for %100 training percentage

with data size of 10000
.






The main idea which can be resolved from
these figures is that the network converges
rapidly when it is tested with the training
data (third case). Anothe
r
observation

is
that it converges in the first case more
rapidly than in the second case. This is
because the training data is smaller and the
network is able to adapt itself to the data
easier.



4.

FURTHER COMMENTS


Throughout the implementation and te
sting
processes some of the characteristics of the
tasks made it quite impossible to reach
successful observations. Most importantly
the effect of the
redefinition

of error
criterion cannot be observed due to the
simplicity of the problem. Network, as
ment
ioned before, rapidly converges.
Increasing the data size of training, the red
area in the Input/Output graphic shrinks to
some degree i.e. the network becomes
more and more stable. However it is not
sensitive to hope better results since the
input pattern
s are bipolarized.


I used relatively small values for data sizes
in the first task than the second one. The
reason was that the convergence of the first
task had been established in the early steps
of training. Choosing small values gave me
the chance to
observe small differences
between the cases (if there existed any).


Independently, I found out that this rapid
convergence of the network highly
depends on the choice of the bias. Setting
bias to 0 makes the network divergent.
Choosing bias too large dela
ys the
convergence, however choosing a small
value leads to different convergence states
(Figure B.2).



An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


10

The polarized character of the input
patterns shows itself in the testing phase.
All 9 types of the tests performed give the
same result in this phase:




This is ironically a deceptive situation
which impedes the convergence of the
network and the observation of the effects
of the different configurations of
parameters
. I believe that more heuristic
problems are more suitable to see the
effects of chan
ges in these parameters.


Kemal Taşkın

M.S. in Cognitive Sciences

METU
-

ANKARA

mail@kemaltaskin.net

http://www.kemaltaskin.net

18.03.2004
Figure 4.1:

Test results.



An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


i

APPENDIX


A.

NEUTRA 1.0
PROGRAM USER INTERFACE


Neuro
-
Trainer v1.0 is a multi
-
layer perceptron simulator (see 1) written in C#. Its user
interface contains fields for adjusting network parameters and starting simple network actions
(create, train, test,
display) (Figure A.1).



Figure A.1:
Standard interface of NEUTRA v1.0



The configurations panel was described before (see 1).


Tasks Panel:

There are two
radio buttons

in this panel, corresponding to XOR problem and
the second generic problem. The
combo

box
, which is active only when the first task is
selected, includes additional tasks mentioned before. When the second task is selected the
‘Training/Data’ parameter becomes active. It is not enabled in the first task though.


Actions:

This panel involves

three major tasks concerning the network. <Create> button
creates the network. <Train> button begins the training and <Delete> button deletes the
network in order to create networks for more tests.


Display:

This panel involves 4 buttons. <Weights> button displays the connection weights of
the network at that moment. <Tests> button displays the graphics window for test results
(Correct/All), <Errors> button displays the graphics window for drifts of the resul
ts and
<Decision Surfaces> displays a window corresponding to the decision surfaces of each upper
hidden layer.


Test:

This panel has two buttons on it. <Test> button tests the system with the specified
number of test data and displays the results.
It is a
lso possible to test a single input pattern
with <Ind. Test> button by writing the button into the text box left.


An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


ii

The source code and the executable
are

available in
http://thesis.kemaltaskin.net/
.



B.

FURTHE
R

TEST RESULTS & FIGURES



Figure B.1:

Epochs / Tests

graphics (common) for the second task.



Figure B.2:

Epochs / Tests graphics for the first task with bias 0.1

An ANALYSIS of MULTI
-
LAYER PERCEPTRON

with BACKPROPAGATION


iii



Figure B.3:

Epochs / Tests graphics for the first task with AND problem.


C.

REFERENCES


1:

Smagt
,

P. Van Der & Kröse
,

B.
(1996
)
,

An In
t
roduction to Neural Networks [p.13]

2:

Rosenblatt, F
.

(1958), The Perceptron: A Probabilistic Model for Information Storage
and Organization in the Brain

[p. 386
-
408]

3:

Kawaguchi, K. (2000), A Multi
threaded

Software Model f
or Backpropagati
on Neural
Network Appl
icati
ons

[Chp. 2.4.4]

4:

Bodén, M. (2001),
A Guide
t
o Recurrent Neural Networks
and

Backpropagation