CAP 6615 Neural Networks

foulchilianΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

74 εμφανίσεις



DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING






CAP 6615

Neural Networks


Rajesh Pydipati




















CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


4


Fall 2003

Introduction





The objective of taking this course was to get a clear understanding of the concepts involved in neural
network c
omputing so that the technology can be tailored to solve a plethora of real world problems with
wide ranging applications in various fields which are mentioned below. Emphasis was mainly on
programming to observe the working of the algorithms.




Course De
scription



Objectives:

Understand the concepts and learn the techniques of neural network computing.



Prerequisites:

A familiarity with basic concepts in calculus, linear algebra, and probability theory.
Calculus requirements include: differentiation, chain

rule, integration. Linear algebra requirements
include: matrix multiplication, inverse, pseudo
-
inverse.



Main topics:

Introduction to neural computational models including classification, association,
optimization, and self
-
organization. Learning and disc
overy. Knowledge
-
based neural network
design and algorithms.



Applications

include: pattern recognition, expert systems, control, signal analysis, and computer
vision.

Syllabus



Basic neural computational models



Feedforward networks



Learning / back propa
gation



Association networks



Classification



Self
-
Organization



Radial Basis Function networks



Support Vector Machines



Networks based on lattice computation



Applications



Projects:



A set of four projects were done as part of this course. A det
ailed description of each project with an
approach to the solution is presented next.










CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


5


Fall 2003

Project 1:


Problem statement:



Project 1a:

Implement the SLP learning algorithm. Implement the algorithm yourselves; do not use any
ANN package. Train your SL
P to classify the capital
letter patterns

A
,
B
,
C
, and
D
, in two classes,
C
1

and
-
C
1

as follows:
A

belongs to
C
1
;
B
,
C
, and
D

all belong to
-
C
1
. After training, t
est whether your SLP
correctly classifies the same four patterns. You may use either the unipolar or the bipolar version of the
patterns


Approach:



This pro
blem was to make us understand the basic working of a neural network circuit, the ‘perceptron’.

The problem involved, was to identify four different letter patterns which were fed as a stream of ‘0’s and
‘1’s to the network. After constructing the network
architecture, it was trained on some data. After training
was complete, some patterns were tested to test the efficacy of the algorithm. Algorithms were written in
MATLAB.


Results:



The network was able to perfectly classify the four letter patterns


Pro
ject 2a:

Implement an SLP to solve the following problem:



1.


Randomly choose 1000 points on either side of the line
y

= 0.5
x

+ 2. Do not choose points
exactly on the line. Also, pick the points between fixed bounds
b
1

<=
x

<=
b
2

such that
b
2
-
b
1

< 100, as s
hown in the figure below.

2.

Train the SLP to discriminate between the two classes of points. Use a sequential application
of the points. Then pick 5 test points (not from the training set!) from either side of the line
to test your SLP on. Do they classify
correctly? (The difficulty will be when the test points
are close to the line.) Print out the equation of the line obtained from the final weights. If the
5 points were not correctly classified, use them as additional training points and retrain the
networ
k. Then pick another 5 points and test again.


CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


6


Fall 2003



Approach:



This problem was to make us understand the working of a single layer perceptron in making dec
ision
surfaces to distinguish various clusters of data. The problem involved. The problem involved in classifying
the data points as belonging to two clusters by forming a hyper plane as a decision boundary. After
constructing the network architecture, it
was trained on some data. After training was complete, some data
points were tested to test the efficacy of the algorithm. Algorithms were written in MATLAB.


Results:



The network was able to perfectly classify the data points according to the decision b
oundary.



Project 2


Problem statement:


Project 2
is more of a research project and consists of implementing the backpropagation training algorithm
for multilayer perceptrons. Use
Fisher's Iris dataset

to train and test a multiclass MLP using the
backpropagation method. The dataset comprises 3 groups (classes) of 50 patterns each. One group
corresponds to one species of Iris flower. Every pattern has 4 real
-
valued f
eatures. The number of input and
output neurons is known; the number of hidden layers and hidden neurons are your choice.



Train your network using 13 exemplar patterns from each class (roughly 25% of the patterns)
picked at random. Then use the remaining
patterns in the dataset to test the network and report the
results.



Next, train your network using 25 exemplar patterns from each class (i.e., 50% of the patterns)
picked at random. Use the remaining patterns in the set to test the network and report the
results.



Next, train your network using 38 exemplar patterns from each class (roughly 75% of the patterns)
picked at random. Use the remaining patterns to test the network and report the results.



Finally, train your network on the entire pattern set. The
n use the same patterns to test the network
and report the results. Note that this last experiment may pose serious convergence problems.

Explore techniques such as momentum to increase convergence speed, try various network architectures
(number of hidde
n layers and neurons in the hidden layers), investigate various stopping criteria and ways
to adjust learning rate and other parameters you might have
.



Fisher's Iris Dataset



R.A. Fisher's
Iris dataset

is often referenced in the field of pattern recognition. It consists of 3 groups
(classes) of 50 patterns each. One group corresponds to one species of Iris flower: Iris Setosa (class
C
1
), Iris
Versicolor (class
C
2
), and Iris Verginica
(class
C
3
). Every pattern has 4 features (attributes), representing
petal width, petal length, sepal width, and sepal length (expressed in centimeters). The
dataset file

contains
one patter
n per line, starting with the class number, followed by the 4 features. Lines are terminated with
single LF characters. Patterns are grouped by class
.

CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


7


Fall 2003




Irise
s by Vincent van Gogh (oil on canvas 1889)

Approach:



A network based on back propagation principle was constructed to
train the network and classify t
he
various classes as mentioned above. The material below explains the architecture in detail
.


Basic

functions implemented in the neural networks algorithm
:


A one hidden layer MLP network with
feed
-
forward
d
ifferent and randomly selected training and test data sets.



Software used
:

MATLAB

(Run on Windows Platform)

Important Considerations
:

Number of
layers
,

Number of Processing elements in each layer
,


Randomizing the training and test data sets
,

Expressive power
,
Training error
,

Activation function
,


Scaling input
,

Target values
,

Initializing weights
,

Learning rate
,

Momentum lea
rning
,

Stopping criterion
,
Criterion function



1.

Number of layers:


Multilayer neural networks implement linear discriminants, in a space where the inputs have been
mapped non
-
linearly. Non
-
linear multilayer networks have greater computational or ex
pressive power
than simple 2
-
layer networks (input & output layers), and can implement more functions. Given a
sufficient number of hidden units, any function can be represented. For this project, a
one hidden layer
MLP

has been chosen in order to reduce t
he complexity of the decision hyper plane.



2.

Number of Processing elements in each layer:


The number of PE’s in the input and output layer can be easily understood based on the key
features in the input and output space. A clear observation o
f the input of each feature set reveals the
principal components that can be used as distinguished features between the three classes of IRIS leaves
that we plan to classify.
Every pattern has 4 features (attributes), representing petal width, petal length
,
sepal width, and sepal length (expressed in centimeters)
. An attempt at trying to reduce the input space,
in order to reduce the overall complexity of the classifier has been made. However, it should not be
CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


8


Fall 2003

forgotten that neglecting any of the features w
ithout proper reasoning might amount to losing key
features and hence reduce accuracy of our classifier. Thus the input space has been analyzed for the
principal components using the PCA algorithm, which has also been implemented in the source code
used fo
r arriving at the neural networks based solution in this project. The number of output PE’s is 3
due to the need for classifying each input space data set into one of the three different classes as
mentioned in the problem definition. Choosing the number o
f PE’s in the hidden layer is, however, a
more intuitive task. It is found that PE’s equal to 3 give best results. However varying the PE’s with in
a acceptable number did not alter the accuracies much.

3.

Randomizing the training and test data sets:



For most practical outputs, the need for randomizing the training and test data sets is important.
A total of 13,25,38,50 training data sets, respectively, were used for each class, in each part of the
project. Correspondingly, a total 37,25,12,50 t
est data sets have been used. This data has been
randomly permuted

before being fed forward in the network.



4.

Expressive power:


Although we will have cause to use networks with different activation function for each layer
or each unit of each

layer, to simplify the mathematical analysis,
identical non
-
linear activation

functions were used.



5
. Training error:


The training error on a pattern is the sum over output units of the squared difference between
the desired output d
k

an
d the actual output y
k
.


J (w) = ½* || d


y ||
2


The training error for the hidden layer is calculated by the
back propagation

of the output layer errors.



6.

Activation function:


The important constraint is that this sho
uld be continuous and differentiable. The
sigmoid

is a
smooth, differentiable, non
-
linear and saturating function. A minor benefit is that the derivative can be
easily expressed in terms of itself. That’s why, this function was chosen.



7.

Scaling input:



In order to avoid difficulty due to difference of scale for each input, the input patterns should be
shifted so that the average over the training set of each feature is zero. Since online protocols do not
have the full data set at any one time,

the scaling of the inputs was not found necessary.




8.

Target values:

CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


9


Fall 2003


For a finite value of net
k
, the output could never reach the saturation value, and thus there
would be error. Full training would never terminate because weights would bec
ome extremely large, as
the net
k

would be driven to plus or minus infinity. Thus target values corresponding to
2*(desired
-
1)

were used here.



9.

Initializing weights:


For uniform learning, i.e. for all weights to reach their equilibrium value
s at the same time,
initializing the weights is very crucial. In case of non
-
uniform learning, one category is learnt well
before the others and so overall error rate is typically higher than necessary, due to redistribution of
error. To ensure uniform lea
rning, the weights have been
randomly initialized

for each given layer.



10.

Learning rate:


The optimal step size is given by step
opt

= (d
2

J/ d w
2
)

1
. The proper setting of this parameter
greatly affects the convergence as well as the class
ification accuracy.

After a lot of trials, it was found
that this parameter should be very small (say a value in the range (1/10 to 1/1000)), to get close to
accurate results.

11.

Momentum learning:


Error surfaces often have multiple minima in w
hich d J (w) / d w is very small. These arise
when there are too many weights and thus the error depends only weakly upon any one of them.
Momentum allows the network to learn more quickly. The effect of the momentum term for the narrow
steep regions of th
e weight learning space is to focus the movement in a downhill direction by
averaging out the components of the gradient which alternate in sign [Gupta, Homma et al]. After a lot
of trails, it was found that the momentum learning parameter should be less t
han 1 for this particular
application. Varying this parameter with in between 0 and 1, did not adversely affect the performance.
However it should be noted that increasing this parameter value significantly reduces the convergence
speed.

12.

Stopping criterion
:


Usually the stopping criterion used is when the error falls below the error achieved on a
separate validation set (in which no data set for the test set has also been in the training set). But here
after training and testing individually fo
r each class, it was found that the error is typically
0.4

when it
converges. So this was the stopping criterion used, in order to avoid over fitting and due to its
simplicity in implementing.



13.

Criterion function:


The
squared error

has been used
as the criterion function for this project.






CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


10


Fall 2003

Results:


The results of the neural networks based classifier has been presented in the form of a confusion matrix,
the columns of which represent the classification results and the rows of which repre
sent the class numbers
of our classification. The diagonal terms illustrate the correct results and the off diagonal terms in each
column; illustrate the wrong classification results, for that particular set of features (corresponding to each
class).


Vari
ous experiments were conducted on maximizing the circuit performance. A sample result is
shown below.

trainConf


classification results on the training data


testConf


classification results on the test data


Step size change


a) Initial step size
= 0.0400


Number of processing elements in the hidden layer = 3


Momentum factor = 0.9000


trainConf = 13 0 0


0 13 0


0 0 13


testConf = 37 0 0



0 33 0



0 4 37










CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


11


Fall 2003

Project 3


Problem statement



Project 3
consists of implementing a training algorithm for a morphological perceptron with dendritic
structures (MPDS). Write a program that trai
ns a two
-
input, two
-
output MPDS to solve the
embedded
spirals

problem.


The parametric equations of the two spirals are:

x
1
(
theta
) =
theta

* cos(
theta
) * 2 /
pi


y
1
(
theta
) =
theta

* sin(
theta
) * 2 /
pi


and

x
2
(
theta
) =


x
1
(
theta
)

y
2
(
theta
) =


y
1
(
theta
)

where

theta

= 0*
pi
/16+
pi
/2, 1*
pi
/16+
pi
/2, 2*
pi
/16+
pi
/2, ..., 64*
pi
/16+
pi
/2.


The spirals are initially sampled in 65 points, at angles ranging from
pi
/2 to 4
*
pi
+
pi
/2 in uniform
increments of
pi
/16. These 2*65 points are provided for your convenience as a dataset file in the
Datasets section

and will represent the firs
t training set. The program will run in stages. At each stage, it
will train the SLMP, then double the number of points by subsampling the spirals (substituting
theta
),
and then test the SLMP on the entire set (consisting of the original training points to
gether with the
intermediate test points). The stages are repeated until either correct classification occurs for all points,
or the number of points per spiral reaches 1025. The figure below illustrates the two spirals, each with
the initial 65 training p
oints depicted as solid dots and the first test set of 64 intermediate points as
empty circles.

To summarize, your implementation must perform the following tasks:



constructs an SLMP and trains it on the initial training set;



generates 64 intermediate po
ints per spiral, each point being on the spiral (and
not

on the edge
connecting two points) dividing an arc piece in two halves, resulting in 129 points per spiral (as in
the above figure);



tests the SLMP on the entire, 2*129 point, set and reports the re
sults;



if recognition is 100% correct, then the program stops; otherwise, it retrains the SLMP on the new
set of data and continues;



doubles the number of points by generating a new set of intermediate points on each spiral; thus, the
new set will consis
t of 2*257 points;



tests the SLMP on the entire, 2*257 point, set and reports the results;



repeats this procedure (retraining, doubling the number of points, and then testing) until either
recognition is 100% accurate, or the total number of points per s
piral has reached 1025
;

reports the
classification results on the last test set and then stops.


CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


12


Fall 2003



.


Approach:


As explained in the problem, an algorithm was w
ritten in MATLAB to implement the Morphological
perceptron with dendritic structures. This particular algorithm mimics the exact functioning of a human
brain growing and shrinking dendrites all the time, as it progresses in its learning and hence its name.

Results:


The algorithm was able to classify the spirals accurately.




CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


13


Fall 2003

Project 4

consists of implementing several types of associative memories.


Project 4a:

Implement the algorithm to create a
Hopfield auto
-
associative memory

that stores the capital
letter patterns

A
,
B
,
C
,
E
, and
X
. Test the memory on all of the following
patterns
:

o

perfect (undistorted)
A
,
B
,
C
,
E
, and
X
;

o

corrupted
A
5%
,
B
5%
,
C
5%
,
E
5%
, and
X
5%

with 5% dilative, erosive, and random noise;

o

corrupted
A
10%
,
B
10%
,
C
10%
,
E
10%
, and
X
10%

with 10% dilative, erosive, and random noise.

Remember, however
, that the Hopfield network requires bipolar data, so be sure to make the
necessary conversions.



Project 4b:

Implement the algorithm to create a pair of
morphological auto
-
associative memories
M

and
W

that store the same capital
letter patterns

A
,
B
,
C
,
E
, and
X
. As before, test the memory on
all of the following
pat
terns
:

o

perfect (undistorted)
A
,
B
,
C
,
E
, and
X
;

o

corrupted
A
5%
,
B
5%
,
C
5%
,
E
5%
, and
X
5%

with 5% dilative, erosive, and random noise;

o

corrupted
A
10%
,
B
10%
,
C
10%
,
E
10%
, and
X
10%
with 10% dilative, erosive, and random noise.



Approach:


Hop field associat
ive memory


The code for implementing the Hopfield associative memory is
written in MATLAB
.



Observations:

1) In all the cases (undistorted, 5% dilative, erosive and random noise cases as well as 10% dilative, erosive
and random noise cases) the letter pa
tterns ‘A’ and ‘x’ were classified correctly. The main reason for this
may be because the patters themselves are largely unassociated as there is wide disparity in the letter form
itself in between these cases. The other patterns are not classified correct
ly. The main reason for this may
be because the letter patterns ‘B’,’C’,’E’ were somewhat similar in one way or another among themselves .




The results are plotted using the ‘imshow’ function in MATLAB. Make sure that Image processing
tool box

is available in your version of Matlab, otherwise we can not observe the results. The code was
written in MATLAB on windows platform using version 6.12. While executing the code no significant
problems were encountered. The results are very obvious from t
he figures that pop up after executing the
code. However appropriate insights regarding those results are also mentioned above.










CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


14


Fall 2003

Results:



Some results are shown here for the case of patterns with 5% dilative noise added:







Pattern ‘A’ recalled Pattern ’B’ not recalled







Pattern ‘C’ not recalled Pattern ‘E’ not recalled








Pattern ‘X’ recalled



Final Observation:





In the case of Hopfield associative memory we observe that recall is successful only when the
patterns themselves are not very similar. In case the
patterns are similar the memory is confused and pops
out garbage results.



CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


15


Fall 2003



Morphological Associative memories using matrices ‘W’ and ‘M’



Observations:



In these memories, the entire letter patterns (undistorted, 5% dilative, erosive and ran
dom noise cases)
were classified correctly both by the W and M associative memories.



In the 10% noise case we observe that


1) W is robust in the presence of Erosive noise.


2) M is robust in the presence of Dilative noise.





Even this, is very subtle to observe as the recalls are perfect in all patterns for both W and M
associative memories, except in one pattern ‘E’ (10% erosive noise case) where W performs better than M
associative memory reinforcing the o
bservations mentioned above.



Probably, if more noise is added we would be able to better appreciate the facts mentioned in (1) and (2)
above.


Results:




For the case of 10% erosive noise added patterns the following recalls were obtained.









CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


16


Fall 2003














CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


17


Fall 2003


This is Pattern ‘E’ (not clear on the white background)










Final Observation:



In general morph
ological memories performed better than Hopfield associative memories.



CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


18


Fall 2003



Additional patterns that were generated and their recalls using Wand M memories:








CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


19


Fall 2003












CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


20


Fall 2003

Results of additional patterns with Hopfield associative memory












CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


21


Fall 2003












CAP 6615
:

Neural Networks



© 200
3 Rajesh

Pydipati


22


Fall 2003


References:

1)
Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory



Madan M. Gupta, Liang Jin, Noriyasu Homma

2)

Neural Networks: A Comprehensive Foundatio
n



Simon S. Haykin

3)

Pattern Classification



Richard O. Duda, Peter E. Hart, David G. Stork

4) Class notes of Prof. Gerhard Ritter of CAP 6615