DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING
CAP 6615
Neural Networks
Rajesh Pydipati
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
4
Fall 2003
Introduction
The objective of taking this course was to get a clear understanding of the concepts involved in neural
network c
omputing so that the technology can be tailored to solve a plethora of real world problems with
wide ranging applications in various fields which are mentioned below. Emphasis was mainly on
programming to observe the working of the algorithms.
Course De
scription
Objectives:
Understand the concepts and learn the techniques of neural network computing.
Prerequisites:
A familiarity with basic concepts in calculus, linear algebra, and probability theory.
Calculus requirements include: differentiation, chain
rule, integration. Linear algebra requirements
include: matrix multiplication, inverse, pseudo

inverse.
Main topics:
Introduction to neural computational models including classification, association,
optimization, and self

organization. Learning and disc
overy. Knowledge

based neural network
design and algorithms.
Applications
include: pattern recognition, expert systems, control, signal analysis, and computer
vision.
Syllabus
Basic neural computational models
Feedforward networks
Learning / back propa
gation
Association networks
Classification
Self

Organization
Radial Basis Function networks
Support Vector Machines
Networks based on lattice computation
Applications
Projects:
A set of four projects were done as part of this course. A det
ailed description of each project with an
approach to the solution is presented next.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
5
Fall 2003
Project 1:
Problem statement:
Project 1a:
Implement the SLP learning algorithm. Implement the algorithm yourselves; do not use any
ANN package. Train your SL
P to classify the capital
letter patterns
A
,
B
,
C
, and
D
, in two classes,
C
1
and

C
1
as follows:
A
belongs to
C
1
;
B
,
C
, and
D
all belong to

C
1
. After training, t
est whether your SLP
correctly classifies the same four patterns. You may use either the unipolar or the bipolar version of the
patterns
Approach:
This pro
blem was to make us understand the basic working of a neural network circuit, the ‘perceptron’.
The problem involved, was to identify four different letter patterns which were fed as a stream of ‘0’s and
‘1’s to the network. After constructing the network
architecture, it was trained on some data. After training
was complete, some patterns were tested to test the efficacy of the algorithm. Algorithms were written in
MATLAB.
Results:
The network was able to perfectly classify the four letter patterns
Pro
ject 2a:
Implement an SLP to solve the following problem:
1.
Randomly choose 1000 points on either side of the line
y
= 0.5
x
+ 2. Do not choose points
exactly on the line. Also, pick the points between fixed bounds
b
1
<=
x
<=
b
2
such that
b
2

b
1
< 100, as s
hown in the figure below.
2.
Train the SLP to discriminate between the two classes of points. Use a sequential application
of the points. Then pick 5 test points (not from the training set!) from either side of the line
to test your SLP on. Do they classify
correctly? (The difficulty will be when the test points
are close to the line.) Print out the equation of the line obtained from the final weights. If the
5 points were not correctly classified, use them as additional training points and retrain the
networ
k. Then pick another 5 points and test again.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
6
Fall 2003
Approach:
This problem was to make us understand the working of a single layer perceptron in making dec
ision
surfaces to distinguish various clusters of data. The problem involved. The problem involved in classifying
the data points as belonging to two clusters by forming a hyper plane as a decision boundary. After
constructing the network architecture, it
was trained on some data. After training was complete, some data
points were tested to test the efficacy of the algorithm. Algorithms were written in MATLAB.
Results:
The network was able to perfectly classify the data points according to the decision b
oundary.
Project 2
Problem statement:
Project 2
is more of a research project and consists of implementing the backpropagation training algorithm
for multilayer perceptrons. Use
Fisher's Iris dataset
to train and test a multiclass MLP using the
backpropagation method. The dataset comprises 3 groups (classes) of 50 patterns each. One group
corresponds to one species of Iris flower. Every pattern has 4 real

valued f
eatures. The number of input and
output neurons is known; the number of hidden layers and hidden neurons are your choice.
Train your network using 13 exemplar patterns from each class (roughly 25% of the patterns)
picked at random. Then use the remaining
patterns in the dataset to test the network and report the
results.
Next, train your network using 25 exemplar patterns from each class (i.e., 50% of the patterns)
picked at random. Use the remaining patterns in the set to test the network and report the
results.
Next, train your network using 38 exemplar patterns from each class (roughly 75% of the patterns)
picked at random. Use the remaining patterns to test the network and report the results.
Finally, train your network on the entire pattern set. The
n use the same patterns to test the network
and report the results. Note that this last experiment may pose serious convergence problems.
Explore techniques such as momentum to increase convergence speed, try various network architectures
(number of hidde
n layers and neurons in the hidden layers), investigate various stopping criteria and ways
to adjust learning rate and other parameters you might have
.
Fisher's Iris Dataset
R.A. Fisher's
Iris dataset
is often referenced in the field of pattern recognition. It consists of 3 groups
(classes) of 50 patterns each. One group corresponds to one species of Iris flower: Iris Setosa (class
C
1
), Iris
Versicolor (class
C
2
), and Iris Verginica
(class
C
3
). Every pattern has 4 features (attributes), representing
petal width, petal length, sepal width, and sepal length (expressed in centimeters). The
dataset file
contains
one patter
n per line, starting with the class number, followed by the 4 features. Lines are terminated with
single LF characters. Patterns are grouped by class
.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
7
Fall 2003
Irise
s by Vincent van Gogh (oil on canvas 1889)
Approach:
A network based on back propagation principle was constructed to
train the network and classify t
he
various classes as mentioned above. The material below explains the architecture in detail
.
Basic
functions implemented in the neural networks algorithm
:
A one hidden layer MLP network with
feed

forward
d
ifferent and randomly selected training and test data sets.
Software used
:
MATLAB
(Run on Windows Platform)
Important Considerations
:
Number of
layers
,
Number of Processing elements in each layer
,
Randomizing the training and test data sets
,
Expressive power
,
Training error
,
Activation function
,
Scaling input
,
Target values
,
Initializing weights
,
Learning rate
,
Momentum lea
rning
,
Stopping criterion
,
Criterion function
1.
Number of layers:
Multilayer neural networks implement linear discriminants, in a space where the inputs have been
mapped non

linearly. Non

linear multilayer networks have greater computational or ex
pressive power
than simple 2

layer networks (input & output layers), and can implement more functions. Given a
sufficient number of hidden units, any function can be represented. For this project, a
one hidden layer
MLP
has been chosen in order to reduce t
he complexity of the decision hyper plane.
2.
Number of Processing elements in each layer:
The number of PE’s in the input and output layer can be easily understood based on the key
features in the input and output space. A clear observation o
f the input of each feature set reveals the
principal components that can be used as distinguished features between the three classes of IRIS leaves
that we plan to classify.
Every pattern has 4 features (attributes), representing petal width, petal length
,
sepal width, and sepal length (expressed in centimeters)
. An attempt at trying to reduce the input space,
in order to reduce the overall complexity of the classifier has been made. However, it should not be
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
8
Fall 2003
forgotten that neglecting any of the features w
ithout proper reasoning might amount to losing key
features and hence reduce accuracy of our classifier. Thus the input space has been analyzed for the
principal components using the PCA algorithm, which has also been implemented in the source code
used fo
r arriving at the neural networks based solution in this project. The number of output PE’s is 3
due to the need for classifying each input space data set into one of the three different classes as
mentioned in the problem definition. Choosing the number o
f PE’s in the hidden layer is, however, a
more intuitive task. It is found that PE’s equal to 3 give best results. However varying the PE’s with in
a acceptable number did not alter the accuracies much.
3.
Randomizing the training and test data sets:
For most practical outputs, the need for randomizing the training and test data sets is important.
A total of 13,25,38,50 training data sets, respectively, were used for each class, in each part of the
project. Correspondingly, a total 37,25,12,50 t
est data sets have been used. This data has been
randomly permuted
before being fed forward in the network.
4.
Expressive power:
Although we will have cause to use networks with different activation function for each layer
or each unit of each
layer, to simplify the mathematical analysis,
identical non

linear activation
functions were used.
5
. Training error:
The training error on a pattern is the sum over output units of the squared difference between
the desired output d
k
an
d the actual output y
k
.
J (w) = ½*  d
–
y 
2
The training error for the hidden layer is calculated by the
back propagation
of the output layer errors.
6.
Activation function:
The important constraint is that this sho
uld be continuous and differentiable. The
sigmoid
is a
smooth, differentiable, non

linear and saturating function. A minor benefit is that the derivative can be
easily expressed in terms of itself. That’s why, this function was chosen.
7.
Scaling input:
In order to avoid difficulty due to difference of scale for each input, the input patterns should be
shifted so that the average over the training set of each feature is zero. Since online protocols do not
have the full data set at any one time,
the scaling of the inputs was not found necessary.
8.
Target values:
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
9
Fall 2003
For a finite value of net
k
, the output could never reach the saturation value, and thus there
would be error. Full training would never terminate because weights would bec
ome extremely large, as
the net
k
would be driven to plus or minus infinity. Thus target values corresponding to
2*(desired

1)
were used here.
9.
Initializing weights:
For uniform learning, i.e. for all weights to reach their equilibrium value
s at the same time,
initializing the weights is very crucial. In case of non

uniform learning, one category is learnt well
before the others and so overall error rate is typically higher than necessary, due to redistribution of
error. To ensure uniform lea
rning, the weights have been
randomly initialized
for each given layer.
10.
Learning rate:
The optimal step size is given by step
opt
= (d
2
J/ d w
2
)
–
1
. The proper setting of this parameter
greatly affects the convergence as well as the class
ification accuracy.
After a lot of trials, it was found
that this parameter should be very small (say a value in the range (1/10 to 1/1000)), to get close to
accurate results.
11.
Momentum learning:
Error surfaces often have multiple minima in w
hich d J (w) / d w is very small. These arise
when there are too many weights and thus the error depends only weakly upon any one of them.
Momentum allows the network to learn more quickly. The effect of the momentum term for the narrow
steep regions of th
e weight learning space is to focus the movement in a downhill direction by
averaging out the components of the gradient which alternate in sign [Gupta, Homma et al]. After a lot
of trails, it was found that the momentum learning parameter should be less t
han 1 for this particular
application. Varying this parameter with in between 0 and 1, did not adversely affect the performance.
However it should be noted that increasing this parameter value significantly reduces the convergence
speed.
12.
Stopping criterion
:
Usually the stopping criterion used is when the error falls below the error achieved on a
separate validation set (in which no data set for the test set has also been in the training set). But here
after training and testing individually fo
r each class, it was found that the error is typically
0.4
when it
converges. So this was the stopping criterion used, in order to avoid over fitting and due to its
simplicity in implementing.
13.
Criterion function:
The
squared error
has been used
as the criterion function for this project.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
10
Fall 2003
Results:
The results of the neural networks based classifier has been presented in the form of a confusion matrix,
the columns of which represent the classification results and the rows of which repre
sent the class numbers
of our classification. The diagonal terms illustrate the correct results and the off diagonal terms in each
column; illustrate the wrong classification results, for that particular set of features (corresponding to each
class).
Vari
ous experiments were conducted on maximizing the circuit performance. A sample result is
shown below.
trainConf
classification results on the training data
testConf
classification results on the test data
Step size change
a) Initial step size
= 0.0400
Number of processing elements in the hidden layer = 3
Momentum factor = 0.9000
trainConf = 13 0 0
0 13 0
0 0 13
testConf = 37 0 0
0 33 0
0 4 37
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
11
Fall 2003
Project 3
Problem statement
Project 3
consists of implementing a training algorithm for a morphological perceptron with dendritic
structures (MPDS). Write a program that trai
ns a two

input, two

output MPDS to solve the
embedded
spirals
problem.
The parametric equations of the two spirals are:
x
1
(
theta
) =
theta
* cos(
theta
) * 2 /
pi
y
1
(
theta
) =
theta
* sin(
theta
) * 2 /
pi
and
x
2
(
theta
) =
–
x
1
(
theta
)
y
2
(
theta
) =
–
y
1
(
theta
)
where
theta
= 0*
pi
/16+
pi
/2, 1*
pi
/16+
pi
/2, 2*
pi
/16+
pi
/2, ..., 64*
pi
/16+
pi
/2.
The spirals are initially sampled in 65 points, at angles ranging from
pi
/2 to 4
*
pi
+
pi
/2 in uniform
increments of
pi
/16. These 2*65 points are provided for your convenience as a dataset file in the
Datasets section
and will represent the firs
t training set. The program will run in stages. At each stage, it
will train the SLMP, then double the number of points by subsampling the spirals (substituting
theta
),
and then test the SLMP on the entire set (consisting of the original training points to
gether with the
intermediate test points). The stages are repeated until either correct classification occurs for all points,
or the number of points per spiral reaches 1025. The figure below illustrates the two spirals, each with
the initial 65 training p
oints depicted as solid dots and the first test set of 64 intermediate points as
empty circles.
To summarize, your implementation must perform the following tasks:
constructs an SLMP and trains it on the initial training set;
generates 64 intermediate po
ints per spiral, each point being on the spiral (and
not
on the edge
connecting two points) dividing an arc piece in two halves, resulting in 129 points per spiral (as in
the above figure);
tests the SLMP on the entire, 2*129 point, set and reports the re
sults;
if recognition is 100% correct, then the program stops; otherwise, it retrains the SLMP on the new
set of data and continues;
doubles the number of points by generating a new set of intermediate points on each spiral; thus, the
new set will consis
t of 2*257 points;
tests the SLMP on the entire, 2*257 point, set and reports the results;
repeats this procedure (retraining, doubling the number of points, and then testing) until either
recognition is 100% accurate, or the total number of points per s
piral has reached 1025
;
reports the
classification results on the last test set and then stops.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
12
Fall 2003
.
Approach:
As explained in the problem, an algorithm was w
ritten in MATLAB to implement the Morphological
perceptron with dendritic structures. This particular algorithm mimics the exact functioning of a human
brain growing and shrinking dendrites all the time, as it progresses in its learning and hence its name.
Results:
The algorithm was able to classify the spirals accurately.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
13
Fall 2003
Project 4
consists of implementing several types of associative memories.
Project 4a:
Implement the algorithm to create a
Hopfield auto

associative memory
that stores the capital
letter patterns
A
,
B
,
C
,
E
, and
X
. Test the memory on all of the following
patterns
:
o
perfect (undistorted)
A
,
B
,
C
,
E
, and
X
;
o
corrupted
A
5%
,
B
5%
,
C
5%
,
E
5%
, and
X
5%
with 5% dilative, erosive, and random noise;
o
corrupted
A
10%
,
B
10%
,
C
10%
,
E
10%
, and
X
10%
with 10% dilative, erosive, and random noise.
Remember, however
, that the Hopfield network requires bipolar data, so be sure to make the
necessary conversions.
Project 4b:
Implement the algorithm to create a pair of
morphological auto

associative memories
M
and
W
that store the same capital
letter patterns
A
,
B
,
C
,
E
, and
X
. As before, test the memory on
all of the following
pat
terns
:
o
perfect (undistorted)
A
,
B
,
C
,
E
, and
X
;
o
corrupted
A
5%
,
B
5%
,
C
5%
,
E
5%
, and
X
5%
with 5% dilative, erosive, and random noise;
o
corrupted
A
10%
,
B
10%
,
C
10%
,
E
10%
, and
X
10%
with 10% dilative, erosive, and random noise.
Approach:
Hop field associat
ive memory
The code for implementing the Hopfield associative memory is
written in MATLAB
.
Observations:
1) In all the cases (undistorted, 5% dilative, erosive and random noise cases as well as 10% dilative, erosive
and random noise cases) the letter pa
tterns ‘A’ and ‘x’ were classified correctly. The main reason for this
may be because the patters themselves are largely unassociated as there is wide disparity in the letter form
itself in between these cases. The other patterns are not classified correct
ly. The main reason for this may
be because the letter patterns ‘B’,’C’,’E’ were somewhat similar in one way or another among themselves .
The results are plotted using the ‘imshow’ function in MATLAB. Make sure that Image processing
tool box
is available in your version of Matlab, otherwise we can not observe the results. The code was
written in MATLAB on windows platform using version 6.12. While executing the code no significant
problems were encountered. The results are very obvious from t
he figures that pop up after executing the
code. However appropriate insights regarding those results are also mentioned above.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
14
Fall 2003
Results:
Some results are shown here for the case of patterns with 5% dilative noise added:
Pattern ‘A’ recalled Pattern ’B’ not recalled
Pattern ‘C’ not recalled Pattern ‘E’ not recalled
Pattern ‘X’ recalled
Final Observation:
In the case of Hopfield associative memory we observe that recall is successful only when the
patterns themselves are not very similar. In case the
patterns are similar the memory is confused and pops
out garbage results.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
15
Fall 2003
Morphological Associative memories using matrices ‘W’ and ‘M’
Observations:
In these memories, the entire letter patterns (undistorted, 5% dilative, erosive and ran
dom noise cases)
were classified correctly both by the W and M associative memories.
In the 10% noise case we observe that
1) W is robust in the presence of Erosive noise.
2) M is robust in the presence of Dilative noise.
Even this, is very subtle to observe as the recalls are perfect in all patterns for both W and M
associative memories, except in one pattern ‘E’ (10% erosive noise case) where W performs better than M
associative memory reinforcing the o
bservations mentioned above.
Probably, if more noise is added we would be able to better appreciate the facts mentioned in (1) and (2)
above.
Results:
For the case of 10% erosive noise added patterns the following recalls were obtained.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
16
Fall 2003
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
17
Fall 2003
This is Pattern ‘E’ (not clear on the white background)
Final Observation:
In general morph
ological memories performed better than Hopfield associative memories.
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
18
Fall 2003
Additional patterns that were generated and their recalls using Wand M memories:
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
19
Fall 2003
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
20
Fall 2003
Results of additional patterns with Hopfield associative memory
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
21
Fall 2003
CAP 6615
:
Neural Networks
© 200
3 Rajesh
Pydipati
22
Fall 2003
References:
1)
Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory
Madan M. Gupta, Liang Jin, Noriyasu Homma
2)
Neural Networks: A Comprehensive Foundatio
n
Simon S. Haykin
3)
Pattern Classification
Richard O. Duda, Peter E. Hart, David G. Stork
4) Class notes of Prof. Gerhard Ritter of CAP 6615
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο