Reading Assignment
•
Chapter 1 in text
•
Chapter 2, Sections 2.1, 2.2
•
Other References:
–
“Emergent Neural Computational
Architectures Based on Neuroscience”,
Wermter, Austin and Willshaw (Eds.)
–
“Computational Explorations in Cognitive
Neuroscience”, O’Reilly and Munakata
Neural Networks
Classification
Classification
•
Classification is a common application for
neural networks, other applications are:
–
Prediction ( stock market )
–
Control ( chemical plants, airplanes )
•
Before a classifier can be designed, several
options must be decided:
–
Neural network architecture
–
Features
–
Training method
Apples, Oranges and Pears
•
Have an digital image of a collection of
fruit.
•
Objective: Want to classify each piece
of fruit
•
Which features could we use?
Features
•
Color
•
Shape: Roundness
•
The feature measurements form the
feature vector
, f(i) = { c(i), s(i) }
•
The objects form the
class vector
,
v(i)={ n(i) }, where n(i) = apple,
orange or pear
Procedure
•
To simplify example, consider only apples and
pears.
•
From a collection of apples and pears,
measure color and roundness for each piece
of fruit.
•
The measurements will represent ‘typical’
fruits
•
Plot data on a 2 dimensional plane ( one
dimension for each measurement )
Feature Space and Decision
Boundary
RED
Color
YELLOW
GREEN
NOT ROUND ROUND
Roundness
P
AA A A
P
A
P
AA A
A
P P PP P
P A
Two Dimensional Feature Space and
Decision Boundary
CLASS 1 CLASS 2
Feature 1
Feature 2
Two Dimensional Feature Space and
Decision Boundary
CLASS 1 CLASS 2
Feature 1
Feature 2
X
Measure features of unknown object
Unknown object X is classified as a member of Class 2
Three Dimensional Feature Space
Feature vector F = { f
1
, f
2
, f
3
}
X X
XX
YY
Y
Three Dimensional Feature Space
Feature vector F = { f
1
, f
2
, f
3
}
X X
XX
YY
Y
DECISION
BOUNDARY
IS A PLANE
More Features,
Higher Dimensional Feature Space
•
Consider Optical Character Recognition
System
•
Each character is divided into a 16x20 matrix
of pixel values
•
Matrix is transformed into a vector ( the gray
level values will actually be the features )
•
The dimension of the Feature Space is 320
•
Each character will be represented by a point
in a 320 dimensional space (Recall: apple was
point in 2 dimensional space )
Optical Character Recognition System
…
…
This is a point in a 320 dimensional space
How Do We Find the
Decision Boundary??
Apple / Pear Problem
•
Is the object an Apple?
–
Does it or doesn’t it belong to the apple
category
–
Response:
•
+1 if it does belong
•

1 if it does not belong
Linear Classifier Architecture
1
X
1
X
2
X
n
Y
Input neurons
–
number of neurons depend on input vector length
Output neurons
yes/no,

1/+1
b
w
1
w
2
w
n
Bias neuron
Allows classification of
n

tuple vector in one
category only
Activation Function
•
net
is defined as the input to the Y
neuron
net = b +
i
x
i
* w
i
•
Activation function
+1 if net >= 0
f(net) =

1 if net < 0
Boundary Decision
•
Consider two input with bias
net = b +
i
x
i
* w
i
= 0
(zero is the
boundary )
b + x
1
w
1
+ x
2
w
2
= 0
solve for x
2
: x
2
=

(w
1
/w
2
) x
1
–
b/w
2
Values of
w
1
, w
2
and b determined during
training
Example
•
Assume
w
1
=1, w
2
=1, b =

1
•
From:
x
2
=

(w
1
/w
2
) x
1
–
b/w
2
x
2
=

x
1
+ 1
Therefore:
when
x
1
=0, x
2
= 1
x
1
=1, x
2
= 0
x
2
x
1
Category +1
Category

1
net =

1 +
x
1
+ x
2
Feature vector: {
x
1
, x
2
}
{1,1}
f(net) = +1
{.5,0}
f(net) =

1
Consider Bias Term b
•
What happens if we eliminate he bias
term b?
•
Net equation reduces to:
net =
i
x
i
w
i
= 0
x
1
w
1
+ x
2
w
2
= 0
x
2
=

(w
1
/w
2
) x
1
No Bias Term
x
2
=

(w
1
/w
2
) x
1
W
1
=1, W
2
=1 , x
2
=

x
1
W
1
=

1, W
2
=1 , x
2
= x
1
x
1
x
2
Linear Separability
•
If a set of weights can be obtained from
the training vectors so that the correct
response of +1 lies on one side of the
boundary, and a correct response of
–
1
lies on the other side, the problem is
“linearly separable”
Recall Apple / Pear
RED
Color
YELLOW
GREEN
NOT ROUND ROUND
Roundness
P
AA A A
P
A
P
A P P
AA A
AAA
A
P P PP P
P A
Can not separate pears from apples with a straight ( linear ) line
x
2
x
1
Feature vector: {
x
1
, x
2
}
{1,1}
f(net) =

1
{1,

1}
f(net) = +1
{

1,1}
f(net) = +1
{

1,

1}
f(net) =

1
Although there are only two
classes cannot separate with only
one linear boundary
Ex

OR Problem
x
2
x
1
Feature vector: {
x
1
, x
2
}
{1,1}
f(net) =

1
{1,

1}
f(net) = +1
{

1,1}
f(net) = +1
{

1,

1}
f(net) =

1
Non

linear boundary
Ex

OR Problem
McCulloch

Pitts
Basis for most neurons used today
•
Activation is binary (output either 1 or 0 )
•
Each neuron has a fixed threshold
•
Positive weights
–
excites neuron, Negative
weights
–
inhibit neuron
•
It takes one ‘time step’ to pass a signal over
one connection link
•
How does this model compare to the biological
neurons we have previously studied?
Biological Neurons
•
Neurotransmitters can either inhibit or
excite neuron
•
Output is a train of pulses
–
can a train
of pulses be modeled by a 0/1 level?
General McCulloch Pitts Neuron
X1
Xn
Xn+1
Xn+m
Y
+w
+w

p

p
p > 0

p inhibit
+w excite
Activation Function
f(y_in) = 1 if y_in
= 0 if y_in <
Where
is the threshold
AND
–
uses analysis instead of learning to determine
weights
x1
x2
y
0
0
0
0
1
0
1
0
0
1
1
1
x1
X1
y
Y
x2
= 2
X2
1
1
y_in = x1*1 + x2*1 = x1+x2
y = f(y_in) = 1 if y_in
,
y_in
2
OR
–
uses analysis instead of learning to determine
weights
x1
x2
y
0
0
0
0
1
1
1
0
1
1
1
1
x1
X1
y
Y
x2
= 2
X2
2
2
y_in = x1*2 + x2*2
y = f(y_in) = 1 if y_in
,
in this example
y_in
2
AND

NOT
–
non

symmetric function
x1
x2
y
0
0
0
0
1
0
1
0
1
1
1
0
x1
X1
y
Y
x2
= 2
X2
2

1
y_in = x1*2

x2
y = f(y_in) = 1 if y_in
,
in this example
y_in
2
XOR Function
x1 xor x2 = ( x1 AND NOT x2 ) OR ( x2 AND NOT x1)
= Z1 OR Z2
2 Z1
X1

1
X2
Z2
y
2
2
2

1
All thresholds = 2
Training Algorithms for Single Layer
Neuron Networks
•
Hebbs
–
most fundamental
•
Perceptron Learning
•
Delta Rule
HEBB Net
•
Learning occurs by modifying the
weights so that the weight between two
neurons that are both ‘on’ is increased
•
Modified Hebb Learning increases the
strength of the weight when both
neurons are either on or off. This is
more powerful han original Hebb rule
Hebb Learning
1
X1
Y
X2
y
b
w1
w2
Bipolar Data +1 or
–
1
We need training data for learning ( s:t )
Training vector s
Target vector t
Hebb Learning Algorithm
•
Initialize weights to 0, w
i
•
For each training vector and target pair,
s
i
: t
i
, ( i = 1,n )
–
Set activations for input neurons x
i
= s
i
–
Set activations for output neuron y = t
i
–
Adjust weights
•
w
i
(new) = w
i
(old) + x
i
y
•
Adjust bias b(new) = b(old) + y
Use only one pass through the training data
Hebb Learning Example ( AND Logic Gate ):
Input
x1
Input
x2
Bias
B
Target
y
1
1
1
1
1

1
1

1

1
1
1

1

1

1
1

1
Initialize weights to zero, calculate change in weights and bias
Recall:
w
i
(new) = w
i
(old) + x
i
y
b(new) = b(old) + y
So, define:
w
1
= x
1
y
w
2
= x
2
y
b = y
x1
x2
b
y
w
1
w
2
b
w
1
w
2
b
1
1
1
1
1
1
1
NEW
w
1,
w
2,
b
Initialize weights to zero, calculate change in weights and bias
Recall:
w
i
(new) = w
i
(old) + x
i
y
b(new) = b(old) + y
So:
w
1
= x
1
y
w
2
= x
2
y
b = y
x1
x2
b
y
w
1
w
2
b
w
1
w
2
b
1
1
1
1
1
1
1
1
1
1
NEW
w
1,
w
2,
b
Since initial weights = 0,
w
i
(new) = w
i
(old) + x
i
y
w
i
(new) = x
i
y
Current Decision Boundary
y = b +
i
x
i
* w
i
= 0
(recall zero is the boundary )
0 = b + x
1
w
1
+ x
2
w
2
solve for x
2
: x
2
=

(w
1
/w
2
) x
1
–
b/w
2
With current weights:
x
2
=

x
1
–
1
x
2
= 0, x
1
=

1
x
2
=

1 , x
1
= 0
Current Decision Boundary
+



Using:
w
i
(new) = w
i
(old) + x
i
y
b(new) = b(old) + y
And:
w
1
= x
1
y
w
2
= x
2
y
b = y
x1
x2
b
y
w
1
w
2
b
w
1
w
2
b
1

1
1

1

1
1

1
0
2
0
NEW
w
1,
w
2,
b
Since previous weights no longer 0,
w
i
(new) = w
i
(old) + x
i
y
Next Data Set
Current Decision Boundary
+



x
2
=

(w
1
/w
2
) x
1
–
b/w
2
With current weights:
x
2
= 0
Using:
w
i
(new) = w
i
(old) + x
i
y
b(new) = b(old) + y
And:
w
1
= x
1
y
w
2
= x
2
y
b = y
x1
x2
b
y
w
1
w
2
b
w
1
w
2
b

1
1
1

1
1

1

1
1
1

1
NEW
w
1,
w
2,
b
Since previous weights no longer 0,
w
i
(new) = w
i
(old) + x
i
y
Next Data Set
Current Decision Boundary
+



x
2
=

(w
1
/w
2
) x
1
–
b/w
2
With current weights:
x
2
=

x
1
+1
Boundary is now in correct position, but one more data set to process
Using:
w
i
(new) = w
i
(old) + x
i
y
b(new) = b(old) + y
And:
w
1
= x
1
y
w
2
= x
2
y
b = y
x1
x2
b
y
w
1
w
2
b
w
1
w
2
b

1

1
1

1
1
1

1
2
2

2
NEW
w
1,
w
2,
b
Since previous weights no longer 0,
w
i
(new) = w
i
(old) + x
i
y
Last Data Set
Final Decision Boundary
+



x
2
=

(w
1
/w
2
) x
1
–
b/w
2
With current weights:
x
2
=

x
1
+1
Observations for Hebb’s Learning
•
Weights only change for active input
neurons, x
i
0
•
Hebb Learning will not always find the
correct weights even if hey exist
Perceptron Learning Algorithm
Developed by Frank Rosenblatt
•
Will always converge to correct weights
if they exist
•
Incorporates the concept of a learning
rate
–
you can control how fast the
neuron learns.
Perceptron Learning Algorithm
•
Initialize weights and bias to 0
•
Set learning rate
( 0 <
1 )
•
Continue process as long as weight change:
–
For each training pair, x
i
= s
i
–
Compute response of output neuron:
y_in = b +
x
i
w
i
1 if y_in >
Output of neuron with input y_in
y = 0 if

y_in

1 if y_in <

•
continued
Perceptron Learning Algorithm
•
If y
t ( there is an error, update
weights)
w
i
(new) = w
i
(old) +
t
x
i
note: if x =0,
b(new) =b(old)
+
t
weight does not
ELSE:
change
w
i
(new) = w
i
(old), b(new) =b(old)
Perceptron Learning Algorithm
•
As long as the weights or bias changes
at least once while processing the
complete set of data continue repeating
the algorithm
•
STOP when no weights or bias changes
for a complete pass through the data
Comments 0
Log in to post a comment