# NNWeek3[1].

AI and Robotics

Oct 19, 2013 (4 years and 6 months ago)

105 views

Chapter 1 in text

Chapter 2, Sections 2.1, 2.2

Other References:

“Emergent Neural Computational
Architectures Based on Neuroscience”,
Wermter, Austin and Willshaw (Eds.)

“Computational Explorations in Cognitive
Neuroscience”, O’Reilly and Munakata

Neural Networks

Classification

Classification

Classification is a common application for
neural networks, other applications are:

Prediction ( stock market )

Control ( chemical plants, airplanes )

Before a classifier can be designed, several
options must be decided:

Neural network architecture

Features

Training method

Apples, Oranges and Pears

Have an digital image of a collection of
fruit.

Objective: Want to classify each piece
of fruit

Which features could we use?

Features

Color

Shape: Roundness

The feature measurements form the
feature vector
, f(i) = { c(i), s(i) }

The objects form the
class vector
,
v(i)={ n(i) }, where n(i) = apple,
orange or pear

Procedure

To simplify example, consider only apples and
pears.

From a collection of apples and pears,
measure color and roundness for each piece
of fruit.

The measurements will represent ‘typical’
fruits

Plot data on a 2 dimensional plane ( one
dimension for each measurement )

Feature Space and Decision
Boundary

RED

Color

YELLOW

GREEN

NOT ROUND ROUND

Roundness

P

AA A A

P

A

P

AA A

A

P P PP P

P A

Two Dimensional Feature Space and
Decision Boundary

CLASS 1 CLASS 2

Feature 1

Feature 2

Two Dimensional Feature Space and
Decision Boundary

CLASS 1 CLASS 2

Feature 1

Feature 2

X

Measure features of unknown object

Unknown object X is classified as a member of Class 2

Three Dimensional Feature Space

Feature vector F = { f
1
, f
2
, f
3

}

X X

XX

YY

Y

Three Dimensional Feature Space

Feature vector F = { f
1
, f
2
, f
3

}

X X

XX

YY

Y

DECISION

BOUNDARY

IS A PLANE

More Features,

Higher Dimensional Feature Space

Consider Optical Character Recognition
System

Each character is divided into a 16x20 matrix
of pixel values

Matrix is transformed into a vector ( the gray
level values will actually be the features )

The dimension of the Feature Space is 320

Each character will be represented by a point
in a 320 dimensional space (Recall: apple was
point in 2 dimensional space )

Optical Character Recognition System

This is a point in a 320 dimensional space

How Do We Find the
Decision Boundary??

Apple / Pear Problem

Is the object an Apple?

Does it or doesn’t it belong to the apple
category

Response:

+1 if it does belong

-
1 if it does not belong

Linear Classifier Architecture

1

X
1

X
2

X
n

Y

Input neurons

number of neurons depend on input vector length

Output neurons

yes/no,
-
1/+1

b

w
1

w
2

w
n

Bias neuron

Allows classification of

n
-
tuple vector in one

category only

Activation Function

net

is defined as the input to the Y
neuron

net = b +

i

x
i

* w
i

Activation function

+1 if net >= 0

f(net) =
-
1 if net < 0

Boundary Decision

Consider two input with bias

net = b +

i

x
i

* w
i
= 0
(zero is the

boundary )

b + x
1
w
1

+ x
2
w
2

= 0

solve for x
2
: x
2
=
-
(w
1

/w
2
) x
1

b/w
2

Values of

w
1

, w
2

and b determined during
training

Example

Assume
w
1

=1, w
2

=1, b =
-
1

From:

x
2
=
-
(w
1

/w
2
) x
1

b/w
2

x
2
=

-

x
1

+ 1

Therefore:

when

x
1
=0, x
2
= 1

x
1
=1, x
2
= 0

x
2

x
1

Category +1

Category
-
1

net =
-
1 +
x
1

+ x
2

Feature vector: {
x
1
, x
2

}

{1,1}

f(net) = +1

{.5,0}

f(net) =
-
1

Consider Bias Term b

What happens if we eliminate he bias
term b?

Net equation reduces to:

net =

i

x
i

w
i
= 0

x
1
w
1

+ x
2
w
2

= 0

x
2
=
-
(w
1

/w
2
) x
1

No Bias Term

x
2
=
-
(w
1

/w
2
) x
1

W
1
=1, W
2
=1 , x
2
=
-
x
1

W
1
=
-
1, W
2
=1 , x
2
= x
1

x
1

x
2

Linear Separability

If a set of weights can be obtained from
the training vectors so that the correct
response of +1 lies on one side of the
boundary, and a correct response of

1
lies on the other side, the problem is
“linearly separable”

Recall Apple / Pear

RED

Color

YELLOW

GREEN

NOT ROUND ROUND

Roundness

P

AA A A

P

A

P

A P P

AA A

AAA

A

P P PP P

P A

Can not separate pears from apples with a straight ( linear ) line

x
2

x
1

Feature vector: {
x
1
, x
2

}

{1,1}

f(net) =
-
1

{1,
-
1}

f(net) = +1

{
-
1,1}

f(net) = +1

{
-
1,
-
1}

f(net) =
-
1

Although there are only two

classes cannot separate with only

one linear boundary

Ex
-
OR Problem

x
2

x
1

Feature vector: {
x
1
, x
2

}

{1,1}

f(net) =
-
1

{1,
-
1}

f(net) = +1

{
-
1,1}

f(net) = +1

{
-
1,
-
1}

f(net) =
-
1

Non
-
linear boundary

Ex
-
OR Problem

McCulloch
-
Pitts

Basis for most neurons used today

Activation is binary (output either 1 or 0 )

Each neuron has a fixed threshold

Positive weights

excites neuron, Negative
weights

inhibit neuron

It takes one ‘time step’ to pass a signal over

How does this model compare to the biological
neurons we have previously studied?

Biological Neurons

Neurotransmitters can either inhibit or
excite neuron

Output is a train of pulses

can a train
of pulses be modeled by a 0/1 level?

General McCulloch Pitts Neuron

X1

Xn

Xn+1

Xn+m

Y

+w

+w

-
p

-
p

p > 0

-
p inhibit

+w excite

Activation Function

f(y_in) = 1 if y_in

= 0 if y_in <

Where

is the threshold

AND

uses analysis instead of learning to determine
weights

x1

x2

y

0

0

0

0

1

0

1
0

0

1

1

1

x1

X1

y

Y

x2

= 2

X2

1

1

y_in = x1*1 + x2*1 = x1+x2

y = f(y_in) = 1 if y_in

,
y_in

2

OR

uses analysis instead of learning to determine
weights

x1

x2

y

0

0

0

0

1

1

1
0

1

1

1

1

x1

X1

y

Y

x2

= 2

X2

2

2

y_in = x1*2 + x2*2

y = f(y_in) = 1 if y_in

,
in this example

y_in

2

AND
-
NOT

non
-
symmetric function

x1

x2

y

0

0

0

0

1

0

1
0

1

1

1

0

x1

X1

y

Y

x2

= 2

X2

2

-
1

y_in = x1*2
-

x2

y = f(y_in) = 1 if y_in

,
in this example

y_in

2

XOR Function

x1 xor x2 = ( x1 AND NOT x2 ) OR ( x2 AND NOT x1)

= Z1 OR Z2

2 Z1

X1

-
1

X2

Z2

y

2

2

2

-
1

All thresholds = 2

Training Algorithms for Single Layer
Neuron Networks

Hebbs

most fundamental

Perceptron Learning

Delta Rule

HEBB Net

Learning occurs by modifying the
weights so that the weight between two
neurons that are both ‘on’ is increased

Modified Hebb Learning increases the
strength of the weight when both
neurons are either on or off. This is
more powerful han original Hebb rule

Hebb Learning

1

X1

Y

X2

y

b

w1

w2

Bipolar Data +1 or

1

We need training data for learning ( s:t )

Training vector s

Target vector t

Hebb Learning Algorithm

Initialize weights to 0, w
i

For each training vector and target pair,
s
i

: t
i

, ( i = 1,n )

Set activations for input neurons x
i

= s
i

Set activations for output neuron y = t
i

w
i

(new) = w
i

(old) + x
i

y

Adjust bias b(new) = b(old) + y

Use only one pass through the training data

Hebb Learning Example ( AND Logic Gate ):

Input

x1

Input

x2

Bias

B

Target

y

1

1

1

1

1

-
1

1

-
1

-
1

1

1

-
1

-
1

-
1

1

-
1

Initialize weights to zero, calculate change in weights and bias

Recall:

w
i

(new) = w
i

(old) + x
i

y

b(new) = b(old) + y

So, define:

w
1

= x
1
y

w
2

= x
2
y

b = y

x1

x2

b

y

w
1

w
2

b

w
1

w
2

b

1

1

1

1

1

1

1

NEW

w
1,
w
2,
b

Initialize weights to zero, calculate change in weights and bias

Recall:

w
i

(new) = w
i

(old) + x
i

y

b(new) = b(old) + y

So:

w
1

= x
1
y

w
2

= x
2
y

b = y

x1

x2

b

y

w
1

w
2

b

w
1

w
2

b

1

1

1

1

1

1

1

1

1

1

NEW

w
1,
w
2,
b

Since initial weights = 0,

w
i

(new) = w
i

(old) + x
i

y

w
i

(new) = x
i

y

Current Decision Boundary

y = b +

i

x
i

* w
i
= 0
(recall zero is the boundary )

0 = b + x
1
w
1

+ x
2
w
2

solve for x
2
: x
2
=
-
(w
1

/w
2
) x
1

b/w
2

With current weights:
x
2
=
-
x
1

1

x
2
= 0, x
1

=
-
1

x
2
=
-
1 , x
1

= 0

Current Decision Boundary

+

-

-

-

Using:

w
i

(new) = w
i

(old) + x
i

y

b(new) = b(old) + y

And:

w
1

= x
1
y

w
2

= x
2
y

b = y

x1

x2

b

y

w
1

w
2

b

w
1

w
2

b

1

-
1

1

-
1

-
1

1

-
1

0

2

0

NEW

w
1,
w
2,
b

Since previous weights no longer 0,

w
i

(new) = w
i

(old) + x
i

y

Next Data Set

Current Decision Boundary

+

-

-

-

x
2
=
-
(w
1

/w
2
) x
1

b/w
2

With current weights:
x
2
= 0

Using:

w
i

(new) = w
i

(old) + x
i

y

b(new) = b(old) + y

And:

w
1

= x
1
y

w
2

= x
2
y

b = y

x1

x2

b

y

w
1

w
2

b

w
1

w
2

b

-
1

1

1

-
1

1

-
1

-
1

1

1

-
1

NEW

w
1,
w
2,
b

Since previous weights no longer 0,

w
i

(new) = w
i

(old) + x
i

y

Next Data Set

Current Decision Boundary

+

-

-

-

x
2
=
-
(w
1

/w
2
) x
1

b/w
2

With current weights:
x
2
=
-

x
1

+1

Boundary is now in correct position, but one more data set to process

Using:

w
i

(new) = w
i

(old) + x
i

y

b(new) = b(old) + y

And:

w
1

= x
1
y

w
2

= x
2
y

b = y

x1

x2

b

y

w
1

w
2

b

w
1

w
2

b

-
1

-
1

1

-
1

1

1

-
1

2

2

-
2

NEW

w
1,
w
2,
b

Since previous weights no longer 0,

w
i

(new) = w
i

(old) + x
i

y

Last Data Set

Final Decision Boundary

+

-

-

-

x
2
=
-
(w
1

/w
2
) x
1

b/w
2

With current weights:
x
2
=
-

x
1

+1

Observations for Hebb’s Learning

Weights only change for active input
neurons, x
i

0

Hebb Learning will not always find the
correct weights even if hey exist

Perceptron Learning Algorithm

Developed by Frank Rosenblatt

Will always converge to correct weights
if they exist

Incorporates the concept of a learning
rate

you can control how fast the
neuron learns.

Perceptron Learning Algorithm

Initialize weights and bias to 0

Set learning rate

( 0 <

1 )

Continue process as long as weight change:

For each training pair, x
i

= s
i

Compute response of output neuron:

y_in = b +

x
i
w
i

1 if y_in >

Output of neuron with input y_in

y = 0 if
-

y_in

-
1 if y_in <
-

continued

Perceptron Learning Algorithm

If y

t ( there is an error, update
weights)

w
i
(new) = w
i
(old) +

t
x
i

note: if x =0,

b(new) =b(old)

+

t
weight does not

ELSE:

change

w
i
(new) = w
i
(old), b(new) =b(old)

Perceptron Learning Algorithm

As long as the weights or bias changes
at least once while processing the
complete set of data continue repeating
the algorithm

STOP when no weights or bias changes
for a complete pass through the data