# Neural Nets Using Backpropagation

AI and Robotics

Oct 19, 2013 (3 years and 7 months ago)

72 views

Neural Nets Using Backpropagation

Chris Marriott

Ryan Shirley

CJ Baker

Thomas Tannahill

Agenda

Review of Neural Nets and Backpropagation

Backpropagation: The Math

and other algorithms

Other ways of minimizing error

Review

Approach that developed from an analysis of the
human brain

Nodes created as an analog to neurons

Mainly used for classification problems (i.e. character
recognition, voice recognition, medical applications,
etc.)

Review

Neurons have weighted inputs, threshold values,
activation function, and an output

Weighted

inputs

Output

Activation function = f(
S
⡩湰畴猠⨠睥楧桴⤩

Review

4 Input AND

Threshold = 1.5

Threshold = 1.5

Threshold = 1.5

All weights = 1 and all outputs = 1 if active 0 otherwise

Inputs

Inputs

Outputs

Review

Output space for AND gate

(1,1)

(1,0)

(0,1)

(0,0)

1.5 = w1*I1 + w2*I2

Input 1

Input 2

Review

Output space for XOR gate

Demonstrates need for hidden layer

(1,1)

(1,0)

(0,1)

(0,0)

Input 1

Input 2

Backpropagation: The Math

General multi
-
layered neural network

0

1

2

3

4

5

6

7

8

9

0

1

i

0

1

Output Layer

Wi,0

W0,0

W1,0

X9,0

X0,0

X1,0

Hidden Layer

Input Layer

Backpropagation: The Math

Backpropagation

Calculation of hidden layer activation values

Backpropagation: The Math

Backpropagation

Calculation of output layer activation values

Backpropagation: The Math

Backpropagation

Calculation of error

d
k

= f(D
k
)
-
f(O
k
)

Backpropagation: The Math

Backpropagation

Backpropagation: The Math

Backpropagation

Output layer weight recalculation

Learning Rate

(eg. 0.25)

Error at k

Backpropagation: The Math

Backpropagation

Hidden Layer weight recalculation

Descent

Relatively simple implementation

Standard method and generally works well

Slow and inefficient

Can get stuck in local minima resulting in sub
-
optimal
solutions

Local Minima

Local
Minimum

Global Minimum

Simulated Annealing

Can

guarantee optimal solution (global minimum)

May be slower than gradient descent

Much more complicated implementation

Genetic Algorithms/Evolutionary Strategies

Faster than simulated annealing

Less likely to get stuck in local minima

Memory intensive for large nets

Simplex Algorithm

Similar to gradient descent but faster

Easy to implement

Does not guarantee a global minimum

Momentum

Adds a percentage of the last movement to the current
movement

Momentum

Useful to get over small bumps in the error function

Often finds a minimum in less steps

w(t) =
-
n*d*y + a*w(t
-
1)

w is the change in weight

n is the learning rate

d is the error

y is different depending on which layer we are calculating

a is the momentum parameter

It assigns each weight a learning rate

That learning rate is determined by the sign of the gradient of the
error function from the last iteration

If the signs are
equal

it is more likely to be a shallow slope so the
learning rate is increased

The signs are more likely to
differ

on a steep slope so the learning rate
is decreased

Possible Problems:

Since we minimize the error for each weight separately the
overall error may increase

Solution:

Calculate the total output error after each adaptation and if it is
greater than the previous error reject that adaptation and
calculate new learning rates

SuperSAB(Super Self
-

Combines the momentum and adaptive methods.

Uses adaptive method and momentum so long as the sign of the

This is an additive effect of both methods resulting in a faster traversal

When the sign of the gradient does change the momentum will
cancel the drastic drop in learning rate

This allows for the function to roll up the other side of the minimum
possibly escaping local minima

SuperSAB

Experiments show that the SuperSAB converges faster than

Overall this algorithm is less sensitive (and so is less likely to
get caught in local minima)

Other Ways To Minimize Error

Varying training data

Cycle through input classes

Randomly select from input classes

Randomly change value of input node (with low probability)

Retrain with expected inputs after initial training

E.g. Speech recognition

Other Ways To Minimize Error

Adding and removing neurons from layers

Adding neurons speeds up learning but may cause loss in
generalization

Removing neurons has the opposite effect

Resources

Artifical Neural Networks, Backpropagation, J.
Henseler

Artificial Intelligence: A Modern Approach, S. Russell
& P. Norvig

501 notes, J.R. Parker

www.dontveter.com/bpr/bpr.html

www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/vl4/cs
11/report.html