Example of Backpropagation

journeycartAI and Robotics

Oct 15, 2013 (4 years and 23 days ago)

143 views

Machine Learning
, Chapter 4






CSE 574, Spring 2004

1

Example of Backpropagation

Machine Learning
, Chapter 4






CSE 574, Spring 2004

2

ANN Illustrative Example: Face Recognition


Machine Learning
, Chapter 4






CSE 574, Spring 2004

3

ANN Illustrative Example: Face Recognition


Many target functions can be learned from the image
data


Identity of person


Direction which the person is facing: left, right, straight
ahead, upward


Gender of the person


Whether or not wearing sunglasses


Specific task considered: learning the direction in
which the person is facing (to their left, right, straight
ahead, or upward)

Machine Learning
, Chapter 4






CSE 574, Spring 2004

4

ANN Illustrative Example: Face Recognition


Practical design choices in applying Backpropagation


The Learning Task: classifying camera images of
faces of various people in various poses


Image Database


624 grayscale images: 20 different people; approx 32
images per person


Various expressions (happy, sad, angry, neutral)


Different directions (left, right, straight ahead, up)


Resolution of 120

128


Machine Learning
, Chapter 4






CSE 574, Spring 2004

5

ANN Illustrative Example: Face Recognition


Specific task considered: learning the direction in
which the person is facing (to their left, right, straight
ahead, or upward)


Without optimizing design choices, design described
here learns target function quite well


After training on a set of 260 images, classification
accuracy over a separate test set is 90%


Contrast the default accuracy by randomly guessing
one of the four face directions is 25%

Machine Learning
, Chapter 4






CSE 574, Spring 2004

6

Design Choices

1. Input Encoding


How to encode the image: image vs features

2. Output Encoding


No of output units, target values for output units

3. Network Graph Structure


No of units and network and interconnection

4. Other Learning Algorithm Parameter


Learning rate eta


Momentum alpha

Machine Learning
, Chapter 4






CSE 574, Spring 2004

7

1. Input Encoding


Design choices


Preprocess image to extract edges, regions of uniform
intensity, or other local image features


difficulty is variable no of edges, whereas ANN has fixed
no of input units


encode image as a fixed set of 30 x 32 pixel intensity values
(coarse resolution summary of the original) ranging from 0 to
255

Machine Learning
, Chapter 4






CSE 574, Spring 2004

8

2. Output Encoding


Design choices


1 of n output encoding
: Four values indicating direction in
which person is looking (left, right, up, straight)


Single unit
: Classification using a single ouput unit assigning
0.2, 0.4, 0.6 and 0.8 to four values


Choice of 1 of n output encoding
:


provides more degres of freedom for representing target
function (n times as many weights available in output layer)


Differene between highest and second highest can be used
as a measure of confidence

Machine Learning
, Chapter 4






CSE 574, Spring 2004

9

Network graph structure

30


32

inputs

left

strt

rght

up

...

...

960 x 3 x 4 network

Machine Learning
, Chapter 4






CSE 574, Spring 2004

10

2. Output Encoding (2)


Target values for output units


obvious choice: (1,0,0,0) to encode facing looking to left


(0,1,0,0) to encode face looking straight, etc


Instead of using 0 and 1 use values 0.1 and 0.9
since sigmoid units cannot produce 0 and 1 given
finite weights


gradient descent will force weights to grow without
bound


0.1 and 0.9 are achievable using sigmoid units with
finite weights

Machine Learning
, Chapter 4






CSE 574, Spring 2004

11

Input
-
to
-
Hidden Network Weights

left

strt

rght

up

...

..
.

Weights from image pixels into each hidden unit,

--
each weight plotted in the position of
corresponding pixel

--
weights are sensitive to pixels in which face and
body appear

Machine Learning
, Chapter 4






CSE 574, Spring 2004

12

Hidden
-
to
-
Output Network Weights

left

strt

rght

up

30


32

inputs

...

...

16 weights corresponding to

hidden to output connections

with w0 being leftmost

in each rectangle

(white is high)

left

strt

rght

up

Machine Learning
, Chapter 4






CSE 574, Spring 2004

13

3. Network Graph Structure


Backpropagation can be applied to any acyclic
directed graph of sigmoid units


Standard structure using two layers of sigmoid units
(one hidden layer and one output layer)


Since training times become larger with more layers


Only 3 hidden units were used yielding 90% accuracy


With 30 hidden units test set accuracy increased only
1 to 2 percent


Training time on Sparc5 was 1 hr for 30 hidden units
and only 5 minutes for 3 hidden units

Machine Learning
, Chapter 4






CSE 574, Spring 2004

14

4. Other Learning Algorithm Parameters


Learning rate eta was set to 0.3


Momentum alpha was set to 0.3


Lower values yielded equivalent generalization
accuracy but longer training times


With higher values training fails to converge with
acceptable error over training set


Full gradient descent was used instead of stochastic
approximation


Machine Learning
, Chapter 4






CSE 574, Spring 2004

15

4. Other Learning Algorithm Parameters (2)


Input unit weights initialized to zero (because of more
intelligible visualizations of the learned weights)


After every 50 gradient steps the performance was
evaluated over the validation set


Final selected network was one with highest
accuracy over validation set


Final reported accuracy was over third set of test
examples


Machine Learning
, Chapter 4






CSE 574, Spring 2004

16

Learned Hidden Representations


Useful to examine learned weight values for 2889
weights in network

Machine Learning
, Chapter 4






CSE 574, Spring 2004

17

Network Weights after 100 iterations

left

strt

rght

up

30


32

inputs

...

...

16 weights corresponding to

hidden to output connections

with w0 being leftmost

in each rectangle

(white is high)

Weights from image pixels into each hidden unit,

with each weight plotted in the position of corresponding pixel

weights are sensitive to features in which face and body appear

left

strt

rght

up

Machine Learning
, Chapter 4






CSE 574, Spring 2004

18

Network Behavior for “right” input

left

strt

rght

up

30


32

inputs

...

...

left

strt

right

up

Input
-
Hidden Weights match for middle hidden unit

Also w2 has a high weight for middle hidden unit

Therefore “right” will fire

Machine Learning
, Chapter 4






CSE 574, Spring 2004

19

Character Recognition


http://yann.lecun.com/exdb/lenet/index.html