Machine Learning
, Chapter 4
CSE 574, Spring 2004
1
Example of Backpropagation
Machine Learning
, Chapter 4
CSE 574, Spring 2004
2
ANN Illustrative Example: Face Recognition
Machine Learning
, Chapter 4
CSE 574, Spring 2004
3
ANN Illustrative Example: Face Recognition
•
Many target functions can be learned from the image
data
•
Identity of person
•
Direction which the person is facing: left, right, straight
ahead, upward
•
Gender of the person
•
Whether or not wearing sunglasses
•
Specific task considered: learning the direction in
which the person is facing (to their left, right, straight
ahead, or upward)
Machine Learning
, Chapter 4
CSE 574, Spring 2004
4
ANN Illustrative Example: Face Recognition
•
Practical design choices in applying Backpropagation
•
The Learning Task: classifying camera images of
faces of various people in various poses
•
Image Database
•
624 grayscale images: 20 different people; approx 32
images per person
•
Various expressions (happy, sad, angry, neutral)
•
Different directions (left, right, straight ahead, up)
•
Resolution of 120
128
Machine Learning
, Chapter 4
CSE 574, Spring 2004
5
ANN Illustrative Example: Face Recognition
•
Specific task considered: learning the direction in
which the person is facing (to their left, right, straight
ahead, or upward)
•
Without optimizing design choices, design described
here learns target function quite well
•
After training on a set of 260 images, classification
accuracy over a separate test set is 90%
•
Contrast the default accuracy by randomly guessing
one of the four face directions is 25%
Machine Learning
, Chapter 4
CSE 574, Spring 2004
6
Design Choices
1. Input Encoding
•
How to encode the image: image vs features
2. Output Encoding
•
No of output units, target values for output units
3. Network Graph Structure
•
No of units and network and interconnection
4. Other Learning Algorithm Parameter
•
Learning rate eta
•
Momentum alpha
Machine Learning
, Chapter 4
CSE 574, Spring 2004
7
1. Input Encoding
•
Design choices
•
Preprocess image to extract edges, regions of uniform
intensity, or other local image features
•
difficulty is variable no of edges, whereas ANN has fixed
no of input units
•
encode image as a fixed set of 30 x 32 pixel intensity values
(coarse resolution summary of the original) ranging from 0 to
255
Machine Learning
, Chapter 4
CSE 574, Spring 2004
8
2. Output Encoding
•
Design choices
•
1 of n output encoding
: Four values indicating direction in
which person is looking (left, right, up, straight)
•
Single unit
: Classification using a single ouput unit assigning
0.2, 0.4, 0.6 and 0.8 to four values
•
Choice of 1 of n output encoding
:
•
provides more degres of freedom for representing target
function (n times as many weights available in output layer)
•
Differene between highest and second highest can be used
as a measure of confidence
Machine Learning
, Chapter 4
CSE 574, Spring 2004
9
Network graph structure
30
32
inputs
left
strt
rght
up
...
...
960 x 3 x 4 network
Machine Learning
, Chapter 4
CSE 574, Spring 2004
10
2. Output Encoding (2)
•
Target values for output units
•
obvious choice: (1,0,0,0) to encode facing looking to left
•
(0,1,0,0) to encode face looking straight, etc
•
Instead of using 0 and 1 use values 0.1 and 0.9
since sigmoid units cannot produce 0 and 1 given
finite weights
•
gradient descent will force weights to grow without
bound
•
0.1 and 0.9 are achievable using sigmoid units with
finite weights
Machine Learning
, Chapter 4
CSE 574, Spring 2004
11
Input

to

Hidden Network Weights
left
strt
rght
up
...
..
.
Weights from image pixels into each hidden unit,

each weight plotted in the position of
corresponding pixel

weights are sensitive to pixels in which face and
body appear
Machine Learning
, Chapter 4
CSE 574, Spring 2004
12
Hidden

to

Output Network Weights
left
strt
rght
up
30
32
inputs
...
...
16 weights corresponding to
hidden to output connections
with w0 being leftmost
in each rectangle
(white is high)
left
strt
rght
up
Machine Learning
, Chapter 4
CSE 574, Spring 2004
13
3. Network Graph Structure
•
Backpropagation can be applied to any acyclic
directed graph of sigmoid units
•
Standard structure using two layers of sigmoid units
(one hidden layer and one output layer)
•
Since training times become larger with more layers
•
Only 3 hidden units were used yielding 90% accuracy
•
With 30 hidden units test set accuracy increased only
1 to 2 percent
•
Training time on Sparc5 was 1 hr for 30 hidden units
and only 5 minutes for 3 hidden units
Machine Learning
, Chapter 4
CSE 574, Spring 2004
14
4. Other Learning Algorithm Parameters
•
Learning rate eta was set to 0.3
•
Momentum alpha was set to 0.3
•
Lower values yielded equivalent generalization
accuracy but longer training times
•
With higher values training fails to converge with
acceptable error over training set
•
Full gradient descent was used instead of stochastic
approximation
Machine Learning
, Chapter 4
CSE 574, Spring 2004
15
4. Other Learning Algorithm Parameters (2)
•
Input unit weights initialized to zero (because of more
intelligible visualizations of the learned weights)
•
After every 50 gradient steps the performance was
evaluated over the validation set
•
Final selected network was one with highest
accuracy over validation set
•
Final reported accuracy was over third set of test
examples
Machine Learning
, Chapter 4
CSE 574, Spring 2004
16
Learned Hidden Representations
•
Useful to examine learned weight values for 2889
weights in network
Machine Learning
, Chapter 4
CSE 574, Spring 2004
17
Network Weights after 100 iterations
left
strt
rght
up
30
32
inputs
...
...
16 weights corresponding to
hidden to output connections
with w0 being leftmost
in each rectangle
(white is high)
Weights from image pixels into each hidden unit,
with each weight plotted in the position of corresponding pixel
weights are sensitive to features in which face and body appear
left
strt
rght
up
Machine Learning
, Chapter 4
CSE 574, Spring 2004
18
Network Behavior for “right” input
left
strt
rght
up
30
32
inputs
...
...
left
strt
right
up
Input

Hidden Weights match for middle hidden unit
Also w2 has a high weight for middle hidden unit
Therefore “right” will fire
Machine Learning
, Chapter 4
CSE 574, Spring 2004
19
Character Recognition
•
http://yann.lecun.com/exdb/lenet/index.html
Comments 0
Log in to post a comment