LOGO
Classification III
Lecturer: Dr. Bo Yuan
E

mail: yuanb@sz.tsinghua.edu.cn
Overview
Artificial Neural Networks
2
Biological Motivation
3
10
11
: The number of neurons in the human brain
10
4
: The average number of connections of each neuron
10

3
: The fastest switching times of neurons
10

10
: The switching speeds of computers
10

1
: The time required to visually recognize your mother
Biological Motivation
The power of parallelism
The
information
processing
abilities
of
biological
neural
systems
follow
from
highly
parallel
processes
operating
on
representations
that
are
distributed
over
many
neurons
.
The
motivation
of
ANN
is
to
capture
this
kind
of
highly
parallel
computation
based
on
distributed
representations
.
Sequential machines vs. Parallel machines
Group A
Using ANN to study and model biological learning processes.
Group B
Obtaining
highly
effective
machine
learning
algorithms,
regardless
of
how
closely
these
algorithms
mimic
biological
processes
.
4
Neural Network Representations
5
When does ANN work?
Instances
are
represented
by
attribute

value
pairs
.
Input
values
can
be
any
real
values
.
The
target
output
may
be
discrete

valued,
real

valued,
or
a
vector
of
several
real

or
discrete

valued
attributes
.
The training samples may contain errors.
Long training times are acceptable.
Can range from a few seconds to several hours.
Fast
evaluation
of
the
learned
target
function
may
be
required
.
The
ability
of
humans
to
understand
the
learned
target
function
is
not
important
.
Weights
are
difficult
for
humans
to
interpret
.
6
Perceptrons
7
∑
x
1
x
2
x
n
w
1
w
2
w
n
.
.
.
w
0
x
0
=1
Power of Perceptrons
8

0.8
0.5
0.5

0.3
0.5
0.5
Input
∑
Output
0
0

0.8
0
0
1

0.3
0
1
0

0.3
0
1
1
0.3
1
Input
∑
Output
0
0

0.3
0
0
1
0.2
1
1
0
0.2
1
1
1
0.7
1
AND
OR
Error Surface
9
Error
w
1
w
2
Gradient Descent
10
Learning Rate
Batch Learning
Delta Rule
11
Perceptron Learning
GRADIENT DESCENT(
training_examples
,
η
)
Initialize each
w
to some small random value.
Until the termination condition is met, Do
Initialize each
Δ
w
i
to zero.
For each <
x
,
t
> in
training_examples
, Do
–
Input the instance
x
to the unit and compute the output
o
–
For each linear unit weight
w
i
, Do
»
Δ
w
i
←
Δ
w
i
+
η
(
t

o
)
x
i
For each linear unit weight
w
i
, Do
–
w
i
←
w
i
+
Δ
w
i
12
Stochastic Gradient Descent
13
t
: target output for the current training sample
o
: the output generated by the perceptron
η
: learning rate
For example, if
x
i
=0.8,
η
=0.1,
t
=1 and
o
=0
Δ
w
i
=
η
(
t

o
)
x
i
=0.1
х
(1

0)
х
0.8=0.08
+

+
+
+



+
+


Learning NAND
14
Input
Initial
Output
Final
x
i
Tar
get
Weights
Calculated
Sum
Output
Error
Correction
Weights
x
0
x
1
x
2
t
w
0
w
1
w
2
C
0
C
1
C
2
S
o
E
R
w
0
w
1
w
2
x
0
·
w
0
x
1
·
w
1
x
2
·
w
2
C
0
+
C
1
+
C
2
t

o
LR
x
E
1
0
0
1
0
0
0
0
0
0
0
0
1
+0.1
0.1
0
0
1
0
1
1
0.1
0
0
0.1
0
0
0.1
0
1
+0.1
0.2
0
0.1
1
1
0
1
0.2
0
0.1
0.2
0
0
0.2
0
1
+0.1
0.3
0.1
0.1
1
1
1
0
0.3
0.1
0.1
0.3
0.1
0.1
0.5
0
0
0
0.3
0.1
0.1
1
0
0
1
0.3
0.1
0.1
0.3
0
0
0.3
0
1
+0.1
0.4
0.1
0.1
1
0
1
1
0.4
0.1
0.1
0.4
0
0.1
0.5
0
1
+0.1
0.5
0.1
0.2
1
1
0
1
0.5
0.1
0.2
0.5
0.1
0
0.6
1
0
0
0.5
0.1
0.2
1
1
1
0
0.5
0.1
0.2
0.5
0.1
0.2
0.8
1

1

0.1
0.4
0
0.1
1
0
0
1
0.4
0
0.1
0.4
0
0
0.4
0
1
+0.1
0.5
0
0.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
0
1
0.8

.2

.1
0.8

.2
0
0.6
1
0
0
0.8

.2

.1
threshold=0.5; learning rate=0.1
Multilayer Networks
15
XOR
16


Input
Output
0
0
0
0
1
1
1
0
1
1
1
0
Cannot be separated by a single line.
+
+
p
q
XOR
17

+
+

OR
NAND
OR
NAND
AND
p
q
XOR
18

+
+

AND
p
q
OR
NAND
AND
0
0
0
1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
0
0
Input
Hidden
Output
Hidden Layer Representations
OR
NAND
The Sigmoid Threshold Unit
19
∑
x
1
x
2
x
n
w
1
w
2
w
n
.
.
.
w
0
x
0
=1
Backpropagation Rule
20
•
x
j
i
= the
i
th
input to unit
j
•
w
ji
= the weight associated with the
i
th
input to unit
j
•
net
j
=
∑
w
ji
x
ji
(the weighted sum of inputs for unit
j
)
•
o
j
=
the output of unit
j
•
t
j
=
the target output of unit
j
•
σ
= the sigmoid function
•
outputs
= the set of units in the final layer
•
Downstream (
j
) = the set of units directly taking the output of unit
j
as inputs
j
i
Training Rule for Output Units
21
Training Rule for Hidden Units
22
k
j
BP Framework
BACKPROPAGATION(
training_examples
,
η
,
n
in
,
n
out
,
n
hidden
)
Create
a
network
with
n
in
inputs,
n
hidden
hidden
units
and
n
out
output
units
.
Initialize
all
network
weights
to
small
random
numbers
.
Until
the
termination
condition
is
met,
Do
For
each
<
x
,
t
>
in
training_examples
,
Do
•
Input
the
instance
x
to
the
network
and
computer
the
output
o
of
every
unit
.
•
For
each
output
unit
k
,
calculate
its
error
term
δ
k
•
For
each
hidden
unit
h
,
calculate
its
error
term
δ
h
•
Update
each
network
weight
w
ji
23
More about BP Networks …
Convergence and Local Minima
The search space is likely to be highly multimodal.
May easily get stuck at a local solution.
Need multiple trials with different initial weights.
Evolving Neural Networks
Black

box optimization techniques (e.g., Genetic Algorithms)
Usually better accuracy
Can do some advanced training (e.g., structure + parameter).
Require much longer training time.
•
Xin
Yao
(
1999
)
“
Evolving
Artificial
Neural
Networks
”,
Proceedings
of
the
IEEE
,
pp
.
1423

1447
.
Representational Power
Boolean functions
•
Two layers
Continuous functions
•
Two layers
Arbitrary functions
•
Three layers
24
More about BP Networks …
Overfitting
Tend to occur during later iterations.
Use validation dataset to terminate the training when necessary.
Practical Considerations
Momentum
Adaptive learning rate
•
Small: slow convergence, easy to get stuck
•
Large: fast convergence, unstable
25
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26
Elman Network
XOR
0
1
1 0
0
0 1
1
0 1
0
1 …
? 1 ? ? 0 ? ? 0 ? ? 1 ? …
Beyond BP Networks
27
Hopfield Network
Reading Materials
Text Book
Richard O.
Duda
et al.,
Pattern Classification,
Chapter 6, John Wiley & Sons Inc.
Tom Mitchell,
Machine Learning,
Chapter 4, McGraw

Hill.
http://page.mi.fu

berlin.de/rojas/neural/index.html.html
Online Demo
http://neuron.eng.wayne.edu/software.html
http://www.cbu.edu/~pong/ai/hopfield/hopfield.html
Online Tutorial
http://www.autonlab.org/tutorials/neural13.pdf
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html
Wikipedia & Google
28
Review
What
is
the
biological
motivation
of
ANN?
When
does
ANN
work?
What
is
a
perceptron?
What
can
a
perceptron
do?
How
to
train
a
perceptron?
What
is
the
limitation
of
perceptrons?
What
is
Backpropogation?
What
are
the
main
power
and
issues
of
BP
networks?
What
are
the
examples
of
other
types
of
ANN?
29
Next Week’s Class Talk
Volunteers are required for next week’s class talk.
Topic 1:
Applications of ANN
Topic 2:
Recurrent Neural Networks
Hints:
Robot Driving
Character Recognition
Face Recognition
Hopfield Network
Length: 20 minutes plus question time
30
Assignment
Topic: Training Feedforward Networks
Technique: BP Algorithm
Task 1: XOR Problem
4
input samples
•
0 0
0
•
1 0
1
Task 2: Identity Function
8
input samples
•
10000000
10000000
•
00010000
00010000
Use
3
hidden units
Deliverables:
Report
Code (any programming language, with detailed comments)
Due: Tuesday, 23 October
Credit: 15%
31
Comments 0
Log in to post a comment