PPT - Dr. Bo Yuan

muscleblouseAI and Robotics

Oct 19, 2013 (4 years and 2 months ago)

92 views

LOGO

Classification III

Lecturer: Dr. Bo Yuan



E
-
mail: yuanb@sz.tsinghua.edu.cn

Overview


Artificial Neural Networks



2

Biological Motivation

3


10
11
: The number of neurons in the human brain


10
4
: The average number of connections of each neuron


10
-
3
: The fastest switching times of neurons


10
-
10
: The switching speeds of computers


10
-
1
: The time required to visually recognize your mother

Biological Motivation


The power of parallelism



The

information

processing

abilities

of

biological

neural

systems

follow

from

highly

parallel

processes

operating

on

representations

that

are

distributed

over

many

neurons
.



The

motivation

of

ANN

is

to

capture

this

kind

of

highly

parallel

computation

based

on

distributed

representations
.



Sequential machines vs. Parallel machines



Group A


Using ANN to study and model biological learning processes.



Group B


Obtaining

highly

effective

machine

learning

algorithms,

regardless

of

how

closely

these

algorithms

mimic

biological

processes
.


4

Neural Network Representations

5

When does ANN work?


Instances

are

represented

by

attribute
-
value

pairs
.


Input

values

can

be

any

real

values
.



The

target

output

may

be

discrete
-
valued,

real
-
valued,

or

a

vector

of

several

real
-

or

discrete
-
valued

attributes
.



The training samples may contain errors.



Long training times are acceptable.


Can range from a few seconds to several hours.



Fast

evaluation

of

the

learned

target

function

may

be

required
.



The

ability

of

humans

to

understand

the

learned

target

function

is

not

important
.


Weights

are

difficult

for

humans

to

interpret
.

6

Perceptrons

7



x
1

x
2

x
n

w
1

w
2

w
n

.

.

.

w
0

x
0
=1

Power of Perceptrons

8

-
0.8

0.5

0.5

-
0.3

0.5

0.5

Input



Output

0

0

-
0.8

0

0

1

-
0.3

0

1

0

-
0.3

0

1

1

0.3

1

Input



Output

0

0

-
0.3

0

0

1

0.2

1

1

0

0.2

1

1

1

0.7

1

AND

OR

Error Surface

9

Error

w
1

w
2

Gradient Descent

10

Learning Rate

Batch Learning

Delta Rule

11

Perceptron Learning

GRADIENT DESCENT(
training_examples
,
η
)



Initialize each
w

to some small random value.



Until the termination condition is met, Do



Initialize each
Δ
w
i

to zero.



For each <
x
,
t
> in
training_examples
, Do



Input the instance
x

to the unit and compute the output
o



For each linear unit weight
w
i
, Do

»
Δ
w
i


Δ
w
i
+
η
(
t
-
o
)
x
i



For each linear unit weight
w
i
, Do


w
i


w
i
+

Δ
w
i


12

Stochastic Gradient Descent

13

t
: target output for the current training sample

o
: the output generated by the perceptron

η
: learning rate

For example, if
x
i
=0.8,
η
=0.1,
t
=1 and
o
=0


Δ
w
i
=
η
(
t
-
o
)
x
i
=0.1
х
(1
-
0)
х
0.8=0.08

+

-

+

+

+

-

-

-

+

+

-

-

Learning NAND

14

Input

Initial

Output

Final

x
i

Tar
get

Weights

Calculated

Sum

Output

Error

Correction

Weights

x
0

x
1

x
2

t

w
0

w
1

w
2

C
0

C
1

C
2

S

o

E

R

w
0

w
1

w
2


x
0


·

w
0


x
1


·

w
1


x
2


·

w
2

C
0
+
C
1
+
C
2

t
-
o

LR
x
E

1

0

0

1

0

0

0

0

0

0

0

0

1

+0.1

0.1

0

0

1

0

1

1

0.1

0

0

0.1

0

0

0.1

0

1

+0.1

0.2

0

0.1

1

1

0

1

0.2

0

0.1

0.2

0

0

0.2

0

1

+0.1

0.3

0.1

0.1

1

1

1

0

0.3

0.1

0.1

0.3

0.1

0.1

0.5

0

0

0

0.3

0.1

0.1

1

0

0

1

0.3

0.1

0.1

0.3

0

0

0.3

0

1

+0.1

0.4

0.1

0.1

1

0

1

1

0.4

0.1

0.1

0.4

0

0.1

0.5

0

1

+0.1

0.5

0.1

0.2

1

1

0

1

0.5

0.1

0.2

0.5

0.1

0

0.6

1

0

0

0.5

0.1

0.2

1

1

1

0

0.5

0.1

0.2

0.5

0.1

0.2

0.8

1

-
1

-
0.1

0.4

0

0.1

1

0

0

1

0.4

0

0.1

0.4

0

0

0.4

0

1

+0.1

0.5

0

0.1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1

1

0

1

0.8

-
.2

-
.1

0.8

-
.2

0

0.6

1

0

0

0.8

-
.2

-
.1

threshold=0.5; learning rate=0.1

Multilayer Networks

15

XOR

16

-

-

Input

Output

0

0

0

0

1

1

1

0

1

1

1

0

Cannot be separated by a single line.

+

+

p

q

XOR

17

-

+

+

-

OR

NAND

OR

NAND

AND

p

q

XOR

18

-

+

+

-

AND

p

q

OR

NAND

AND

0

0

0

1

0

0

1

1

1

1

1

0

1

1

1

1

1

1

0

0

Input

Hidden

Output

Hidden Layer Representations

OR

NAND

The Sigmoid Threshold Unit

19



x
1

x
2

x
n

w
1

w
2

w
n

.

.

.

w
0

x
0
=1

Backpropagation Rule

20



x
j
i
= the
i

th

input to unit
j




w
ji
= the weight associated with the
i

th

input to unit
j




net
j
=

w
ji
x
ji

(the weighted sum of inputs for unit
j
)




o
j
=
the output of unit
j




t
j
=
the target output of unit
j




σ

= the sigmoid function




outputs

= the set of units in the final layer




Downstream (
j
) = the set of units directly taking the output of unit
j

as inputs

j

i

Training Rule for Output Units

21

Training Rule for Hidden Units

22

k

j

BP Framework


BACKPROPAGATION(
training_examples
,

η
,

n
in
,

n
out
,

n
hidden
)



Create

a

network

with

n
in

inputs,

n
hidden

hidden

units

and

n
out

output

units
.



Initialize

all

network

weights

to

small

random

numbers
.



Until

the

termination

condition

is

met,

Do



For

each

<
x
,

t
>

in

training_examples
,

Do



Input

the

instance

x

to

the

network

and

computer

the

output

o

of

every

unit
.



For

each

output

unit

k
,

calculate

its

error

term

δ
k





For

each

hidden

unit

h
,

calculate

its

error

term

δ
h





Update

each

network

weight

w
ji



23

More about BP Networks …


Convergence and Local Minima


The search space is likely to be highly multimodal.


May easily get stuck at a local solution.


Need multiple trials with different initial weights.



Evolving Neural Networks


Black
-
box optimization techniques (e.g., Genetic Algorithms)


Usually better accuracy


Can do some advanced training (e.g., structure + parameter).


Require much longer training time.


Xin

Yao

(
1999
)


Evolving

Artificial

Neural

Networks
”,

Proceedings

of

the

IEEE
,

pp
.

1423
-
1447
.




Representational Power


Boolean functions


Two layers


Continuous functions


Two layers


Arbitrary functions


Three layers


24

More about BP Networks …


Overfitting


Tend to occur during later iterations.


Use validation dataset to terminate the training when necessary.



Practical Considerations


Momentum


Adaptive learning rate


Small: slow convergence, easy to get stuck


Large: fast convergence, unstable



25

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26

Elman Network

XOR


0
1

1 0
0

0 1
1

0 1
0

1 …


? 1 ? ? 0 ? ? 0 ? ? 1 ? …

Beyond BP Networks

27

Hopfield Network

Reading Materials


Text Book



Richard O.
Duda

et al.,
Pattern Classification,
Chapter 6, John Wiley & Sons Inc.



Tom Mitchell,
Machine Learning,
Chapter 4, McGraw
-
Hill.


http://page.mi.fu
-
berlin.de/rojas/neural/index.html.html



Online Demo



http://neuron.eng.wayne.edu/software.html


http://www.cbu.edu/~pong/ai/hopfield/hopfield.html



Online Tutorial



http://www.autonlab.org/tutorials/neural13.pdf


http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html



Wikipedia & Google



28

Review


What

is

the

biological

motivation

of

ANN?



When

does

ANN

work?



What

is

a

perceptron?



What

can

a

perceptron

do?



How

to

train

a

perceptron?



What

is

the

limitation

of

perceptrons?



What

is

Backpropogation?



What

are

the

main

power

and

issues

of

BP

networks?



What

are

the

examples

of

other

types

of

ANN?


29

Next Week’s Class Talk


Volunteers are required for next week’s class talk.



Topic 1:
Applications of ANN



Topic 2:
Recurrent Neural Networks



Hints:


Robot Driving


Character Recognition


Face Recognition


Hopfield Network



Length: 20 minutes plus question time



30

Assignment


Topic: Training Feedforward Networks



Technique: BP Algorithm



Task 1: XOR Problem


4

input samples


0 0


0


1 0


1



Task 2: Identity Function


8

input samples


10000000


10000000


00010000


00010000


Use
3

hidden units



Deliverables:


Report


Code (any programming language, with detailed comments)



Due: Tuesday, 23 October



Credit: 15%

31