CSNB234 ARTIFICIAL INTELLIGENCE

clangedbivalveΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

61 εμφανίσεις

UNIVERSITI TENAGA NASIONAL

CSNB234

ARTIFICIAL INTELLIGENCE

Chapter 10

Artificial Neural Networks (ANN)

Instructor: Alicia Tang Y. C.

(Chapter 11, pp. 458
-
471, Textbook)

(Chapter 18, Ref. #1)


UNIVERSITI TENAGA NASIONAL

2

UNIVERSITI TENAGA NASIONAL

2

What is Neural Network?


Neural Networks are a different paradigm
for computing:


Neural networks are based on the
parallel architecture of animal brains.


It is a model that simulate a biological
neural network


Real brains, however, are orders of
magnitude more complex than any
artificial neural network so far considered.

UNIVERSITI TENAGA NASIONAL

3

Artificial Neural Networks


Supervised Learning


The Perceptron


Multilayer Neural Networks that use a
backpropagation learning algorithm


The Hopfield network


Stochastic network


Unsupervised Learning


Hebbian Learning


Competitive Learning


Kohonen Network (SOM)

UNIVERSITI TENAGA NASIONAL

4

SUPERVISED LEARNING

INPUT

ERROR

HANDLER

OUTPUT

ANN


EXPECTED

OUTPUT

Feedback

loop

UNIVERSITI TENAGA NASIONAL

5

UNSUPERVISED LEARNING

INPUT

OUTPUT

Unsupervised

learning program

The learning programs will adjust themselves

to figure out what could be the output.

There is no targets to match, whatsoever

UNIVERSITI TENAGA NASIONAL

6

A Schematic of a
Neuron

UNIVERSITI TENAGA NASIONAL

7


Neural network at the first glimpse


Neuron


A cell body consists of many dendrites


A single branch is called an axon


It is the information processor


dendrites handle inputs
-

receive signals


soma does processing


axon holds output


Neurons are connected by Synapses


synapses are modelled by (adjusting)
weights
-

point of contact between neurons

UNIVERSITI TENAGA NASIONAL

8

What is in a Neural Network?


The model consists of artificial neurons
(processing elements or parameters)


they are called nodes


depends on hardware or software
implementation


All neurons are connected in some
structure that form a “network” look, i.e.
neurons are interconnected


A neural network usually operates in
parallel


parallel computation


doing multiple things at the same time.

UNIVERSITI TENAGA NASIONAL

9

What’s Special in

a Neural Network?


Its computing architecture is based on:


large number of relatively simple processors


operating in PARALLEL


connected to each other by link system

How does the artificial neural
network model the brain?



An artificial neural network consists of a number of interconnected
processors.


These processors are made very simple; which are analogous to
biological neurons in the human brain.


The neurons are connected by weighted links passing signals from
one neuron to another.


Each neuron receives a number of signals, and it produces only one
output signal through its connection.


The outgoing connection, in turn, splits into a number of branches
that transmit the same signal.


The outgoing branches terminate at the incoming connections of
other neurons in the network.

UNIVERSITI TENAGA NASIONAL

10

UNIVERSITI TENAGA NASIONAL

11

Why Neural Network Computing?


To
model and mimic

certain
processing capabilities of
our brain.


It
imitates

the way a
human brain works,
learns
,
etc.

UNIVERSITI TENAGA NASIONAL

12

A Neural Network Model


Consists of


Input units



x
i


Weight from unit i


w
i


An activation level


a


A threshold






A network topology


A learning algorithm

Real numbers

UNIVERSITI TENAGA NASIONAL

13

Neural Network with Hidden
Layer(s)

UNIVERSITI TENAGA NASIONAL

14

Perceptrons Learn by Adjusting
Weights

UNIVERSITI TENAGA NASIONAL

15

An example of the use of ANN

UNIVERSITI TENAGA NASIONAL

16

THE PERCEPTRON

(Single Layer Neural Network)

UNIVERSITI TENAGA NASIONAL

17


Perceptron


Developed by Frank Rosenblatt (1958).


Its learning rule is superior than the Hebb
learning rule.


Has been proven by Rosenblatt that the weights
can converge on particular applications.


However, the Perceptron does not work for
nonlinear applications as proven by Minsky and
Papert (1969).


Activation function used is the binary step
function with an arbitrary, but fixed threshold.


Weights are adjusted by the Perceptron learning
rule.

UNIVERSITI TENAGA NASIONAL

18

A Perceptron


Is a simple neural network


1


2

:

:

n


Input units

Output unit

Input unit




x
i

Weight from unit i


w
i

Activation level



a

Threshold





Given that

UNIVERSITI TENAGA NASIONAL

19

Threshold Function used by
Perceptron


n

a = 1


if



w
i

x
i






i=1


a = 0
,


otherwise

(1)

A unit as being ‘on’ or ‘active’, if activation level is ‘1’.

UNIVERSITI TENAGA NASIONAL

20

Perceptron Threshold Function

UNIVERSITI TENAGA NASIONAL

21

A Perceptron that learns “AND” and
“OR” concepts:






1 1

1 1

AND
-


function

Each has two inputs

Weights shown next to the arcs/links

Threshold, is shown next to the output

1.5






1 1

1 1

OR
-


function

0.5

UNIVERSITI TENAGA NASIONAL

22

The perceptron will have its output ‘on’ iff




x
1
.1 + x
2
.1


ㄮ㔠†
ⴭⴭ

畳u湧n⠱)


Perceptron learns by repeatedly testing on
adjustable ‘weights’ through repeated
presentation of examples

P Q P AND Q

-----------------------------------------

1 1 1

1 0 0

0 1 0

0 0 0

x1 x2

UNIVERSITI TENAGA NASIONAL

23

A more abstract characterisation


We view inputs x
1
, x
2
, … x
n

to a
perceptron as vectors in
n
-
dim space


Since activation levels are restricted to 1
or 0, all input vectors will lie on the
corner of a
hypercube

in this space


We may view weights and threshold as
defining a
hyperplane

satisfying the
equation:


w
1
x
1

+ w
2
x
2

+ …. + w
n
x
n
-



㴠0


UNIVERSITI TENAGA NASIONAL

24

Geometric Interpretation


Input vectors are classified according to
which side of the hyperplane they fall on


This is termed as
Linear Discrimination


e.g.
four possible inputs

are fall on
vertices of a square


w
1
x
1

+ w
2
x
2


-



㴠=


defines a line in the plane

UNIVERSITI TENAGA NASIONAL

25

Linear Discrimination


E.g.

ax
1

+ bx
2

-

c = 0

(straight line)


ax
1

+ bx
2

-

c


0
(1 side of straight line)


ax
1

+ bx
2

-

c = 0

>= 0

<=0

UNIVERSITI TENAGA NASIONAL

26

Perceptron
cannot

compute XOR
function

(I)









-


-

+

+

Graph of XOR function

No straight line(s)
can be drawn to
separate the “+”
and “
-
”. Try it out,
if you don’t
believe.

P Q P XOR Q

-----------------------------------------

1 1 0

1 0 1

0 1 1

0 0 0

Hidden layers

required!!

UNIVERSITI TENAGA NASIONAL

27

Perceptron cannot compute XOR
function

(II)


Consider this net:






This suggests that neural nets of threshold
units comprising more than one layer can
correctly compute XOR function


0.5




1
-
2 1




1 1


1.5

UNIVERSITI TENAGA NASIONAL

28



Perceptron cannot compute XOR
function


(III)


Hidden unit is neither an input nor an
output unit, thus we need not concern
with its activation level


Any function a perceptron can compute,
a perceptron can

learn

UNIVERSITI TENAGA NASIONAL

29

Description of A
Learning Task


Rules:


to teach a perceptron a function f which
maps n binary values

x
1
, x
2
, … x
n

to a
binary output
f(x
1
, x
2
, … x
n

).


Think of f being the AND function


{
f(1,1)=
1
, f(1,0)=
0
, f(0,1)=
0
, f(0,0)=
0

}


Starting off with
random weights &
thresholds

and inputs & output will have
some values that responds to activation
level a, either 1 or 0.

UNIVERSITI TENAGA NASIONAL

30


We then compare the actual output with
the desired output f(x
1
, x
2
, … x
n

) = t


‘t’ for teaching


If the
two are the same

then
leave the
weights/threshold alone


UNIVERSITI TENAGA NASIONAL

31

Perceptron

Learning Algorithm

UNIVERSITI TENAGA NASIONAL

32


Set
w
i

( i = 1, 2, .., n) and


to be real
numbers


Set


to be a positive real number


UNTIL

all
a
p

= t
p

for each input pattern p
DO


FOR each input pattern p = (x
1
p

… x
n
p
) DO


let new weights & threshold be:


w
i


w
i
+

(t
p
-

a
p
) . x
i
p








-


. (t
p
-

a
p
)


ENDFOR


END UNTIL

UNIVERSITI TENAGA NASIONAL

33

Few words on




This is learning rate


Amount by which we adjust
w
i

&


for
each pattern P.


It affects the
Speed
of learning


fairly small positive number is suggested


if it is too big
--
> over step minima


if it is too small
--
> move very
2

slow


UNIVERSITI TENAGA NASIONAL

34

x

Minima is here & being skipped

x

x

Too slow!!! Crawling ….

UNIVERSITI TENAGA NASIONAL

35



Multi
-
layer

Neural Networks (MLP)





Hidden layers are required…


What are hidden layers?

-

They are layers additional to the input and output layers,


not connected externally.

-

They are located in between the input and output layers.


Multi
-
layer Perceptron (MLP)


To build
nonlinear classifier
based
on
Perceptrons


Structure of MLP is usually found
by
experimentation


Parameters can be found using
backpropagation

UNIVERSITI TENAGA NASIONAL

36

Multi
-
layer Perceptron (MLP)


How to learn?


Cannot simply use
Perceptron

learning rule
because we hidden layer(s)


There is a function that we are trying to
minimize
:
e r
r

o r


Need a different activation function:


Use
sigmoid function
instead of threshold
function


UNIVERSITI TENAGA NASIONAL

37

Formulas needed for

The backpropagation learning algorithm

UNIVERSITI TENAGA NASIONAL

38

UNIVERSITI TENAGA NASIONAL

39

UNIVERSITI TENAGA NASIONAL

40

Multi
-
layer Neural Networks


Modifications done to “units”


We still assume input values are either 1 or 0


Output values are either 1 or 0


But, activation levels take on any real number
between 0 and 1


Thus,



the activation level of each unit x
j
is: first we take
the net input to x
j
to be weighted sum using this
formula




net
j
= (


w
ji

. x
i
)
-


j
)
------

(2)


i

UNIVERSITI TENAGA NASIONAL

41


Here,


summation runs over all input units x
i

in the
previous layer
to x
j


with w
ji
denoting the weight on the link from
x
i

to unit x
j



j

the threshold corresponding to
x
j



Step function required and we use

SIGMOID function

UNIVERSITI TENAGA NASIONAL

42

Sigmoid Function


Is a
continuous

function


Also called
smooth

function


Why is this f(x) needed?


It is a mathematical function that produces a sigmoid
curve (i.e. S shape). It is a special case of a logistic
function. It is used in neural network to introduce non
linearity in the learning model.


1

f(net
j
) =


1 + e
(
-



w
ji
. x
i
+

j
) / T




---

Sigmoid f(x)

Run over all
i

UNIVERSITI TENAGA NASIONAL

43

Learning in Multi
-
layer NN via the
‘Backpropagation’ learning algorithm


All input patterns P are fed one at a time into
the input units


actual response of the output units are
compared with the desired output


adjustments are made to the weights in
response to discrepancies between the desired
& actual outputs


after all input patterns have been given, the
whole process is repeated over & over until
the actual response of the output is tolerably
close to the desired response

UNIVERSITI TENAGA NASIONAL

44

We now examine the procedure
of adjusting weights
:



j
p
= (t
j

-

a
j
)
--------

(3)



where



j
p
= error at unit j in respond to presentation of
input pattern P


t
j
= desired response


a
j
= actual response


For an output unit, j

UNIVERSITI TENAGA NASIONAL

45


The weights leading to unit
j
are modified
in much the same way as for single
-
layer
perceptron


For all units
k

which feed into unit
j
, we
set:

w
j,k




j,k


+

a
k
p
.

j
p
. f’(net
j
p
)
--------

(4)


f’(net
j
p
)


= rate of change of function at any point,

i.e. derivative of a function

UNIVERSITI TENAGA NASIONAL

46

What if
unit j

is a hidden unit?


The measure of

j
p

of error at
unit j
, cannot
this time be given by the difference
(t
j

-

a
j
)

[recall formula (3)]


Because we do not know what the response
of the hidden units should be!!


Instead, it is calculated on the basis of the
errors of the units in the layer immediately
above
unit j


UNIVERSITI TENAGA NASIONAL

47


Specifically, the error at unit j is the
weighted sum of ALL the errors at the
units k such that there is a link from unit
j to unit k, with the weighting simply
being given by the weights on the links:


j
p

=


w
k,j

.

k
p

------

(5)


k

UNIVERSITI TENAGA NASIONAL

48


Equation (3)

tells us how to calculate
error for output units

and
equation (5)

tells us how to
calculate errors for
hidden units

in terms of the errors in the
layer above


We can construct a “
goodness
-
of
-
fit
” measure,

which is used to determine how close the network

is to compute the function we are trying to teach it.

A (sensible) measure is:



E =


E
p


Where
E
p
= (


(t
j
p

-

o
j
p
)
2
)



UNIVERSITI TENAGA NASIONAL

49

ANN
Promises


A successful implementation area of ANN is
“vision”.


NN can survive the failure of some nodes


Handle noise (missing data) well. Once trained,
NN shows an ability to
recognize patterns even
though part of the data is missing


A tool for modeling and exploring brain function


Parallelism (without much effort)


A neural network can
execute an automatic
acquisition task
for situation in which historical
data are available.

UNIVERSITI TENAGA NASIONAL

50

ANN unsolved
problems


It can not (now) model high
-
level cognitive
mechanism such as attention


Brains are very large, having trillions of
neurons


There is growing evidence that (human)
neuron can learn by not merely adjusting
weights but to grow new connections

Exercises


State True/False

Multiple choice questions


UNIVERSITI TENAGA NASIONAL

52

UNIVERSITI TENAGA NASIONAL

53

UNIVERSITI TENAGA NASIONAL

54

Exercise


A neural network for training the
recognition of the digits 0


9


UNIVERSITI TENAGA NASIONAL

55

UNIVERSITI TENAGA NASIONAL

56

How many bars (hence input bits) are
required to represent the digits 0


9

Only 7 bars

are required

to represent

all the 10 digits

UNIVERSITI TENAGA NASIONAL

57

So, the neural network can could be used

for training will look like this: