LEARNING IN NEURAL NETWORKS

chickenchairwomanΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

59 εμφανίσεις

ARTIFICIAL INTELLIGENCE

[INTELLIGENT AGENTS PARADIGM]

Professor Janis Grundspenkis


Riga Technical University

Faculty of Computer Science and Information Technology

Institute of Applied Computer Systems

Department of Systems Theory and Design


E
-
mail:
Janis.Grundspenkis@rtu.lv

LEARNING IN NEURAL NETWORKS

LEARNING IN NEURAL
NETWORKS (1)


A (artificial)
neural network

is composed of a number of
simple
arithmetic computing elements

or
nodes

(
units
),
connected by
links
.


Synonyms: connectionism, parallel distributed processing, and
neural computation.


From a biological viewpoint, a neural network is a
mathematical model for the operation of brain.


The nodes of a neural network correspond to
neurons
.


A neuron is a cell in the brain whose principal function is
the collection, processing, and dissemination of electrical
signals.

LEARNING IN NEURAL
NETWORKS (2)


The
brains information capacity

is thought to
emerge primarily from
networks of such
neurons
.

Axonal

arborization

Dendrite

Cell body or Soma

Nucleus

Synapse

Axon from

another cell

Synapses

COMPARING BRAINS WITH
COMPUTERS (1)

Computer

Human Brain

Computational
units

1 CPU,

10
8

gates

10
11

neurons

Storage units

10
10

bits RAM

10
11

bits disk

10
11

neurons

10
14

synapses

Cycle time

10
-
9

sec

10
-
3

sec

Bandwidth

10
10

bits/sec

10
14

bits/sec

Memory
updates/sec

10
9

10
14

COMPARING BRAINS WITH
COMPUTERS (2)

COMPARISON OF THE MEMORY


Human brain is
evolving very slowly
, whereas
computer memories are
growing rapidly
.

COMPARISON OF SWITCHING SPEED AND
PARALLELISM


Computer chips can execute an instruction in
tens
of nanoseconds
, whereas neurons require
milliseconds

to fire.


Most current computers have
only one

or
at most
a
few

CPU, whereas all neurons and synapses are
active simultaneously

(
parallel processing
).

COMPARING BRAINS WITH
COMPUTERS (3)


A neural network running on a serial computer
requires hundreds of cycles

to decide if a single
neuron
-
like unit will fire, whereas in a human
brain, all the neurons do this in a
single step
.


CONCLUSION
: Even though a computer is a
million times faster in raw switching speed, the
brain ends up being a billion times faster at what it
does.

COMPARING BRAINS WITH
COMPUTERS (4)


Brains are more
fault
-
tolerant

than computers.


Brains are constantly faced with
novel input
, yet manage
to do something with it. Computer programs
rarely work
as well

with novel input, unless the programmer has been
exceptionally careful.


The attraction of neural networks is
graceful degradation
:
they tend to have a gradual rather than sharp drop
-
off in
performance as conditions worsen.


The attraction of neural networks also is that they are
designed
to be trained

using an inductive learning
algorithm.

A MATHEMATICAL MODEL
FOR A NEURON (1)


A simple mathematical model of the neuron
is devised by McCulloch and Pitts (1943).

g

a
i

in
i



Activation

Function

Output

Input

Function

a
i

= g(in
i
)

Bias Weight

W
0,j

a
0

=
-
1

a
j

W
j,i

Input

Links

Output

Links

A MATHEMATICAL MODEL
FOR A NEURON (2)


A neural network is composed of a number of
units
, connected by
links
.


Each link has a numeric
weight

associated with it.
Weights are the primary means of
long
-
terms
storage

in neural networks, and learning usually
takes place by updating the weights.


Some of the units are connected to the external
environment, and can be designated as
input

or
output

units.

A MATHEMATICAL MODEL
FOR A NEURON (3)


The weights are modified so as to try to bring the
network’s
input/output behaviour

more into line
with that of the environment providing the inputs.


Each unit has a set of
input links

from other units,
a set of
output links

to other units, a current
activation level
, and a
means of computing

the
activation level at the next step in time.


Each unit does a
local computation

based on
inputs of its neighbors, but without the need for
any global control over the set of units as a whole.

SIMPLE COMPUTING
ELEMENTS (1)


Each unit performs a simple computation: it
receives signals

from its input links and
computes

a new activation level that it
sends

along each of its output links.


The computation is split into two components.


First is a
linear component
, called the
input
function
,
in
i
, that computes the weighted sum of
the unit’s input values

SIMPLE COMPUTING
ELEMENTS (2)


Second is a
nonlinear component
, called the
activation
function
,
g
, that transforms the weighted sum into the final
value that serves as the unit’s output (activation value),
a
i
.


The computation of the activation level is based on the
values of each input signal received from a neighbor node,
and the weights on each input line.


A bias weight W
0i

connected to a fixed input a
0

=
-
1 sets
the actual threshold for the unit.

ACTIVATION FUNCTION (1)


The activation function
g

is designed to meet
two
wishes
.


First, the unit must be
“active”

(near +1) when the
“right” inputs

are given, and
“inactive”

(near 0)
when the

wrong


inputs

are given.


Second, the activation needs to be
nonlinear
,
otherwise the entire neural network collapses into
a simple linear function and will not be useful for
representation of more complex functions.


Different models

are obtained by using
different
mathematical functions

for
g
.

ACTIVATION FUNCTION (2)


THREE COMMON CHOICES


The threshold function

in
i

g(in
i
)

+1


The sign function

in
i

g(in
i
)

+1

-
1


The sigmoid function

in
i

g(in
i
)

+1

REPRESENTATION OF BOOLEAN
FUNCTIONS BY THRESHOLD UNITS


Individual units are able to represent
basic
Boolean function

(
logic gates
).

W
0

= 1.5

W
1

= 1

W
2

= 1

AND

W
0

= 0.5

W
1

= 1

W
2

= 1

OR

W
0

= 0.5

W
1

= 1

NOT

THE FUNCTION REPRESENTED
BY THE NETWORK


With
fixed structure

and
fixed activation function

g
, the
functions representable by a feed
-
forward network are
restricted

to have specific parameterized structure.


The weights chosen for the network determine which of
these function is actually represented.

I
1

W
13

H
3

I
2

H
4

W
14

W
23

W
24

O
5

W
35

W
45


a
5

= g(W
35

a
3

+ W
45

a
4
) =


g(W
35

g(W
13

a
1

+ W
23

a
2
) +


+ W
45

g(W
14

a
1

+ W
24

a
2
))

where g is the activation function,
and a
i

is the output of node i.

EXAMPLE

NETWORK REPRESENTATION


The links are
determined by three
parameters


a
start node

of the link


an
end node

of the link


a
numeric weight

of the link


Network topology (structure of
links)


a weight matrix

NETWORK STRUCTURES (1)

TWO MAIN CATEGORIES


FEED FORWARD NETWORKS


The network structure is
acyclic
.


A feed
-
forward network
represents a function

of its
current input

(it has
no internal state

other than the
weights themselves)
.

EXAMPLES

NETWORK STRUCTURES (2)


RECURRENT NETWORKS


The network structure is
cyclic
.

EXAMPLES

NETWORK STRUCTURES (3)


A recurrent network
feeds

its
outputs back
into

its own
inputs
.


The activation levels of recurrent networks
form a
dynamic system

that may reach a
stable state

or exhibit
oscillations

or even
chaotic behaviour
.


The
response

of the network to a given input
depends

on its
initial state
, which
may
depend

on
previous inputs
. Hence, recurrent
networks can support
short
-
term memory
.

NETWORK STRUCTURES (
4
)


Hopfield networks

are the best
-
understood class of
recurrent networks


They use
bidirectional

connections with
symmetric

weights

W
i,j

= W
j,i


All of the units are both
input

and
output

units


The activation function
g

is the
sign

function, and the
activation levels can only be

1


A Hopfield network functions as an
associative memory



after training on a set of examples, a new stimulus will
cause the network to settle into an activation pattern
correspoding to the example in the training set that
most
closely resembles

the new stimulus

FEED
-
FORWARD NEURAL
NETWORKS


SINGLE LAYER FEED
-
FORWARD NEURAL
NETWORK


A network with all the inputs
connected directly

to the
outputs is called a
single layer network

or a
perceptron

network.


Each output unit is
independent of the others
.

Perceptron network

Single perceptron

EXAMPLES

PERCEPTRON (1)


With a
threshold activation function
, the perceptron
represents a
Boolean function
.


A threshold perceptron returns
1

if and only if the
weighted sum of its inputs (including the bias) is positive:


The equation defines a hyperplane in the


input space, so the perceptron returns 1 if and only if the
input is on one side of that hyperplane.


For this reason, the threshold perceptron is called a linear
separator.

I
2

I
1

0

1

1,5

1

1,5

a) Boolean function AND

PERCEPTRON (2)


LINEAR SEPARABILITY IN PERCEPTRONS

I
2

0

1

0,5

1

0,5

b) Boolean function OR

I
1


Black dots indicate a point in the input space where the
value of the function is 1, and white dots indicate a
point where the value is 0.


Each function is
represented as a
two dimensional
plot, based on the
values of the two
inputs.

PERCEPTRON (3)


In Figure (a) one possible
separating “plane” (a line)

is
defined by the equation


I
1

+ I
2

= 1,5

or


I
1

=
-

I
2

+ 1,5


The region above the line, where the output is 1, is given
by


I
1

+ I
2



1,5 > 0


In Figure (b) one possible
separating “plane” (a line)

is
defined by the equation


I
1

+ I
2

= 0,5


or


I
1

=
-

I
2

+ 0,5


The region above the line, where the output is 1, is given
by


I
1

+ I
2



0,5 > 0

LINEAR CLASSIFICATION

(1)

EXAMPLE

Problem: CLASSIFICATION OF AIRPLANES


RULES


IF WEIGHT > 0,80 AND SPEED < 0,55 THEN BOMBER


IF WEIGHT < 0,90 AND SPEED > 0,25 THEN FIGHTER

Let have 10 examples of airplanes

3

2,5

2

1,5

1

0,5

0

0,2

0

0,4

0,6

0,8

1

WEIGHT

SPEED

BOMBER

FIGHTER

LINEAR CLASSIFICATION

(2)


One possible separating line is defined by the equation


I
2

= 1,5 I
1

+ 0,5


The equation may be used for the decision
-
making rule



f(I
1
, I
2
) = 1,5

I
1



I
2

+ 0,5


IF f(I
1
, I
2
)


0 THEN FIGHTER


IF f(I
1
, I
2
) < 0 THEN BOMBER


THE CORRESPONDING PERCEPTRON

1

I
1

I
2

0,5

1,5

-
1,0

LEARNING LINEARLY
SEPARABLE FUNCTIONS

(1)


There is a perceptron algorithm that will
learn any linearly separable function
,
given enough
training examples
.


The idea behind most algorithms for neural
network learning is to
adjust the weights

of
the network to
minimize

some measure of
the error on the training set.


The initial network has
randomly assigned
weights
, usually from the range [
-
0,5, 0,5].

LEARNING LINEARLY
SEPARABLE FUNCTIONS

(2)


The network is then
updated

to try to make
it consistent with the examples. This is done
by making small
adjustments in the
weights

to reduce the difference between
the observed and predicted values
(optimization search in
weight space
).


Typically, the updating process is
iterative
.
Each iteration involves updating
all the
weights

for
all the examples
.

THE WEIGHT UPDATING
PROCESS


THE MEASURE OF ERROR

For a single training example

First case
: ERR = T


O

where
O

is the output of the perceptron on the example, and
T

is the true output value.

Second case
:


If the error is positive, then O must be increased.


If the error is negative, then O must be decreased.


Each input unit contributes W
j


I
j

to the total input, so if I
j

is positive, an increase in W
j

will tend to increase O, and if
I
j

is negative, an increase in W
j

will tend to decrease O.

THE PERCEPTRON LEARNING
RULE (1)

W
j

= W
j

+




I
j



Err

where


is the learning rate.

EXAMPLE

1

I
1

I
2

0,2

0,2

-
0,5

The initial network

The training example


The fighter, speed = 0,4, weight =
0,8.

The output


O = 0,2


0,4 + (
-
0,5)


0,8 + 0,2 =



0,08


0,4 + 0,2 =
-
0,12

f(I
1
, I
2
) < 0, the classification


bomber

The error


ERR = 1


0 = 1

THE PERCEPTRON LEARNING
RULE (
2
)

The updated weights (


= 1)

W
1

= 0,2 + 0,4


1 = 0,6

W
2

=
-
0,5 + 0,8


1 = 0,3

W
bias

= 0,2 + 1


1 = 1,2

1

I
1

I
2

1,2

0,6

0,3

The new output

O = 0,6


0,4 + 0,3


0,8 + 1,2 = 0,24 + 0,24 + 1,2

=
1,68

f(I
1
, I
2
) > 0, the classification


fighter

MULTILAYERED FEED
-
FORWARD NETWORKS

(1)


Networks with one or more layers of
hidden units

are called multilayer networks.

I
1

I
2

a
k

a
j

W
kj

W
ji

a
i

Input units

Hidden units

Output units


The advantage of adding hidden layers is that it enlarges
the space of hypothesis that the network can represent.

MULTILAYERED FEED
-
FORWARD NETWORKS

(2)


With a
single
,
sufficiently large

hidden layer
, it
is possible to
represent any continuous function

of the inputs with arbitrary accuracy.


Unfortunately, for any
particular network
structure
, it is
harder to characterize

exactly
which functions can be represented and which
ones cannot. The problem of
choosing the right
number

of
hidden units

in advance is still not
well understood.

MULTILAYERED FEED
-
FORWARD NETWORKS

(3)

EXAMPLE

Boolean XOR function

I
1

0

1

1

I
2

Linear classification

is not possible

The two layer feed

forward network

I
1

a
1

a
2

-
1

-
1

-
1

-
1

a
3

I
2

1

1

1,5

-
0,5

1

1

MULTILAYERED FEED
-
FORWARD NETWORKS

(4)

First case

Input=(0,0)

a
1

= (1

1,5)+(0

(
-
1)) + (0

(
-
1)) = 1,5

Output=1




a
2

= (1

0,5)+(0

(
-
1))+(0

(
-
1)) = 0,5

Output=1




a
3

= (1

(
-
0,5))+(1

1)+(1

(
-
1)) =
-
0,5

Output=0


Second case

Input=(1,0)

a
1

= (1

1,5)+(1

(
-
1)) + (0

(
-
1)) = 0,5

Output=1




a
2

= (1

0,5)+(1

(
-
1))+(0

(
-
1)) =
-
0,5

Output=0




a
3

= (1

(
-
0,5))+(1

1)+(0

(
-
1)) = 0,5

Output=1

MULTILAYERED FEED
-
FORWARD NETWORKS

(5)

Third case

Input=(0,1)

a
1

= (1

1,5)+(0

(
-
1)) + (1

(
-
1)) = 0,5

Output=1




a
2

= (1

0,5)+(0

(
-
1))+(1

(
-
1)) =
-
0,5

Output=0




a
3

= (1

(
-
0,5))+(1

1)+(0

(
-
1)) = 0,5

Output=1


Fourth case

Input=(1,1)

a
1

= (1

1,5)+(1

(
-
1)) + (1

(
-
1)) =
-
0,5

Output=0




a
2

= (1

0,5)+(1

(
-
1))+(1

(
-
1)) =
-
1,5

Output=0




a
3

= (1

(
-
0,5))+(0

1)+(0

(
-
1)) =
-
0,5

Output=0

CLASSIFICATION EXAMPLE

Problem: Any pair of persons must be classified into
two classes: brothers and friends. There are 3
brothers in each family. Persons from different
families are friends.

John

Paul

George

0,5

0,5

1

1

1

Charles

Dick

Rodger

1,5

-
1,5

1

-
1

-
1

1

FRIENDS

BROTHERS

LEARNING IN MULTILAYER
NETWORKS


Learning algorithms for multilayer networks are
similar

to
the perceptron learning algorithm.


Inputs are presented to the network, and if the network
computes an output vector that
matches

the target nothing
is done.


If there is an
error

(a difference between the output and
target), then the
weights are adjusted

to reduce this error.


The trick is to
assess

the blame for an error and
divide it

among the contributing weights, because in multilayer
networks there are
many weights

connecting each input to
each output, and each of
these weights contributes to
more than one output
.

BACK
-
PROPAGATION
LEARNING

(1)


The back
-
propagation algorithm is sensible approach to
dividing the contribution of each weight.


At the
output layer
, in weight update rule the activation of
the hidden unit
a
j

is used instead of the input value; and the
rule contains a term for the
gradient

of the activation
function


W
ji

= W
ji

+




a
j


Err
i



g’(in
i
)


where
E
rr
i

is the error (T
i

-

O
i
) at the output node;


g’(in
i
) is the
derivative

of the activation function g (for
this reason only sigmoid function can be used in multilayer
network).

BACK
-
PROPAGATION
LEARNING

(2)


A
new error term


i

is defined, which for output
nodes is

i

= Err
i



g’(in
i
).


The
update rule

now is the following


W
ji

= W
ji

+




a
j




i


The
propagation rule

for the


values is the
following


The weight update rule for the weights between the inputs
and the hidden layer is the following


W
kj

= W
kj

+




I
k




j

SUMMARIZED BACK
-
PROPAGATION ALGORITHM


Compute the


values for the
output units

using the observed error.


Starting with output layer,
repeat

the
following for each layer in the network,
until the earliest hidden layer is reached:


Propagate the


values back to the previous
layer.


Update the weights between the two layers.

BUILDING OF NEURAL
NETWORKS


To build a neural network to perform some task, it
is needed:


To decide
how many units

are to be used.


To decide what
kind of units

are appropriate.


To decide how the
units are connected

to form a
network.


To
initialize the weights

of the network.


To decide how to
encode

the examples in terms of
inputs and outputs of the network.


To
train the weights

using a learning algorithm
applied to a set of training examples for the task.