ARTIFICIAL INTELLIGENCE
[INTELLIGENT AGENTS PARADIGM]
Professor Janis Grundspenkis
Riga Technical University
Faculty of Computer Science and Information Technology
Institute of Applied Computer Systems
Department of Systems Theory and Design
E

mail:
Janis.Grundspenkis@rtu.lv
LEARNING IN NEURAL NETWORKS
LEARNING IN NEURAL
NETWORKS (1)
•
A (artificial)
neural network
is composed of a number of
simple
arithmetic computing elements
or
nodes
(
units
),
connected by
links
.
–
Synonyms: connectionism, parallel distributed processing, and
neural computation.
•
From a biological viewpoint, a neural network is a
mathematical model for the operation of brain.
•
The nodes of a neural network correspond to
neurons
.
•
A neuron is a cell in the brain whose principal function is
the collection, processing, and dissemination of electrical
signals.
LEARNING IN NEURAL
NETWORKS (2)
•
The
brains information capacity
is thought to
emerge primarily from
networks of such
neurons
.
Axonal
arborization
Dendrite
Cell body or Soma
Nucleus
Synapse
Axon from
another cell
Synapses
COMPARING BRAINS WITH
COMPUTERS (1)
Computer
Human Brain
Computational
units
1 CPU,
10
8
gates
10
11
neurons
Storage units
10
10
bits RAM
10
11
bits disk
10
11
neurons
10
14
synapses
Cycle time
10

9
sec
10

3
sec
Bandwidth
10
10
bits/sec
10
14
bits/sec
Memory
updates/sec
10
9
10
14
COMPARING BRAINS WITH
COMPUTERS (2)
COMPARISON OF THE MEMORY
•
Human brain is
evolving very slowly
, whereas
computer memories are
growing rapidly
.
COMPARISON OF SWITCHING SPEED AND
PARALLELISM
•
Computer chips can execute an instruction in
tens
of nanoseconds
, whereas neurons require
milliseconds
to fire.
•
Most current computers have
only one
or
at most
a
few
CPU, whereas all neurons and synapses are
active simultaneously
(
parallel processing
).
COMPARING BRAINS WITH
COMPUTERS (3)
•
A neural network running on a serial computer
requires hundreds of cycles
to decide if a single
neuron

like unit will fire, whereas in a human
brain, all the neurons do this in a
single step
.
•
CONCLUSION
: Even though a computer is a
million times faster in raw switching speed, the
brain ends up being a billion times faster at what it
does.
COMPARING BRAINS WITH
COMPUTERS (4)
•
Brains are more
fault

tolerant
than computers.
•
Brains are constantly faced with
novel input
, yet manage
to do something with it. Computer programs
rarely work
as well
with novel input, unless the programmer has been
exceptionally careful.
•
The attraction of neural networks is
graceful degradation
:
they tend to have a gradual rather than sharp drop

off in
performance as conditions worsen.
•
The attraction of neural networks also is that they are
designed
to be trained
using an inductive learning
algorithm.
A MATHEMATICAL MODEL
FOR A NEURON (1)
•
A simple mathematical model of the neuron
is devised by McCulloch and Pitts (1943).
g
a
i
in
i
Activation
Function
Output
Input
Function
a
i
= g(in
i
)
Bias Weight
W
0,j
a
0
=

1
a
j
W
j,i
Input
Links
Output
Links
A MATHEMATICAL MODEL
FOR A NEURON (2)
•
A neural network is composed of a number of
units
, connected by
links
.
•
Each link has a numeric
weight
associated with it.
Weights are the primary means of
long

terms
storage
in neural networks, and learning usually
takes place by updating the weights.
•
Some of the units are connected to the external
environment, and can be designated as
input
or
output
units.
A MATHEMATICAL MODEL
FOR A NEURON (3)
•
The weights are modified so as to try to bring the
network’s
input/output behaviour
more into line
with that of the environment providing the inputs.
•
Each unit has a set of
input links
from other units,
a set of
output links
to other units, a current
activation level
, and a
means of computing
the
activation level at the next step in time.
•
Each unit does a
local computation
based on
inputs of its neighbors, but without the need for
any global control over the set of units as a whole.
SIMPLE COMPUTING
ELEMENTS (1)
•
Each unit performs a simple computation: it
receives signals
from its input links and
computes
a new activation level that it
sends
along each of its output links.
•
The computation is split into two components.
•
First is a
linear component
, called the
input
function
,
in
i
, that computes the weighted sum of
the unit’s input values
SIMPLE COMPUTING
ELEMENTS (2)
•
Second is a
nonlinear component
, called the
activation
function
,
g
, that transforms the weighted sum into the final
value that serves as the unit’s output (activation value),
a
i
.
•
The computation of the activation level is based on the
values of each input signal received from a neighbor node,
and the weights on each input line.
•
A bias weight W
0i
connected to a fixed input a
0
=

1 sets
the actual threshold for the unit.
ACTIVATION FUNCTION (1)
•
The activation function
g
is designed to meet
two
wishes
.
•
First, the unit must be
“active”
(near +1) when the
“right” inputs
are given, and
“inactive”
(near 0)
when the
“
wrong
”
inputs
are given.
•
Second, the activation needs to be
nonlinear
,
otherwise the entire neural network collapses into
a simple linear function and will not be useful for
representation of more complex functions.
•
Different models
are obtained by using
different
mathematical functions
for
g
.
ACTIVATION FUNCTION (2)
•
THREE COMMON CHOICES
•
The threshold function
in
i
g(in
i
)
+1
•
The sign function
in
i
g(in
i
)
+1

1
•
The sigmoid function
in
i
g(in
i
)
+1
REPRESENTATION OF BOOLEAN
FUNCTIONS BY THRESHOLD UNITS
•
Individual units are able to represent
basic
Boolean function
(
logic gates
).
W
0
= 1.5
W
1
= 1
W
2
= 1
AND
W
0
= 0.5
W
1
= 1
W
2
= 1
OR
W
0
= 0.5
W
1
= 1
NOT
THE FUNCTION REPRESENTED
BY THE NETWORK
•
With
fixed structure
and
fixed activation function
g
, the
functions representable by a feed

forward network are
restricted
to have specific parameterized structure.
•
The weights chosen for the network determine which of
these function is actually represented.
I
1
W
13
H
3
I
2
H
4
W
14
W
23
W
24
O
5
W
35
W
45
a
5
= g(W
35
a
3
+ W
45
a
4
) =
g(W
35
g(W
13
a
1
+ W
23
a
2
) +
+ W
45
g(W
14
a
1
+ W
24
a
2
))
where g is the activation function,
and a
i
is the output of node i.
EXAMPLE
NETWORK REPRESENTATION
•
The links are
determined by three
parameters
–
a
start node
of the link
–
an
end node
of the link
–
a
numeric weight
of the link
•
Network topology (structure of
links)
–
a weight matrix
NETWORK STRUCTURES (1)
TWO MAIN CATEGORIES
•
FEED FORWARD NETWORKS
–
The network structure is
acyclic
.
–
A feed

forward network
represents a function
of its
current input
(it has
no internal state
other than the
weights themselves)
.
EXAMPLES
NETWORK STRUCTURES (2)
•
RECURRENT NETWORKS
–
The network structure is
cyclic
.
EXAMPLES
NETWORK STRUCTURES (3)
–
A recurrent network
feeds
its
outputs back
into
its own
inputs
.
–
The activation levels of recurrent networks
form a
dynamic system
that may reach a
stable state
or exhibit
oscillations
or even
chaotic behaviour
.
–
The
response
of the network to a given input
depends
on its
initial state
, which
may
depend
on
previous inputs
. Hence, recurrent
networks can support
short

term memory
.
NETWORK STRUCTURES (
4
)
•
Hopfield networks
are the best

understood class of
recurrent networks
•
They use
bidirectional
connections with
symmetric
weights
W
i,j
= W
j,i
•
All of the units are both
input
and
output
units
•
The activation function
g
is the
sign
function, and the
activation levels can only be
1
•
A Hopfield network functions as an
associative memory
–
after training on a set of examples, a new stimulus will
cause the network to settle into an activation pattern
correspoding to the example in the training set that
most
closely resembles
the new stimulus
FEED

FORWARD NEURAL
NETWORKS
•
SINGLE LAYER FEED

FORWARD NEURAL
NETWORK
–
A network with all the inputs
connected directly
to the
outputs is called a
single layer network
or a
perceptron
network.
–
Each output unit is
independent of the others
.
Perceptron network
Single perceptron
EXAMPLES
PERCEPTRON (1)
•
With a
threshold activation function
, the perceptron
represents a
Boolean function
.
•
A threshold perceptron returns
1
if and only if the
weighted sum of its inputs (including the bias) is positive:
•
The equation defines a hyperplane in the
input space, so the perceptron returns 1 if and only if the
input is on one side of that hyperplane.
•
For this reason, the threshold perceptron is called a linear
separator.
I
2
I
1
0
1
1,5
1
1,5
a) Boolean function AND
PERCEPTRON (2)
•
LINEAR SEPARABILITY IN PERCEPTRONS
I
2
0
1
0,5
1
0,5
b) Boolean function OR
I
1
•
Black dots indicate a point in the input space where the
value of the function is 1, and white dots indicate a
point where the value is 0.
•
Each function is
represented as a
two dimensional
plot, based on the
values of the two
inputs.
PERCEPTRON (3)
•
In Figure (a) one possible
separating “plane” (a line)
is
defined by the equation
I
1
+ I
2
= 1,5
or
I
1
=

I
2
+ 1,5
The region above the line, where the output is 1, is given
by
I
1
+ I
2
–
1,5 > 0
•
In Figure (b) one possible
separating “plane” (a line)
is
defined by the equation
I
1
+ I
2
= 0,5
or
I
1
=

I
2
+ 0,5
The region above the line, where the output is 1, is given
by
I
1
+ I
2
–
0,5 > 0
LINEAR CLASSIFICATION
(1)
EXAMPLE
Problem: CLASSIFICATION OF AIRPLANES
•
RULES
IF WEIGHT > 0,80 AND SPEED < 0,55 THEN BOMBER
IF WEIGHT < 0,90 AND SPEED > 0,25 THEN FIGHTER
Let have 10 examples of airplanes
3
2,5
2
1,5
1
0,5
0
0,2
0
0,4
0,6
0,8
1
WEIGHT
SPEED
BOMBER
FIGHTER
LINEAR CLASSIFICATION
(2)
•
One possible separating line is defined by the equation
I
2
= 1,5 I
1
+ 0,5
•
The equation may be used for the decision

making rule
f(I
1
, I
2
) = 1,5
I
1
–
I
2
+ 0,5
IF f(I
1
, I
2
)
0 THEN FIGHTER
IF f(I
1
, I
2
) < 0 THEN BOMBER
•
THE CORRESPONDING PERCEPTRON
1
I
1
I
2
0,5
1,5

1,0
LEARNING LINEARLY
SEPARABLE FUNCTIONS
(1)
•
There is a perceptron algorithm that will
learn any linearly separable function
,
given enough
training examples
.
•
The idea behind most algorithms for neural
network learning is to
adjust the weights
of
the network to
minimize
some measure of
the error on the training set.
•
The initial network has
randomly assigned
weights
, usually from the range [

0,5, 0,5].
LEARNING LINEARLY
SEPARABLE FUNCTIONS
(2)
•
The network is then
updated
to try to make
it consistent with the examples. This is done
by making small
adjustments in the
weights
to reduce the difference between
the observed and predicted values
(optimization search in
weight space
).
•
Typically, the updating process is
iterative
.
Each iteration involves updating
all the
weights
for
all the examples
.
THE WEIGHT UPDATING
PROCESS
•
THE MEASURE OF ERROR
For a single training example
First case
: ERR = T
–
O
where
O
is the output of the perceptron on the example, and
T
is the true output value.
Second case
:
•
If the error is positive, then O must be increased.
•
If the error is negative, then O must be decreased.
•
Each input unit contributes W
j
∙
I
j
to the total input, so if I
j
is positive, an increase in W
j
will tend to increase O, and if
I
j
is negative, an increase in W
j
will tend to decrease O.
THE PERCEPTRON LEARNING
RULE (1)
W
j
= W
j
+
∙
I
j
∙
Err
where
is the learning rate.
EXAMPLE
1
I
1
I
2
0,2
0,2

0,5
The initial network
The training example
The fighter, speed = 0,4, weight =
0,8.
The output
O = 0,2
∙
0,4 + (

0,5)
∙
0,8 + 0,2 =
0,08
–
0,4 + 0,2 =

0,12
f(I
1
, I
2
) < 0, the classification
–
bomber
The error
ERR = 1
–
0 = 1
THE PERCEPTRON LEARNING
RULE (
2
)
The updated weights (
= 1)
W
1
= 0,2 + 0,4
∙
1 = 0,6
W
2
=

0,5 + 0,8
∙
1 = 0,3
W
bias
= 0,2 + 1
∙
1 = 1,2
1
I
1
I
2
1,2
0,6
0,3
The new output
O = 0,6
∙
0,4 + 0,3
∙
0,8 + 1,2 = 0,24 + 0,24 + 1,2
=
1,68
f(I
1
, I
2
) > 0, the classification
–
fighter
MULTILAYERED FEED

FORWARD NETWORKS
(1)
•
Networks with one or more layers of
hidden units
are called multilayer networks.
I
1
I
2
a
k
a
j
W
kj
W
ji
a
i
Input units
Hidden units
Output units
•
The advantage of adding hidden layers is that it enlarges
the space of hypothesis that the network can represent.
MULTILAYERED FEED

FORWARD NETWORKS
(2)
•
With a
single
,
sufficiently large
hidden layer
, it
is possible to
represent any continuous function
of the inputs with arbitrary accuracy.
•
Unfortunately, for any
particular network
structure
, it is
harder to characterize
exactly
which functions can be represented and which
ones cannot. The problem of
choosing the right
number
of
hidden units
in advance is still not
well understood.
MULTILAYERED FEED

FORWARD NETWORKS
(3)
EXAMPLE
Boolean XOR function
I
1
0
1
1
I
2
Linear classification
is not possible
The two layer feed
forward network
I
1
a
1
a
2

1

1

1

1
a
3
I
2
1
1
1,5

0,5
1
1
MULTILAYERED FEED

FORWARD NETWORKS
(4)
First case
Input=(0,0)
a
1
= (1
∙
1,5)+(0
∙
(

1)) + (0
∙
(

1)) = 1,5
Output=1
a
2
= (1
∙
0,5)+(0
∙
(

1))+(0
∙
(

1)) = 0,5
Output=1
a
3
= (1
∙
(

0,5))+(1
∙
1)+(1
∙
(

1)) =

0,5
Output=0
Second case
Input=(1,0)
a
1
= (1
∙
1,5)+(1
∙
(

1)) + (0
∙
(

1)) = 0,5
Output=1
a
2
= (1
∙
0,5)+(1
∙
(

1))+(0
∙
(

1)) =

0,5
Output=0
a
3
= (1
∙
(

0,5))+(1
∙
1)+(0
∙
(

1)) = 0,5
Output=1
MULTILAYERED FEED

FORWARD NETWORKS
(5)
Third case
Input=(0,1)
a
1
= (1
∙
1,5)+(0
∙
(

1)) + (1
∙
(

1)) = 0,5
Output=1
a
2
= (1
∙
0,5)+(0
∙
(

1))+(1
∙
(

1)) =

0,5
Output=0
a
3
= (1
∙
(

0,5))+(1
∙
1)+(0
∙
(

1)) = 0,5
Output=1
Fourth case
Input=(1,1)
a
1
= (1
∙
1,5)+(1
∙
(

1)) + (1
∙
(

1)) =

0,5
Output=0
a
2
= (1
∙
0,5)+(1
∙
(

1))+(1
∙
(

1)) =

1,5
Output=0
a
3
= (1
∙
(

0,5))+(0
∙
1)+(0
∙
(

1)) =

0,5
Output=0
CLASSIFICATION EXAMPLE
Problem: Any pair of persons must be classified into
two classes: brothers and friends. There are 3
brothers in each family. Persons from different
families are friends.
John
Paul
George
0,5
0,5
1
1
1
Charles
Dick
Rodger
1,5

1,5
1

1

1
1
FRIENDS
BROTHERS
LEARNING IN MULTILAYER
NETWORKS
•
Learning algorithms for multilayer networks are
similar
to
the perceptron learning algorithm.
•
Inputs are presented to the network, and if the network
computes an output vector that
matches
the target nothing
is done.
•
If there is an
error
(a difference between the output and
target), then the
weights are adjusted
to reduce this error.
•
The trick is to
assess
the blame for an error and
divide it
among the contributing weights, because in multilayer
networks there are
many weights
connecting each input to
each output, and each of
these weights contributes to
more than one output
.
BACK

PROPAGATION
LEARNING
(1)
•
The back

propagation algorithm is sensible approach to
dividing the contribution of each weight.
•
At the
output layer
, in weight update rule the activation of
the hidden unit
a
j
is used instead of the input value; and the
rule contains a term for the
gradient
of the activation
function
W
ji
= W
ji
+
∙
a
j
∙
Err
i
∙
g’(in
i
)
where
E
rr
i
is the error (T
i

O
i
) at the output node;
g’(in
i
) is the
derivative
of the activation function g (for
this reason only sigmoid function can be used in multilayer
network).
BACK

PROPAGATION
LEARNING
(2)
•
A
new error term
i
is defined, which for output
nodes is
i
= Err
i
∙
g’(in
i
).
•
The
update rule
now is the following
W
ji
= W
ji
+
∙
a
j
∙
i
•
The
propagation rule
for the
values is the
following
•
The weight update rule for the weights between the inputs
and the hidden layer is the following
W
kj
= W
kj
+
∙
I
k
∙
j
SUMMARIZED BACK

PROPAGATION ALGORITHM
•
Compute the
values for the
output units
using the observed error.
•
Starting with output layer,
repeat
the
following for each layer in the network,
until the earliest hidden layer is reached:
–
Propagate the
values back to the previous
layer.
–
Update the weights between the two layers.
BUILDING OF NEURAL
NETWORKS
•
To build a neural network to perform some task, it
is needed:
–
To decide
how many units
are to be used.
–
To decide what
kind of units
are appropriate.
–
To decide how the
units are connected
to form a
network.
–
To
initialize the weights
of the network.
–
To decide how to
encode
the examples in terms of
inputs and outputs of the network.
–
To
train the weights
using a learning algorithm
applied to a set of training examples for the task.
Comments 0
Log in to post a comment