Chapter 9. Artificial Neural Networks

doctorsounderΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

51 εμφανίσεις

Chapter 9. Artificial Neural Networks


Biological Neurons and Neural Networks


The human brain consists of a densely interconnected network of around 10 billion neurons, about the same
number of stars in a typical galaxy, and there are more than 100 billion galaxies in the universe. The brain’s
neural network provides it with enormou
s processing power enabling it to perform computationally
demanding perceptual acts, such as face recognition, speech recognition. The brain also provides us with
advanced control functions, such as walking, but perhaps most importantly is that it can
lear
n

how to do all
of this. Whereas a modern CPU chip’s performance derives from its raw speed

(3GHz = 3,000,000,000 Hz)
,
a
neural network is slow, computing with a frequency of 10
-
100 Hz. The brain’s performance derives from
the massive parallelism of the ne
ural network, where each of the 10 billion neurons may be connected to
around 1000 other neurons.




To get an idea of the structure of a biological neural network, glance at Figure 1 which is a sketch of
asection through an animal retina, one

of the first ever visualisations of a neural network produced by Golgi
and Cajal who received a Nobel Prize in 1906. You can see roundish neurons with their output
axons
. Some
leave the area (those at the bottom which form the ‘optic nerve’) and other axo
ns input into other neurons
via their input connections called
dendrites
. Neuron
e

receives its input from four other neurons and sends
its output
down to the optic nerve. So we see that a neuron has several inputs, a body and one output which
connects to
other neurons.


I
t was realised that neurons function electrically
by sending electrical signals down their axons
, based on
flows of potassium and sodium ions
. This signal is in the form of a pulse (rather like the sound of a hand
clap). A single neuron ca
n only emit a pulse (“fires”) when the total input is above a certain threshold
. This
characteristic led to the McCulloch and Pitts model (1943) of the artificial neural network (ANN).


Glance briefly at Figure 2. which illustrates how learning

occurs in
a biological neural network. It is
assumed here that both cells A and B fire often and simultaneously over time. When this condition occurs,
Figure 1.

then the strength of synaptic connexion between A and B increases. This concept was contributed by Hebb
(1949) and
is known as
Hebbian Learning
. The idea is that if two neurons are simultaneously active, then
their interconnection should be strengthened.





The Mathemati
cs of a Single Artificial Neuron.


The McCulloch

Pitts model of a single neuron is sketched in Fig
ure 3.
We can think about the calculation as
proceeding in two parts, input processing then the calculation of the output.
The inputs to the neuron body
are shown as


,


, …


. When the input (from the previous neurons) reach

this neuron, then their values
are multiplied by the synaptic strength. As mentioned above, these strengths are dependent on how much
learning has occurred. These strengths are called
weights

and are represented by



etc. So a single input is
calculated

as










So the total
weighted

input to the neuron is




























This total input is then passed on to the output
-
calculation part of the neuron. Here, if the total input is
greater than some threshold

, then the neuron will ‘fire’ and produce an output. This is shown in Figure 4
where the mathematical function
the “activation function”
used to model the output calculation is

a simple
step function. Note that the output of t
he neuron is effectively binar
y; fire or no
-
fire

0 or 1.

While it is not
directly relevant to our work here, it is interesting to note that correct choices of weights and thresholds
make the neuron behave like logic gates, especially AND and NOT. Given that all CPUs can be described as

a combination of AND and NOT gates,

we conclude that neural nets can provide all the computations
performed on PCs.




It is possible to relax the binary nature of the output, to obtain real numbers (such as decimals). This is done
using
the ‘sigmoid’

fu
nction which is illustrated in Figure 5. The parameter
k

adjusts the slope of the
transition between fire and no
-
fire states, as shown in Figure 5. The mathematical description of the sigmoid
function is





(

)









It is clear that choosing a larg
e value of
k

approximates the binary McCulloch
-
Pitts neuron more closely.


Neural Networks and
Braitenberg Vehicles


In order to understand how individual neurons may be connected together to form
neural networks
, let’s
consider the simple example of a “Braitenberg Vehicle”.

Braintenberb is a … He suggested his vehicles to
demonstrate that apparent purposive behaviour do
es not need to have a representation of the external
environment in a creature’s brain. Rather behaviour can obtain by simply reacting to the environment in a
structured manner.


Let’s have a look at the vehicles shown in Figure 6 (from his book). Each veh
icle has two eye sensors and
two motor actuators. In his vehicle (a) the right eye is connected to the right motor and
vice versa
. As
located in the diagram, the right eye receives more light than the left eye, since it is close to the light. So the
drive
to the right motor is greater than the drive to the left. So the vehicle moves away from the light. On the
other hand, in his vehicle (b) the eyes and motors are cross
-
coupled, so now, when the left eye receives more
light, it gives the right motor more dr
ive, so the vehicle moves towards the light.




Figure 5. Sigmoid Functions. Left has
k

= 10, right has
k

= 5. Large
k

means steep slope.

Figure 6

How does this relate to neural networks? Well each eye can be considered as a
sensor neuron

and each
motor is driven by an
actuator neuron
. So we have a basic 2
-
input 2
-
output neural network shown

in Figure
7(a). This is indeed a simple network, yet it turns out that it can be trained to function as an AND gate
,

a
NAND gate
,

an OR gate
,

and a NOR gate
,

as well as a NOT gate. However it cannot ever function as an
XOR gate.

Fortunately more complex n
eural networks can be

designed as shown in Figure 7(b
). Here there
is an
input layer

of neurons, an
output layer

but now an in
-
between layer, the
hidden layer
. Each layer may
contain an arbitrary number of neurons (which is related to the application), but

in general the hidden layer
contains more neurons than either the input or the output layers.




The mathematics of the network is simple, it’s just a question of using the output of each neuron connected
to the input of the neuron in the next layer, and

computing the weighted sums as above. One important point
to note about the networks sketched in Figure 7
(b)
. All neurons in the input layer are connected to all
neurons in the hidden layer, and all neurons in the hidden layer are connected to all neurons

in the output
layer. Of course the
strength

of each connection may not be the same, and this is where learning comes into
play.

Learning by Back
-
Propagation of Errors.


We saw above that we could construct a 2
-
in/2
-
out network to elicit two different behaviours of the vehicle.
While this is interesting, it is not the point of ANNs, since as we mentioned above, the biological neural
networks in our brain are able to
learn
.

So can we construct an ANN which can
learn

the two behaviours
discussed above? Of course the answer is we can! We shall explore how to
train

and ANN, causing it to
learn

in this section. Look at Figure 8
. On the left is an untrained network for the vehicl
e problem. Each
input neuron is connected to each output neuron with a random weight. To train the vehicle to move towards
a light, we need to strengthen the “cross” connections and weaken the “direct” connections between the left
and right eyes and actuat
ors.





We do this by applying a training set of inputs and outputs to the net, and adjusting the weights to make the
applied values agree. Specifically we apply an input pair of values, and the net produces an output pair. We
compare the net’s output pa
ir with the pair we require and use the error between the two to change the
network weights. (How we do this is explained below). This process (a “learning cycle”) is repeated many
times, until the error is sufficiently small.


For the above example, we ne
ed the following training se
t


Training
Set

Input to
Left Eye
Neuron

Input to
Right

Eye
Neuron

Output
to
Left
Motor

Neuron

Output to
Right

Motor

Neuron

A

0.1

0.9

0.9

0.1

B

0.9

0.1

0.1

0.9


This set is applied to the ANN when it is in the
learning state

hundreds or thousands of times. We can
monitor the “mean squared error”
(MSE)
between actual and desired outputs, and stop the learning when
this is at an acceptable level.

The MSE is the average error over all the output neurons, the “squared” is to
prevent negative errors from cancelling out positive errors.


The mathematics of the back
-
propagation of errors is straightforward. Let’s say for output neuron
i
, the
desired output is



and the actual output is



which gives us an error

(





)
. So we change the weight of
each neuron
j

feeding into this output neuron (with value


) according to the expression











(





)




where


is

the rate of learning. While it is not possible to explain how this expression is derived, it is
straightforward to understand.

The change in weight




of each neuron feeding into neuron
i

is determined
both by the error on neuron
i

and by the value of
the input neuron
j
. This is
Hebbian Learning

as discussed
above. The synaptic strength is determined by the simultaneous activity of pairs of neurons.

The effect of
changing the weight by the amount




is to reduce the output error
(





)
.

Coding o
f some Simple ANNs


[next instance]

Applications of Neural Networks


[next instance]