# Neural Networks

AI and Robotics

Oct 20, 2013 (4 years and 6 months ago)

92 views

Neural Networks

Background

-

Neural Networks can be :

-

Biological
models

-

Artificial

models

-

Desire to produce
artificial systems

capable of

sophisticated computations
similar

to the human brain.

Biological analogy and some main ideas

The brain is composed of a
mass of interconnected neurons

each neuron is connected to many other neurons

Neurons
transmit signals to each other

Whether
a signal is transmitted is an
all
-
or
-
nothing

event
(the electrical potential in the cell body of the neuron is
thresholded
)

Whether
a signal is sent, depends on the
strength of the
bond

(synapse) between two neurons

How Does the Brain Work ? (1)

NEURON

-

The cell that performs information processing in the brain.

-

Fundamental functional unit of all nervous system tissue.

Each consists of :

SOMA, DENDRITES, AXON, and SYNAPSE.

How Does the Brain Work ? (2)

Brain vs. Digital Computers (1)

-

Computers require hundreds of cycles to simulate

a firing of a neuron.

-

The brain can fire all the neurons in a single step.

Parallelism

-

Serial computers require billions of cycles to

perform some tasks but the brain takes
less than

a second.

e.g.
Face Recognition

Definition of Neural Network

A Neural Network is a
system

composed of

many simple processing elements

operating in

parallel

which can
acquire, store, and utilize

experiential knowledge.

Artificial Neural Network?

Neurons vs. Units (1)

Each
element of NN is a node called
unit.

Units
are connected by

Each
numeric weight
.

Neurons
vs.
units (2)

Real neuron is far away
from our simplified
model
-

unit

Chemistry,
biochemistry,
quantumness.

Computing Elements

A typical unit:

Planning

in building a Neural Network

Decisions must be taken

on the following:

-

The number of units to use.

-

The type of units required.

-

Connection between the units.

Issues to be discussed

-

Initializing the weights
.

-

Use of a learning algorithm.

-

Set of training examples.

-

Encode the examples as inputs.

-

Convert output into meaningful results.

Neural Network Example

A
very simple, two
-
layer, feed
-
forward network with two inputs, two
hidden nodes, and one output node.

Simple Computations in this network

-

There are
2 types of components:

Linear

and
Non
-
linear
.

-

Linear:

Input function

-

calculate weighted sum of all inputs.

-

Non
-
linear:

Activation function

-

transform sum into activation level.

Calculations

Input
function:

Activation

function
g
:

A Computing Unit.

Now in more detail but for a particular model only

A
unit

Activation

Functions

-

Use
different functions

to obtain different models.

-

3 most common choices :

1)
Step

function

2)
Sign

function

3)
Sigmoid

function

-

An output of
1 represents firing

of a neuron down the
axon.

Step Function Perceptrons

3 Activation Functions

Standard structure of an artificial neural
network

Input units

represents the input as a fixed
-
length vector of numbers (user
defined)

Hidden units

calculate thresholded weighted sums of the inputs

represent intermediate calculations that the network learns

Output units

represent the output as a fixed length vector of numbers

Representations

Logic rules

If color = red ^ shape = square then +

Decision trees

tree

Nearest neighbor

training examples

Probabilities

table of probabilities

Neural networks

inputs in [0, 1]

Can be used for all of them

Many variants exist

Notation

Notation (cont.)

Operation of individual units

Output
i

= f(W
i,j

* Input
j

+ W
i,k

* Input
k

+ W
i,l

*
Input
l
)

where f(x) is a threshold (activation) function

f(x) = 1 / (1 + e
-
Output
)

“sigmoid”

f(x) = step function

Artificial Neural Networks

Perceptron Learning Theorem

Recap
: A perceptron (threshold unit) can
learn
anything that it can
represent
(i.e.
anything separable with a hyperplane)

26

The Exclusive OR problem

A Perceptron cannot represent
Exclusive OR
since it is not linearly separable.

27

28

Properties of architecture

No connections within a layer

No direct connections between input and output layers

Fully connected between layers

Often more than 3 layers

Number of output units need not equal number of input units

Number of hidden units per layer can be more or less than

input or output units

Each unit is a perceptron

Often include bias as an extra weight

29

Conceptually: Forward Activity
-

Backward Error

30

Backward pass phase: computes ‘error signal’,
propagates

the
error
backwards

through
network starting at output units
(
where the error is the difference between

actual
and desired
output
values)

Forward pass phase: computes ‘functional signal’, feed
forward propagation

of
input pattern signals through network

Backpropagation

learning algorithm ‘BP’

Solution to credit assignment problem in MLP.
Rumelhart
, Hinton and Williams (1986)
(
though actually invented earlier in a PhD thesis relating to economics)

BP has two phases
:

31

Forward Propagation of Activity

Step 1: Initialize weights at random, choose a
learning rate η

Until network is trained:

For each training example i.e. input pattern and
target output(s):

Step 2: Do forward pass through net (with fixed
weights) to produce output(s)

i.e., in Forward Direction, layer by layer:

Inputs applied

Multiplied by weights

Summed

‘Squashed’ by sigmoid activation function

Output passed to each neuron in next layer

Repeat above until network output(s) produced

32

Step 3. Back
-
propagation of error

33

‘Back
-
prop’ algorithm summary

(
with
Maths
!)

34

‘Back
-
prop’ algorithm summary

(
with NO
Maths
!)

35

MLP/BP: A worked example

36

Worked example: Forward Pass

37

Worked example: Forward Pass

38

Worked example: Backward Pass

39

Worked example: Update Weights

Using Generalized Delta Rule (BP)

40

Similarly for the all weights wij:

41

Verification that it works

42

Training

This was a single iteration of back
-
prop

Training requires many iterations with many
training examples or
epochs
(one epoch is entire
presentation of complete training set)

It can be slow !

Note that computation in MLP is local (with
respect to each neuron)

Parallel computation implementation is also
possible

43

Training and testing data

How many examples ?

The more the merrier !

Disjoint training and testing data sets

learn from training data but evaluate
performance (generalization ability) on
unseen test data

Aim
: minimize error on
test
data

44

More resources

Binary Logic Unit in an example

http://www.cs.usyd.edu.au/~irena/ai01/nn/5.html

MultiLayer

Perceptron Learning Algorithm

http://www.cs.usyd.edu.au/~irena/ai01/nn/8.html

45