# Lectures 9 Feed-Forward Neural Networks

AI and Robotics

Oct 20, 2013 (4 years and 6 months ago)

111 views

Lectures 9 Feed
-
Forward Neural Networks

Learning outcomes

You will be able to:

describe the common feed
-
forward neural network architecture;

describe various common transfer functions;

know how to code different types of data for input to a network.

We ha
ve met the hardlim transfer function. We do however allow other transfer functions

in fact for multilayer networks we need other transfer functions because hardlim makes
training ineffective.

The most common are

hardlim

hardlims
purelin

logsig (or s
igmoid)

tansig

How do we use feed forward neural networks (or the five second
guide to neural network modelling)

Feed forward nets are used to classify patterns, recognize things or to calculate functions.
We get them to do this by
supervised training

-

that is we present examples to them and
say "do this thing when you see something like this other thing". After training (if we get
it right) the network not only knows how to behave with the data we trained it on but also
acts correctly with
completely

new data
. Sometimes this very simple idea gets lost in the
detail

because the things we want the net to recognise don't come ready made to put
into a computer. We have to do some work to get them into the right form.

Description of Feed Forward Neural
Network used for function
approximation.

The standard feed forward network that we will use is used to attempt to model a
function.

It has three layers:

input layer

hidden layer

output layer

No transfer function

uses logsig transfer

uses purelin tra
nsfer

How many neurons in each layer?

input

hidden

output

determined by function

?????

determined by function (well almost)

choose to get

good fit

eg Credit scoring. Input size depends on data you possess. Output size depends on what
you a
re after

1 if credit score, 1 if yes/no, 3 if £x at y% paid back over z months.

An example problem

We have some data on irises

numerical values and the classification of the iris the data
was taken from. Can we tell what kind of iris we are looking at
just from the data (and so
do away with the need for botanists to do this job for us)?

See the data in iris.data. [we will make available in the labs]

A sample of this data is

6, 2.7, 5.1, 1.6, Iris
-
versicolor

6.7, 3.1, 4.7, 1.5, Iris
-
versicolor

4.3, 3,
1.1, 0.1, Iris
-
setosa

6, 2.2, 5, 1.5, Iris
-
virginica

5.8, 2.6, 4,1 .2, Iris
-
versicolor
The data is interpreted as follows:

6

2.7

5.1

1.6

are the observations of 4 features of the iris.

This iris is of type Iris
-
versicolor

4.3

3

1.1

0.1

are measurements on

an iris of type Iris
-
setosa.

We need to get this into a form we can classify with a neural net

so we use a numeric
coding. See the data in irisnumeric.data.

A sample of this data is:

6

2.7

5.1

1.6

1

6.7

3.1

4.7

1.5

1

4.3

3

1.1

0.1

2

6

2.2

5

1.5

0

5.8

2.6

4

1.2

1

The data is interpreted as follows:

6

2.7

5.1

1.6

are the observations of 4 features of the iris.

This iris is of type 1 or Iris
-
versicolor

4.3

3

1.1

0.1

are measurements on an iris of type 2 or Iris
-
setosa.

We want to train a NN to r
ecognise such data i.e we want to plug in four values for the
features and have the network say "That was an iris
-
versicolor" (or rather "That was a
number 1" or rather output 1).

So we create a network with four inputs, some number of neurons in the hidd
en layer (3
say) and one neuron in the output layer and use the default transfer functions.

[picture]

We give it lots of samples to train on, and if it works fine. If not we try altering the
hidden layer size or using other transfer functions.

How to code information into the inputs:

number input

eg weight height salary age data. (Assuming for credit score output). These are probably
left as numerical inputs

one neuron each.

Picture

However

beware of data over time. Example with inflat
ion

don't use raw numbers but
categorise as low, medium, high for example.

Non
-
Numeric Input

More data: weight height salary age
gender data

gender

male/female categorical data

not a number. Convert to numeric: 0 for male say
and 1 for female.

Even m
ore data: ethnic origin
-

white european, black british, black african, asian sub
continent etc.

Here there is a choice of single neuron 0 0.25 0.5 0.75 1.0 say.

Alternatively use a bit map approach: ethnic origin group of neurons. Code ethnic origin
as a
bit pattern. 0 0 0

white 0 0 1 black british etc.

picture

There are three main types of coding data:

Linear or Local [suitable for numeric or ordered categorical data (eg income on a
categorised scale low income

1 middle

2 high
-

3)]

Binary coding
[for categorical where no expected relationship (eg ethnic origin or Iris
data) Only used to reduce output dimension

try to avoid since hidden order]

One
-
of
-
n or Distributed [for categorical where no expected relationship (eg ethnic origin
or Iris data)
Preferred over binary but may be unwieldy if n is big]

NB

coding can affect how easy it is to find a network which recognises the data.

How to code information into the outputs:

Similar choices for the output neurons. If you want to train a network to r
ecognise
categories you need to code them somehow. There is additional problem of decoding the
output you actually get (applies whatever you do).

Suppose we expect numerical output. Then if neuron spits out 25 ok its 25.

Expect categorical output 0 male 1
female. Neuron spits out 0.4 ?? Well it's nearer to 0

so male. [You might want to know what values the nn produced for the training data
before deciding to put the cut off at 0.5].

picture

Similarly with distributed output 0.2 0.3 0.7

is this reall
y 0 0 1?

The usual rule is to take the winning neuron as a 1.

How many neurons

No definite rule but……there is a "rule of thumb" about the relationship between the
number of weights and the number of training values needed to give a certain level of
perform
ance:

#training set > #Weights/error proportion

eg want 10% errors as max error and design network with 20 weights 20/0.1=200. Need
200 data points in the training set.

Alternatively if you have a network with 15 weights and only 50 data points:
error=#
W/#train=15/50 ~ 1/3. So the network won't be very good.