20-NeuralNetsx

AI and Robotics

Oct 20, 2013 (4 years and 6 months ago)

71 views

LINEAR CLASSIFICATION

Biological inspirations

Some numbers…

The human brain contains about 10 billion nerve cells
(neurons)

Each neuron is connected to the others through 10000
synapses

Properties of the brain

It can learn, reorganize itself from experience

It is robust and fault tolerant

Biological neuron (simplified model)

A neuron has

A branching input (dendrites)

A branching output (the axon)

The information circulates from the dendrites to the
axon via the cell body

The cell body sums
up the inputs
in
some way
and
fires

generates a signal through the axon

if the
result is
greater than some
threshold

An Artificial Neuron

-

weights



-

inputs

Definition : Non linear,
parameterized function with
restricted output range

Activation Function

Usually not pictured (we’ll
see why), but you can
imagine a threshold
parameter here.

Same Idea using the Notation in the Book

𝑖
𝑛

=

,

𝑎

𝑛

=
0

𝑎

=

𝑖
𝑛

=


,

𝑎

𝑛

=
0

The Output of a Neuron

As described so far…

This simplest form of a neuron is also called a
perceptron
.

The Output of a Neuron

Other possibilities, such as the sigmoid function for
continuous output.

𝟏
𝟏
+
𝒆

𝒏

𝒑

𝑖
𝑛


is the activation
of the
neuron

𝑝

is a
parameter which
controls the shape of the
curve (usually
1
.
0
)

Linear Regression using a Perceptron

Linear regression:

Find a linear function (straight line)


=

1

+

0

that best predicts the continuous
-
valued output.

Linear Regression As an Optimization
Problem

Finding the optimal weights could be solved through:

Simulated annealing

Genetic algorithms

… and now Neural Networks

Linear Regression using a Perceptron


=

1

+

0



1

1

0

1

+
1

0

(

)

The Bias Term

So far we have defined the output of a perceptron
as controlled by a threshold

x
1
w
1

+ x
2
w
2

+ x
3
w
3
… + x
n
w
n

>=
t

But just like the weights, this threshold is a
parameter that needs to be adjusted

Solution: make it another weight

x
1
w
1

+ x
2
w
2

+ x
3
w
3
… + x
n
w
n

+
(1
)(
-
t
)

>= 0

The bias term.

A Neuron with a Bias Term

Another Example

Assign weights to perform the logical OR operation.



2

1

2

+

1

+

0

0

1

0

0
=

1
=

2
=

Artificial Neural Network (ANN)

A mathematical model to solve engineering
problems

Group of highly connected neurons to realize
compositions of non linear functions

Classification

Discrimination

Estimation

Feed Forward Neural Networks

The information is
propagated from the inputs
to the outputs

There are no cycles
between outputs and
inputs

the state of the system is not
preserved from one iteration
to another

x1

x2

xn

…..

1st hidden

layer

2nd hidden

layer

Output layer

ANN Structure

Finite number of inputs

Zero or more hidden layers

One or more outputs

All nodes at the hidden and output layers contain a
bias term.

Examples

Handwriting character recognition

Control of a virtual agent

ALVINN

Neural Network controlled AGV (1994)

961

4
+
5

30
=
3994

weights

http://
blog.davidsingleton.org/nnrccar

Learning

The procedure that consists in estimating the weight
parameters so that the whole network can perform a specific

The
Learning process (supervised)

Present the network a number of inputs and their corresponding outputs

See how closely the actual outputs match the desired ones

Modify the parameters to better approximate the desired
outputs

Perceptron Learning Rule

1.
Initialize the weights to some random values (or 0)

2.
For each sample
(


,


)

in the training set

1.
Calculate the current output of the perceptron,

𝑥

2.
Update the weights

=

+
𝛼



𝑥




3.
Repeat until the error



𝑥

is smaller than
some predefined threshold

𝛼

is the learning rate, usually
0
.
1

Linear
Separability

Perceptrons

can classify any input that is linearly
separable.

For more complex problems we need a more complex
model.

Different
Non
-
Linearly Separable
Problems

Structure

Types of

Decision Regions

Exclusive
-
OR

Problem

Classes with

Meshed regions

Most General

Region Shapes

Single
-
Layer

Two
-
Layer

Three
-
Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

A

B

B

A

A

B

B

A

A

B

B

B

A

B

A

B

A

Calculating the Weights

The weights are a
1
×
𝑛

vector of parameters where
we need to find a global optimum

Could be solved by:

Simulated annealing

Genetic algorithms

http://

Perceptron learning rule is
descent.

Learning the Weights in a Neural
Network

Perceptron learning rule (gradient descent) worked
before, but it required us to know the correct
output of the node.

How do we know the correct output of a given
hidden node??

Backpropagation

Algorithm

network
weight
vector

Easily generalized to arbitrary directed graphs

Will find a local, not necessarily global error
minimum

in
practice often works well (can be invoked multiple
times with different initial weights)

Backpropagation

Algorithm

1.
Initialize the weights to some random values (or 0)

2.
For each sample
(


,


)

in the training set

1.
Calculate the current output
of the node,

𝑥

2.
For each output node
𝑘
, update
the
weights

3.
For each hidden node, update the weights

3.
For all network weights do

4.
Repeat until weights converge or desired accuracy is achieved


=

(

𝑥

)
(
1

𝑥

)



𝑥



=

(

𝑥

)
(
1

𝑥

)


,





,

=

,

+
𝛼





Intuition

General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to

The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node

This is the same as the perceptron learning rule, but
for a sigmoid decision function instead of a step
decision function (full derivation on p. 726)

=

+
𝛼



𝑥

(

𝑥

)
(
1

𝑥

)



Intuition

General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to

The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node

Intuition

When expanded, the update
to the output nodes is
almost the same
as the perceptron rule

Slight difference is that the algorithm uses a
sigmoid function instead of a step function

(
full derivation on p. 726)

=

+
𝛼



𝑥

(

𝑥

)
(
1

𝑥

)



Questions