20-NeuralNetsx

runmidgeAI and Robotics

Oct 20, 2013 (3 years and 11 months ago)

66 views

LINEAR CLASSIFICATION

Biological inspirations


Some numbers…


The human brain contains about 10 billion nerve cells
(neurons)


Each neuron is connected to the others through 10000
synapses



Properties of the brain


It can learn, reorganize itself from experience


It adapts to the environment


It is robust and fault tolerant



Biological neuron (simplified model)


A neuron has


A branching input (dendrites)


A branching output (the axon)


The information circulates from the dendrites to the
axon via the cell body


The cell body sums
up the inputs
in
some way
and
fires



generates a signal through the axon


if the
result is
greater than some
threshold

An Artificial Neuron






-

weights





-

inputs




Definition : Non linear,
parameterized function with
restricted output range


Activation Function

Usually not pictured (we’ll
see why), but you can
imagine a threshold
parameter here.

Same Idea using the Notation in the Book

𝑖
𝑛

=




,

𝑎

𝑛

=
0

𝑎

=

𝑖
𝑛

=




,

𝑎

𝑛

=
0

The Output of a Neuron


As described so far…

This simplest form of a neuron is also called a
perceptron
.

The Output of a Neuron


Other possibilities, such as the sigmoid function for
continuous output.

𝟏
𝟏
+
𝒆


𝒏

𝒑


𝑖
𝑛


is the activation
of the
neuron


𝑝

is a
parameter which
controls the shape of the
curve (usually
1
.
0
)

Linear Regression using a Perceptron


Linear regression:


Find a linear function (straight line)



=

1

+

0

that best predicts the continuous
-
valued output.

Linear Regression As an Optimization
Problem


Finding the optimal weights could be solved through:


Gradient descent


Simulated annealing


Genetic algorithms


… and now Neural Networks


Linear Regression using a Perceptron



=

1

+

0



1


1


0


1

+
1

0


(

)

The Bias Term


So far we have defined the output of a perceptron
as controlled by a threshold


x
1
w
1

+ x
2
w
2

+ x
3
w
3
… + x
n
w
n

>=
t


But just like the weights, this threshold is a
parameter that needs to be adjusted



Solution: make it another weight


x
1
w
1

+ x
2
w
2

+ x
3
w
3
… + x
n
w
n

+
(1
)(
-
t
)

>= 0


The bias term.

A Neuron with a Bias Term

Another Example


Assign weights to perform the logical OR operation.






2


1


2

+

1

+

0

0

1


0


0
=


1
=


2
=

Artificial Neural Network (ANN)


A mathematical model to solve engineering
problems


Group of highly connected neurons to realize
compositions of non linear functions


Tasks


Classification


Discrimination


Estimation

Feed Forward Neural Networks


The information is
propagated from the inputs
to the outputs


There are no cycles
between outputs and
inputs


the state of the system is not
preserved from one iteration
to another

x1

x2

xn

…..

1st hidden

layer

2nd hidden

layer

Output layer

ANN Structure


Finite number of inputs


Zero or more hidden layers


One or more outputs



All nodes at the hidden and output layers contain a
bias term.

Examples


Handwriting character recognition



Control of a virtual agent

ALVINN

Neural Network controlled AGV (1994)

961

4
+
5

30
=
3994

weights


http://
blog.davidsingleton.org/nnrccar

Learning


The procedure that consists in estimating the weight
parameters so that the whole network can perform a specific
task



The
Learning process (supervised)


Present the network a number of inputs and their corresponding outputs


See how closely the actual outputs match the desired ones


Modify the parameters to better approximate the desired
outputs

Perceptron Learning Rule

1.
Initialize the weights to some random values (or 0)

2.
For each sample
(


,


)

in the training set

1.
Calculate the current output of the perceptron,

𝑥


2.
Update the weights



=


+
𝛼




𝑥





3.
Repeat until the error




𝑥


is smaller than
some predefined threshold

𝛼

is the learning rate, usually
0
.
1

Linear
Separability


Perceptrons

can classify any input that is linearly
separable.








For more complex problems we need a more complex
model.

Different
Non
-
Linearly Separable
Problems

Structure

Types of

Decision Regions

Exclusive
-
OR

Problem

Classes with

Meshed regions

Most General

Region Shapes

Single
-
Layer

Two
-
Layer

Three
-
Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

A

B

B

A

A

B

B

A

A

B

B

B

A

B

A

B

A

Calculating the Weights


The weights are a
1
×
𝑛

vector of parameters where
we need to find a global optimum



Could be solved by:


Simulated annealing


Gradient descent


Genetic algorithms


http://
www.youtube.com/watch?v=0Str0Rdkxxo




Perceptron learning rule is
pretty much gradient
descent.

Learning the Weights in a Neural
Network


Perceptron learning rule (gradient descent) worked
before, but it required us to know the correct
output of the node.



How do we know the correct output of a given
hidden node??

Backpropagation

Algorithm


Gradient descent over entire
network
weight
vector



Easily generalized to arbitrary directed graphs


Will find a local, not necessarily global error
minimum


in
practice often works well (can be invoked multiple
times with different initial weights)


Backpropagation

Algorithm

1.
Initialize the weights to some random values (or 0)

2.
For each sample
(


,


)

in the training set

1.
Calculate the current output
of the node,

𝑥


2.
For each output node
𝑘
, update
the
weights



3.
For each hidden node, update the weights



3.
For all network weights do



4.
Repeat until weights converge or desired accuracy is achieved




=

(

𝑥

)
(
1


𝑥

)




𝑥





=

(

𝑥

)
(
1


𝑥

)



,







,

=


,

+
𝛼





Intuition


General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to


The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node




This is the same as the perceptron learning rule, but
for a sigmoid decision function instead of a step
decision function (full derivation on p. 726)




=


+
𝛼




𝑥

(

𝑥

)
(
1


𝑥

)



Intuition


General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to


The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node



Intuition


When expanded, the update
to the output nodes is
almost the same
as the perceptron rule




Slight difference is that the algorithm uses a
sigmoid function instead of a step function


(
full derivation on p. 726)





=


+
𝛼




𝑥

(

𝑥

)
(
1


𝑥

)



Questions