LINEAR CLASSIFICATION
Biological inspirations
Some numbers…
The human brain contains about 10 billion nerve cells
(neurons)
Each neuron is connected to the others through 10000
synapses
Properties of the brain
It can learn, reorganize itself from experience
It adapts to the environment
It is robust and fault tolerant
Biological neuron (simplified model)
A neuron has
A branching input (dendrites)
A branching output (the axon)
The information circulates from the dendrites to the
axon via the cell body
The cell body sums
up the inputs
in
some way
and
fires
–
generates a signal through the axon
–
if the
result is
greater than some
threshold
An Artificial Neuron

weights

inputs
Definition : Non linear,
parameterized function with
restricted output range
Activation Function
Usually not pictured (we’ll
see why), but you can
imagine a threshold
parameter here.
Same Idea using the Notation in the Book
𝑖
𝑛
=
,
𝑎
𝑛
=
0
𝑎
=
𝑖
𝑛
=
,
𝑎
𝑛
=
0
The Output of a Neuron
As described so far…
This simplest form of a neuron is also called a
perceptron
.
The Output of a Neuron
Other possibilities, such as the sigmoid function for
continuous output.
𝟏
𝟏
+
𝒆
−
𝒏
𝒑
•
𝑖
𝑛
is the activation
of the
neuron
•
𝑝
is a
parameter which
controls the shape of the
curve (usually
1
.
0
)
Linear Regression using a Perceptron
Linear regression:
Find a linear function (straight line)
=
1
+
0
that best predicts the continuous

valued output.
Linear Regression As an Optimization
Problem
Finding the optimal weights could be solved through:
Gradient descent
Simulated annealing
Genetic algorithms
… and now Neural Networks
Linear Regression using a Perceptron
=
1
+
0
1
1
0
1
+
1
0
(
)
The Bias Term
So far we have defined the output of a perceptron
as controlled by a threshold
x
1
w
1
+ x
2
w
2
+ x
3
w
3
… + x
n
w
n
>=
t
But just like the weights, this threshold is a
parameter that needs to be adjusted
Solution: make it another weight
x
1
w
1
+ x
2
w
2
+ x
3
w
3
… + x
n
w
n
+
(1
)(

t
)
>= 0
The bias term.
A Neuron with a Bias Term
Another Example
Assign weights to perform the logical OR operation.
2
1
2
+
1
+
0
≥
0
1
0
0
=
1
=
2
=
Artificial Neural Network (ANN)
A mathematical model to solve engineering
problems
Group of highly connected neurons to realize
compositions of non linear functions
Tasks
Classification
Discrimination
Estimation
Feed Forward Neural Networks
The information is
propagated from the inputs
to the outputs
There are no cycles
between outputs and
inputs
the state of the system is not
preserved from one iteration
to another
x1
x2
xn
…..
1st hidden
layer
2nd hidden
layer
Output layer
ANN Structure
Finite number of inputs
Zero or more hidden layers
One or more outputs
All nodes at the hidden and output layers contain a
bias term.
Examples
Handwriting character recognition
Control of a virtual agent
ALVINN
Neural Network controlled AGV (1994)
961
∗
4
+
5
∗
30
=
3994
weights
http://
blog.davidsingleton.org/nnrccar
Learning
The procedure that consists in estimating the weight
parameters so that the whole network can perform a specific
task
The
Learning process (supervised)
Present the network a number of inputs and their corresponding outputs
See how closely the actual outputs match the desired ones
Modify the parameters to better approximate the desired
outputs
Perceptron Learning Rule
1.
Initialize the weights to some random values (or 0)
2.
For each sample
(
,
)
in the training set
1.
Calculate the current output of the perceptron,
ℎ
𝑥
2.
Update the weights
=
+
𝛼
−
ℎ
𝑥
3.
Repeat until the error
−
ℎ
𝑥
is smaller than
some predefined threshold
𝛼
is the learning rate, usually
0
.
1
Linear
Separability
Perceptrons
can classify any input that is linearly
separable.
For more complex problems we need a more complex
model.
Different
Non

Linearly Separable
Problems
Structure
Types of
Decision Regions
Exclusive

OR
Problem
Classes with
Meshed regions
Most General
Region Shapes
Single

Layer
Two

Layer
Three

Layer
Half Plane
Bounded By
Hyperplane
Convex Open
Or
Closed Regions
Arbitrary
(Complexity
Limited by No.
of Nodes)
A
A
B
B
A
A
B
B
A
A
B
B
B
A
B
A
B
A
Calculating the Weights
The weights are a
1
×
𝑛
vector of parameters where
we need to find a global optimum
Could be solved by:
Simulated annealing
Gradient descent
Genetic algorithms
http://
www.youtube.com/watch?v=0Str0Rdkxxo
Perceptron learning rule is
pretty much gradient
descent.
Learning the Weights in a Neural
Network
Perceptron learning rule (gradient descent) worked
before, but it required us to know the correct
output of the node.
How do we know the correct output of a given
hidden node??
Backpropagation
Algorithm
Gradient descent over entire
network
weight
vector
Easily generalized to arbitrary directed graphs
Will find a local, not necessarily global error
minimum
in
practice often works well (can be invoked multiple
times with different initial weights)
Backpropagation
Algorithm
1.
Initialize the weights to some random values (or 0)
2.
For each sample
(
,
)
in the training set
1.
Calculate the current output
of the node,
ℎ
𝑥
2.
For each output node
𝑘
, update
the
weights
3.
For each hidden node, update the weights
3.
For all network weights do
4.
Repeat until weights converge or desired accuracy is achieved
∆
=
(
ℎ
𝑥
)
(
1
−
ℎ
𝑥
)
−
ℎ
𝑥
∆
=
(
ℎ
𝑥
)
(
1
−
ℎ
𝑥
)
,
∆
,
=
,
+
𝛼
∆
Intuition
General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to
The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node
This is the same as the perceptron learning rule, but
for a sigmoid decision function instead of a step
decision function (full derivation on p. 726)
=
+
𝛼
−
ℎ
𝑥
(
ℎ
𝑥
)
(
1
−
ℎ
𝑥
)
Intuition
General idea: hidden nodes are “responsible” for
some of the error at the output nodes it connects to
The change in the hidden weights is proportional to
the strength (magnitude) of the connection
between the hidden node and the output node
Intuition
When expanded, the update
to the output nodes is
almost the same
as the perceptron rule
Slight difference is that the algorithm uses a
sigmoid function instead of a step function
(
full derivation on p. 726)
=
+
𝛼
−
ℎ
𝑥
(
ℎ
𝑥
)
(
1
−
ℎ
𝑥
)
Questions
Comments 0
Log in to post a comment