neural-networks

cracklegulleyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 19 μέρες)

90 εμφανίσεις

Neural Networks

Kostas Kontogiannis

E&CE

General Concepts


Neurons: the cells that perform information processing in
the brain. It is the fundamental functional unit of all
nervous system tissue, including brain


Soma: The neuron’s cell body


Dendrites: collection of fibers branching out of the soma
body cell


Axon: A single long fiber in the collection of dendrites.
Eventually, the axon also branches into strands and sub
-
strands that connect to the dendrites and cell bodies of
other neurons


Synapse: The point where stands from two neurons
connect

Neural Networks


A neural network is composed of a number of nodes, or
units, connected by links. Each link has a numeric weight
associated with it.



Weights are the primary means of long
-
term storage in
neural networks, and learning usually takes place by
updating the weights.



Each unit has a set of input links from other units, a set of
output links to other units, a current activation level, and a
means of computing the activation level at the next step in
time, given its inputs and weights.


Neural Networks


To build a neural network to perform some task, one must
first decide how many units are to be used, what kind of
units, and how the units are connected to form a network.



One then initializes the weights of the network, and “trains”
the weights using a learning algorithm applied to a set of
training examples for the task.



The use of examples also implies that one must decide how
to encode the examples in terms of inputs and outputs of the
network.

Neural Networks


To build a neural network to perform some task, one must
first decide how many units are to be used, what kind of
units, and how the units are connected to form a network.



One then initializes the weights of the network, and “trains”
the weights using a learning algorithm applied to a set of
training examples for the task.



The use of examples also implies that one must decide how
to encode the examples in terms of inputs and outputs of the
network.

Simple Computing Elements


Each unit performs a simple computation: It receives signals
from its input links and computes a new activation level that
it sends along each of its output links.


The computation of the activation level is based on the
values of each input signal received from a neighboring
node, and the weights of each input link.


The computation is split into two components. First is a
linear function
in
i

that computes weighted sum of the unit’s
input values. Second is a nonlinear component called the
activation function
g
, that transforms the weighted sum into
the final value that serves as the unit’s activation value
a
i
.

Models for Activation Functions


Different models are obtained by using different mathematical
functions for
g
. Three common choices are the step, sign, and
sigmoid functions.





+1

t

in
i

-
1

+1

in
i

+1

Step Sign Sigmoid

Network Structures


There are a variety of kinds of network structure, each of
which results in a very different computational properties.


The main distinction is between
feed
-
forward

and
recurrent

networks.


In a feed
-
forward network, the links can form arbitrary
topologies. In essence these networks are DAGs.


Usually we deal with networks that are arranged in layers. In
a layered feed
-
forward network, each unit is linked only to
the units in the next layer; there are no links between units in
the same layer, no links backward to a previous layer, and no
links that skip a layer.

Fundamental Network Types


Hopfield Networks
: They use bi
-
directional connections with symmetric
weights; all of the units are input and output units, the activation function
g

is
the sign function; and the activation levels can only be +1 or
-
1.


Boltzmann Machines
: also use symmetric weights, but include units that are
neither input nor output units. They also use a
stochastic

activation function,
such that the probability of the output being 1 is some function of the total
weighted input.



Networks with no hidden units are called perceptrons.


Input units are directly connected to the external input sources. Output units are
connected to the observed output. Hidden units are neither connected to input
sources nor the observed output.


Networks with one or more layers of hidden units are called multi
-
layer
networks.

Perceptron Neural Network Learning


function NEURAL
-
NETWORK
-
LEARNING(
examples
) returns
network


network

= a network with randomly assigned weights;


repeat


for each
e

in
examples

do


O = NEURAL
-
NETWORK
-
OUTPUT(
network
,

e
);


T = the observed output values from
e
;


update the weights in

network

based on
e
, O, T;


end


until all examples correctly predicted or stopping criterion is reached


return
network



Essentially


Err = T
-

O


W
j
= W
j

+ (a * I
j

* Err)



Multi
-
Layer Feed
-
Forward Networks


Initial work in the 1950’s.



Learning algorithms for multi
-
layer are neither efficient, nor can guarantee that
they can converge to a global optimum



On the other hand, learning general functions from examples is an intractable
problem in the worst case





The most popular method for learning in multi
-
layer networks is called back
-
propagation.


Back Propagation Learning


Learning in multi
-
layer feed
-
forward networks using back
-
propagation
proceeds the same way as for perceptrons: example inputs are presented to the
network, and if the network computes an output vector that matches the output,
nothing is done. If there is an error, then the weights are adjusted to reduce the
error.



The trick is to assess the blame for an error and divide it among the
contributing weights. In perceptrons, this is easy because there is only one
weight between each input and the output. But in multilayer networks, there
are many weights connecting each input to an output, and each of these
weights contributes to more than one output



The back
-
propagation algorithm is a sensible approach to dividing the
contribution of each weight.


Back Propagation Learning


As in the perceptron learning algorithm, we try to minimize the error between
each target output and the output value computed by the network.


At the output layer, the weight update rule is very similar to the rule for the
perceptrons. However, there are two differences: The activation of the hidden
unit aj is used instead of the input value, and the rule contains a term for the
gradient of the activation function.


If Erri is the error Ti
-

O at the output node, then the weight update rule for the
link from unit j to unit i is



W
j,i

= W
j,i

+ (alpha * a
j

* Err
i

* g’(in
i
)



where g’ is the derivative of the activation function g, and the above can be
rewritten as:


W
j,i

= W
j, i

+ alpha * a
j

* Delta
i


Back Propagation Learning


On the previous formula, for updating the connections between the input units
and the hidden units, we need to define a quantity analogous to the error term
for output nodes.


The idea is that hidden node j is “responsible” for some fraction of the error
Delta
i
, in each of the output nodes to which it connects. Thus, the Delta
i

values
are divided according to the strength of the connection between the hidden
node and the output node, and propagated back to provide the Delta
i

values for
the hidden layer. The propagation rule for the Delta values is the following:



Delta
i

= g’(in
j
) * Sum
i
(W
j,i

* Delta
i
)


Now the update rule for the weights between the inputs and the hidden layer is
almost identical to the update rule for the output layer:



W
k,j

= W
k,j

+ (alpha * I
k

* Delta
j
)


Back Propagation Learning


The learning algorithm can be summarized as follows:


Compute the Delta values for the output units using the
observed behavior


Starting with the output layer, repeat the following for each
layer in the network, until the earliest (closest to input) hidden
layer is reached


Propagate the Delta values values back to the previous layer


Update the weights between the two layers



Back Propagation Learning Algorithm


Algorithm Back
-
Prop
-
Update(network, examples, alpha) : new network weights




repeat


for each e in examples do


O = Run
-
Network(network, I
e
)


Err
e

= T
e

-

O


W
j,i

= W
j,i

+ (alpha * a
j

* Err
e

i

* g’(in
i
))




for each subsequent layer in network do


Delta
j

= g’(in
j
) * Sum
i

W
j,i

* Delta
I


W
k,j

= W
k,j

+(alpha * I
k

* Delta
j
)


end


end

until network has converged




Discussion


Expressiveness: Well suited for continuous input/output, but do not have the expressive
power of general logical representations



Computational Efficiency: For m examples and |W| weights each epoch takes O(m|W|)
time. The worst case number of epochs is exponential to the number of inputs



Generalization: Good on generalizing on continuous functions that vary smoothly with
the input



Sensitivity to noise: Very sensitive to noise since they do non
-
linear regression



Transparency: Neural networks are essential black boxes



Prior knowledge: Difficult to chose good training examples, and the best network
topology