Neural Networks - KBS

apricotpigletΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 19 μέρες)

109 εμφανίσεις

Neural Networks

Slides from: Doug Gray, David Poole

What is a Neural Network?


Information processing paradigm that is
inspired by the way biological nervous
systems, such as the brain, process
information


A method of computing, based on the
interaction of multiple connected
processing elements

What can a Neural Net do?

Compute a known function

Approximate an unknown function

Pattern Recognition

Signal Processing


Learn to do any of the above


Basic Concepts


A Neural Network generally
maps a set of inputs to a set
of outputs


Number of inputs/outputs is
variable


The Network itself is
composed of an arbitrary
number of nodes with an
arbitrary topology




Basic Concepts


Definition of a node:



A node is an element
which performs the
function


y =
f
H
(∑(
w
i
x
i
) +
W
b
)


Node

Connection

Properties

Inputs are flexible


any real values


Highly correlated or independent

Target function may be discrete
-
valued, real
-
valued, or vectors of discrete or real values


Outputs are real numbers between 0 and 1

Resistant to errors in the training data

Long training time

Fast evaluation

The function produced can be difficult for
humans to interpret

Perceptrons

Basic unit in a neural network

Linear separator

Parts


N inputs, x1 ...
xn


Weights for each input, w1 ...
wn


A bias input x0 (constant) and associated
weight w0


Weighted sum of inputs, y = w0x0 + w1x1 + ...
+
wnxn


A threshold function (activation function),
i.e

1 if
y > 0,
-
1 if y <= 0

Diagram

x1

x2

.

.

.

xn

Σ

Threshold

y = Σ wixi

x0

w0

w1

w2

wn

1 if y >0

-
1 otherwise

Typical Activation Functions

F(x) = 1 / (1 + e

x
)




Using a nonlinear

function which
approximates a linear
threshold allows a

network to approximate
nonlinear functions


Simple Perceptron

Binary logic application

f
H
(x) = u(x) [linear threshold]

W
i

= random(
-
1,1)


Y = u(W
0
X
0

+ W
1
X
1

+ W
b
)


Now how do we train it?

Basic Training

Perception learning rule


ΔW
i

= η * (D


Y) * X
i


η = Learning Rate

D = Desired Output


Adjust weights based on how well the
current weights match an objective

Logic Training

Expose the network to the logical
OR operation

Update the weights after each
epoch


As the output approaches the
desired output for all cases, ΔW
i
will
approach 0

X
0

X
1

D

0

0

0

0

1

1

1

0

1

1

1

1

Results

W
0

W
1

W
b

Details

Network converges on a hyper
-
plane decision
surface

X
1

= (W
0
/W
1
)X
0

+ (W
b
/W
1
)

X
1

X
0

Feed
-
forward neural networks

Feed
-
forward neural networks are the
most common
models.

These are directed acyclic graphs:

Neural Network for the news
example

Axiomatizing the Network

The values of the attributes are real numbers.

Thirteen parameters
w0; … ;w12 are real numbers.

The attributes
h1 and h2 correspond to the values of

hidden units.

There are 13 real numbers to be learned. The
hypothesis space is thus a 13
-
dimensional real space.

Each point in this 13
-
dimensional space corresponds
to a particular logic program that predicts a value for
reads
given
known, new, short, and home

Prediction Error

Neural Network Learning

Aim of neural network learning: given a set
of examples, find parameter settings that
minimize the error.

Back
-
propagation learning is gradient
descent search through the parameter
space to minimize the
sum
-
of
-
squares
error.

Backpropagation Learning

Inputs:


A network, including all units and their
connections


Stopping Criteria


Learning Rate (constant of proportionality of
gradient
descent search)


Initial values for the parameters


A set of classified training data

Output: Updated values for the parameters

Backpropagation Learning
Algorithm

Repeat


evaluate the network on each example given
the
current parameter settings


determine the derivative of the error for each
parameter


change each parameter in proportion to its
derivative

until the stopping criteria is met

Gradient Descent for Neural Net
Learning

Bias in neural networks and
decision trees

It’s easy for a neural network to represent “at least
two of
I
1
, …, I
k

are true”:



w
0
w
1

w
k



-
15 10 10

This concept forms a large decision tree.


Consider representing a conditional: “If
c then a
else b”:


Simple in a decision tree.


Needs a complicated neural network to represent


(c ^ a) V (~c ^ b).

Neural Networks and Logic

Meaning is attached to the input and
output units.

There is no a priori meaning associated
with the hidden
units.

What the hidden units actually represent is
something
that’s learned.