Modeling with neural networks

cracklegulleyAI and Robotics

Oct 19, 2013 (4 years and 20 days ago)

164 views

1

Modeling with neural networks

Garrison W. Cottrell

Gary's Unbelievable Research Unit (GURU)

Computer Science and Engineering Department

Temporal Dynamics of Learning Center

Institute for Neural Computation

UCSD

Modeling with Neural Networks

2

Ways to understand

how the brain works


Behavioral measures


Choices


Reaction times


Eye movements



Brain imaging


PET


fMRI


MEG


EEG


NIRS


DTI



Neural recording


Single cell recording


Multicell recording


Optical imaging


Voltage
-
sensitive dyes


Optogenetics


ECOG



Modeling


Neural networks


Bayesian Models


Abstract Mathematical Models

Modeling with Neural Networks

3

Why model?



Models rush in where theories fear to tread.



Models can be manipulated in ways people cannot



Models can be analyzed in ways people cannot.


Modeling with Neural Networks

4

Models rush in

where theories fear to tread


Theories are high level
descriptions

of the
processes underlying behavior.


They are often not explicit about the
processes involved.


They are difficult to reason about if no
mechanisms are explicit
--

they may be
too high level to make explicit predictions.


Theory formation itself is difficult.

Modeling with Neural Networks

5

Models rush in where theories fear to
tread


Using machine learning techniques, one can often build a
working model

of a task for which we have no theories or
algorithms (e.g., expression recognition).


A working model provides an “intuition pump” for how
things
might

work, especially if they are “neurally
plausible” (e.g., development of face processing
-

Dailey
and Cottrell).


A working model may make unexpected predictions (e.g.,
the Interactive Activation Model and SLNT).

Modeling with Neural Networks

6

Your first neural net:

The Interactive Activation Model:

A model of reading from print




Word level




Letter level



Feature level

Modeling with Neural Networks

7

Operation of the model


Modeling with Neural Networks

8

Operation of the model


Modeling with Neural Networks

9

Example of data accounted for…

Pseudoword effect


Modeling with Neural Networks

10

Example of data accounted for…

Pseudoword effect

Modeling with Neural Networks

11

Example of data predicted


What about non
-
pronounceable non
-
words like SLNT?



SLNT has a lot of friends at the word level



The model predicts that there should be a superiority effect for SLNT.



The tested this in UCSD Psychology sophomores and got the predicted
effect

Modeling with Neural Networks

12

Summary


Why model?


Models make assumptions explicit


Models (because they are run on a computer and can be
highly non
-
linear) can make unexpected predictions


While no model is “correct”, the more data a model
predicts, the more we “believe” that model…

Modeling with Neural Networks

13

Models can be manipulated in ways
people cannot


We can see the effects of variations in cortical
architecture (e.g., split (hemispheric) vs. non
-
split
models (Shillcock and Monaghan word perception
model)).


We can see the effects of variations in processing
resources (e.g., variations in number of hidden
units in Plaut et al. models).

Modeling with Neural Networks

14

Models can be manipulated in ways
people cannot


We can see the effects of variations in environment
(e.g., what if our parents were cans, cups or books
instead of humans? I.e., is there something special
about face expertise versus visual expertise in
general? (Sugimoto and Cottrell, Joyce and
Cottrell)).


We can see variations in behavior due to different
kinds of brain damage within a single “brain” (e.g.
Juola and Plunkett, Hinton and Shallice).

Modeling with Neural Networks

15

Models can be analyzed in ways
people cannot

In the following, I specifically refer to neural network models.



We can do single unit recordings.


We can selectively ablate and restore parts of the network,
even down to the single unit level, to assess the contribution
to processing.


We can measure the individual connections
--

e.g., the
receptive and projective fields of a unit.


We can measure responses at different layers of processing
(e.g., which level accounts for a particular judgment:
perceptual, object, or categorization? (Dailey et al. J Cog
Neuro 2002).

Modeling with Neural Networks

16

How
(I like)

to build Cognitive Models


In a domain where there is a
lot of data and controversy
!


I like to be able to relate them to the brain, so “neurally
plausible” models are preferred
--

neural nets.


The model should be a
working model

of the
actual

task,
rather than a cartoon version of it.


Of course, the model should nevertheless be
simplifying

(i.e. it should be constrained to the essential features of the
problem at hand):


Do we really need to model the (supposed) translation invariance
and size invariance of biological perception?


As far as I can tell, NO!


Then, take the model “as is” and fit the experimental data:
No fitting parameters is to be preferred over 1, 2 , or 3.

Modeling with Neural Networks

17

The
other

way (I like) to build
Cognitive Models


In domains where there is
little data and much mystery


Use them as
exploratory

models
--

in domains where there
is little direct data (e.g. no single cell recordings in infants
or undergraduates) to suggest what we might find if we
could

get the data. These can then serve as “intuition
pumps.”


Examples:


Why we might get specialized face processors


Why those face processors get recruited for other tasks

Modeling with Neural Networks

18

A few giants

Modeling with Neural Networks

19

A few giants

Frank Rosenblatt invented the
perceptron
:



One of the first neural
networks to learn by
supervised training



Still in use today!

Modeling with Neural Networks

20

A few giants

Dave E. Rumelhart with Geoff
Hinton and Ron Williams
invented back
-
propagation



Many had invented back
propagation before; few could
appreciate as deeply as Dave
did what they had when they
discovered it.

Modeling with Neural Networks

21

A few giants



Hal White was a theoretician of
neural networks



Hal White’s paper with Max
Stinchcombe:



“Multilayer feedforward networks are
universal approximators” is his
second most
-
cited paper, at 8,114
cites.

Modeling with Neural Networks

22

A few giants


In yet another paper (in
Neural
Computation,
1989), he wrote



The premise of this article is that
learning procedures used to train
artificial neural networks are
inherently statistical techniques. It
follows that statistical theory can
provide considerable insight into the
properties, advantages, and
disadvantages of different network
learning methods…”

This was one of the first papers to make
the connection between neural
networks and statistical models
-

and
thereby put them on a sound
statistical foundation.

Modeling with Neural Networks

23

What is backpropagation, and why
is/was it important?


We have billions and billions of neurons
that somehow work together to create the
mind.


These neurons are connected by 10
14

-

10
15
synapses, which we think encode the
“knowledge” in the network
-

too many for
us to explicitly program them in our
models


Rather we need some way to
indirectly
set them via a procedure that will achieve
some goal by changing the synaptic
strengths (which we call weights).


This is called
learning

in these systems
.

Modeling with Neural Networks

24

Learning: A bit of history


Frank Rosenblatt studied a simple version of a neural net
called a
perceptron:


A single layer of processing


Binary output


Can compute simple things like (some) boolean functions (OR,
AND, etc.)

Modeling with Neural Networks

25

Learning: A bit of history


net input


output

Modeling with Neural Networks

26

Learning: A bit of history


Modeling with Neural Networks

27

Learning: A bit of history


Rosenblatt (1962) discovered a learning rule for perceptrons called
the
perceptron convergence procedure
.



Guaranteed to learn anything computable (by a two
-
layer
perceptron)



Unfortunately, not everything was computable (Minsky & Papert,
1969)

Modeling with Neural Networks

28

Perceptron Learning Demonstration


Output activation rule:


First, compute the
net input

to the output unit:



w
i
x
i
=

net


Then, compute the output as:


If
net





then output =
1




else output =
0

net input

output

Modeling with Neural Networks

29

Perceptron Learning Demonstration


Output activation rule:


First, compute the
net input

to the output unit:



w
i
x
i
=

net


If
net





then output =
1




else output =
0


Learning rule:

If output is
1

and should be
0
, then
lower
weights to active inputs
and
raise

the threshold (

)

If output is
0

and should be
1
, then
raise
weights to active inputs
and
lower

the threshold (

)


(“active input” means x
i
=
1
, not
0)


Modeling with Neural Networks

30

Characteristics of perceptron learning



Supervised learning: Gave it a set of input
-
output examples
for it to model the function (a
teaching signal
)



Error correction learning: only correct it when it is wrong.



Random presentation of patterns.



Slow! Learning on some patterns ruins learning on others.


Modeling with Neural Networks

31

Perceptron Learning Made Simple


Output activation rule:


First, compute the
net input

to the output unit:



w
i
x
i
=

net


If
net





then output =
1





else output =
0


Learning rule:

If output is
1

and should be
0
, then
lower
weights to
active inputs and
raise

the threshold (

)

If output is
0

and should be
1
, then
raise
weights to
active inputs and
lower

the threshold (

)


Modeling with Neural Networks

32

Perceptron Learning Made Simple


Learning rule:

If output is
1

and should be
0
, then
lower
weights to
active inputs and
raise

the threshold (

)

If output is
0

and should be
1
, then
raise
weights to
active inputs and
lower

the threshold (

)



Learning rule:




w
i
(t+1) =
w
i
(t) +

*(teacher
-

output)*
x
i





(


is the
learning rate
)

Modeling with Neural Networks

33

Perceptron Learning Made Simple


Learning rule:

If output is
1

and should be
0
, then
lower
weights to active inputs
and
raise

the threshold (

)

If output is
0

and should be
1
, then
raise
weights to active inputs
and
lower

the threshold (

)


Learning rule:




w
i
(t+1) =
w
i
(t) +

*(teacher
-

output)*
x
i





(


is the
learning rate
)


This is known as the
delta rule

because learning is based
on the
delta

(difference) between what you did and what
you should have done:


= (teacher
-

output)


Modeling with Neural Networks

34

Problems with perceptrons



The learning rule comes with a great guarantee: anything a
perceptron can
compute
, it can
learn to compute.



Problem: Lots of things were not computable,


e.g., XOR (Minsky & Papert, 1969)



Minsky & Papert said:


if you had hidden units, you could compute
any

boolean function.


But no learning rule exists for such multilayer networks,
and we
don’t think one will ever be discovered.

Modeling with Neural Networks

35

Problems with perceptrons

Modeling with Neural Networks

36

Aside about perceptrons



They didn’t have hidden units
-

but Rosenblatt assumed
nonlinear preprocessing!



Hidden units compute features of the input



The nonlinear preprocessing is a way to choose features by
hand.



Support Vector Machines essentially do this in a principled
way, followed by a (highly sophisticated) perceptron
learning algorithm.

Modeling with Neural Networks

37

Enter Rumelhart, Hinton, & Williams (1985)


Discovered a learning rule for networks with hidden units.



Works a lot like the perceptron algorithm:



Randomly choose an input
-
output pattern


present the input, let activation propagate through the network


give the
teaching signal


propagate the error back through the network (hence the name
back propagation
)


change the connection strengths according to the error

Modeling with Neural Networks

38

Enter Rumelhart, Hinton, & Williams (1985)


The actual algorithm uses the chain rule of calculus to go
downhill
in
an error measure with respect to the weights


The hidden units must learn features that solve the problem

. . .

. . .

Activation

Error

INPUTS

OUTPUTS

Hidden Units

Modeling with Neural Networks

39

XOR


Here, the hidden units learned AND and OR
-

two features
that when combined appropriately, can solve the problem

Back Propagation

Learning

Random Network

XOR Network

AND

OR

Modeling with Neural Networks

40

XOR

But, depending on initial conditions, there are an infinite
number of ways to do XOR
-

backprop can surprise you
with innovative solutions.

Back Propagation

Learning

Random Network

XOR Network

AND

OR

Modeling with Neural Networks

41

Why is/was this wonderful?



Efficiency


Learns internal representations


Learns internal representations


Learns internal representations


Generalizes to
recurrent networks

Modeling with Neural Networks

42

Hinton’s Family Trees example


Idea: Learn to represent relationships between people that
are encoded in a family tree:

Modeling with Neural Networks

43

Hinton’s Family Trees example


Idea 2: Learn
distributed
representations of concepts:




localist outputs


Learn:
features

of these

entities useful for

solving the task

Input: localist people

localist relations

Localist
: one unit “ON” to represent each item

Modeling with Neural Networks

44

People hidden units: Hinton diagram










What does the

unit 1 encode?




What is unit 1 encoding?

Modeling with Neural Networks

45

People hidden units: Hinton diagram











What does

unit 2 encode?




What is unit 2 encoding?

Modeling with Neural Networks

46

People hidden units: Hinton diagram










Unit 6?




What is unit 6 encoding?

Modeling with Neural Networks

47

People hidden units: Hinton diagram

When all three are on, these units pick out Christopher and Penelope:


Other combinations pick out other parts of the trees

Modeling with Neural Networks

48

Relation units















What does the lower middle one code?

Modeling with Neural Networks

49

Lessons


The network learns features
in the service of the
task
-

i.e., it learns features on its own.



This is useful if we don’t know what the features
ought to be.



Can explain some human phenomena


Modeling with Neural Networks

50

Thanks to funders, GURONS, and you!

Questions?