Thomas Trappenberg
Learning
:
A
modern review of anticipatory
systems
in brains and machines
2
Outline
Machine Learning
Computational
Neuroscience
1.
Supervised
Learning
Synaptic
Plasticity
2.
Sparse Unsupervised Learning
Cortical Object Recognition
3.
Reinforcement
Learning
Basal Ganglia
Universal Learning machines
1961: Outline of a theory of Thought

Processes
and Thinking Machines
•
Neuronic & Mnemonic Equation
•
Reverberation
•
Oscillations
•
Reward learning
Eduardo Renato Caianiello (1921

1993)
But: NOT STOCHASTIC
(only small noise in weights)
Stochastic networks:
The Boltzmann machine
Hinton & Sejnowski 1983
MultiLayer
Perceptron
(MLP)
Universal
approximator
(learner)
but
Overfitting
Meaningful input
Unstructured learning
Only
deterministic
(just use chain rule)
Linear large margin classifiers
Support Vector Machines (SVM)
MLP: Minimize training error
(here threshold
Perceptron
)
VM
: Minimize generalization error
(empirical risk)
Linear
in parameter learning
Linear in parameters
Thanks to Doug Tweet (
UoT
) for pointing out LIP
Linear hypothesis
Non

Linear hypothesis
SVM in dual form
}
Kernel function
Liquid/echo state machines
Extreme learning machines
Fundamental stochastisity
Irreducible indeterminacy
Epistemological limitations
Sources of fluctuations
Probabilistic framework
Goal of learning:
Make predictions !!!!!!!!!!!
learning vs memory
Goal of learning:
Plant equation for robot
Distance traveled when both motors
are running with Power 50
Hypothesis:
Learning
:
Choose parameters that make training data most likely
The hard problem
:
How to come up with a useful hypothesis
Assume independence of training examples
and consider this as function of parameters (log likelihood)
Maximum
Likelihood
Estimation
How about
building more
elaborate multivariate models
?
Causal (graphical) models
(Judea Pearl)
Parameters of CPT usually learned from data!
and arguing with
10 parameters
31
Hidden Markov Model (HMM) for localization
•
Integrating sensor information becomes trivial
•
Breakdown of point estimates in global localization (particle filters)
Synaptic
Plasticity
Gradient descent rule for LMS loss function:
… with linear hypothesis:
Perceptron
learning rule
Hebb
rule
Donald O. Hebb
The organization of behavior (1949):
(1904

1985)
see also Sigmund Freud,
Law of association by simultaneity
, 1888
15
Classical LTP/LTD
R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)
R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)
R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)
R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)
D. Standage, S. Jalil and T. Trappenberg, Biological Cybernetics 96 (2007)
Data from G.Q. Bi and M.M. Poo, J Neurosci 18 (1998)
Population argument of `weight dependence’
Is Bi and Poo’s weight
dependent STDP data an
experimental artifact?

Three sets of assumptions
(B, C, D)

Their data may reflect population
effects
… with Dominic
Standage
(Queen’s University)
2.
Sparse Unsupervised Learning
Horace Barlow
Possible mechanisms underlying the transformations
of sensory of sensory messages (1961)
``
… reduction of redundancy is an important principle
guiding the organization of sensory messages …
”
Sparsness
&
Overcompleteness
The Ratio Club
minimizing reconstruction error
and sparsity
PCA
27
Geoffrey E. Hinton
Deep believe networks:
The stacked Restricted Boltzmann Machine
sparse
convolutional
RBM
… with Paul
Hollensen
& Warren Connors
Truncated
Cone
Side Scan
Sonar
Synthetic
Aperture Sonar
scRBM
/SVM mine sensitivity: .983
±
.024, specificity:
.954
±
.012
SIFT/SVM mine sensitivity: .970
±
.025, specificity:
.944
±
.008
scRBM
reconstruction
Sonar images
sparse and topographic RBM (
rtRBM
)
… with Paul
Hollensen
…with
Pitoyo
Hartono
Map Initialized
Perceptron
(MIP)
Free

Energy

Based
Supervised Learning:
TD learning generalized
to Boltzmann machines
(
Sallans
& Hinton 2004)
Paul Hollensen:
Sparse, topographic RBM successfully learns to drive the e

puck and avoid
obstacles, given training data (proximity sensors, motor speeds)
RBM features
3
.
Reinforcement learning
2.
Reinforcement learning

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1
From Russel and Norvik
Markov Decision Process (MDP)
If we know all these factors the problem is said to be
fully observable
And we can just sit down and
contemplate
about the problem before moving
Goal: maximize total expected payoff
Two important quantities
policy:
value function:
Optimal Control
Calculate value function (dynamic programming)
Richard Bellman
1920

1984
Bellman Equation for policy
p
Deterministic policies
to simplify notation
Solution: Analytic or Incremental
Value
Iteration:
Bellman Equation for optimal policy
Policy
Iteration:
Chose one policy
calculate corresponding value function
chose better policy based on this value function
For each state evaluate all possible actions
But:
Environment not known a priori
Observability of states
Curse of Dimensionality
Solution:
Online (TD)
POMDP
Model

based RL
Online
value function estimation (
TD learning
)
If the environment is not known,
use Monte Carlo method with bootstrapping
Expected payoff
before taking step
Expected reward after taking step =
actual reward plus discounted expected payoff of next step
=T
emporal
D
ifference
What if the environment is not completely known ?
This leads to the exploration

exploitation dilemma
Online optimal control: Exploitation versus Exploration
On

policy TD learning: Sarsa
Off

policy TD learning: Q

learning
Model

based RL: TD(
l
)
Instead of tabular methods as mainly discussed before, use
function approximator with parameters
q
and gradient descent
with exponential eligibility trace
e
which weights updates with
l
for each step (Satton 1988):
Free Energy

based reinforcement learning
(
Sallans
& Hinton 2004
)
… Paul
Hollensen
Basal
Ganglia
… work with Patrick Connor
Our questions
•
How do humans learn values that guide behaviour?
(human behaviour)
•
How is this implemented in the brain?
(anatomy and physiology)
•
How can we apply this knowledge?
(medical interventions and robotics)
Ivan Pavlov
1849

1936
Nobel Prize 1904
Classical Conditioning
Rescorla

Wagner Model (1972)
Stimulus B
Stimulus A
Reward
Stimulus A
No reward
Wolfram Schultz
Reward Signals in the Brain
Maia & Frank 2011
Disorders with effects
On dopamine system:
Parkinson’s disease
Tourett’s syndrome
ADHD
Drug addiction
Schizophrenia
Adding Biological Qualities to the Model
Input
Rescorla

Wagner Model
Rescorla and Wagner, 1972
Dopamine and Reward
Prediction Error
Schultz, 1998
Striatum
Adding Biological Qualities to the Model
GPe
Striatum
Dual pathway model
e.g. M. Frank
Striatal interactions
and plasticity
e.g. J. Wickens
SLIM: Striatal with Lateral Inhibition Model
Model Equations
Parameters to Vary
>Input Salience
>Tuning Curve Width
>Mean Input Weight
>Std Input Weight
>% Input Connection
>Input Learning Rate
>Mean Lateral Weight
>Std Lateral Weight
>% Lateral Connection
>Lateral Learning Rate
>Number of Neurons
>Activation Threshold
>Activation Slope
>Activation Exponent
>Number of Inputs
>Input Noise
>Reward Salience
Acquisition:
(Pavlov, 1927)
CS+
(Classical
Conditioning)
V(S) = R
(Reinforcement
Learning)
Conditioning Paradigms 1
Extinction:
CS

(Classical
Conditioning)
V(S) = 0
(Reinforcement
Learning)
Conditioning Paradigms 2
Partial (probabilistic) Conditioning:
CS+/

V(S) = P
Conditioning Paradigms 3
Negative Patterning:
(Woodbury, 1943)
CS
A
+,CS
B
+,CS
AB
–
V(S
A
) = R
V(S
B
) = R
V(S
AB
) = 0
Conditioning Paradigms 4
… with Patrick and Laurent
Mattina
ENSTA

Bretagne, Brest, France
Simulating Hyperactivity in ADHD using Reinforcement Learning
Higher initial value
looses more interest
more switching
Works only when taking a switching cost into account
Conclusion and Outlook
Three basic categories of learning:
Supervised:
Lots of progress through statistical learning theory
Kernel machines, graphical models, etc
Unsupervised:
Hot research area with some progress,
deep temporal learning
Reinforcement:
Important topic in animal behavior,
model

based RL
62
The Anticipating Brain
learn to model the world
… argue about it
… and act accordingly
Comments 0
Log in to post a comment