Contemporary Learning Theory

geographertonguesAI and Robotics

Nov 30, 2013 (3 years and 11 months ago)

84 views

Contemporary Learning
Theory

Dr Pam Blundell

Lecture Five

Today

Introduce instrumental conditioning

How it

s different from Pavlovian
conditioning

Describe theoretical accounts of
instrumental conditioning

Objectives

At the end of this lecture, students
should be able to:

Describe instrumental conditioning
procedures

Discuss what is learnt in instrumental
conditioning

Evaluate formal models of instrumental
conditioning


Reading


Dickinson p102
-
122


Pearce Ch 4

Instrumental conditioning

Both
prediction

and
control
are
required for successful adaptation in a
changing environment


Instrumental conditioning

Instrumental behaviour refers to those
actions whose acquisition and
maintenance depend upon the fact that
the action is
instrumental

in causing
some outcome

Allows us (and animals) to control our
environment in service of our needs and
desires

Instrumental conditioning

Consider approach to a food source

Hungry chick will learn to approach a bowl

Instrumental analysis suggests the animal
is sensitive to the contingency between its
own behaviour, and access to food

Pavlovian account suggests that predictive
relationship between bowl and food is
important

Instrumental conditioning

In a stable environment, we cannot
discriminate between those two
accounts of the behaviour

We need to change the causal structure
of the environment to determine what
is governing behaviour

Hershberger (1986)

Arranged a looking glass world

Approach to a food bowl actually increased
the distance to the food bowl

Pavlovian animal (insensitive to the consequences
of it

s actions) would never be able to adapt

Instrumental animal would learn to withdraw

Chicks showed little evidence of learning to
run away, across 100 minutes training


thus
not sensitive to instrumental contingencies

Miller & Konorski (1969)

Passive dog legflex, in presence of stimulus,
paired with food.

After a number of pairings, dog began to flex
leg in presence of stimulus

At odds with the notion of stimulus substitution

Termed

type II conditioning

.

But doesn

t demonstrate the instrumental
character of type II conditioning

Grindley (1932)

Guinea pigs

Trained to turn head to left of right when
buzzer sounded to receive food

Reversed the contingency (ie making them
turn the head the other way)

Animals could perform this

So S
-
O (pavlovian) kept constant (buzzer
-
food)

Behaviour changed

Instrumental conditioning

Many

instrumental


tasks have a strong
Pavlovian component

Pigeon key peck

Runways

Mazes

Free operant lever pressing in rats is a
fairly pure instrumental task

Free operant lever pressing

Is sensitive to contingency reversal
(David & Bitterman, 1971)

Trained rats to lever press

Then changed contingency


either no
contingency, or press postponed delivery
of food

Postponed group reduced responding more

Bolles, Holtz, Dunn & Hill (1980)

Trained rats to press a lever down and push
a lever up for food


which action randomly
determined, so rats tended to alternate

Punish one category of responding with shock

Suppression only of the response that was
punished


sensitivity to the consequences of
their actions

REINFORCEMENT

e.g., food

PUNISHMENT

e.g., shock

OMISSION

e.g., NO food

ESCAPE

AVOIDANCE

e.g., NO shock

Effect of the
instrumental response:



APPETITIVE


AVERSIVE


Nature of the event

PRODUCES AN EVENT

TURNS OFF OR
PREVENTS
PRESENTATION OF AN
EVENT

What is learned in instrumental
conditioning?

Earliest explanation of instrumental
conditioning is the Law of Effect
(Thorndike)

Association between stimulus and
response, strengthened by presentation
of a reinforcer

What is learned in instrumental
conditioning

As is Pavlovian conditioning, this
suggests no knowledge of the
consequences of the action

Instrumental action is simply a habitual
response triggered by the training
stimuli


Drive


will potentiate habits

Tolman (1932, 1959)

Cognitive theory of instrumental action

Belief about consequences of action
(mean
-
end readiness)


Value


assigned to outcome, interacts with

expectancy


to produce the behaviour

How to distinguish between the
SR and the cognitive account?

As in Pavlovian conditioning, examine
the effects of changing reward value!

Adams and Dickinson (1981)

(see also Colwill & Rescorla 1986)

LP1


food1

LP2

0

Food 2 delivered non contingently

4 groups: PF, PS, UF, US (paired,
unpaired, food, sucrose)

Test lever pressing in
extinction


Dickinson & Adams (1981)

Supports the cognitive theory of instrumental
action

BUT

some residual responding to the paired

When training reintroduced, devalued foods
ineffective reinforcers, so residual responding
not

due to ineffective devaluation

Perhaps both SR and cognitive going
on??


What is learned in instrumental
conditioning

Problem with Tolman

s account


no
specification of the psychological
mechanism by which expectancies,
beliefs, values interact, causing
instrumental action

Bidirectional theory

Pairing two events not only causes a
forward connection between them, but
also a backwards connection


E1

E2

E1

E2

Bidirectional theory

If E1 is instrumental action and E2 is
the reinforcer


context

action

reinf

Bidirectional theory

Gormezano & Tait (1976)

E1: Airpuff

E2: Water delivery

If backward associations form, then
water delivery should elicit eyeblink

But it doesn

t



Bidirectional theory

Can

t explain punishment






Action is reduced by punishment

context

action

shock

Associative
-
cybernetic model

Associative: involves the formation of a
connection between representation of
action and outcome

Cybernetic: activation of the outcome
representations feedback to modulate
performance


Habit memory

Array of stimulus detecting units linked
to an array of response units

Corresponds to URs or pretrained
responses



Associative memory

Representations of actions, and outcomes

Performance of an action activates the
representation of that action

Contiguous activation of actions in habit and
associative memory allows growth of a
connection between the habit and associative
representations of the action

Making a response becomes associated with
the outcome of that response


Incentive system

Any event in associative memory that has
motivational significance has associations
with units in the incentive system

Innate? But learnable! (see next lecture)

Activation of
reward

units exert a general
and indiscriminate excitation on units in the
motor system

Similarly, activation of
punishment

units
inhibits all units in the motor system


Associative cybernetic model

Important associations:

Habit response
---

associative action

(ability to detect and represent the animal

s own
behaviour)

Associative action


associative outcome

(ability to detect and represent contingency between
action and outcome)

Associative outcome
---

reward incentive system

(Represents the desires of the animal)


Representations of instrumental
actions

Habit response
---

associative action

(ability to detect and represent the animal

s
own behaviour)

Shettleworth (1975) compared
sensitivity of a variety of behaviours to
food conditioning in hamsters

Rearing could be conditioned

Face washing and scratching couldn

t

Morgan & Nicholas (1979)

Presented two levers into operant chamber,
following rat either face washing or rearing

Animal had to make one response if it had
just reared, or the other if it had just washed

They could learn this

Second group, scratching and face washing

Couldn

t learn this

Scratching and face washing are poorly
represented (no associative memory of
scratching in ass
-
cyb model)


Heyes

Observational learning

Seeing a conspecific carrying out an
action also activates associative
representation of the action

Observer rats pushed pole in same
direction as demonstrator rats

Instrumental learning

Associative action


associative outcome

(ability to detect and represent contingency between
action and outcome)

In instrumental conditioning, it is the
causal

relationship that determines behaviour.

BUT can instrumental behaviour be explained
simply by a sensitivity to the temporal
contiguity of events?

Contiguity


Contiguity

Animals are certainly sensitive to the
contiguity between action and outcome

BUT

learning is still maintained, even with a
30s gap between action and outcome

Perhaps the sensitivity to contiguity is due to
a difficulty in discriminating a causal
relationship in which A

O, from a
noncontingent schedule


in which outcome
occurs frequently, but independent of
behaviour

Contingency

Hammond (1980)

Varied P(O/A) and P(O/
-
A)


Contingency

With P(O/
-
A)=0, higher pressing with higher
P(O/A)

As P(O/
-
A) decreases, so does responding

So outcomes following no response
don

t

act
as a delayed reinforcer (which should
increase responding), and animals appear
sensitive to the
causal

relationships

Alternatively

Perhaps non contingent reinforcers
enhanced Pavlovian approach
behaviours at the expense of the
instrumental responses?

Dickinson & Mulatero (1989)

(also see
Dickinson, Campos, Varga, &
Balleine 1996)

L1


food; L2


sucrose

Present non contiguous food

Only L1 responding reduced

Rats are sensitive to the contingencies

However


We can still claim contiguity as a crucial
element of conditioning, as we can
explain the results of Hammond very
easily

Context


food associations will be
higher in the non continguous groups,
which will block learning the action

food associations (recall blocking).

Signalling the noncontiguous outcomes
increases the rate of instrumental lever
pressing (context is overshadowed by
the signal, Dickinson & Charnock 1985)

In summary

Simple contiguity based learning
process provides as account of the
sensitivity of instrumental performance
to variations in the casual effectiveness
of an action


BUT effect of schedules


Ratio vs interval schedules

Ratio schedules: the more you press
the more you earn! FR15. VR20. RR20

Interval schedules: rate of responding
independent of how much
reinforcement received. FI15. VI25.
RI30.

Dawson & Dickinson 1990

Trained rats to chain pull on either
RR20 or RI schedules.

IRI on RI schedules was determined by
a yoked animal on a RR20 schedule

Temporal distribution of reinforcers was
matched



P(O/A) higher on RI schedule

But RR schedule produces more responding

Sensitive to the causal relationship between
performance and reward rates


Further evidence from Dickinson (1983)
which found Ratio schedules more sensitive
to reward devaluation


Summary

Instrumental action mediated by two
systems

Habit

Associative cybernetic system


Next time we

ll discuss incentive
learning, and Pavlovian instrumental
interactions!