Vinay Papudesi and Manfred Huber

hostitchAI and Robotics

Oct 23, 2013 (3 years and 11 months ago)

115 views

Vinay

Papudesi

and Manfred Huber

INTRODUCTION


Staged skill learning involves:


To Begin:


“Skills” are innate reflexes and raw representation of the
world.


The Process:


Abstract away details of learnt skills


Use these abstractions as part of a higher
-
level
representation:


Behavioural results


Affordances


Rinse and repeat

THE DEVELOPMENTAL LEARNER


State representation encodes only those
aspects of the environmental state owing
behavioural and reward implications in the
context of its current capabilities.


A compact representation


Becomes more and more abstract over time



But how to model this?...

STATE
-
SPACES


Three yummy flavours:


External (World
)

State Space

(…maps to…)


Internal State Space


(…composed of…)


Action State Spaces



Internal and External spaces are good friends:


S
i

← I(
S
e
)



Where:

Internal state


=

S
i





External state


=

S
e





Mapping function =

I



Objective:

Don’t hard
-
code mapping function, automate it!



Internal State Space is a vector of Action
S
paces, one for each action
the agent provides…

ACTION
SPACE


An action space is defined as a vector
of paired

(indicator, predicator
)

conditions.


Conditions are
task
-
agnostic


Can be reused for learning different tasks


Improvement over previous work


When an action is performed:


Signals a transition between internal states,
S
1


S
2
.


Observes an outcome from the world,

.


Two conditions are constructed:


Indicator:

C
ind
(S
2
) = oʹ


Predicator:

C
pre
(S
1
) = oʹ

OUTCOMES, GENETIC ALGORITHMS, NON
-
DETERMINISM, OH MY!


World state space is potentially vast


Must measure outcome somehow


Genetic Algorithms (GAs)
are used to train hierarchical,
rule
-
based, classifiers


What if an outcome cannot be accurately
measured?


Classifiers simply flag world state as non
-
deterministic.


Outcome is thus a triple type:


(success%, failure%, undetermined)

‘FIND’ ACTION

“Rotate 360
°

or until an object is visible”

TASKS


With the abstract state space constructed, the
agent can now learn optimal policies for
completing tasks.


Treat the problem as a Markov Decision Process
(MDP).


From some internal state the agent must select an
appropriate action to progress toward completing the
task optimally.


Reinforcement learning is used to compute such
policies:


Select the policy which maximises the expected future
return.


Future reward is estimated from prior experience.

THE TASK MODEL


Must acquire a Task Model


Agent interacts with environment, recording
experiences as it does so.


The internal source and destination states get
updated with new conditions.


The reward function is re
-
computed as the average
reinforcement value over all the recorded
experiences pertaining to the chosen action.


Will eventually converge on the true model

TASK
-
SPECIFIC CONDITIONS


Not all tasks can be optimally represented with
this approach.


Actions are individually encapsulated, knowledge
contained within them is not shared among them.


E.g. ‘GOTO’ and ‘PICK’


Solution is to build ‘bipartition’ states


Allow the GOTO task a condition on whether the item
can be
PICKed
.


… but only if the reward for doing so is significant and
the condition is statistically stable (low variance) and
deterministic.

RESULTS
-

FORAGING


Left:


A hard
-
coded,
expert
-
designed
state space and
policy.



Right:


Dynamically
acquired
equivalent.

RESULTS


STATE SPACE SIZE


As the agent
interacts with the
environment the
proposed
algorithm
maintains a near
-
constant state
space complexity.


The
representation is
continually
abstracted.

RESULTS


POLICY PERFORMANCE


The presented
technique is
comparable to
manually
-
designed
behaviour.


Domain specific
models are slow to
converge.


Their state spaces
are more complex

= harder to learn.

CONCLUSIONARY

SENTIMENTS


The paper describes an approach that constructs
an abstract internal state space that is grounded
in the set of actions that the agent provides.
Reinforcement learning aids in selecting actions to
complete tasks.


By applying an inherently epigenetic design they
have devised a developmental learner that
produces results that are comparable to hand
-
rolled solutions.


Task learning is performed in a bottom
-
up fashion
(actions to tasks), but the representation of new
tasks thereafter can be constructed from the top
-
down using previously acquired state abstractions.