Life-Long Autonomous Learning

unknownlippsΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

87 εμφανίσεις

Developmental Mechanisms for

Life
-
Long Autonomous Learning

in Robots

Pierre
-
Yves Oudeyer

Project
-
Team INRIA
-
ENSTA
-
ParisTech

FLOWERS


http://
www.pyoudeyer.com

http://
flowers.inria.fr


Sensorimotor and social
learning:



Autonomous



Open, «

life
-
long

learning »



Real world, physical and social




䕸灥物浥Et慬 v慬楤i瑩潮

Developmental robotics


Intrinsic
Motivation



Maturation


Imitation,


Social guidance

Fundamental understanding of the
mechanisms of development


Application to assistive robotics


Engineered


robot learning


Engineer shows, with fixed interaction
protocol in the lab:




Target:



Regression algorithms (e.g. LGP,
LWPR, Gaussian Mixture Regression)

Action

State/

context

Action

policy


Engineer provides a reward/fitness
function:






Target:




Optimization algorithms
(e.g. NAC,
non
-
linear Nelder
-
Mead, …)

OR

«

Real

» world



Developmental

approach

Which generic reward function
for spontaneous curiosity
driven learning?

Axe 2

?

Behaviour of human (non
-
engineer)


?

Axe 1

Learning from interactions

with non
-
engineers

Non
-
engineer human behaviour


?

1.
Intuitive multimodal interfaces


Synthesis and recognition of
emotion in speech
(
IJHCS, 2001
, 5
patents)


Clicker
-
training
(
RAS, 2002
; 1 patent)


Physical human
-
robot interfaces
(
Humanoids 2011
)


2.
User studies
(
Humanoids 2009, HRI 2011
)


3.
Adaptation: learning flexible
teaching interfaces

(
Conn. Sci., 2006,
ICDL 2011
,
IROS 2010
)



Spontaneous active exploration,
artificial curiosity

in the vicinity of


Non
-
stationary function, difficult to model



Algorithms for empirical evaluation of de/dt with
statistical regression




IAC
(2004, 2007)
, R
-
IAC
(2009)
, SAGG
-
RIAC
(2010)

McSAGG
-
RIAC (
2011
), SGIM (
2011
)

Non !

Intrinsic Motivation

Berlyne (1960), Csikszentmihalyi (1996)

Dayan and Belleine (2002)

Quelle fonction de
récompense générique
?

Exploring and learning generalized
forward and inverse models

Parameterized by

Parameterized by

simple

complexe

complexe

simple

complexe

complexe

Explore zones
where
:


Uncertainty
/
errors

maximal


Least
explored

Assume:


Spatial or temporal
stationarity


Everything

is

learnable

within

lifetime


Which



experiment

?

Developmental
approach


Explore zones where
empirically learning
progress is maximal

Active learning of models

Sensori

state



Action

state

Context

state

Classic

machine
learner

M

(
e.g
. neural net, SVM,
Gaussian

process
)

Meta machine learner

metaM

Progressive categorization

Local model of
learning progress

. . .

Sensori state at t+1

Prediction

Error feedback

Action
selection

system

Intrinsic

reward

Local model of
learning progress

IAC, IEEE Trans. EC (2007)

R
-
IAC, IEEE Trans. AMD (2009)

The Playground Experiments

(IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)

Experimentations


on Open Learning in the Real World

Playground

Experiments



Autonomous

learning

of
novel

affordances and
and
skills
,
e.g
.
object

manipulation


IEEE TEC, 2007;
IROS
2010; IEEE TAMD, 2009; Front.
Neurorobotics
, 2007;
Connect
. Sc., 2006
;

IEEE ICDL 2010,
2011


simple

complex

complex


Self
-
organization of developmental trajectories,
bootstrapping of communication



New hypotheses for understanding infant
development


Front. Neuroscience 2007, Infant and Child Dev. 2008,
Connect. Science 2006

Active learning of inverse models

SAGG
-
RIAC

(RAS, 2012)

(Context, Movement)





Effect

Redundancy of
sensorimotor
spaces

From the active choice of action, followed by observation of
effect …

… to the active choice of effect, followed by the search of a
corresponding action policy through goal
-
directed
optimization
(e.g. using NAC, POWER, PI^2
-
CMA, …)




self
-
defined RL problem

Spontaneous active exploration of a space of fitness functions parameterized by
where one iteratively chooses the which maximizes the empirical evaluation of:





Apprentissage de la locomotion omnidirectionnelle


Performance
higher

than

more
classical

active
learning

algorithms

in real
sensorimotor

spaces

(non
-
stationary
, non
homogeneous
)

(
IEEE TAMD 2009; ICDL 2010, 2011
;
IROS 2010; RAS 2012
)

Experimental

evaluation

of active
learning

efficiency

Control Space:

Task Space:

Maturational constraints


Progressive growths of DOF number and
spatio
-
temporal resolution


Adaptive
maturational

schedule

controlled

by
active
learning
/
learning

progress


(Bjorklund, 1997; Turkewitz and Kenny, 1985)


McSAGG
-
RIAC

Maturationally constrained
curiosity
-
driven learning


(IEEE ICDL
-
Epirob 2011a)

SGIM:
Socially

Guided

Intrinsic

Motivation

(ICDL
-
Epirob, 2011b)


«

Life
-
long

» Experimentation

Acroban

(
Siggraph 2010, IROS 2011
,

World Expo, South Korea, 2012)


Experimentation of
algorithms for «

life
-
long

»
learning in the real world



Technological experimental
platforms:

robust, reconfigurable,
precise, easily repaired,
cheap


Ergo
-
Robots

(Exhibition «

Mathematics, a beautiful elsewhere

»,

Fond. Cartier, 2011
-
2012)


Experimentation of
algorithms for «

life
-
long

»
learning in the real world



Technological experimental
platforms:

robust, reconfigurable,
precise, easily repaired,
cheap



Ergo
-
Robots



Mid
-
term: open
-
source
distribution of the platform to
the scientific community




«

Life
-
long

» Experimentation

Baranes, A., Oudeyer, P
-
Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots,
Robotics and Autonomous Systems.

http://www.pyoudeyer.com/RAS
-
SAGG
-
RIAC
-
2012.pdf


Baranes, A., Oudeyer, P
-
Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor
Development, in Proceedings of IEEE ICDL
-
Epirob 2011.

http://flowers.inria.fr/BaranesOudeyerICDL11.pdf


Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning,
European Conference on Machine Learning (ECML/PKDD),

Bled, Slovenia, 2009.

http://flowers.inria.fr/mlopes/myrefs/09
-
ecml
-
airl.pdf


Nguyen, M., Baranes, A., Oudeyer, P
-
Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in
Proceedings of IEEE ICDL
-
Epirob 2011.
http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf


Oudeyer P
-
Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE
Transactions on Evolutionary Computation, 11(2), pp. 265
--
286.

http://www.pyoudeyer.com/ims.pdf


Baranes, A., Oudeyer, P
-
Y. (2009 )R
-
IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on
Autonomous Mental Development, 1(3), pp. 155
--
169.


Ly, O., Lapeyre, M., Oudeyer, P
-
Y. (2011)
Bio
-
inspired vertebral column, compliance and semi
-
passive dynamics in a
lightweight robot, in Proceedings of
IEEE/RSJ International Conference on Intelligent Robots and Systems

(IROS 2011), San
Francisco, US.


Exploration in Model
-
based Reinforcement Learning by Empirically Estimating Learning Progress
, Manuel Lopes,
Tobias Lang, Marc Toussaint and Pierre
-
Yves Oudeyer.
Neural Information Processing Systems (NIPS 2012)
, Tahoe, USA.
http://flowers.inria.fr/mlopes/myrefs/12
-
nips
-
zeta.pdf


The Strategic Student Approach for Life
-
Long Exploration and Learning
, Manuel Lopes and Pierre
-
Yves Oudeyer.
In
Proceedings of IEEE ICDL
-
Epirob 2012
,
http://flowers.inria.fr/mlopes/myrefs/12
-
ssp.pdf