Machine Learning and Motion

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

60 εμφανίσεις

Machine Learning and Motion
Planning

Dave Millman

October 17, 2007

Machine Learning intro


Machine Learning (ML)


The study of algorithms which improve
automatically though experience.
-

Mitchell


General description


Data driven


Extract some information from data


Mathematically based


Probability, Statistics, Information theory,
Computational learning theory, optimization

A very small set of uses of ML


Text


Document labeling, Part of speech tagging,
Summarization


Vision


Object recognition, Hand writing recognition, Emotion
labeling, Surveillance


Sound


Speech recognition, music genra classification


Finance


Algorithmic trading


Medical, Biological, Chemical, on and on and on…

A few types of ML


Supervised


Given: labeled data


Usual goal: learn function


Ex: SVM, Neural Networks, Boosting etc.


Unsupervised


Given: unlabeled data


Usual goal: cluster data, learn conditional
probabilities


Ex: Nearest Neighbors, Decision trees

A few types of ML (cont.)


Semi
-
Supervised


Given: labeled and unlabeled data


Usual goal: use unlabeled data to increase labeled
data


Ex: Cluster, Label unlabeled data from clusters


Reinforcement


Given: Reward function and set of actions


Goal: Learn a function which optimizes the reward
function


Ex: Q
-
learning , Ant
-
Q

General Idea Supervised

General Idea Unsupervised

General Idea Semi
-
Supervised

General Idea Reinforcement


Markov Decision Process (MDP)


State space (fully or partially observable)


Action space (static or time dependant)


Transition function produces an action
(based the present state, not the past)


Reward function (based on action)

Why regression is not
enough…The XOR problem

Text book Q
-
Learning [MI06]


Learning flocking behavior


N agents


discrete time steps


Agent
i
partner

j


Define Q
-
state
Q
(
s
t
,
a
t
)


s
t

-

state


a
i

-

action

Our text book example


State of
i


[
R
] = floor(|
i
-
j
|)


Actions for
i


a
1

-

Attract to
j


a
2

-

Parallel positive orientation to
j


a
3

-

Parallel negative orientation to
j


a
4

-

Repulsion from
j

Reward Function
-

no predator


Distances
R
1
, R
2
, R
3

s.t.
R
1

< R
2

< R
3

s
t

0<[
R
]≤
R
1

R
1
<[
R
]≤
R
2

R
2
<[
R
]≤
R
3

R
3
<[
R
]

a
t

a
4

a
1
,a
2
,a
3

a
2

a
1
,a
3
,a
4

a
1

a
2
,a
3
,a
4

a
1
,a
2
,a
3
, a
4

r

1

-
1

1

-
1

1

-
1

0

Reward Function
-

predator


Distances
R
1
, R
2
, R
3

s.t.
R
1

< R
2

< R
3

s
t

0<[
R
]≤
R
3

R
3
<[
R
]

a
t

a
4

a
1
,a
2
,a
3

a
1
,a
2
,a
3
,a
4

r

1

-
1

0

Don’t repeat work!!


Basic planners work from scratch


Ex, path planning for parking, no
difference between first time and the
hundredth time


Ideal learn some general higher level
“strategies” that can be reused


General solution patterns in the problem
space

Viability Filtering [KP07]


Agent can “see”, perceptual information


Range finder like virtual sensors


Data base of successfully perceptually
-
parameterized motions


From its own experimentation or external source


Database exploited for future queries


Search based off of what has previously been
successful in similar situations.

Sensors in Viability Filtering
some defs


X

set of agent states


E

set of environment

states


def x
+
\
in

X
+

{x
+
=(x,e) |
???
x
\
in X, e
\
in E}


Sensor function

(
x
+
):
X
+


R


At a specific sensor state x
+

\
in X
+



def sensor state
s

=
(

1
(
x
+
), …,

n
(
x
+
)
)


And sensor space

s
\
in S

where
S

all sensor
state values

Finally


Def locally situated state of the agent


=
(s,
x’)

\
in


where x’ is some state information
independent of the sensory agent.


Now we want collect data to train a function



(

):

{viable, nonviable}


Note, errors in

(

) could cause problems

Check Viability not Collision

Function IS_NONVIABLE(x
+
)


if is_collision(x
+
) then



return True


s := (

1
(
x
+
), …,

n
(
x
+
)
)


x’ := extract_internal_state(x
+
)





:= (s, x’)


return ¬

(

):

Results and Further work


Bootstrapping


Use of history to create macroscopic plans


Model transfer

Training a Dog [B02]


MIT lab
-

System where the
user interactively train the
dog using “click training”


Uses acoustic patterns as
cues for actions


Can be taught cues on
different acoustic pattern


Can create new actions from
state space search


Simplified Q
-
learning based
on animal training
techniques

Training a Dog (cont.)


Predictable regularities


animals will tend to successful state


small time window


Maximize use of supervisor feedback


limit the state space by only looking at states that
matter, ex if utterance
u
followed by action
a

produces a reward then utterance
u

is important.


Easy to train


Credit accumulation


And allowing state action pair to delegate credit to
another state action pain.

Alternatives to Q
-
Learning


Q
-
decomp [RZ03]


Complex agent as
set of simpler
subagents


Subagent has its
own reward function


Arbitrator decides
best actions based
on

advice


from
subagents


A simple world with initial state

S
0

and
three terminal states

S
L

,

S
U

,

S
R

, each
with an associated reward of dollars
and
/
or euros. The discount factor is

γ


(0
,
1).

[fig from. RZ03]

Learning Behavior with

Q
-
Decomp [CT06]


Q
-
Decomp as the learning technique


Reward function
-

Inverse
Reinforcement Learning (IRL) [NR00]


Mimicking behavior from an “expert”

Support Vector Path Planning


Idea that uses the SVM algorithm to
generate a smooth path.


Not really Machine learning but neat
application of a ML algortihm


Here is the idea

Support Vector Path Planning

Videos


Robot learning to pick up objects


http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004


Training a Dog


http://characters.media.mit.edu/projects/dobie.html

References


[NR00]
A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning.
In
Proc. 17th International Conf. on Machine Learning
, pages 663
-
670. Morgan
Kaufmann, San Francisco, CA, 2000.


[B02]
B. Blumberg et al. Integrated learning for interactive synthetic characters.
In
SIGGRAPH

02: Proceedings of the 29th annual conference on Computer
graphics and interactive techniques
, pages 417
-
426, New York, NY, USA, 2002.
ACM Press.


[RZ03]
S. J. Russell and A. Zimdars. Q
-
decomposition for reinforcement
learning agents. In
ICML
, pages 656
-
663, 2003


[MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui,
Emergence of Flocking Behavior Based on Reinforcement Learning,
Knowledge
-
Based Intelligent Information and Engineering Systems
, pages 699
-
706, 2006


[CT06]
T. Conde and D. Thalmann. Learnable behavioural model for
autonomous virtual agents: low
-
level learning. In
AAMAS

06: Proceedings of
the fifth international joint conference on Autonomous agents and multiagent
systems,
pages 89
-
96, New York, NY, USA, 2006. ACM Press.


[M06]
J. Miura. Support vector path planning. In
Intelligent Robots and Systems,
2006 IEEE/RSJ International Conference

on, pages 2894
-
2899, 2006.


[KP07]
M. Kalisiak and M. van de Panne. Faster motion planning using learned
local viability models. In
ICRA
, pages 2700
-
2705, 2007.

Machine Learning Ref

[M07] Mehryar Mohri
-

Foundations of Machine Learning course
notes
http://www.cs.nyu.edu/~mohri/ml07.html

[M97] Tom M. Mitchell. Machine learning. McGraw
-
Hill, 1997

RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern
Approach, Prentice Hall Series in Artificial Intelligence.
Englewood Cliffs, New Jersey

[CV95] Corinna Cortes and Vladimir Vapnik, Support
-
Vector
Networks, Machine Learning, 20, 1995.

[V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction
to Computational Learning Theory. MIT Press, 1994.