# Machine Learning and Motion

Τεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

67 εμφανίσεις

Machine Learning and Motion
Planning

Dave Millman

October 17, 2007

Machine Learning intro

Machine Learning (ML)

The study of algorithms which improve
automatically though experience.
-

Mitchell

General description

Data driven

Extract some information from data

Mathematically based

Probability, Statistics, Information theory,
Computational learning theory, optimization

A very small set of uses of ML

Text

Document labeling, Part of speech tagging,
Summarization

Vision

Object recognition, Hand writing recognition, Emotion
labeling, Surveillance

Sound

Speech recognition, music genra classification

Finance

Medical, Biological, Chemical, on and on and on…

A few types of ML

Supervised

Given: labeled data

Usual goal: learn function

Ex: SVM, Neural Networks, Boosting etc.

Unsupervised

Given: unlabeled data

Usual goal: cluster data, learn conditional
probabilities

Ex: Nearest Neighbors, Decision trees

A few types of ML (cont.)

Semi
-
Supervised

Given: labeled and unlabeled data

Usual goal: use unlabeled data to increase labeled
data

Ex: Cluster, Label unlabeled data from clusters

Reinforcement

Given: Reward function and set of actions

Goal: Learn a function which optimizes the reward
function

Ex: Q
-
learning , Ant
-
Q

General Idea Supervised

General Idea Unsupervised

General Idea Semi
-
Supervised

General Idea Reinforcement

Markov Decision Process (MDP)

State space (fully or partially observable)

Action space (static or time dependant)

Transition function produces an action
(based the present state, not the past)

Reward function (based on action)

Why regression is not
enough…The XOR problem

Text book Q
-
Learning [MI06]

Learning flocking behavior

N agents

discrete time steps

Agent
i
partner

j

Define Q
-
state
Q
(
s
t
,
a
t
)

s
t

-

state

a
i

-

action

Our text book example

State of
i

[
R
] = floor(|
i
-
j
|)

Actions for
i

a
1

-

Attract to
j

a
2

-

Parallel positive orientation to
j

a
3

-

Parallel negative orientation to
j

a
4

-

Repulsion from
j

Reward Function
-

no predator

Distances
R
1
, R
2
, R
3

s.t.
R
1

< R
2

< R
3

s
t

0<[
R
]≤
R
1

R
1
<[
R
]≤
R
2

R
2
<[
R
]≤
R
3

R
3
<[
R
]

a
t

a
4

a
1
,a
2
,a
3

a
2

a
1
,a
3
,a
4

a
1

a
2
,a
3
,a
4

a
1
,a
2
,a
3
, a
4

r

1

-
1

1

-
1

1

-
1

0

Reward Function
-

predator

Distances
R
1
, R
2
, R
3

s.t.
R
1

< R
2

< R
3

s
t

0<[
R
]≤
R
3

R
3
<[
R
]

a
t

a
4

a
1
,a
2
,a
3

a
1
,a
2
,a
3
,a
4

r

1

-
1

0

Don’t repeat work!!

Basic planners work from scratch

Ex, path planning for parking, no
difference between first time and the
hundredth time

Ideal learn some general higher level
“strategies” that can be reused

General solution patterns in the problem
space

Viability Filtering [KP07]

Agent can “see”, perceptual information

Range finder like virtual sensors

Data base of successfully perceptually
-
parameterized motions

From its own experimentation or external source

Database exploited for future queries

Search based off of what has previously been
successful in similar situations.

Sensors in Viability Filtering
some defs

X

set of agent states

E

set of environment

states

def x
+
\
in

X
+

{x
+
=(x,e) |
???
x
\
in X, e
\
in E}

Sensor function

(
x
+
):
X
+

R

At a specific sensor state x
+

\
in X
+

def sensor state
s

=
(

1
(
x
+
), …,

n
(
x
+
)
)

And sensor space

s
\
in S

where
S

all sensor
state values

Finally

Def locally situated state of the agent

=
(s,
x’)

\
in

where x’ is some state information
independent of the sensory agent.

Now we want collect data to train a function

(

):

{viable, nonviable}

Note, errors in

(

) could cause problems

Check Viability not Collision

Function IS_NONVIABLE(x
+
)

if is_collision(x
+
) then

return True

s := (

1
(
x
+
), …,

n
(
x
+
)
)

x’ := extract_internal_state(x
+
)

:= (s, x’)

return ¬

(

):

Results and Further work

Bootstrapping

Use of history to create macroscopic plans

Model transfer

Training a Dog [B02]

MIT lab
-

System where the
user interactively train the
dog using “click training”

Uses acoustic patterns as
cues for actions

Can be taught cues on
different acoustic pattern

Can create new actions from
state space search

Simplified Q
-
learning based
on animal training
techniques

Training a Dog (cont.)

Predictable regularities

animals will tend to successful state

small time window

Maximize use of supervisor feedback

limit the state space by only looking at states that
matter, ex if utterance
u
followed by action
a

produces a reward then utterance
u

is important.

Easy to train

Credit accumulation

And allowing state action pair to delegate credit to
another state action pain.

Alternatives to Q
-
Learning

Q
-
decomp [RZ03]

Complex agent as
set of simpler
subagents

Subagent has its
own reward function

Arbitrator decides
best actions based
on

from
subagents

A simple world with initial state

S
0

and
three terminal states

S
L

,

S
U

,

S
R

, each
with an associated reward of dollars
and
/
or euros. The discount factor is

γ

(0
,
1).

[fig from. RZ03]

Learning Behavior with

Q
-
Decomp [CT06]

Q
-
Decomp as the learning technique

Reward function
-

Inverse
Reinforcement Learning (IRL) [NR00]

Mimicking behavior from an “expert”

Support Vector Path Planning

Idea that uses the SVM algorithm to
generate a smooth path.

Not really Machine learning but neat
application of a ML algortihm

Here is the idea

Support Vector Path Planning

Videos

Robot learning to pick up objects

http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004

Training a Dog

http://characters.media.mit.edu/projects/dobie.html

References

[NR00]
A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning.
In
Proc. 17th International Conf. on Machine Learning
, pages 663
-
670. Morgan
Kaufmann, San Francisco, CA, 2000.

[B02]
B. Blumberg et al. Integrated learning for interactive synthetic characters.
In
SIGGRAPH

02: Proceedings of the 29th annual conference on Computer
graphics and interactive techniques
, pages 417
-
426, New York, NY, USA, 2002.
ACM Press.

[RZ03]
S. J. Russell and A. Zimdars. Q
-
decomposition for reinforcement
learning agents. In
ICML
, pages 656
-
663, 2003

[MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui,
Emergence of Flocking Behavior Based on Reinforcement Learning,
Knowledge
-
Based Intelligent Information and Engineering Systems
, pages 699
-
706, 2006

[CT06]
T. Conde and D. Thalmann. Learnable behavioural model for
autonomous virtual agents: low
-
level learning. In
AAMAS

06: Proceedings of
the fifth international joint conference on Autonomous agents and multiagent
systems,
pages 89
-
96, New York, NY, USA, 2006. ACM Press.

[M06]
J. Miura. Support vector path planning. In
Intelligent Robots and Systems,
2006 IEEE/RSJ International Conference

on, pages 2894
-
2899, 2006.

[KP07]
M. Kalisiak and M. van de Panne. Faster motion planning using learned
local viability models. In
ICRA
, pages 2700
-
2705, 2007.

Machine Learning Ref

[M07] Mehryar Mohri
-

Foundations of Machine Learning course
notes
http://www.cs.nyu.edu/~mohri/ml07.html

[M97] Tom M. Mitchell. Machine learning. McGraw
-
Hill, 1997

RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern
Approach, Prentice Hall Series in Artificial Intelligence.
Englewood Cliffs, New Jersey

[CV95] Corinna Cortes and Vladimir Vapnik, Support
-
Vector
Networks, Machine Learning, 20, 1995.

[V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction
to Computational Learning Theory. MIT Press, 1994.