CS599: Reinforcement Learning and Learning Control

crazymeasleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

106 εμφανίσεις

CS599: Reinforcement Learning and

Learning Control

Instructor: Stefan Schaal

3 Cre
dits

Description
: In a mixture of tutorial and seminar style, this course will introduce
and discuss machine learning methods for learning control, particularly with a
focus on r
o
botics, but also applicable to models of learning in biology and any
other co
ntrol process. The course will cover the basics of reinforcement learning
with value functions (dynamic programming, temporal difference learning, Q
-
learning). The emphasis, however, will be on learning methods that scale to
complex high dimensional contro
l problems. Thus, we will cover function a
p-
proximation met
h
ods for reinforcement learning, policy gradients, probabilistic
reinforcement learning, learning from trajectory trials, optimal control methods,
stochastic o
p
timal control methods, dynamic Bayesia
n networks for learning
control, Gau
s
sian processes for reinforcement learning, etc.

Grading
: 2 Paper Presentations per student (20% per presentation), a Final Pr
o-
ject (40%), and Participation in Class (20%).

Prerequisites
:
CS545, CS542 or CS567, or any o
ther graduate level classes that
provided the foundation of machine learning and robotics, or permission by i
n-
structor.

Tentative Syllabus
:

-

Introduction to reinforcement learning

[1]

-

Dynamic programming methods

[1, 2]

-

Optimal control methods

[2, 3]

-

Temporal difference methods

[1]

-

Q
-
Learning

[1]

-

Problems of value
-
function
-
based RL methods

-

Function Approximation for RL

[1]

-

Incremental Function

Approximation Methods for RL

[4, 5]

-

Least Squares Methods

[6]

-

D
irect Policy Learning: REINFORCE

[7]

-

Modern policy gradient methods: GPOMDP and the Policy Gradient The
o-
rem

[8, 9]

-

Natural Policy Gradient Methods

[9]

-

Prob
.

Reinforcement Learning with Reward Weighted Averaging

[10
, 11]

-

Q
-
Learning on Trajectories

[12]

-

Path Integral Approaches to Reinforcement Learning I

[13]

-

Path Integral Approaches to Reinforc
ement Learning II

-

Dynamic Bayesian Networks for RL

[14]

-

Gaussian Processes in Reinforcement L
earning

[5]

Academic Integrity
:

All students should read, understand and abide by the University Student Co
n-
duct Code:


http://www.usc.edu/dept/publications/SCAMPUS/governance/gov03.html

Students With Disabilities
:

Any student requesting academic accommodations based on a disability is r
e-
quired to regist
er with Disability Services and Programs (DSP) each seme
s
ter. A
letter of verification for approved accommodations can be obtained from DSP.
Please be sure the letter is delivered to me (or to your TA) as early in the seme
s-
ter as possible. DSP is located i
n STU 301 and is open 8:30 a.m.
-

5:00 p.m., Mo
n-
day through Friday. The phone number for DSP is (213) 740
-
0776.


Readings
:

[1]

R. S. Sutton and A. G. Barto,
Reinforcement learning : An introduction
.
Cambridge: MIT Press, 1998.

[2]

P. D
yer and S. R. McReynolds,
The computation and theory of optimal control
.
New York: Academic Press, 1970.

[3]

D. H. Jacobson and D. Q. Mayne,
Differential dynamic programming
. New
York,: American Elsevier Pub. Co., 1970.

[4]

S. Schaal and C. G. Atkeson, "Co
nstructive incremental learning from only
local information,"
Neural Computation,
vol. 10, pp. 2047
-
2084, 1998.

[5]

C. E. Rasmussen and C. K. I. Williams,
Gaussian processes for machine
learning
. Cambridge, Mass.: MIT Press, 2006.

[6]

J. Boyan, "Least
-
squa
res temporal difference learning," in
In Proceedings of
the Sixteenth International Conference on Machine Learning
: Morgan
Kaufmann, 1999, pp. 49
-
56.

[7]

R. J. Williams, "Simple statistical gradient
-
following algorithms for
connectionist reinforcement lear
ning,"
Machine Learning,
vol. 8, pp. 229
-
256, 1992.

[8]

J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy
gradients,"
Neural Netw,
vol. 21, pp. 682
-
97, May 2008.

[9]

J. Peters and S. Schaal, "Natural actor critic,"
Neurocomputing
,
vol. 71, pp.
1180
-
1190, 2008.

[10]

J. Peters and S. Schaal, "Reinforcement learning by reward
-
weighted
regression for operational space control," in
Proceedings of the International
Conference on Machine Learning (ICML2007)

Corvallis, Oregon, June 19
-
21,

2007.

[11]

J. Koeber

and
J. Peters
, "
Learning motor primitives in robotics
," in
Advances in Neural Information Processing Systems 21 (NIPS
2008)
, Vancouver, BC, Dec. 8
-
11, 2009.

[12]

G. Neurmann

and
J. Peters
, "
Fitted Q
-
iteration by advantage
weighted reg
ression
," in
Advances in Neural Information
Processing Systems 21 (NIPS 2008)
, Vancouver, BC, Dec. 8
-
11, 2009.

[13]

E. Theodorou, J. Buchli, and S. Schaal, "Path integral stochastic optimal
control for rigid body dynamics," in
IEEE International Symposium
on
Approximate Dynamic Programming and Reinforcement Learning
(ADPRL2009)

Nashville, Tenesse, March 30
-
Aprl 2, 2009.

[14]

M. Toussaint and A. Storkey, "Probabilistic inference for solving discrete
and continuous state Markov Decision Processes," in
23nd In
ternational
Conference on Machine Learning (ICML 2006)
, 2006.