CS599: Reinforcement Learning and
Learning Control
Instructor: Stefan Schaal
3 Cre
dits
Description
: In a mixture of tutorial and seminar style, this course will introduce
and discuss machine learning methods for learning control, particularly with a
focus on r
o
botics, but also applicable to models of learning in biology and any
other co
ntrol process. The course will cover the basics of reinforcement learning
with value functions (dynamic programming, temporal difference learning, Q

learning). The emphasis, however, will be on learning methods that scale to
complex high dimensional contro
l problems. Thus, we will cover function a
p
proximation met
h
ods for reinforcement learning, policy gradients, probabilistic
reinforcement learning, learning from trajectory trials, optimal control methods,
stochastic o
p
timal control methods, dynamic Bayesia
n networks for learning
control, Gau
s
sian processes for reinforcement learning, etc.
Grading
: 2 Paper Presentations per student (20% per presentation), a Final Pr
o
ject (40%), and Participation in Class (20%).
Prerequisites
:
CS545, CS542 or CS567, or any o
ther graduate level classes that
provided the foundation of machine learning and robotics, or permission by i
n
structor.
Tentative Syllabus
:

Introduction to reinforcement learning
[1]

Dynamic programming methods
[1, 2]

Optimal control methods
[2, 3]

Temporal difference methods
[1]

Q

Learning
[1]

Problems of value

function

based RL methods

Function Approximation for RL
[1]

Incremental Function
Approximation Methods for RL
[4, 5]

Least Squares Methods
[6]

D
irect Policy Learning: REINFORCE
[7]

Modern policy gradient methods: GPOMDP and the Policy Gradient The
o
rem
[8, 9]

Natural Policy Gradient Methods
[9]

Prob
.
Reinforcement Learning with Reward Weighted Averaging
[10
, 11]

Q

Learning on Trajectories
[12]

Path Integral Approaches to Reinforcement Learning I
[13]

Path Integral Approaches to Reinforc
ement Learning II

Dynamic Bayesian Networks for RL
[14]

Gaussian Processes in Reinforcement L
earning
[5]
Academic Integrity
:
All students should read, understand and abide by the University Student Co
n
duct Code:
http://www.usc.edu/dept/publications/SCAMPUS/governance/gov03.html
Students With Disabilities
:
Any student requesting academic accommodations based on a disability is r
e
quired to regist
er with Disability Services and Programs (DSP) each seme
s
ter. A
letter of verification for approved accommodations can be obtained from DSP.
Please be sure the letter is delivered to me (or to your TA) as early in the seme
s
ter as possible. DSP is located i
n STU 301 and is open 8:30 a.m.

5:00 p.m., Mo
n
day through Friday. The phone number for DSP is (213) 740

0776.
Readings
:
[1]
R. S. Sutton and A. G. Barto,
Reinforcement learning : An introduction
.
Cambridge: MIT Press, 1998.
[2]
P. D
yer and S. R. McReynolds,
The computation and theory of optimal control
.
New York: Academic Press, 1970.
[3]
D. H. Jacobson and D. Q. Mayne,
Differential dynamic programming
. New
York,: American Elsevier Pub. Co., 1970.
[4]
S. Schaal and C. G. Atkeson, "Co
nstructive incremental learning from only
local information,"
Neural Computation,
vol. 10, pp. 2047

2084, 1998.
[5]
C. E. Rasmussen and C. K. I. Williams,
Gaussian processes for machine
learning
. Cambridge, Mass.: MIT Press, 2006.
[6]
J. Boyan, "Least

squa
res temporal difference learning," in
In Proceedings of
the Sixteenth International Conference on Machine Learning
: Morgan
Kaufmann, 1999, pp. 49

56.
[7]
R. J. Williams, "Simple statistical gradient

following algorithms for
connectionist reinforcement lear
ning,"
Machine Learning,
vol. 8, pp. 229

256, 1992.
[8]
J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy
gradients,"
Neural Netw,
vol. 21, pp. 682

97, May 2008.
[9]
J. Peters and S. Schaal, "Natural actor critic,"
Neurocomputing
,
vol. 71, pp.
1180

1190, 2008.
[10]
J. Peters and S. Schaal, "Reinforcement learning by reward

weighted
regression for operational space control," in
Proceedings of the International
Conference on Machine Learning (ICML2007)
Corvallis, Oregon, June 19

21,
2007.
[11]
J. Koeber
and
J. Peters
, "
Learning motor primitives in robotics
," in
Advances in Neural Information Processing Systems 21 (NIPS
2008)
, Vancouver, BC, Dec. 8

11, 2009.
[12]
G. Neurmann
and
J. Peters
, "
Fitted Q

iteration by advantage
weighted reg
ression
," in
Advances in Neural Information
Processing Systems 21 (NIPS 2008)
, Vancouver, BC, Dec. 8

11, 2009.
[13]
E. Theodorou, J. Buchli, and S. Schaal, "Path integral stochastic optimal
control for rigid body dynamics," in
IEEE International Symposium
on
Approximate Dynamic Programming and Reinforcement Learning
(ADPRL2009)
Nashville, Tenesse, March 30

Aprl 2, 2009.
[14]
M. Toussaint and A. Storkey, "Probabilistic inference for solving discrete
and continuous state Markov Decision Processes," in
23nd In
ternational
Conference on Machine Learning (ICML 2006)
, 2006.
Comments 0
Log in to post a comment