Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning

kettledoctorΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

79 εμφανίσεις

Multi
-
armed Bandit

Problem and Bayesian
Optimization in Reinforcement Learning

From Cognitive Science and Machine Learning
Summer School 2010


Loris Bazzani

1

Outline Summer School



www.videolectures.net

2

Outline Summer School



www.videolectures.net

3

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case




4

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case




5

What is Machine Learning (ML)?


Endow computers with the ability to “learn” from
“data”


Present data from sensors, the internet,
experiments


Expect computer to make decisions


Traditionally categorized as:


Supervised Learning: classification, regression


Unsupervised Learning: dimensionality reduction,
clustering


Reinforcement Learning: learning from feedback,
planning


From N. Lawrence slides

6

What is Cognitive Science (CogSci)?


How does the mind get so much out of so
little?


Rich models of the world


Make strong generalizations


Process of reverse engineering of the brain


Create computational models of the brain


Much of cognition involves induction: finding
patterns in data

From N. Chater slides

7

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case




8

Link between CogSci and ML


ML takes inspiration from psychology, CogSci and
computer science


Rosenblatt’s Perceptron


Neural Networks





CogSci uses ML as engineering toolkit


Bayesian inference in generative models


Hierarchical probabilistic models


Approximated methods of learning and inference




9

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case



11



12

13

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case



14

Multi
-
armed Bandit Problem

[Auer et al. ‘95]

I wanna win a
lot of cash!

15

Multi
-
armed Bandit Problem

[Auer et al. ‘95]


Trade
-
off between Exploration and
Exploitation


Adversary controls payoffs


No statistical assumptions on the rewards
distribution


Performances measurement: Regret = Player
Reward


Best Reward


Upper Bound on the Expected Regret


16

Multi
-
armed Bandit Problem

[Auer et al. ‘95]

Actions

Sequence of

Trials

Reward(s)

Goal
: define a probability distribution over

17

The Full Information Game

[Freund & Shapire ‘95]

Regret Bound:

Problem
: Compute the reward for each action!

18

The Partial Information Game

Exp3 = Exp
onential
-
weight algorithm for
Exp
loration and
Exp
loitation

Update only the

selected action

Tries out all the

possible actions

Bound for certain values

of and depending

on the best reward

19

The Partial Information Game

Exp3.1

= Exp3 with rounds, where a round consists of a sequence of trials

Each round guesses a
bound for the total
reward of the best action

Bound:

20

Applications [Hedge]

[Bazzani et al. ‘10]

25

Outline Presentation


What are Machine Learning and Cognitive
Science?


How are they related each other?


Reinforcement Learning


Background


Discrete case


Continuous case




26

Bayesian Optimization

[Brochu et al. ‘10]


Optimize a nonlinear function over a set:


Classic Optimization Tools


Bayesian Optimization Tools


Known math representation


Convex


Evaluation of the function on all
the points



Not close
-
form expression


Not convex


Evaluation of the function only
on one point gets noisy response


actions

Function that
gives
rewards

27

Bayesian Optimization

[Brochu et al. ‘10]


Uses the Bayesian Theorem






where

Prior
: our beliefs about the
space of possible objective
functions

Posterior
: our
updated beliefs
about the unknown
objective function

Likelihood
: given what
we think we know about
the prior, how likely is
the data we have seen?

Goal
: maximize the posterior at each step, so that each new
evaluation decreases the distance between the true global
maximum and the expected maximum given the model.

28

Bayesian Optimization

[Brochu et al. ‘10]

29

Priors over Functions


Convergence conditions of BO:


The acquisition function is continuous and
approximately minimizes the risk


Conditional variance converges to zero

Guaranteed by Gaussian Processes (GP)


The objective is continuous


The prior is homogeneous


The optimization is independent of the m
-
th differences

30

Priors over Functions


GP = extension of the multivariate
Gaussian distribution to an infinite
dimension stochastic process


Any finite linear combination of
samples will be normally
distributed


Defined by its mean function and
covariance function


Focus on defining the covariance function

31

Why use GPs?


Assume zero
-
mean GP, function values are drawn according to
, where




When a new observation comes





Using Sherman
-
Morrison
-
Woodbury formula

32

Choice of Covariance Functions


Isotropic model with hyperparameter




Squared Exponential Kernel




Mater Kernel

Gamma function

Bessel function

33

Acquisition Functions


The role of the acquisition function is to guide the
search for the optimum and the uncertainty is great


Assumption
: Optimize the acquisition function is
simple and cheap


Goal
: high acquisition corresponds to potentially high
values of the objective function








Maximizing the probability of improvement

34

Acquisition Functions


Expected improvement







Confidence bound criterion






CDF and PDF of normal distribution

35

Applications [BO]

Learn a set of robot gait parameters that
maximize velocity of a Sony AIBO ERS
-
7 robot

Find a policy for robot path planning that
would minimize uncertainty about its
location and heading

Select the locations of a set of sensors (e.g.,
cameras) in a dynamic system

36

Take
-
home Message


ML and CogSci are connected


Reinforcement Learning is useful for optimization
when dealing with temporal information


Discrete case: Multi
-
armed bandit problem


Continuous case: Bayesian optimization


We can employ these techniques for Computer
Vision and System Control problems


37

http://heli.stanford.edu/

[Abbeel et al. 2007]

38

Some References

P
.

Auer,

N
.

Cesa
-
Bianchi,

Y
.

Freund,

and

R
.

E
.

Schapire
.

1995
.

Gambling

in

a

rigged

casino
:

The

adversarial

multi
-
armed

bandit

problem
.

FOCS

'
95
.

Yoav

Freund

and

Robert

E
.

Schapire
.

1995
.

A

decision
-
theoretic

generalization

of

on
-
line

learning

and

an

application

to

boosting
.

EuroCOLT

'
95
.

Eric

Brochu,

Vlad

Cora

and

Nando

de

Freitas
.

2009
.

A

Tutorial

on

Bayesian

Optimization

of

Expensive

Cost

Functions,

with

Application

to

Active

User

Modeling

and

Hierarchical

Reinforcement

Learning
.

Technical

Report

TR
-
2009
-
023
.

UBC
.

Loris

Bazzani,

Nando

de

Freitas

and

Jo
-
Anne

Ting
.

2010
.

Learning

attentional

mechanisms

for

simultaneous

object

tracking

and

recognition

with

deep

networks
.

NIPS

2010

Deep

Learning

and

Unsupervised

Feature

Learning

Workshop
.

Carl

Edward

Rasmussen

and

Christopher

K
.

I
.

Williams
.

2005
.

Gaussian

Processes

for

Machine

Learning
.

The

MIT

Press
.

Pieter

Abbeel,

Adam

Coates,

Morgan

Quigley,

and

Andrew

Y
.

Ng
.

2007
.

An

Application

of

Reinforcement

Learning

to

Aerobatic

Helicopter

Flight
.

NIPS

2007
.

39