Topology Discovery

donkeycheerfulInternet and Web Development

Aug 7, 2012 (5 years and 5 months ago)

531 views

Improving Systems
Management Policies Using
Hybrid Reinforcement
Learning

Gerry Tesauro <gtesauro@us.ibm.com>

IBM TJ Watson Research Center

Joint work with Rajarshi Das (IBM), Nick Jong (U. Texas)
Mohamed Bennani (George Mason Univ.)

2


Outline: Main points of the talk


Introduction:

Brief Overview of “Autonomic Computing”


Grandiose Motivation: Combining Machine Learning with domain
knowledge in Autonomic Computing


Problem Description



Scenario: Online server allocation in Internet Data Center



Data Center Prototype Implementation


Reinforcement Learning Approach


Quick RL Overview


Prior Online RL Approach


New Hybrid RL Approach


Results/Insights into Hybrid RL outperformance


Fresh results on new application: Power Management



3

Challenges in Systems Management

IBM's Global IP Network
AT&T
Description
REV
SDC North Physical/Logical WAN Connectivity
1/2
SCALE
N/A
SHEET
1
DRAWN
ISSUED
Gregg Machovec SDC North Network Architect
9/13/2001
SDC North
Customer
C:\temp\
SDC North Physical-Logical IP WAN Connectivity.vsd
IBM Global Services
Network Services
FDDI 100Mbps
OSPF 0.0.0.0 Cost 9
Seg BB1
9.32.236.145-158.0
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 610
9.32.236.217-222.0 (.217/.218)
FDDI 100Mbps
OSPF 9.130.0.0 Cost 100
Seg B0E
9.130.104.0
(.12/.9)
HSSI 31Mbps Static Seg E1B 9.32.232.44 (.46/.45)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 611
9.32.236.193-198.0 (.193/.194)
FDDI 100Mbps
OSPF 9.66.0.0 Cost 100
Seg B51
9.66.7.0.
(.3/.7)
HSSI 31Mbps Static Seg EB5 9.32.232.64 (.65/.66)
HSSI 31Mbps Static Seg E5B
9.32.232.60 (.61/.62)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
9.32.236.185-190 (.185/.186)
FDDI 100Mbps
OSPF 9.50.0.0 Cost 8
9.50.123.0
(.4/.2)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
HSSI 31Mbps OSPF 0.0.0.0 Cost 140
Seg E9B 9.32.232.52 (.53/.54)
IBM's Global IP Network
AT&T
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 620
9.32.236.33-38.0 (.33/.34)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 640
9.32.236.40-46.0 (.41/.42)
Frame Realy
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.20 (.22.21)
Frame Relay
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.12 (.13/.14)
Frame Relay HSSI 20Mbps OSPF 0.0.0.0 Cost 100 9.32.232.16 (.18/.17)
Frame Relay HSSI 10Mbps OSPF 0.0.0.0 Cost 100 9.32.232.4 (.6/.5)
Frame Relay
HSSI 20Mbps Cost 100 OSPF 0.0.0.0 9.32.232.8 (.10/.9)
Frame Relay
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.0 (.2/.1)
SOMCSB-2
DD3 (D)
SOMCSB-1
DD2 (C)
SBY002-1
DD6 (C)
SBY002-2
DD7 (D)
POK010-3
DBC (F)
POK918-3
DBB (E)
Token-Ring 16Mbps
OSPF 9.2.0.0 Cost 160
Seg 551
9.32.237.65-78 (.65)
HAW790-1
DD9 (D)
YKT801-1
DD8 (C)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 670
9.32.236.105-110.0 (.105/.106)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 690
9.32.236.128 (.129/.130)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
9.32.236.65-70 (.69/.70)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
9.32.236.48 (.49/.50)
Sync 1.544Mbps OSPF 0.0.0.0
Cost 648 9.32.232.88 (.90/.89)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.92 (.94/.93)
SYNC 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.112 (.113/.114)
SYN 1.544Mbps OSPF 0.0.0.0 Cost 648 9.32.232.108 (.109/.110)
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.32 (.33/.34)
HSSI 31Mbps OSPF 0.0.0.0 Cost 31
9.32.232.40 (.41/.42)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.24 (.25/.26)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.28 (.29/.30)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 631
9.32.236.177-182.0 (.178/.177)
Frame Relay HSSI 12Mbps OSPF 0.0.0.0
Cost 100
9.32.232.76 (.78/.77)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.132 (.134/.133)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.120 (.121/.122)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
SQV257-1
D15 (E)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
SQV014-1
D14 (F)
FDDI 100Mbps
OSPF 9.5.0.0 Cost 1
Seg F03
9.5.101.0
(.24/.25)
.147
.147
FDDI 100Mbps
OSPF 9.117.0.0 Cost 10
Seg BB0
9.117.1.0 (.2/.19)
ATM 155Mbps
OSPF 9.117.220.0 Cost 80
USPOKTR0BC1_IP10
9.117.220.0 (.249/229)
.149
.149
.137
FSH330-3
D90 (E)
BTV963-IGSNS
D52 (F)
BTV863-5
D61 (E)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHRTE25
FD9 (B)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHRTE24
FD8 (A)
POK010-1
DB2 (D)
FSH640-3
DA6 (F)
FDDI 100Mbps
Static 9.38.80-85.0
9.38.80.193
(.219/.218)
RCHSDR-1
DEB (C)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHSDR-2
DEC (D)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Token-Ring 16Mbps
Seg C01
9.242.96-103.0 (.128/.129)
9.242.104-111.0 (.128/.129)
Token-Ring 16Mbps
Seg 099
9.242.64-71.0 (.128/.129)
PAL001-2
DE2 (D)
PAL001-1
DE1 (C)
Token-Ring 16Mbps
Seg C04
9.242.48-55.0 (.127/.128)
9.242.80-87.0 (.127/.128)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.104 (.106/.105)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
STF001-2
DDD (D)
Token-Ring 16Mbps
Seg DF3,DF1 Armonk
9.242.144-151.0 (.128/.129)
9.242.152-159.0 (.128/.129)
Seg BB0 North Castle
9.38.32.97-110.0 (.100/.105)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
STF001-1
DDC (C)
Token-Ring 16Mbps
LIG 32.225.9.0
204.146.137-142.0 (.141)
Token-Ring 16Mbps
LIG 32.226..113.0,
32.226.175.0
32.96.121.49-54.0 (.53)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ARM001-4
D01 (D)
ARM001-3
D00 (C)
44S001-2
DD5 (D)
44S001-1
DD4 (C)
ATM 155Mbps
OSPF 9.2.0.0 Cost 80
9.32.237.33.-46.0 (.33)
ATM 155Mbps
OSPF 9.2.0.0 Cost 80
9.32.237.17-30.0 (.17)
Token-Ring 16Mbps
OSPF 9.2.0.0 Cost 160
Seg 396
9.32.237.49-62.0 (.49)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.36 (.38/.37)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Somers
Campus Network
Bethesda
Sync 1.544Mbps
OSPF 0.0.0.0
Cost 648
9.32.232.116
(.118/.117)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Southbury Campus
Network
Frame Relay MPN New York City HSSI 20Mbps BGP-4
199.4.213.116 (.117/.118)
Frame Relay MPN Raleigh HSSI 10Mbps BGP-4
9.32.152.100 (.101/.102)
Frame Relay FMPN Raleigh Sync 1.544Mbps Static
9.32.232.80 (.82/.81)
Frame RelayMPN Bethesda HSSI 10Mpbs BGP-4
9.32.74.208 (.209/.210)
Frame Relay MPN Dallas HSSI 10Mbps BGP-4
9.32.105.20 (.21/.22)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
.148
.148
.146
.146
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Token-Ring 16Mbps
LIG 32.224.10.0
204.146.252.249-254.0(.133)
Sync 1.544Mbps Static Seg E59
9.66.123.0 (.1/.2)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Sync 3Mbps OSPF 9.66.0.0 Cost 220 Seg E52
9.66.124.0 (.1/.2) BTV617-2
Sync 3Mbps OSPF 9.66.0.0 Cost 220 Seg E51
9.66.125.0 (.1/.2) BTV617-1
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
44 South
Broadway
Poughkeepsie
NY
Endicott NY
Rochester MN
Burlington VT
Fishkill NY
Southbury CT
Palisades NJ
Sterling Forest
NY
Armonk NY
Rochester NY
Hawthorne NY
Yorktown NY
Somers NY
HSSI 12Mbps Static Seg E6B
9.117.70.250 (.251/.252)
Frame Relay
HSSI 1.54 Mbs 9.32.232.84 (.86/.85)
SNA Only Segment E7B
HSSI 31Mbps
Static Seg E2B 9.32.232.48 (.50/.49)
ATM 155Mbps
OSPF 0.0.0.0 Cost 8
USPOKTR0BC3_GW10
9.32.237.145-158.0
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ATM OC3 MM 155,000 PVC 0/40 9.32.42.88 (.90/.89)
FDDI 100Mbps
BGP-4
9.32.236.136 (.139/.138)
POK918-1
DBD (B)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ATM OC3 MM 155,000 PVC 0/41 9.32.42.84 (.86/.85)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
HSSI 31Mbps OSPF 0.0.0.0 Cost 140 Seg EAB 9.32.232.56 (.57/.58)
Morning rebooting
p-patterns
Weekly rebooting
p-patterns
Outages
Event bursts
Excessive DM
events
Hosts

Large
-
scale, heterogeneous
distributed systems with
highly dynamic, complex multi
-
component interactions



Large volumes of real
-
time
high
-
dimensional data, but also
lots of missing information and
uncertainty


Too much complexity,
too few (skilled) administrators


Need for

“self
-
managing”
systems


autonomic computing

4

What is Autonomic Computing?


“Computing systems that manage themselves in
accordance with high
-
level objectives from humans”

Kephart and Chess
, A Vision of Autonomic Computing
, IEEE Computer, 2003


“Self
-
management” capabilities include


Self
-
Configuration
:

Automated configuration of components, systems
according to high
-
level policies; rest of system adjusts seamlessly.


Self
-
Healing:

Automated detection, diagnosis, and repair of localized
software/hardware problems.


Self
-
Optimization:

Automatic and continual adaptive tuning of hundreds of
parameters (database params, server params,…) affecting performance &
efficiency


Self
-
Protection:

Automated defense against malicious attacks or cascading
failures; use early warning to anticipate and prevent system
-
wide failures.



Good application domain for ML: rich opportunities, little previously done

5

A “Knowledge Bottleneck” in Autonomic Computing

Managed Element

E

S

Monitor

Analyze

Execute

Plan

Knowledge

Autonomic Manager

E

S

6

Machine Learning to the Rescue


Can avoid knowledge bottleneck: automatically
extract knowledge from observations of data



Examples:


Supervised Learning: Input


Predicted Output
(classification, regression)


Unsupervised Learning: Input


Structure among
input variables (clustering, data mining)


Reinforcement Learning:

Learns
behavioral
policies
:
State


Action

7

Will ML Without Built
-
In Knowledge Work?

Managed Element

E

S

Monitor

Analyze

Execute

Plan

Tabula Rasa

ML

Autonomic Manager

E

S

Tabula Rasa

= “blank slate” (Latin)

8

A Hybrid Approach Combining Knowledge + ML


Initial Knowledge


Behavioral Data


ML


Improved Knowledge



Several advantages:


No direct interface between ML and Initial
Knowledge; don’t engineer knowledge into ML


Initial knowledge can be virtually anything:


very simple (e.g. crude heuristic)


highly sophisticated (multi
-
tier closed queuing network)


could even be human behavior


Can do multiple iterations to keep improving


9


Outline: Main points of the talk


Introduction:


Problem Description



Scenario: Online server allocation in Internet Data Center



Data Center Prototype Implementation


Reinforcement Learning Approach





Results


Insights into Hybrid RL outperformance


Wrapup




10

Application: Allocating Server Resources in a Data Center

Scenario
: Data center serving multiple customers, each running high
-
volume web apps with independent time
-
varying workloads

Macy’s Online

Shopping

Application

Manager

Servers

Servers

Servers

DB2

Router

E
-
Trade: online trading

Application

Manager

Servers

Servers

Servers

DB2

Router

Citibank: online banking

Application

Manager

Servers

Servers

Servers

DB2

Router

SLA $$

SLA $$

SLA $$

Resource

Arbiter

Data Center

Maximize business
value across all

customers

11


Problem Description



Scenario: Online server allocation in Internet Data Center



Data Center Prototype Implementation:


Real servers: Linux cluster (X series machines)


Realistic Web
-
based workload:
Trade3

(online trading emulation)


Runs on top of WebSphere and DB2


Realistic time
-
varying demand generation:


Open
-
loop scenario: Poisson HTTP requests; vary mean arrival rate



Closed
-
loop scenario: Finite number of customers
M

with fixed think time
distribution;
M

varies with time


Use Squillante
-
Yao
-
Zhang time
-
series model to vary
M

or


above




12

Data Center Prototype: Experimental setup

8 xSeries servers

Value(#srvrs)

Trade3

App

Manager

Value(RT)

Resource

Arbiter

Batch

App

Manager

Trade3

Server

Server

Server

Server

Server

Server

Server

Server

Value(#srvrs)

Value(#srvrs)

Demand

(HTTP req/sec)

WebSphere 5.1

DB2

App

Manager

WebSphere 5.1

DB2

Value(#srvrs)

Maximize
Total SLA
Revenue

5 sec

Value(RT)

Demand

(HTTP req/sec)

SLA

SLA

SLA

13

Standard Approach: Queuing Models



Design an appropriate model of flows and queues (arrival process/
routing discipline/service process etc.) in system


Estimate model parameters offline or online


Model estimates Value(numServers) by estimating (asymptotic)
performance changes due to changes in numServers


Has worked well in many deployed systems



Two main limitations:


Model design is difficult and knowledge
-
intensive


Model assumptions don

t exactly match real system


Real systems have complex dynamics; standard models assume steady
-
state behavior



Two prospective benefits of machine learning approach:


Avoid knowledge bottleneck


Decisions can reflect dynamic consequences of actions


e.g. properly handle transients and switching delays


14


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach


Quick RL Overview




Results


Insights into Hybrid RL outperformance


Wrapup




15

Reinforcement Learning (RL) approach


Action

Reward

State

Alg?

App 1

Value(RT)

# servers

Monitored
data streams

RL

System

16

Reinforcement Learning: 1
-
slide Tutorial


A
learning agent

interacts with the
environment


Observes current state

s

of the environment


Takes an action
a


Receives an (immediate) scalar reward
r



Agent learns a

long
-
range value function
V(s,a)





estimating

cumulative future reward:



We use a standard RL algorithm
“Sarsa”
: learns state
-
action value function





By design RL does “trial
-
and
-
error” learning
without model of environment


Naturally handles long
-
range dynamic consequences of actions (e.g., transients,
switching delays)


Solid theoretical grounding for MDPs; recent practical success stories



System

Agent

Action

Reward

State






0
1
t
t
k
r
R





)
,
(
)
'
,
'
(
)
,
(
a
s
V
a
s
V
r
a
s
V






17


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach


Quick RL Overview


Online RL Approach




Results


Insights into Hybrid RL outperformance


Wrapup




18

Will ML Without Built
-
In Knowledge Work?

Managed Element

E

S

Monitor

Analyze

Execute

Plan

Tabula Rasa

ML

Autonomic Manager

E

S

Tabula Rasa

= “blank slate” (Latin)

19

Application: Allocating Server Resources in a Data Center

Scenario
: Data center serving multiple customers, each running high
-
volume web apps with independent time
-
varying workloads

Macy’s Online

Shopping

Application

Manager

Servers

Servers

Servers

DB2

Router

E
-
Trade: online trading

Application

Manager

Servers

Servers

Servers

DB2

Router

Citibank: online banking

Application

Manager

Servers

Servers

Servers

DB2

Router

SLA $$

SLA $$

SLA $$

Resource

Arbiter

Data Center

Maximize business
value across all

customers

20

Assumptions Behind RL Formulation

Macy’s Online

Shopping

Application

Manager

Servers

Servers

Servers

DB2

Router

E
-
Trade: online trading

Application

Manager

Servers

Servers

Servers

DB2

Router

Citibank: online banking

Application

Manager

Servers

Servers

Servers

DB2

Router

SLA $$

SLA $$

SLA $$

Resource

Arbiter


Each application has local state; unaffected by other apps


Each app. has local state transitions and local rewards, depending only on
local state and local resource




Collection of separate local MDPs, but global decision maker wants to
maximize sum of local rewards


21

Global RL versus Local RL


One approach
: Make the Resource Arbiter a global Q
-
Learner


Advantages:


Arbiter’s problem is a true MDP


Can rely on convergence guarantee


Main Disadvantage:


Arbiter’s state space is huge: cross product of all local state spaces




Serious curse
-
of
-
dimensionality if many applications



Alternative Approach: Local RL


Each application does local Sarsa(0) based on local state, local
provisioning, and local reward


learns local value function


Each application conveys current V(resource) estimates to arbiter


Arbiter then acts to maximize sum of current value functions


Local learning should be much easier than global learning; but


No longer have a convergence guarantee


Related work: Russell & Zimdars, ICML
-
03. (local rewards only)


22

Online RL in Trade3 Application Manager (AAAI 2005)

Application Environment

TRADE3
App Mgr

SLA

(RT)


Response

Time

V(n)

U

RL

Demand


Servers

V(

Ⱐ温


Observed state = current demand


only


Arbiter action = # servers provided (
n
)


Instantaneous reward
U

= SLA payment


Learns long
-
range expected value
function
V(state,action) = V(


n
)

(two
-
dimensional lookup table)


Data Center results:


good asymptotic performance, but



poor performance during long
training period


method scales poorly with state
space size



Resource

Arbiter

Server

Server

Server

23

Amazingly Enough, RL Works! :
-
)

Results of overnight training (~25k RL updates = 16 hours real time) with random initial condition

24

Comparison of Performance: 2 Application Environments

25

3 Application Environments: Performance

26


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach


Quick RL Overview


Online RL Approach


Hybrid RL Approach

(Tesauro et al., ICAC 2006)




Results


Insights into Hybrid RL outperformance


Wrapup




27

RL

RL

RL

System

MBP

Action

Reward

State




Run RL offline on data from initial policy


Bellman Policy Improvement Theorem (1957)


V(state,action)

defines a new policy
guaranteed better than original policy


Combines best aspects of both RL and
model
-
based (e.g. queuing) methods


Very general method that automatically
improves
any

existing systems management
policy

In Data Center prototype:




Implement best queuing
models within each Trade3 mgr



Log system data in overnight
run (~12
-
20 hrs)



Train RL on log data (~2 cpu
hrs)


new value functions



Replace queuing models by RL
value functions and rerun
experiment


Hybrid Reinforcement Learning Illustrated


28

Two key ingredients of Trade3 implementation



1. “Delay
-
Aware” State Representation:


Include previous allocation decision as part of current
state


V = V(

t

, n
t
-
1

, n
t

)


Can learn to properly evaluate switching delay (provided
that delay < allocation interval)


e.g. can distinguish V(

, 2, 3) from V(

, 3, 3)


delay need not be directly observable: RL only observes
delayed reward


Also handles transient suboptimal performance



2. Nonlinear Function Approximation

(Neural Nets)


Generalizes

across states and actions


Obviates visiting every state in space


Greatly reduces need for “exploratory” actions


Much better scaling

to larger state spaces


From 2
-
3 state variables to 20
-
30, potentially


But:

lose guaranteed optimality


29


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach




Results




Insights into Hybrid RL outperformance


Wrapup




30

Results: Open Loop, No Switching Delay

+2.6% Trade3 RT

+12.7% Batch thrput

-
0.4% Trade3 RT

+38.9% Batch thrput

+73% Trade3 RT

+221% Batch thrput

31

Results: Closed Loop, No Switching Delay

32

Results: Effects of Switching Delay

33


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach



Results



Insights into Hybrid RL outperformance



Wrapup




34

Insights into Hybrid RL outperformance



1. Less biased estimation errors


Queuing model predicts indirectly: RT


SLA(RT)


V


Nonlinear SLA induces overprovisioning bias


RL estimates utility directly


less biased estimate of V




2. RL handles transients and switching delays



Steady
-
state queuing models cannot



3. RL learns to avoid thrashing





35

Policy Hysteresis in Learned Value Function

Stable joint allocations (T1, T2, Batch) at fixed

2





36


Hybrid RL learns not to thrash

Closed Loop Demand: #Customers in T1 & T2

Allocation Delay 4.5s


Queuing Model Servers(T2)


Queuing Model Servers(T1)

Hybrid RL Servers(T1)

Hybrid RL Servers(T2)

T1

T2

37

<

n>

Experiment


Hybrid RL does less swapping than QM

0.578

0.464

0.581

0.269

0.654

0.486

0.736

0.331

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

QM

RL

QM

RL

QM

RL

QM

RL

Delay=0

Delay=4.5

Delay=0

Delay=4.5

Open

Open

Closed

Closed

38


Outline: Main points of the talk


Introduction


Problem Description


Reinforcement Learning Approach



Results


Insights into Hybrid RL outperformance


Power Management


(Kephart et al., ICAC 2007)



39

Stock

Trading

Prioritization and

Flow Control

Routing and

Load Balancing

Classification

Computing
Resources

WebSphere On Demand Router

WebSphere XD

Controllers

Account

Mgmt

Financial

Advice

High

Importance

Medium

Importance

Low

Importance

AM

ST

Node

2

FA

ST

Node

3

Node

4

Node

1

Placement

Executions

Stock

Trading

Account

Mgmt

Financial

Advice

Placement

Decisions

WebSphere XD

Performance Manager

AM

FA

ST

FA

ST

Load
balancing
parameters

U(RT)

Power Executive


Control CPU
speeds

IBM Director

Manipulate power controls dynamically

Power and Performance Management

{U(RT)


C(Pwr)}

40

Architecture Overview


ICAC 2007, to appear

(IBM Director)

2007 Tivoli/AC Joint Program

© 2007 IBM Corporation

IBM Software Group | Tivoli software

41

Experiment with hand
-
tuned policy

No power management

Power management, using Hand
-
tuned Policy

Avg power = 96.6 watts (
savings: 11.3 watts = 10.5%
)

Avg power = 107.9 watts

Workload intensity

CPU

Power

Response time

Workload intensity

CPU

Power

Response time

Time

Time

42

Hybrid RL Results


Learn V=V(s,a) state s uses single input variable (numClients)


Both response time performance and power consumption comparable to
hand
-
crafted policy



43


Hybrid RL results (15 input variables)

Avg power = 98.3 watts (
savings = 8.9%
)

SLA violations = 1.5% vs 21%

44

Conclusions



Hybrid RL works quite well for server allocation


combines disparate strengths of RL and queuing models


exploits domain knowledge built into queuing model


but doesn’t need access to knowledge: only uses externally observable
behavior of queuing model policy



Initial promising results in power management


suggests a basic 2
-
d value function
V(load_intensity, resource_knob)

may be generally useful and easy to learn



Potential for wide usage of Hybrid RL in systems management


managing other resource types: memory, storage, VMs etc.


manage control params: OS/DB params etc.


simultaneous management of multiple criteria: performance/utilization,
performance/availability etc.








45

For further info/reading material


Papers:


“Online Resource Allocation using Decompositional Reinforcement
Learning,” G. Tesauro, Proc. of AAAI
-
05.


“A Hybrid Reinforcement Learning Approach to Autonomic Computing”
G. Tesauro et al., Proc. of ICAC
-
06.


“Coordinating Multiple Autonomic Managers to Achieve Specified Power
-
Performance Tradeoffs,” J. Kephart et al., Proc. of ICAC
-
07.


More info about R & D in Autonomic Computing:


Our work:
www.research.ibm.com/nedar


AC toolkit (Autonomic Manager ToolSet):

AMTS v1.0 available as
part of Emerging Technologies Toolkit v1.1 on IBM alphaWorks:
www.alphaworks.com


IBM:
www.research.ibm.com/autonomic


Intl. Conf. on Autonomic Computing (ICAC
-
07):
www.autonomic
-
conference.org


Summer internships: email me:
gtesauro@us.ibm.com


Thanks! Any questions??


46

The End

47

IBM's Global IP Network
AT&T
Description
REV
SDC North Physical/Logical WAN Connectivity
1/2
SCALE
N/A
SHEET
1
DRAWN
ISSUED
Gregg Machovec SDC North Network Architect
9/13/2001
SDC North
Customer
C:\temp\
SDC North Physical-Logical IP WAN Connectivity.vsd
IBM Global Services
Network Services
FDDI 100Mbps
OSPF 0.0.0.0 Cost 9
Seg BB1
9.32.236.145-158.0
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 610
9.32.236.217-222.0 (.217/.218)
FDDI 100Mbps
OSPF 9.130.0.0 Cost 100
Seg B0E
9.130.104.0
(.12/.9)
HSSI 31Mbps Static Seg E1B 9.32.232.44 (.46/.45)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 611
9.32.236.193-198.0 (.193/.194)
FDDI 100Mbps
OSPF 9.66.0.0 Cost 100
Seg B51
9.66.7.0.
(.3/.7)
HSSI 31Mbps Static Seg EB5 9.32.232.64 (.65/.66)
HSSI 31Mbps Static Seg E5B
9.32.232.60 (.61/.62)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
9.32.236.185-190 (.185/.186)
FDDI 100Mbps
OSPF 9.50.0.0 Cost 8
9.50.123.0
(.4/.2)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
HSSI 31Mbps OSPF 0.0.0.0 Cost 140
Seg E9B 9.32.232.52 (.53/.54)
IBM's Global IP Network
AT&T
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 620
9.32.236.33-38.0 (.33/.34)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
Seg 640
9.32.236.40-46.0 (.41/.42)
Frame Realy
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.20 (.22.21)
Frame Relay
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.12 (.13/.14)
Frame Relay HSSI 20Mbps OSPF 0.0.0.0 Cost 100 9.32.232.16 (.18/.17)
Frame Relay HSSI 10Mbps OSPF 0.0.0.0 Cost 100 9.32.232.4 (.6/.5)
Frame Relay
HSSI 20Mbps Cost 100 OSPF 0.0.0.0 9.32.232.8 (.10/.9)
Frame Relay
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.0 (.2/.1)
SOMCSB-2
DD3 (D)
SOMCSB-1
DD2 (C)
SBY002-1
DD6 (C)
SBY002-2
DD7 (D)
POK010-3
DBC (F)
POK918-3
DBB (E)
Token-Ring 16Mbps
OSPF 9.2.0.0 Cost 160
Seg 551
9.32.237.65-78 (.65)
HAW790-1
DD9 (D)
YKT801-1
DD8 (C)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 670
9.32.236.105-110.0 (.105/.106)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 690
9.32.236.128 (.129/.130)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
9.32.236.65-70 (.69/.70)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 160
9.32.236.48 (.49/.50)
Sync 1.544Mbps OSPF 0.0.0.0
Cost 648 9.32.232.88 (.90/.89)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.92 (.94/.93)
SYNC 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.112 (.113/.114)
SYN 1.544Mbps OSPF 0.0.0.0 Cost 648 9.32.232.108 (.109/.110)
HSSI 10Mbps OSPF 0.0.0.0 Cost 100
9.32.232.32 (.33/.34)
HSSI 31Mbps OSPF 0.0.0.0 Cost 31
9.32.232.40 (.41/.42)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.24 (.25/.26)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.28 (.29/.30)
Token-Ring 16Mbps
OSPF 0.0.0.0 Cost 63
Seg 631
9.32.236.177-182.0 (.178/.177)
Frame Relay HSSI 12Mbps OSPF 0.0.0.0
Cost 100
9.32.232.76 (.78/.77)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.132 (.134/.133)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.120 (.121/.122)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
SQV257-1
D15 (E)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
SQV014-1
D14 (F)
FDDI 100Mbps
OSPF 9.5.0.0 Cost 1
Seg F03
9.5.101.0
(.24/.25)
.147
.147
FDDI 100Mbps
OSPF 9.117.0.0 Cost 10
Seg BB0
9.117.1.0 (.2/.19)
ATM 155Mbps
OSPF 9.117.220.0 Cost 80
USPOKTR0BC1_IP10
9.117.220.0 (.249/229)
.149
.149
.137
FSH330-3
D90 (E)
BTV963-IGSNS
D52 (F)
BTV863-5
D61 (E)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHRTE25
FD9 (B)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHRTE24
FD8 (A)
POK010-1
DB2 (D)
FSH640-3
DA6 (F)
FDDI 100Mbps
Static 9.38.80-85.0
9.38.80.193
(.219/.218)
RCHSDR-1
DEB (C)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
RCHSDR-2
DEC (D)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Token-Ring 16Mbps
Seg C01
9.242.96-103.0 (.128/.129)
9.242.104-111.0 (.128/.129)
Token-Ring 16Mbps
Seg 099
9.242.64-71.0 (.128/.129)
PAL001-2
DE2 (D)
PAL001-1
DE1 (C)
Token-Ring 16Mbps
Seg C04
9.242.48-55.0 (.127/.128)
9.242.80-87.0 (.127/.128)
HSSI 12Mbps OSPF 0.0.0.0 Cost 83
9.32.232.104 (.106/.105)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
STF001-2
DDD (D)
Token-Ring 16Mbps
Seg DF3,DF1 Armonk
9.242.144-151.0 (.128/.129)
9.242.152-159.0 (.128/.129)
Seg BB0 North Castle
9.38.32.97-110.0 (.100/.105)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
STF001-1
DDC (C)
Token-Ring 16Mbps
LIG 32.225.9.0
204.146.137-142.0 (.141)
Token-Ring 16Mbps
LIG 32.226..113.0,
32.226.175.0
32.96.121.49-54.0 (.53)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ARM001-4
D01 (D)
ARM001-3
D00 (C)
44S001-2
DD5 (D)
44S001-1
DD4 (C)
ATM 155Mbps
OSPF 9.2.0.0 Cost 80
9.32.237.33.-46.0 (.33)
ATM 155Mbps
OSPF 9.2.0.0 Cost 80
9.32.237.17-30.0 (.17)
Token-Ring 16Mbps
OSPF 9.2.0.0 Cost 160
Seg 396
9.32.237.49-62.0 (.49)
Sync 1.544Mbps OSPF 0.0.0.0 Cost 648
9.32.232.36 (.38/.37)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Somers
Campus Network
Bethesda
Sync 1.544Mbps
OSPF 0.0.0.0
Cost 648
9.32.232.116
(.118/.117)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Southbury Campus
Network
Frame Relay MPN New York City HSSI 20Mbps BGP-4
199.4.213.116 (.117/.118)
Frame Relay MPN Raleigh HSSI 10Mbps BGP-4
9.32.152.100 (.101/.102)
Frame Relay FMPN Raleigh Sync 1.544Mbps Static
9.32.232.80 (.82/.81)
Frame RelayMPN Bethesda HSSI 10Mpbs BGP-4
9.32.74.208 (.209/.210)
Frame Relay MPN Dallas HSSI 10Mbps BGP-4
9.32.105.20 (.21/.22)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
.148
.148
.146
.146
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Token-Ring 16Mbps
LIG 32.224.10.0
204.146.252.249-254.0(.133)
Sync 1.544Mbps Static Seg E59
9.66.123.0 (.1/.2)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
Sync 3Mbps OSPF 9.66.0.0 Cost 220 Seg E52
9.66.124.0 (.1/.2) BTV617-2
Sync 3Mbps OSPF 9.66.0.0 Cost 220 Seg E51
9.66.125.0 (.1/.2) BTV617-1
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
44 South
Broadway
Poughkeepsie
NY
Endicott NY
Rochester MN
Burlington VT
Fishkill NY
Southbury CT
Palisades NJ
Sterling Forest
NY
Armonk NY
Rochester NY
Hawthorne NY
Yorktown NY
Somers NY
HSSI 12Mbps Static Seg E6B
9.117.70.250 (.251/.252)
Frame Relay
HSSI 1.54 Mbs 9.32.232.84 (.86/.85)
SNA Only Segment E7B
HSSI 31Mbps
Static Seg E2B 9.32.232.48 (.50/.49)
ATM 155Mbps
OSPF 0.0.0.0 Cost 8
USPOKTR0BC3_GW10
9.32.237.145-158.0
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ATM OC3 MM 155,000 PVC 0/40 9.32.42.88 (.90/.89)
FDDI 100Mbps
BGP-4
9.32.236.136 (.139/.138)
POK918-1
DBD (B)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
ATM OC3 MM 155,000 PVC 0/41 9.32.42.84 (.86/.85)
S
D
R
E
S
E
T
P
O
W
E
R
R
U
N
B
O
O
T
D
I
A
G
B
A
C
K
B
O
N
E
N
O
D
E
B
a
y

N
e
t
w
o
r
k
s
HSSI 31Mbps OSPF 0.0.0.0 Cost 140 Seg EAB 9.32.232.56 (.57/.58)
48

Evolution of Computing