CMPS242 Project Report
Alastair Fyfe
Using Two

Person
,
Zero

Sum
,
Games
To Investigate On

line
Learning Algorithms.
Introduction
This project report describes two investigations only partly related to one another. The
first two sections focus on a game

theoretic framework intr
oduced by Freund and
Schapire [3],[4
] as an alternative view of o
n

line
learning and boost
ing algorithms. The
last section explores application of randomized algorithms
to the disk spin

down problem
[2
].
One of the striking features of this quarter’s review of machine learning algorithms is the
variety of alternative techniques available for s
olving classification/prediction problems.
The relative merits of competing algorithms are commonly assessed by (a) comparison of
demonstrable error

bounds
and (b) performance on real

world
data sets. Both approaches
are important but have drawbacks. Provi
ng an upper bound on error rate does not give
any
infor
mation about
how far below this rate an algorithm will perform on actual data.
Observing a difference in erro
r rate on a particular real

world
data set does not
necessarily shed light on its cause: oft
en it is hard to explain why one algorithm
outperforms another. A third approach involves the construction of small, synthetic
data
sets
which
can
maximize the error difference between competing algorithms. Careful
analysis of how algorithms interact with suc
h data sets may help to illuminate their
differences.
The first section of this report describes a framework introduced by Freund and Schapire
for repeatedly playing rounds of a two

person zero

sum game
. Both the weighted

majority
and boosting algorit
hms can be transformed into this framework. Because the
minmax solution of a two

person zero

sum game can be computed via linear
programming, this framework also establishes a link between on

line learning and
approximate solutions of linear programming p
roblems. The discussion in Freund and
Schapire's papers is focused on the use of this framework
as a novel
device for
investigating the theoretical properties of algorithms. In this project I wanted to
investigate whether game matrices
could be also used
as data sets to characterize
differences between algorithms and parameter settings. The results are given in section
two.
The third section considers a different problem. Randomized algorithms can be used in
on

line prediction problem
s such as
in
deciding whether to block or
spin a t
hread while
waiting for a lock[1
] or whether to shut down or keep spinning a d
isk drive once it
becomes idle[2]. In [1
] Karlin,
et al
showed that any randomized algorithms cannot
achieve a cost ratio bett
er than e/(e

1) ~1.58 over the optimal, omniscient, algorithm and
also provided a simple randomized algorithm with a continuous probability density that
achieved this bound. Interestingly, the bound can
b
e closely approximated by using a
multinomial over
a small number of fixed values. The technique for choosing coefficients
for this multinomial is discussed for both cost ratio and cost difference comparisons.
I) The Game

Theoretic Framework for Online

Learning and Boosting.
In [3] and [4
] Freund and S
chapire set out a connection between on line learning
algorithms, game theory and boosting. The crux of their idea is that both on

line learning
and boosting can be cast in the context of learning to play repeated rounds of a non

cooperative two

person ga
me. Such games have well characterized properties including
an optimal minmax solution which can be computed by linear programming.
I
n the framework introduced in [3
], games are played as follows. The game is
characterized by a payoff matrix
M
with entri
es in [0,1] . The row player, sometimes
referred to as the
learner
, does not have prior knowledge of
M
and plays a mixed strategy
P
over the rows of
M .
The column player, sometimes referred to as the
environment,
knows
M
, knows the learner's current mi
xed strategy and plays a mixed strategy
Q.
The
cost to the learner of playing
P
when the environment plays
Q
is
P
(
i
)
M
(
i,j
)
Q
(
j
) =
P
T
MQ
which is also represented as
M(P,Q)
.
If the environment chooses a fixed strategy j, the cost to
the learner is
P
(i)
M
(i,j)
which is also represented as
M
(
P
,j). The learner maintains weights on the rows of
M
which are used to calculate
P
. He starts by initializing all weights to 1. Play then proceeds
in a sequence of rounds t=1..T
as follows :
1.
the learner selects a mixed strategy
P
t
computed as
P
t
(i) = w
t
(i)/weight_sum
2.
the environment select a mixed column strategy
Q
t
3.
the learner is told
M
(i,
Q
t)
=
M
(i,j)
Q
(j) for all i. This is the expected loss it
would hav
e incurred had it chosen to only play row i in the face of the environment's
current mixed strategy
Q
t
. Interestingly, the learner is never told the actual contents of
the game matrix, only the exp
e
cted loss that results from application of the opponent's
strategy.
4.
the learner's total loss i
n incremented by
M(P
t
,
Q
t
)
5.
the learner calculates new weights with a simple multiplicative update:
w
t+1
(i)=w
t
(i)β
M
(i,
Q
t)
where β is a given parameter.
Freund and Schapire go on to show that t
he loss bounds established in [6
] for the
Weighted Majority algorithm transfer to this framework. By use of this algorithm, the
learner's expected loss can be brought arbit
rarily close to the best loss that would have
been realized if the learner had known the environment's strategies
Q
t
for t=1..T.
The standard framework for on

line learning can be reduced to this variant of repeated
game playing with the following exten
sions. Along with
M
we are now given an instance
space
X
, a hypothesis space
H
and a target concept c.
M
is taken to have 
H
 rows and 
X

columns. The environment plays by selecting an instance x
t
from
X
. The row player
randomly selects a row i of
M
accor
ding to
P
t
and predicts with h
i
(x
t
) incurring loss
M
(i,x
t
). The weight update rule is as given above. Loss bounds for online

learning can be
calculated using this reduction and, not surprisingly, they are the same as those originally
obtained for WM.
II)
Results
The game matrix for the game "Rock, Paper, Scissors" expressed with entries in [0,1] is
1/2
1
0
0
1/2
1
1
0
1/2
the minmax strategies for the row and column players are
P* = Q* = [ 1/3, 1/3,1/3] and
the expected loss to the learner, that is
the value of the game is 1/2. In this form, the game
provides no incentive for the row player to adopt a particular strategy. If the column
player plays Q*, then MQ* is [1/2,1/2,1/2] and regardless of how the row player selects
p1, p2 and p3 P
T
MQ* is p1/2+
p2/2+p3/2 = 1/2(p1+p2+p3) = 1/2.
To obtain a game that motivates the row player towards a particular strategy, I used
asymmetric game matrices of the form:
1/2
1
2*x
0
1/2
1
1

x
0
1/2

2*x/3
For
values of x in (0,1/2]
the row player's minmax strategy
and game value are shown
below.
x
p1
p2
p3
value*1000
1/2
0
2/3
1/2
667
1/4
11/30
2/15
1/2
567
1/6
5/14
3/14
3/7
536
1/8
58/165
41/165
2/5
524
1/10
95/273
73/273
5/13
518
1/12
47/136
19/68
3/8
514
1/14
98/285
82/285
7/19
512
1/16
260/759
223/759
4/11
510
1/18
111/325
97/325
9/25
509
1/20
415/1218
184/609
5/14
508
The minmax strategies and game value were calculated with the maple "simplex"
package. Simulations were run by having both players start with a uniform probability
distribution then i
teratively update their weights. I had initially wanted to have the
column player only play its maxmin strategy so that the effect of the learner's updates
could be tracked more closely, but had difficulty relating the dual solution to the column
player's
probabilities.
Convergence of the observed loss to the game value was quick. The graph below shows
the difference between the average observed loss and game value of .567 as a function of
the Beta parameter for the game with x=1/4. Results for other valu
es of X were similar.
The red and green circles correspond to 10 and 50 rounds of the game respectively.
Convergence of the probability distributions to the maxmin solutions was more of a
problem. The relative entropy, D(pq), where p is the final
mixture calculated by a player
after n rounds of the game and q is the minmax solution calculated by linear
programming, should
also
vary monotonically as a function of Beta. However I was not
able to confirm this with the results obtained.
III) Computing M
ixture Coefficients for a Randomized Algorithm in Disk Spin

Down
This section investigates the use of randomized algorithms f
or the disk spin

down
problem [2
]. The following notation is used: i is the idle time, x is the selected time out, s
is the spin

down cost. The cost incurred by algorithm A if it selects a time

out of x when
the observed idle time is i is:
cost
A
(x,i) =
i
if i <= x
x+s
if i> x
A hypothetical algorithm that could look ahead and know i before choosing
x
could
minimize its cost
by choosing a time

out of 0 when i exceeds s and some x>i when i is
less than or equal to s. It's cost would then be:
opt(i) =
i
if i <=s
s
if i > s
The quality of an algorithm can be assessed by considering either the ratio cost
A
(i)/opt(i)
or the di
fference cost
A
(i)

opt(i). For example , the algorithm that simply sets x=s will
never incur a cost greater than twice that of the optimal algorithm. For values of i<s it's
cost will be the same as opt(i). For values of i>=s it's cost will be exactly twi
ce opt(i).
Various classes of algorithms for choosing x have been studied. One class of algorithms
maintains a weighted collection of sub

algorithms, or "experts", each of which nominates
a single time

out. The master algorithm focuses on updating the wei
ghts of the experts to
reflect past performance. An alternative class, called randomized algorithms, chooses the
timeout at random from some probability distribution.
Karlin
et al
[1
] considered the problem of how long a thread should spin waiting for a
lock before choosing to block and incurring the cost of a context switch, a problem very
similar to the disk spin

down. They showed that no randomized algorithm can hope for a
cost ratio smaller than e/(e

1) ~1.58 relative to the optimal algorithm. They al
so showed
that a randomized algorithm that selects its timeout from the density f(x) = e
x/s
/s(e

1) will
meet this bound. A plot of this density for a spin down

cost of 1 is shown below.
An alternative randomized algorithm might pick its time

out by sel
ecting one of several
candidates according to some discrete probability distribution. Given the above result it is
interesting to explore how well this alternative fares, in particular how the coefficients of
the distribution should be chosen to maximize t
he cost ratio or cost difference.
To calculate cost
A
(i) we must calculate the expectation over the algorithm's probability
distribution for time

outs. Assume for simplicity that we are only using two time

outs s
and s/2 . We will choose s/2 with probabili
ty p and s with probability 1

p then
cost
A
(i) = p*cost(s/2,i)+(1

p)cost(s,i)
For this case, a
graph of the cost ratio, cost
A
(i)/cost(i), as a function of x and i is shown
below:
To obtain a more informative expression of cost
A
(i) we can divide i over three ranges :
0<
i<=s/2, 2/s<i<s and s <= i. The resulting expressions for the algorithm's cost, cost ratio
and cost difference are given in the following table:
idle value range:
[0, s/2)
[s/2,s)
[s,∞)
c潳o
A
(i)
p*i+(1

p)*i
p*(s/2+s)+(1

p)*i
p*(s/2+s)+(1

p)*(s+s)
max
i
(cost
A
(i)/opt(i))
1
1+2p
(4

p)/2
max
i
(cost
A
(i)

opt(i))
0
p*s
(2*s

p*s)/2
From inspection of this table it is apparent that the maximum value for the ratio
opt
A
(i)/opt(i) occurs a
t p = 1 over the range s/2 <= i < s and at p=0 for the i >= s. The
value of p that maximizes both of these constraints is given by the intersection of the two
lines, 1+2*p = (4

p)/2. This occurs at p=2/5 and the value of the ratio at this point is 9/5.
Thi
s implies
that
a randomized algorithm that chooses
a
time

out of s/2 with probability
2/5 and a time

out of s with probability 3/5 will be 1.8 competitive, already significantly
better than the 2

competitive algorithm above.
Applying the same approach to the dif
ference cost
A
(i)

opt(i) gives a value of p = 2/3 at
the intersection p*s=(2*s

p*s)/2 so that the value of the difference will be 2*s/3
. One
disadvantage to using the cost
difference to compare the two algorithms is that it is not
possible to eliminate a depende
nce on the spin

down cost.
General expressions for both the cost ratio and cost difference can be obtained for
multiple equally spaced timeouts. Construction of these expressions can be illustrated by
expanding the expected cost ratio for three timeou
ts. For this case there will be four idle
time intervals to consider : [0,s/3), [s/3,2*s/3),[2*s/3,s),[s,∞). For idle times in the third
of these intervals, the expected cost over the randomized time

outs will be:
( p1*(s/3+s)+p2*(2*s/3+s)+p3*i )/ i
For
a spin

down cost s <= 1, and i in [2*s/3,s) this ratio will be at a maximum when i
takes the minimum point of the interval, i=2*s/3.
Using p3=1

p1

p2 we can rewrite it as:
p1*(s/3+s)/(2*s/3)+ p2*(2*s/3+s)/(2*s/3) + 1

p1

p2
or
1+p1*(3+1

2)/2+p2*(3+2

2)/
2
or more generally as :S(j) =
where k is the number of time

outs
and j identifies the interval of idle

time values over which we are calculating the
expectation. To solve for the mixture probabilities we can find the intersecti
on of the
above hyperplanes , for example, setting S(1)=S(2)=S(3) for the three time

out case,
gives p1= 9/37, p2=12/37 and p3=16/37.
The same approach can be used to solve for the cost difference cost
A
(i)

opt(i)
or
( p1*(s/3+s)+p2*(2*s/3+s)+p3*i )

i
T
his difference will be at a minimum when i takes the minimum point in the interval, i =
2*s/3. So we can write it as
p1*(s/3+s)+p2*(2*s/3+s)

p1*2*s/3

p2*2*s/3+1
or
p1*(s+3*s

2*s)/3+p2*(2*s+3*s

2*s)/3+1
or
1+p1*s*(3+1

2)/3 + p2*s*(3+2

2)/3
or more gener
ally as SD(j)=
Once again we can obtain an optimal mixture by solving for the intersection of
hyperplanes. For the three time

out case, setting SD(1)=SD(2)=SD(3) gives
p1=9/16,p2=3/16,p3=1/4.
A plot of the cost ratio obtained by us
ing the optimal mixtures over equally spaced sets of
2,3,5,10,15 and 20 points in the interval [0,s] is shown below. Clearly there is little
advantage to considering a greater number of points as 20 points comes quite close to the
theoretical limit.
IV)
Add
itional Work.
As might be expected from a project of this scope
,
the work has raised more questions
than it answered. With respect to the use of game matrices as data sets for exploring the
properties of on

line algorithms, it would be interesting to un
derstand the problems in
convergence of the mixture distribution. Another topic, not adressed here, would be to
extend this approach to the investigation of alternative boosting
alg
orithms
.
For the mixture calculation work, it would be interesting to com
pare the relative merits of
the ratio

derived
and differenc
e

derived mixtures on actual data traces.
It would also be
interesting to explore alternatives to un
i
form spacing, such as "harmonic" spacing.
V)
References:
[1] Karlin, A. R., Manasse, M., S., McGeoch, A.,L., Owicki,
S.1990. "Competitive
Randomized Algorithms for Non

Uniform Problems". In
Proceedings of the ACM

SIAM
Symposium on Discrete Algorithms
pp 301

309
[2] Sherrod, B. 1997. "A Dynamic Disk Spin

Down Technique For Mobile Computing".
Master of Science Thesis, Uni
versity of California, Santa Cruz
[3] Freund, Y., Schapire, R., E. 1996. "Game Theory, On

line Prediction and Boosting".
In
Proceedings of the 9
th
Annual Conference on Computational Learning Theory
, pp.
325

332.
[4] Freund, Y., Schapire, R. E. 1999. "Adap
tive Game Playing Using Multiplicative
Weights". Games and Economic Behavior
29
:79

103.
[5] Breiman, L.. 1997. "Prediction Games and Arcing Algorithms". Technical Report
504. Statistics Department, University of California, Berkeley.
[6] Littlestone, N., Warmuth, M. K. 1994. "The Weighted Major
ity Algorithm".
Information and Computation
108
:212

261
Comments 0
Log in to post a comment