L e a r n i n g a n d I m p l e m e n t a t i o n o n t h e I n t e r n e t

colossalbangAI and Robotics

Nov 7, 2013 (3 years and 11 months ago)

234 views

Learning and Implemen tation on the n I ternet

cJEri riedman
Departmen t of conomics E Rutgers niv U ersit y
New Brunswic k NJ
Scott S henk er
Xero xP AR C
Co y ote Hill Road P alo Alto C A
August
Abstract
W e ddress a the problem of learning and implemen tation on the In ternet When
agen ts pla y r ep eated games in distributed en vironmen ts lik e t he In ternet hey t ha v e
v ery limited a priori information ab out he t other pla y ers and the pa y o matrix and
the pla y can b e highly async hronous Consequen tly standard solution concepts lik e
Nash equilibria or ev en the serially undominated set do not a pply in suc ha tse
ting T o construct more appropriate solution concepts w e rst describ e the essen tial
prop erties that constitute reasonable l earning b ha e vior in distributed en vironmen ts
W e t hen study the con v ergence b e ha vior of suc h a lgorithms these results lead us to
prop ose rather non traditional solutions concepts for this con text Finally e discuss
implemen tation of so cial c hoice unctions f ith w hese t solution oncepts c

W ew ould lik e o t thank Roger Klein and Herv e Moulin for useful iscussions d and seminar participan ts
at Princeton Rutgers Ston ybro ok and U BC or f helpful c ommen ts This researc hw as supp orted b y NSF
Gran t ANI Email friedmanconutgersdu shenk erarcero xom



In tro duction
The In ternet is rapidly b ecoming a cen terpiece of the global telecomm cations infrastruc
ture and someda yit ma yw ell pro vide all of ur o telecomm unication needs In this pap er w e
consider the In ternet as an exercise in resource sharing where t he sharing o c curs on sev eral
diren t lev els Most imp ortan tly n ternet users share access o t the underlying transmission
facilities themselv es With the b estrt nature of the In ternet where resources are not
reserv ed and all ac p k ets are serviced on a rstomersterv e asis b one user usage can

act the qualit y of service seen b y nother a user In addition the In ternet pro vides a seam
less w a y of a ccessing remote services suc h s a databases or w eb serv ers whic h are themselv es
examples of shared resources where usage can i nduce congestion F or example dela ys on the
W orldide eb ha v e i ncreased signian tly n i recen ty ears i t is n o w sometimes w aggishly
referred to as the W orldide ait nd a service ro p viders suc h as A merica Online ha v e
faced la wsuits o v er their access d ela ys Both of these are c ases where o v eruse as h resulted
in deteriorating service qualit y for all sers u
In eac h c ase aggressiv e applications r users get ore m than a n qual e share of these
shared facilities a nd so the In ternet is lik ely to b e a place where nonco op erativ e game theory

is particularly relev an t F or instance w eb bro wsers that op en more TCP connections receiv e

more bandwidth t the exp ense of l ess opp rtunistic o u sers f o the In ternet Similarly esrs
that mo dify their TCP implem en tation to b e less resp onsiv e hen w ongestion c is detected can
obtain m uc h larger s hares of the bandwidth emers e t l a

This ect do es not o ccur on telephone n et w orks b ecause the underlying transmission facilities are n ot
shared on a pac k et yac k et basis bandwidth is reserv ed for eac h call and so the qualit y o f ervice s p erceiv ed
b y a particular user is indep e nden t of the presence of other callers

TCP stands for T ransmission on C trol Proto col a nd this is the proto col that go v t
usage in data transfers In p articular TCP is designed so that o ws slo wdo wn their rate of transmission
when they detect congestion

In the Netscap e Na vigator bro wser the m axim n um b er of CP T connections c an b e set b y the user so
that this form of reediness is under user con trol

al
bandwidth he erns


uniF or the In ternet arc hitecture to b e viable in the ongerm l it m ust not b e vulnerable to
suc h greedy u sers and th us it m ust b e d esigned ith w i ncen es in mind et N w ork arc hitects
are increasingly addressing the ncen i tiv e prop erties of their designs F or example McCanne
et al discuss t he incen tiv e issues n i pac k et dropping algorithms and ts i implications
for la y ered m ulticast ee Ba ja j et a l for a con uation of this l ine of in v estigation
Nagle w as the rst to explore the ncen i tiv e issues i nheren tin acp k heduling in
net w ork routers and this h as b een the fo cus of m uc h subsequen t researc h see for example
Sanders Demers et al Shenk er and Korilis and Lazar Korilis
et al Resnic k e t al ha v e p rop osed mark etased solutions to the problem of
net w ork a ddress allo cation and r oute adv ertisemen ts Net w orks with m ultiple qualities of
service raise in teresting incen tiv e ssues i and this has promoted m uc h f o the recen tin terest
in pricing and accoun ting f or computer net w orks few examples include Co cc hi et al
Clark et al MacKieason a nd V arian and urph M y and Murph y
Mendelson and Whang
F or similar reasons man y t heorists a h v e b egun applying game theory to the In ternet ee
for example F erguson F erguson et al Gupta et al Hsiao and Lazar
Korilis et al a nd Most o f these analyses assume that the appropriate solution
concept the set f o asymptotic pla ys in a rep eated game i s con tained within t he set of Nash
equilibria T o the con trary n i this ap p er w e a rgue that Nash equilibria re a not necessarily
ac hiev ed as a result of learning in the I n ternet setting and that in fact distributed settings
lik e the In ternet require a dramatically d iren t solution oncept c
Because o f t he In ternet i ncreasing role in the telecomm unications infrastructure it is
imp ortan ttath w eac hiev e o s cially desirable allo cations of service i n the In ternet This will
require understanding the nature of l earning a nd con v ergence in the In ternet and o ther
distributed settings so that w e can iden tify the appropriate solution concept Learning and

sc et
tin
tivcon v ergence and its implications for mec hanism design in the n I ternet is the sub ject of this
pap er
F or a concrete e xample of the i ncen e i ssues with whic hw e are concerned consider
the scenario hic h i s more f ully describ ed i n henk S er where sev eral In ternet users
are sim ultaneously sending d ata across a p articular ink l The d ela y xp e erienced b ythe
pac k ets is a function o f the load t he bandwidth consumed b yv arious u sers o n the link
Eac h u ser utilit y f unction U dep ends on her a v erage bandwidth ransmission rate r
i i
and on the a v erage queuing or congestion exp erienced b y er h pac k ets c Users con trol
i
their bandwidth usage r and the et n w ork etermines d the v ector of a v erage queuing c as
i
a function of the set o f b andwidths e r c C r here w the function C rects the
particular pac k et sc heduling algorithm used b y the net w ork nd m ust ob e y the sum rule
P P
that C r f r f or some constrain t unction f f b cause e the o v erall a v erage queue
i i
i i
length is indep enden t f o t he order in whic hpca k ets are serv ed This ongestion game
where eac h pla y er usage c an imp ose dela y on o ther pla y ers can b e m o eled d as a normal
form game with the b andwidths r b eing he t actions and the p a y o g iv en b y U r r
i i i i
The equilibria or more generally the solution concept of this congestion game will determine
the allo cation of n et w ork bandwidth among these users Since n et w ork designers can c ho ose
the sc heduling algorithm C in o rder to attain some so cially desirable outcome the solution
concept of this congestion ame g has s ignian t ractical p ramiations
This congestion game also arises in man y o ther settings F or instance r could b e the
i
usage lev el of a shared database uc h as a video r o text l ibrary r o w eb serv er with c
i
b eing b e the pro cessing dela y r r could b e the a v erage time connected to an online service
i
with c b eing the exp ected time required to connect ee F riedman f or a discussion
i
of this and other ames g arising on the In ternet These examples suggest that there are
man y g ameik e situations arising in distribute d systems l ik eteh In ternet W e call them





tivdistributed systems b ecause he t users are geographically d isp rsed e and are accessing he t
resource through the net w ork The games in these distributed systems share he t feature
that the agen teract only through their join t u se of a s hared resource for instance he t
only form of in teraction b et w een users in the congestion game is that their pac k ets happ en
to collide somewhere i nside the n et w ork Th us it is quite l ik ely that the agen ts ha v elttleoi r
no information ab out eac h o ther Moreo v er he t users p robably kno wv ery little ab out the
detailed nature e apacit c y latency etc of the esource r itself to use the congestion game
again as a sp eci example users ha v e ittle l k no wledge of the underlying net w ork top ology
and c haracteristics so they can alw a ys distinguish b et w een d ela ys due o t the c haracteristics
of the underlying net w ork e sp edfigh e t elad ys in transmission links and dela ys due
to the b eha vior of other u sers queuing dela ys in routers
In this pap er w e sk a t w o q uestions What is the appropriate solution concept for the
congestion game and other games that arise in distributed settings Giv en this solution
concept can w e design sc heduling r o sharing algorithms to ac hiev e t he allo cations w e desire
If the congestion game w ere a canonical onehot game with common kno wledge then one
could in v e tandard s solution concepts suc h a s Nash equilibria or the rationalizable set
Ho w ev er the congestion game is neither a onehot game nor o ne with common kno wledge
Man y d ata transmissions p ersist for a signian t p erio d of time and the sers u are able to
adjust their bandwidth at an yponi t hile w transmitting Th us t he congestion game should
b e mo deled as a r ep eated game rather than a nehot o game Moreo v er b ecause users are
geographically distributed and h a v e n o irect d con tact with r o kno wledge of eac h other
solution concepts based on common kno wledge re a not pplicable a here W e instead m ust
lo ok at the pro cess of learning through rep eated pla y raditional approac hes to learning
through rep eated pla y chi hw e discuss more fully i n Section t ypically assume the pla y ers
use their exp erience to build a mo del of the lik ely a ctions f o ther o pla y ers and then lap y some



ok
in tsform of b est resp onse ither exact b est resp onse s a in the original ctitious pla y approac h
obinson or a sto c hastic b est r esp nse o a s i n F uden b erg and evine L Ba y esian
learning as in Kalai nd a Lehrer i s a particular xample e of this pproac a h whereb y
agen ts b egin the game with priors ab out he t exp ected pla y o f i ndividuals and hen t up date
those b eliefs a s they observ e the pla y Man y f o he t analyses o f suc h learning algorithms
suggest that they result in e ither N ash or correlated equilibria ee e Kalai and Lehrer
F uden b erg and L evine F oster and V ohra
These results while imp o rtan t to understanding t he rational foundations f o equilibria
do not apply in distributed settings due t o the factors w e iscussed d ab o v e In terms of the
underlying game users kno w their o wn action space and can observ e after some dela y the
pa y o resulting from a articular p ction a at a p articular t ime but do not kno w their o wn
pa y o function nor an y o ther pla y er pa y o function a nd cannot observ e t he actions of
other pla y ers Giv en this v ery limited information users ha v e no sense of what other pla y ers
are doing nor an y idea of w hat w ould constitute a b est reply f i t hey did and so sers u cannot
adopt a titious pla y approac h Instead w e a nalyze the case in whic h sers u r the soft w are
on the mac hines they are using mplo e y simple earning l algorithms that xp e erimen twith
v arious actions and then fo cus their pla y n o the actions pro viding the highest pa y o This
is similar in spirit to the stim ulusesp onse approac hes studied i n R oth and Erev
Borgers and Sarin and E rev and oth R Often the w ork n o suc h learning
approac hes concen trates on matc hing the results of the earning l algorithm t o e xp erimen
data Our fo cus here is quite diren t and has three distinct comp onen
First w ew an t to u nderstand the nature of learning in settings ik l e the In ternet where
pla y ers are geographically distributed and ha v e ittle l or no information a b out eac h other nd a
the underlying game In Section w e iscuss d some of the relev an t considerations arising in
the In ternet and other distributed settings W e then presen t riteria c that ll a easonable

ts
tallearning algorithms in this setting m ust satisfy The k ey comp onen ts are optimization
monotonicit y and resp onsiv eness
Second in Section w e address the asymptotic result f o p la y mong a a set of reasonable
learners In a previous pap er F riedman a nd Shenk er w e a nalyzed one particular
family of learning algorithms with these prop erties ere H w e attempt to iden tify the class
of all learning algorithms that can b e onsidered c reasonable and then s tudy the union of
asymptotic pla ys for all p opulations of reasonable learners In other w ords if all w ekno w
is that agen ts are reasonable what predictions can w emka e b a o ut their asymptotic pla y
W e d that the asymptotic pla yawl a ys resides in the serially uno v erwhelmed s et eed
in Section W eare otan letb osho wandinfactdonot believ e that reasonable learn
ing algorithms actually visit ith signian tly arge l probabilit y all p oin ts in the serially
uno v erwhelmed s et W e discuss this i n etail d n i S ection
Third in Section w e discuss the i mplications f o these con v ergence results for mec hanism
design and explore whic h so cial c hoice functions can b e impleme n ted in these istributed d
setting W e nd that so cial c hoice unctions f i mplem en table in this ecen d tralized setting
m ust b e strictly strategypro of n y d eviation that l eads to a diren t o utcome results in
lo w er utilit y f or the eviator d and Maskin onotonic M Moreo v er an ysocialc hoice functions
implem en table with the serially uno v erwhelmed solution concept hic h o v e
is a sup erset of the t rue s olution concept m ust b e strictly c oalitionally strategypro of ee
Section for a d eition W e then presen t e xamples of some implem table so cial c hoice
functions
Learning in Distributed Systems
In this section w e st i nformally discuss the ature n of learning algorithms appropriate for
the In ternet W e then formalize these notions f o what ak m es a r e asonable learning algorithm

en
ab stated as
in to precise deitions and pro vide some examples It s i mp i ortan t to mphasize e that w e are
not claiming that these algorithms are j ustid b y b eing truly rational or pro v ably optimal in
an y precise sense W e are merely trying to mo del the kinds of adaptiv e learning pro cedures
that either are curren tly or could p o ten tially b e used on the n I ternet
Learning in the In ternet
The gameheoretic prop erties of the In ternet are common to an m y other distributed settings
but for concreteness in the paragraphs b elo ww e fo cus solely on the I n ternet con text There
are four main asp ects of the n I ternet that are particularly relev t to our game theoretic
form ulation
First as discussed ab o v e pla y t ypically ha v e xtremely e limited information They
do not kno w who the o ther pla y ers are or ev en ho wman y nd a they do not o bserv e o ther
pla y ers actions In addition b ecause they are ot n a w are of the nderlying u n et w ork top ology
or c haracteristics pla y ers t ypically d on kno w the pa y o functions that s i they d on kno w
ho w their pa y o dep end n o t he actions of other pla y ers or ev o wn actions The
only information a v ailable to users are their o wn actions and the resulting p a y o nd hey t
ma y only learn the pa y o after some dela y This lac k of nformation i is actually a cen tral
design principle of the I n ternet The arc hitectural notion of layering ee T anen baum
for a textb o ok discussion of l a y ering of net w ork proto cols is in tended to allo w computers
to utilize the et n w ork without kno wledge o f the underlying ph ysical infrastructure and to
allo w applications uc h s a email or e transfer o t o p erate without detailed no k wledge of
the curren tlev el of net w ork congestion
Second pla y ers do not carry out n a y ophisticated s ptimization o ro p cedures Often the
actual decisions ab out resource utilization are ade m b y computer programs either the ap
plication or lo w er lev el proto cols lik e CP T without irect d h uman in terv tion Th us the

en
their on en
ers
anearning algorithm m ust b e em b edded in soft w are and that imits l the xibilit y and com
plexit y of the optimization pro cedure Moreo v er suc h l earning algorithms are n i tended to
b e ortable i usable on a n y mac hine lo cated an ywhere a nd so are expressly designed

to not rely on the details of the sp eci on c text In particular a Ba y esian approac h based
on up dating priors is n ot realistic h ere since the la y ering o f et n w ork proto cols ensures that
an y priors w ould b e quite llnformed i v E en in cases where the resource decisions are made
directly b yteh h uman user it seems unlik ely that the user will b e making complex opti
mization decisions giv en the v ery meager information a v ailable T ypically the user actions
in suc h cases are limited to adjusting parameter settings f or underlying programs uc has
adjusting the n um b er o f TCP connections a b ro wser op ens rather than actually xercising e
detailed con trol
Third there is no s ync hronization and o n atural n unit of time on the In ternet Pla y ers
do not all up date their actions a t the same time s they do in the standard rep eated game
literature T o the con trary the rate at whic h the up dating o ccurs can v ary b y sev eral
orders of magnitude Note that there is a d ela ybet w een when agen ts up date their action
and the time they notice a c hange in their p a y o for the congestion game escrib d ed in the
In tro duction this dela yis t ypically on the o rder of a roundtrip time the time it tak es a

pac k et to get to its destination and the ac kno wledgmen t to mak e the return trip These
roundtrip dela ys v ary from s of microseconds if the destination is on the same thernet E
he dela y is due to op erating s ystem o v erhead to s of millise conds i f the destination is
across the coun try he dela y is the sp eedfigh t d ela y o f p ropagation Standard con trol
theoretic results suggest hat t con trol l o ops should not p u d ate faster than the roundtrip time

F or example TCP do es not kno w he t con t of he t data it is con v eying or n do es it kno wan ything
ab out the net w ork o v er whic h the data i s o wing It erely m w aits for signs of congestion and resp onds
appropriately

In TCP t he receipt o f ac e h data pac k et is c wledged b yan A CK pac k et sen t from receiv er to
sender

kno

tenSince up dating rates a re tied to roundtrip times the v ariation in up date rates will b e quite
large Moreo v er some learning agen ts will b e p eople ot n programs nd a their p dating
rates are most lik ely on the rder o of at least s econds if not signian tly slo w Th us the
standard mo del of a rep eated game in whic hpal y ers are sync hronized can b e isleading m in

the In ternet con text
F ourth and ally i t is neither the ong l term n or the short term but the m edium term
s deed b y oth R and Erev that is relev an t Pla y ers t ypically use the system for
man y time units measured in their appropriate timescale ho w ev er the nature of their pa y o
function c hanges fairly often as new pla y ers en ter the system or as the system conuration
c hanges often due o t equipmen t failures for whic h t w ork automatically comp ensates
The imp ortan t p oin t h ere is that lap y ers do not k no w d irectly that the pa y o function has
c hanged they can only observ e he t pa y o they get nd a so can distinguish b et w een when
another pla y er c hanges her action and hen w the en vironmen t itself c hanges This requires
the learning a lgorithm to alw a ys b e r onsive ich hw e ee d more formally in Section

These four prop erties c haracterize what w e call d istributed systems The atural n ques
tion then is what forms o f learning a lgorithms are ppropriate a in distributed systems W e
claim that in suc h ettings s there are three primary requiremen ts that one w ould exp ect of
an y reasonable learning algorithm One requiremen t i s that against a xed p a y o function
when there are no o ther pla y ers just nature the pla y er learns to c a hiev e t he optimal
pa y o This eems s to b e the most asic b requiremen t f o a n o ptimizing learning algorithm
and it w ould b e hard to justify an y lgorithm a that id d not atisfy s this criterion Another
reasonable requiremen t s i hat t the learning lgorithm a b e monotonic i n the pa y o that is if
w e mo dify the a p y o function b y aising r the a p y o for a c ertain action the probabilit yof

Laguno and Matsui ha v e also made a similar p oin t b a out the r ole of async hron y n i the set of
sequen tial equilibria for rep eated games

esp
net he
erthe agen t pla ying that action should not decrease This is similar to the a w f o the Ect
whic his w ell kno w in t he psyc hology literature nd a is discussed b y oth R a nd Erev as
a fundamen tal prop ert y in exp erimen tal learning Finally an y o f the learning algorithms in
the literature decrease the rate t a whic h they resp ond with time in settings lik ethe In ternet
where the pa y o function c hanges frequen tly a s lap y ers come and go agen ts m ust alw a ys
b e prepared to resp ond to a new situation in a b ounded amoun t o f time Th us there are
three informal comp onen ts of b eing a reasonable earner l optimization monotonicit ynd
resp onsiv eness W eno w ro p ceed to mak e these concepts precise but st w em ust describ e
our basic mo del
Mo del
In this section w e escrib d e a s imple mo el d to capture the k ey elemen ts of a istributed d setting
suc h as the In ternet Consider a ame g w ith a set P pla y ers kP k P here w eac h pla y er
has a ite action set A he ap y o o f the game are describ ed b y a time dep enden tnad
i
P
p ossibly sto c hastic function G A A A here w for on c v enience nd a
P

to simplify notation w eha v e restricted pa y o to The game s i la p y ed in con tin uous
time a t A denotes pla y er i action at time t and G a t enotes d her instan taneous
i i i

pa y o w at time t stable game is one i n whic h G a t G a t ro all t t i there
is no time dep endence F or stable g ames A w e will rop d the last argumen t from the
notation and just w rite G a t Later w e will refer to games that are stable after time t
i

whic h means that G a t G a t for all t t
While the pa y o arise from the game structure eac h ndividual i la p y er is completely
una w are f o he t presence of other pla y ers and of the pa y o function G us from het
p ersp ectiv e f o n a individual pla y er w e need only mo del the fact that hey t r eceiv e some

T o guaran tee that in tegrals are w ell deed w e assume that on an y nite time in terv s t t
G a t s nco tin uous in t except at p e rhaps a ite n um b er of laces p


function he al
Th









pa y o w t This pa y o w t can dep e nd explicitly on time erhaps in a sto c hastic
i i
manner and o n all the la p y er previous actions
Preferences o v er diren tpa y o ws can b e extremely complex Here w e restrict our
atten tion to a simple case b y a ssuming that pla y ers ha v e a ed sampling rate ev aluating

a v erage pa y o at discrete and deterministic ep o c hs In our mo del a la p y er has discrete

time h orizons t at whic h she ev aluates er h pa y o as some p ossibly w eigh ted a v erage
i i

of her wpa y o and then at the end of the p e o c h can decide to alter her ction a W e let
n
n
a n b e the pla y er i action c at t h s i hen t main tained un til t Note that
i
i i
there is no sync hronization in the ystem s so the time horizons are diren t or f eac h pla y er
n n
i w e can ha v e and generally do ha v e t t i j esa y t p yis synchr onous
i j
n n
if t t for all n and all i j
i j
Dee

Z n
n
t
i t t
i
n G a t d
i i i
n
n
n
t t t
i i
i
where t is some con tin uous nondecreasing cum ulativ e distribution function ith w
i i
n n
us nsaw ted a v erage f o t v er the ime t p erio d t
i i i
i
i
A
Let h n a n h n n and h n
i i i i
i i
A
h n n b e pla y er i history u p to p erio d n a nd let H n b e the set of all p ossi
i
i i
ble histories for pla y er i n is a function of the time n he t curren tacotin a n the
i i
history h n nd a ma y also b e s to c hastic F or the remainder of this section w e will rite w
i
a n n In this form ulation the ther o pla y ers are mo deled s a p art of the en viron
i i i
men t the fact that their b eha vior i s cted a b y gen a t i history of p la y s i ncorp i orated in

Th us w e are not considering an ything as complex s a the equilibria of rep eated games in con tin uous time
ee for example tinc S hcom b e for a discussion b ut are only attempting to analyze the b eha vior of
fairly simple learners

Note that the decision p o in ts are often determined b y the tec hnology and re a t ypically not treated
as strategic v ariables Nonetheless w e b eliev e hat t ost m o f ur o results re a still v alid for learners that
strategically manipulate their decision p oin ts g iv en noisy pa y o and dela ys in observ ation In particular
the abilit y to manipulate decision p oin ts should not decrease the set of outcomes that arise

to



eigh Th


la hat for
whic hosen

s dep endency on h
i
Agen t i uses a learning algorithm to c a n Since in this setting gen a ts cannot
i
observ e the actions of other agen ts their c hoice o f a n can only dep nd e on the history of
i
agen t i o wn pla o wn pa y o h n With suc hliltte a priori information ab out the
i
game pla y ers m ust exp erimen t with v arious actions in order to learn b a out the resulting
pa y o Suc h exp erimen tation is often b est done with randomized lgorithms a While ran
domization is often extremely useful i t can b e unluc ky nda so w em ust allo w or f o ccasional
istak es sub optimal b eha vior W e will consider learning to b e suien tly optimal if
it is almost optimal a lmost a ll of the time This t yp e f o earning l is kno wn as P A C learning
probably appro ximately correct learning and can b e extremely p o w erful See for example
V alian t or Blumer et al
Giv en a a p y o function and history h n em ust b e able to compare the v of
i i
diren t actions One metho d whic hw ec ho ose for its simplici t y is to ompare c the means
of the random v ariable a n F y w e will w rite a n
i i i i i i
b n to mean that E a n E b n
i i i i i i i i
In the remainder of this section w e ill w onsider c a single la p y er and th us will drop the
subscript iich h w ill b e implicit Let E N e na en vironmen t eed d o v er N p erio ds i
apa y o function deed on n
easonable Learning Algorithms
As w e discussed in Section t he three requiremen ts of a reasonable learner are optimiza
tion monotonicit y and resp onsiv eness These nformal i concepts can b e made ore m precise
with the h elp o f t he follo wing deitions
The requiremen t of o ptimization is s imply the notion that in an en vironmen t w ith a single
action that is b e tter pro vides higher pa y o than n a y other he t learning algorithm should



an or
alue

and ys
ose hoev en tually learn to almost alw a ys tak e this ptimal o action Certainly one cannot imagine
reasonable learning a lgorithms doing otherwise

Deition A n envir onment E N imple with optimal ction a a A if for al l
n N

a n a n h n


for al l a A such that a a f l h n H n
A reasonable learner should b e ble a o t l earn the ptimal o action in suc h g ames if N
is suien tly large A learning algorithm or l earner L s i a mapping f rom histories h n
to probabilit y distributions o v er actions i n Aiv en an en vironmen t E N this induces a
probabilit y distribution o v er the set of all histories H n whic hw e ill w enote d
L E N

Deition ptimization A player is a simple earner l if f or any E N

which is imple with optimal action a A uch s that N N and any m such that


N m N ther e xists e a subset H m H m such that H m and
L E N


for al l h m H m Pr a m a j h m
Simple learners can nd the optimal action in simple games in the sense of pla ying the
optimal action with high probabilit y or f ost histories where ost is deed b ythe
probabilit y d istribution induced b y t he learner Note that the robabilistic p form ulation of het
ab o v e deition ith w the allo w ance of o ccasional istak es i s n ecessary since a randomized
learning algorithm can b e unluc ky
No ww e a ttempt to capture the m ore general idea of resp onsiv eness or medium term
learning Let H m enote d the set of all istories h on x m
x
Deition esp onsiv eness Ale arner is resp onsiv e if iven g any envir on

ment E N and any N m N such that E N r estricte dto m N m is imple






al or

is

with optimal action a ther e e E N r estricte dto m N m a subset H m
m N

H m such that H m a al l h m H m Pr a m
m N L E N m N m N

a j h m
Being resp onsiv e r equires that the learner resp ond to c hanges in the en vi
ronmen t w ithin a b ounded time N that is in an y p erio d of ength l N during whic hthe
en vironmen t h as b een imple the learning algorithm m ust on c v erge in a robabilistic p

sense to the optimal action
Note that resp nsiv o eness is strictly stronger than b eing a simple learner F or example
consider the follo wing uasitatic en vironmen tin hcwi hev ery p erio d s the optimal a ction
ma yc hange but in b et w een c hanges the en vironmen tis imple L et I n b e the indicator
v ariable whic h is when the agen tc ho oses the o ptimal ction a i n t ime p e rio d n and
otherwise W e consider the case where these stable in terv c v ary nda so w e can let
b e a random v ariable with mean
Theorem In the quasitatic envir onment
m
X

lim lim I t
m

m
t
almost sur ely for any r esp onsive le arner
Pro of Let rN for r and consider a p erio d of length where the en vironmen tis
p p
simple With probabilit y reater g than r the p erio d is longer than rN hen T for that
p
period E I T r r Note that this b ound is indep nden e t f o a ll previous
p P
m
p erio ds and since with probabilit y r the b ound holds w eget ilm I t
m
t
m

Most adaptiv e earning l algorithms in the literature b e rg and Levine Erev and Roth
Borgers and Sarin are not adaptiv e b ecause s a time o g es on they b ecome less eactiv r eto c hanges in
their en vironmen t In theory a y esian yp e learners Kalai and Lehrer F a V ohra
could satisfy resp onsiv eness b y i ncluding the p o ssibilit y o f switc hing in the priors ecause B the space of all
p ossible en vironmen tal c hanges is h uge and pla y ers are llnformed i ab ut o their probabilities this w ould
result in an algorithm hat t is extremely iult d to implemen t a nd completely impractical

nd oster
uden




an als

for nd
on xistsp p
r r r almost surely and taking the limit as r completes
the pro of
Note that nonresp onsiv e learners do not satisfy this theorem F or example o egret r
P
m
learners suc h as those in F oster and V ohra do quite adly b im l I t
m
t
m

can b e on the order of j A j
Our next deitions formalize a notion of monotonicit y o r the a w o f the Ect
horndyk e First w e d ee what it means for one h istory to b e b e tter with resp ect
to an action

Deition Given two histories h n and h n we say t hat h n is higher r esp e ct to
A A A

action a A if h n h n dn h n h n whenever h n a and
m m m
A

h n h n whenever h n a
m m m

Deition onotonicit y Ale arner is monotonic if for any p air of histories h n h n

such that h n is higher with r esp e ct to a A than h n hen t

Prob a n a j h n Prob a n a j h n
Com bining these deitions w e can no w p recisely ee d what w e c onsider to b e a rea
sonable learning algorithm in distributed settings l ik e t ternet
Deition Ale arner is an r e asonable le arner i f it s i onotonic m and
r onsive
Note that monotonicit y allo ws us to mak e statemen ts ab out en vironmen ts that are not
imple F or example in an en vironmen t there ma ybe sev eral actions n a y ne o of whic h
ma y b e optimal dep ending n o exogenous ects but there ma y also b e ctions a that are
clearly sub ptimal o In this case w e c an sho w t s h clearly sub ptimal o actions ill w b e
pla y ed rarely b y a reasonable earner l

This is demonstrated n umerically n i Green w ald F riedman and Shenk er

uc hat
esp

In he


with
lim

Theorem Consider an envir onment E N ssume A that ther e is an action a A and a


set of actions A A such that al l ctions a i n A ar e always worse han t a a n



a n h n for al l a A If a player is a r e asonable le arner with N then


for any m with N m N ther e exists a subset H m H m such that H m
L E N

a nd for al l h m H m Pr a m A j h m

Pro of Consider the en vironmen tin iwch h as h the s ame pa y o as E N when either he t


action a a or a A but has zero a p y o for an y other action This n e vironmen tis imple


with optimal action a nd th us Pr a m A j h m b y heorem T H o w ev er for all


a A this en vironmen t is higher than E N Th us in E N the p robabilit y of pla

a A can not b e larger han t this
Examples
Eac h o f the three notions optimizing monotonicit y nd a resp onsiv eness that comprise
our deition o f r easonableness seem n o the surface to b e quite natural and undemanding
requiremen ts Surprisingly ew f formal learning algorithms in the economics l iterature satisfy
this deition of reasonableness Man y of the learning algorithms in the standard literature
do not ha v e the resp onsiv e prop ert y t ypically their resp onsiv eness to c hanges i n pa y o
or their lev el of exp erimen tation diminishes o v er time W e lso a n ote that there are no
deterministic algorithms hic w h are resp onsiv e
W eno w p resen tt w o examples of r e asonable learning a lgorithms
Stage Learners
The st is a stage learner whic his a v ery simple reasonable l earner T he stage learner

SL learns in tages of length During eac h stage the action that h ad the highest


With suitable c hoices of parameters Roth and Erev m o el d of learning is easonable

A sligh tv arian t of this statemen tis pro v F uden b erg and evine L

in en



ying



a v erage in the previous stage ith ties brok en randomly is pla y ed with probabilit y
while the remaining actions a re eac h pla y ed with probabilit y j A j The c hoice of a ction
in an y time p erio d is i Note that the stage learner almost alw a ys pla ys the ction a with
highest exp ected v alue ased on the a p y o bserv o ed in the last stage but exp erimen ts
with suien t frequency to notice c hanges in the en vironmen t nd a react o t them
q
p

Theorem F or suiently smal l SL is an j A j e j A j

r e asonable le arner

Pro of Assume t hat d uring a particular p erio d of length the en vironmen tis imple
p
with optimal action a and Then the stage l earner will ha v e faced a imple en vi


ronmen t during ts i previous stage Dee a n a n E a n h n

Note that not restricting o do es not ct a the stage learner In this en vironmen t
p


E a n h n and E a n for a ll a a oteN that Var a n

for all a A ince s

Dee a tage s to b e ormal if eac h action has b een pla y ed at least j A j times

The exp e cted n um ber of pla ys for an y p articular action is greater than j A j while the
q


standard deviation of the n b er of t imes it is pla y ed i s less than j A j h us from

the cen tral limit theorem the probabilit y of a n a ction not b eing pla y ed at least j A j
q q


times is less than er f j A j whic h s i b ounded b y exp j A j so the probabilit y
of a stage b eing normal is greater than
j A j
q q
X
j A j
j A j j
exp j A j exp j j A j
j j A j j
j
When j A j is o d d w e c an rearrange t erms to g et

j A j
q q q
X
j A j j A j j
A j exp j j A j exp j j A j j A j
j j A j j j
i
q
A j exp j j A j




exp


um




xp since the terms in he t sum are ll a p sitiv o e for suien tly small When j A j is ev en w e use
q q
j A j j A j
the same argumen t after noting that j A j exp j A j

Dee a o beteh a v erage a p y o for ction a a A o v er a n ormal learning stage
q


The standard deviation of a i s less than j A j while the a v erage i s if a
p

optimal action and less than if a is not ptimal o T h us the probabilit y o f t he optimal
q
p

action ha ving a v erage less than s i l ess than exp j A j since the sequence s a
martingale ee Ho eing or f details This s i also the probabilit y of a nonoptimal
p
action ha ving pa y o greater than Th us the probabilit y of the optimal action ha
q
j A j
the highest pa y o is greater than exp j A j whic h i s reater g than j A j
q
exp j j A j c ompleting the pro o f
Note that if there are t w o optimal actions then the stage l earner will alternate randomly
bet w een them F or constructiv e urp p o ses it is often useful to mak etihs c hoice deterministic

Let A b e the set of all strict orderings o n A e for A j A j
S

with i A and i A hen T giv en an ordering A dee the rioritized
i A

stage learner SL to b e a stage learner that pla ys with robabilit p y the highest

ranking ccording to trategy s whose a v erage pa y o in the last stage w as no less then
less than the a v erage pa y o from an y other strategy remaining actions are still pla y
probabilit y j A j
p
Note that mo diation of the stage earner l has n o ect for a imple en vironmen t
other than sligh tly increasing the probabilit y that the learner mistak es the action with the
highest pa y o
p


Theorem F or suiently smal l and any A SL is an j A j

q

exp j A j r e asonable le arner
Pro of The ro p of is iden tical to the previous pro of or f ordinary stage learners except for the
conditions under whic hitc ho oses the incorrect optimal action This ma y rise a when the


with ed




ving

the is

exp

a v erage pa y o for a sub optimal action is within f the a v erage pa y o for the optimal
action whic hc hanges the probabilit y o f a mistak e sligh tly
Resp o nsiv e earning L Automata
Our second example is the resp onsiv e earning l a utomata LA w hic hw as studied in F ried
man and Shenk er and motiv ated the nalysis a in his t p ap er RLAs re a based on algo
rithms studied in the ngineering e iterature l and ha v e b een implem en ted for an m y net w ork
optimization tasks ee e C hrysalis and ars M Mason and Gu and Shrik an
takumar They are also closely related to sev eral o m d els p rop osed or f exp erimen
economic learning rth ur Mo ok erji and Sopher Roth and Erev An RLA
consists o f a probabilit yv ector whic hcan be in terpreted as a m ixed action at ev ery decision
epoc h ith w probabilit y p n a ction a is pla y ed After action a is pla y ed and the pa y o
a
n is observ ed the probabilit yv ector p n i s u p ated d b y he t follo wing rule
a
X

p n p n n c n p n
a a b b
b a

b ap n p n n c n p n
b b b b
where

p n
b
c n mni
b

p n n
b
W e will denote t hese learners b y

Theorem F or suiently smal l ther e exist c onstants such that RLA is


an exp r e asonable le arner
Pro of This follo ws directly from F riedman and S henk er Theorem



RLA



tal
Groups of Reasonable earners L
Con text and Deitions
Our discussion of learning algorithms considered an en vironmen t seen b y a single pla y er
whic h consisted of a eneral g pa y o function with no restriction on ho w t y o w ere
generated Here w e return to the original situation where this p a y o function arises from a
game G in v olving P pla y ers ith P denoting the set of pla y ers eac h with action space A
i
When fo cusing on a single pla y er in a general en vironmen t results ik l e T heorem allo w
us to mak e ome s statemen ts ab out the a symptotic ature n of pla y of a reasonable learner as
deed in Section Similarly in t his section w e a ssume that eac h f o the P pla y a
reasonable learner and sk a what the asymptotic nature of the oin j tpla y is This asymptotic
set of actions is the solution c onc appropriate for learning i n d istributed systems lik ethe
In ternet Note that the solution concept m ust con tain the ev tual pla y f o ll a p ossible sets of
learning algorithms W e a re not in terested in results for one particular learning algorithm
ev en if the set of suc h learners ha v e particularly nice con v ergence prop rties e All w ecan
assume is that learners are reasonable n ot that they conform to s ome sp e ci algorithm
Milgrom nd a Rob erts dee an daptiv e learner s a ne o who e v tually elimi
nates actions that are strictly ominated d n p ure actions o v er time They ro p v e that when
a group of adaptiv e learners pla y together t hey con v erge to the serially undominated set he
result of the iterated deletion of these dominated actions
In this section w e parallel those results with t w o ain m d istinctions First w e o nly assume
that pla y ers a re reasonable l earners as deed in the revious p section In this setting it is not
true that pla y ers alw a en tually abandon dominated actions Pla y ers annot c explicitly
iden tify dominated a ctions ecause they don no k wthe ap y o matrix and furthermore w e
sho w that in some cases dominated actions can ev en b e pla y ed in equilibrium Th us w ecan

ev ys
en
en
ept
is ers
pa heseonly imp ose the requiremen t of reasonableness s w eha v e deed it on learners Second
since in this distributed setting no action can ev er b e completely discarded the on c v ergence

to an y set of actions or the elimination of others is only appro ximate The f act that all
actions remain in pla y f orev er mak es the analysis o f he t join tpal y quite delicate
As w e hall s see a s et of reasonable learners need not con v erge to the serially undominated
set The main result of this section is that a set o f reasonable learners ev en tually pla yin
the serially uno v erwhelmed set the s et remaining after iterated elimination of o v erwhelmed
actions W e do not b e liev e this c haracterization is tigh t in that there are some g ames
where no set of reasonable learners will ev en tually pla y with signian t robabilit p y in some
p ortions of the serially uno v erwhelmed s et Ho w ev er the serially uno v erwhelmed solution
concept is the tigh test o cal set based solution concept p ossible where l o al c set based
solution concepts are the natural generalizations of he t serially undominated set Moreo v er
w e presen t another t w o ets s the tac S k elb erg correlated set and the Stac k elb erg undominated
set and raise the question as to whether the t rue solution concept lies b et w een these t w o
Before pro eeding c w e equire r t w o eitions d
Deition oAl c al dominanc eop er ator on a table s game A is a set of monotone

i A A
i i
op er ators one for e ach i he notation r esents the act f t hat
i
i

dep ends on player i p ayo matrix G We denote this set of op er ators by wher e
i

i
for e ach i and A Note that an op er ator is monotone if
i i
i

A i i
i
for such that if enh
i i
Eac h o l cal dominance op erator describ es the set of p o ssible strategies agen t i migh t

emplo y as a function of the p ossible pla ys the ther o agen ts migh tmak e F or eac hlocal

Recall that a reasonable learner in order to remain resp onsiv e can nev er completely top s pla n a
action since exogenous ects c ould mo dify the pa y o m aking hat t action optimal t a some l ater time

Duggan and Le Breton study the xed p oin ts of lo cal dominance op erators whic h they denote
ominance Structures

ying



only epr


dominance op erator e can dee the related solution concept

Deition Given a lo c al dominanc eop er ator the asso ciate d lo cal et s based
m
solution concept SB is the op er ator dee dby G A lim A
m
One standard LSB is deed using dominated actions The lo cal dominance op erator is

i
giv en b y f a A j b A s a G a b g e will
i i i i i i i i i
i

denote the LSB for this op erator b y D a nd so D G A d enotes the serially undominated
set of the game A
The relev t LSB for decen tralized games s i based on uno v erwhelmed actions The lo c al
dominance op erator is

i
f a A j b A s a G a b g
i i i i i i i i
i

W e will denote the LSB that results from the iteration of this op erator b y O a nd refer to

O G A as the serially uno v erwhelmed et s of the g ame A W e w ill o ccasionally

abbreviate this as O G w hen the action subset is the en tire action set and will further

abbreviate the notation to O when the game i s also unam biguous Similarly when the
game is unam biguous and the action subset i s the en tire action set w e w ill se u the notation
k
O to denote the k h iteration of the uno v erwhelmed o l c al dominance op e rator applied to
the en tire action s et
F or comparison note that one a ction dominates another if all a p y o or f the one are
greater than the other for all giv en e d sets of other pla y ers actions In con trast one
action o v erwhelms another if all pa y o o v er all sets of other pla y ers actions f or the one
are greater than all pa y o o v er all sets f o ther o pla y ers actions for the other Domination
compares the v ector of pa y o term yerm o v erwhelmi ng compares the en tire ag of
pa y o a v ailable and th us is a m uc h stronger r equiremen t

The limit exists since is a monotone set op erator and A is ite



an




F or an y game the serially uno v erwhelmed set con tains the serially undominated set
whic h con tains the set of rationalizable actions
Con v ergence Results
Giv en a nite set of r easonable learners L f L g where eac h L is an
m i i i i i
reasonable learner let L amx max max N max w consider a
i i i i i i i i

rep eated game pla y ed b y these pla y ers with pa y o functions G a t nd elt be het
i
i


largest time in terv al b et w een pla y er i decision ep o c hs the smallest and et l L
i

L


max L min D ee L N L and et l j A j j A j
i i i
i i
L
Note that a set of learners L and a game A induce a measure o v er histories
H b y their pla y whic hw e will call W eno w p resen t our main result whic h is that
L
decen tralized learning eads l o t the serially uno v erwhelmed set
Theorem Given any game G a which s i table s after time t and any Ther e

exists such that or f any s N dn anyste L of r e asonable le arners

playing satisfying L d n L L the players c onver ge


to O in the fol lowing sense ther e exists a s et H s H s with H s such
L

that Pr a s O G j h s
k
Pro of Fixing a game A ho ose an a ction a O and dee
i
i

A
a max min G b max G a
ki i i i i i i i
k k k
b O
i b O a O
i i
i
i i
k
and let in f a j a O g and note t hat if an action is eliminated for
ki ki i i ki
i
pla y er i at round k a nd otherwise etL ax
ik ki ki

Dee time in terv al k b y I t k L N L t k L N L Note that in
k

I all pla yisin O e p ro ceed inductiv Assume that for n a y s in p erio d I learner i
k
k
is pla ying in O with probabilit y greater than L If m L then learner i is
i


ely










and


No
k
pla ying in an en vironmen tin whci h ll a actions n ot in O m ust b e exceeded in exp ected
k
v alue b y hose t a ctions in O and th w e can apply heorem T o t sho w that the learner
learns to pla y these actions in p erio d k with probabilit y ess l han t with probabilit y
i
greater than The probabilit y that the pla y er do es this at ev ery in terv al in p rio e d
i
k is greater than L L Th us the robabilit p y that a ll learners do this is greater
than m L L Finally the probabilit y that this o ccurs o v er all stages is g reater than

m j A j L L since there c an b e at most j A j stages required to reac h O Th us if

j A j L L this sho ws that con v ergence will o ccur
This theorem immedi ately applies to Stage L earners and R LAs

Corollary Ther e e xists some such that any gr oup L of Stage l e arners and RLAs
p
satisfying L ax min c onver ge to the erial s y l u noverwhelme d s et
i i i i
wher ec onver genc e is dee d as n i The or em
The ab o v e esults r hold for a stable games Ho w ev er the nalogous a results hold ev en
with time arying games F or instance consider s w e did in Section f or games against
nature the uasitatic game in whic hev p erio d s the pa y o functions ma yc hange
but in b et w een c hanges the game is constan t Let I t b e the indicator v ariable whic his

when curren t action is in the serially u no v erwhelmed set a t O G Let be a
random v ariable with m ean hen T t he Theorem also implies con v ergence in this game
Corollary In the quasitatic game just describ e d
m
X

lim lim I t L L
m
m
t
for any gr oup of le arners satisfying the c onditions in the pr evious the or em
As discussed in S ection nonresp onsiv e learners ncluding i no regret learners do v ery
p o orly in quasitatic en vironmen ts



ery
and


us
Sync hronous P la y
n n
In terestingly f w e restrict to sets of pla y ers who pla y sync hronously t t a i j n
i j

then w erve ert to the standard results p la y con v erges to D G
Theorem L et L b e a set of r e asonable le arners playing sync hronously a game G a

which is stable after time t T hen for any ther e exists such that for

any s N f L nd L L then ther e e xists a set


H s H s with H s such that Pr a s D G j h s
L
Pro of The p ro of of this theorem is a nalogous to Theorem fter a noting that in a sync hronous
game the exp ected pa y o of an y ominated d action i s a lw a ys less than that of the dominating
t
action since for pla y er i a is go v erned b y a random distribution that s i o d e s not dep end
i
t
on the c hoice of a lthough they ma y b e correlated ex p o st
i
W e do not kno w if there is a smaller set the supp ort of the set of correlated equilibria
or the rationalizable strategies for whic h his t result con tin ues to hold
Minimal Solution Concepts
Theorem e stablishes a b ounding set o n the asymptotic pla y he T true solution concept ma y
b e somewhat smaller Let C G A A b e the true solution concept that is the union of the
set of strategies pla y ed with nonegligible robabilit p yb y all p ossible groups of r easonable
learners More formally eha v e he t follo wing deition

Deition C G A is the smal l est et s for which The or em s i true when O r eplac e d
by C
First w e will sho wttha C G A no tains some tac S k elb erg equilibria G iv en a trict s order

on P P dee he t Stac k elb erg game G to b e the xtensiv e form game with




is




ll for pa y o giv en b y G in whic hpal y er m o v es st then and con uing up to pla y er
P

Deition A tackelb S er ge quilibrium ith w r esp e ct to or der P a s game p erfe ct
e quilibrium of the S tackelb er g game G

The k ey asp ct e of the f ollo wing pro of is the observ ation that b y s eparating their timescales
pla y ers b eha v e sa ifthyae re lpa ying a Stac k elb erg game Note that he t role f o l eader is not
in ten tional b y the learner in fact the learner is not ev en a w are that it is the leader
and is merely the pro duct of learning slo wly h us earning l slo wly u sually p e rceiv ed as a
disadv an tage pro vides the b enes of b eing a Stac k elb erg leader This is an xample e where
sup erior sophistication uc h s a faster computer pro cessors or b etter learning algorithms
ma y lead to inferior results

Theorem F or every or dering P ther e exists ome s a G A such that a is a
Stackelb er ge quilibria for the game A with r esp e ct to or der
j

Pro of Consider a group of iden tical rioritized p stage l earners SL eac hwtihan y ordering

j

A Cho ose suc h that
j
p

min fj G a G a j G a G a g
i i i i

a A i
n P i
Set t n heer d e here d x e is the l east in teger greater than x so
i i
i
the pla y ers up date at ed but diren t in terv als with the rst pla y er in the S tac k elb erg
ordering b eing the slo w est L astly et t for t t t
t o s pla y ers a v erage o v er pa y o o nly uring d the second half of their time in al
n
No w consider pla y P She c ho oses an action at time t nda in the p n e
P
n
n
in terv al b et w t and t no pla y er b e fore her i n he t order will c hange their
P P

een
er
terv
for and





ub is
tincurren t action and pla y P ill w con v erge to a b est reply ith high probabilit y
n
curren t action a b y time t us rom f her p o in t of view the game i s a
P P
P
n
Stac k elb erg one here w P follo ws her since she only ev aluates pa y o b et w een t
P
P
n
and t Con tin uing bac ards through the ordering w e see that e ac hpla y er follo ws the
P
pla y er b efore her and that p la y will con v erge to the sp ecid equilibria

Lastly Theorem ho s ws that C G A O G A As w e discuss later w e susp ect
that this inequalit y is strict or f some games Ho w ev f i w e restrict ourselv es to LSBs then
Theorem is tigh t in the follo wing sense

Theorem L et b e an LSB such that O for al l A and

A nd C G A G A for al l A Then O G A G A for al l
A

Pro of Assume that there exists a ame g A suc h that O G A G A us
k k k k
there m ust exist some k suc htath A O A ut A O A where w e d rop
k
the sup erscript A here and b elo w for notational con v enience A
k
O A Cho ose some b A that do es app ear in O tu tino n No w construct
i i i i i i

the game in the follo wing manner F or all a A set G a G a
i
i
Cho ose a unction f r A A h that or f all a b
i i i i
r a argmin G a
i i i i
a
i i
and
r b argmax G b
i i i i
a
i i

No w for all j i dee G a when a r a G a ewhn a
i i i i i i i i
j j

and G a for all other cases
i i
j


By construction O A and th b O A nice G G
i i i i i
i


Since A O A this implies that b A b y onotonicit m y
i i


us





suc


Let

Th



er

kw

Th
the to er
whic h also implies that b G Ho w ev er w ewill no wsho w that b G pro
i i
the theorem
Construct a Stac k elb erg ordering where pla y er i is the leader i rst in the ordering
and let her b e a prioritized stage learner where action b is the top priorit y action Let the
i
other pla y ers b e in an y o rder and a ssume that they are o rdinary stage learners
No ww e use the same construction as n i the previous theorem to sho w that the outcome
of this game is the strategy proe b b since all follo w ers w ill pla y r a in resp onse
i i j i
to the leader action and the leader will then see a ame g i n whic h action b has t he highest
i
pa y o b y construction of the function r nd a the act f that it is not o v erwhelmed Note
that this pa y o ma y not b e strictly highest ho w ev er the action b will b e c hosen b ecause of
i
the priorit y ordering used
A Tigh ter Solution Concept

While the O solution concept is the tigh test LSB solution concept it is probably not he t
tigh test solution concept or f decen tralized earning l That is w e exp ect that there re a games

for whic h C G A O G A onsider t he follo wing game
L R
T
B

O of this game is the set of all actions It seems in tuitiv e although w eha v enofrmalo
pro of that an ypairof ecend tralized learners will con v erge to T L In Green w ald F riedman
and Shenk er sim ulations of the LAs R and Stage Learners w ere onsisten c t with this
in tuition Since our goal here is to describ e t he p o ssible outcomes of a game p la y ed b y decen
tralized learners it is imp ortan t to nd the tigh test solution concept to w hic h decen tralized
learners con v erge
W eno w d escrib e a class of solution concepts whic hissuggested b y the pro f o o f heorem T


ving W e do not kno w whether an y of these is the correct solution concept w ein tro d uce
these solution concepts t o form ulate a testable op en question whose resolution w ould greatly
impro v e our understanding of reasonable learners in distributed settings
Stac k elb e rg Solution Concepts
Consider some solution concept G A that s i deemed appropriate for sync hronous games
W eno w d ee a s olution concept based on G A that ore m ppropriate a for games with
arbitrary degrees of async hron y Giv en a ite set of pla y ers P ith P j dee
S
a on strict pla y rder o m where r r P nd
r g
T

for r r r r et P b e the set of all on strict pla y orders r
S S

r and r r
r g r r g
Giv en a p la yorerd dee t he asso ciated Stac k elb erg game where pla y ers mo v e according
to that order Eac hpla y er tak es the ctions a of the pla y ers earlier i n the order as a g iv en
and pla ys accordingly Th us a pla y er sees the b eha vior of the earlier pla y ers as ed
and sees the later pla y ers s a reacting to their mo v es Eac hpla y er elemen tal action in
this Stac k elb erg game is actually a r onse function n hcwi h n a action of the underlying
normal form game a c hosen as a function of the ctions a of the revious p in terms of
i
the ordering pla y ers That is for agen t i r a strategy n i the Stac k elb erg game is a


resp onse function A A et G b e the set of all suc h tacS k elb erg strategies
i i
r

for the ordering dn let G b e he t restriction of G to r F or et Out
i
b e the action c hosen b y pla y er i when pla y i s eed d b y or example if P

then Out hic h s i a ed strategy indep enden t o f the other la p y ers mo v es



Out Out and so o n Giv en a v ector f o strategies he



pa y o is G Out


F or an y r a and consider the game pla y ed b y the pla y ers in r
r r

Note that w edonot allo w these resp onse functions to b e mixed strategies







is
esp




jP
They see the strategies of the pla y ers in rsatxed a and see the strategies of
r

the pla y ers in r as a function of their j oin t action Th us to the pla y ers in r the game
has pa y o f o the form
r
G a a G a a
r r r r r r r r
Giv en an order and an y solution concept G A eno w ee d the set G A in
r
ductiv ely or all a A et
r r

r

G A a G a a
r r r r r r
G

r r

where the union is o v er all resp onse functions whose i mage Out iesin eth set
r r
G A Let G A b e t f o suc htath a G A a
r r r r r r r

F or a strategy set B dee the et s of reac hable a ctions b y

R B f a A j B s a Out g

W e prop ose that the set R G A represen ts a p ossible s olution concept or f a g ame

with ordering e can no w d ee the set of Stac k elb erg actions denoted b y S G A
of a game A

Deition The set of Stackelb er g actions S G A of a game A is given by


S G A R G A
P
A C onjecture and a Question
A p ossible conjecture is that t he correct solution concept for reasonable learners i n a dis

tributed setting is S where is the orrect solution concept for reasonable learners in
a ync hronous game If t his i s true then the o nly impact of async hron y is in separating







set he





timescales as in the pro of of Theorem while i f it s i alse f it implies that the ect of
async hron y is more subtle
First w e ote n some relationships b et w een the v arious solution oncepts c
Lemma F or any olution s c onc ept G A the fol lowing hold

i G A S G A



ii G A G A for al l G A S G A S G A al l G A

O
iii S G A O G A
Pro of i This follo ws imme diately since the order P thi P ws that the
Stac k elb erg v ersion of m ust on c tain

ii This follo ws imme diately rom f the eition d of S G A

O
iii The relation O G A S G A olol ws from art p i nd a w eno wsho wtath het

rev erse holds Assume a O G A from the deition f o O a is o v erwhelmed b y
i i
another action then i t m ust b e o v erwhelmed for an y subset o f the other pla y ers actions

O
Therefore a S G A ro ving the qualit e y
i
i
The Stac k elb erg solution concepts are a w a ytotak e a ync hronous solution concept
a nd generalize it to a setting with arbitrary async hron yh w e prop ose the Stac k elb erg
solution concepts as a p ossible c andidate for a decen tralized solution concept C G A
The ob vious question then is what sync hronous s olution concept is ppropriate a F oster
and V ohra sho w t hat the appropriate solution concept for calibrated learners s i he t
set of correlated e quilibria Let Corr G A represen t the supp rt o of the set of correlated

equilibria If reasonable learners rather than calibrated ones lso a ll out the space of
correlated equilibria then the follo wing conjecture ma y b e true

The standard form of calibrated learning lgorithms a are n ot resp onsiv e so the question is whether the
F oster and V ohra result holds f or the resp onsiv ev ersions of suc h l earning a lgorithms suc h algorithms w ere
sim ulated in Green w ald F riedman and Shenk er

us

if

sho

forCorr
Conjecture S G A G A
Corr
W e call S G A t k elb erg correlated set In essence this conjecture sa ys that
while w e do n w what the correct solution concept i s for ync hronous ames g w essu
Cor r
p ect that it con tains the set Corr G A and w e further conjecture that the set S G A

captures the ects of async hron y O n the other hand t he set D G A s i usually tak
b e a sup erset of the actual asymptotic pla yisycnn hronous games If that is indeed true
then it leads to t he follo wing question

D
Question Is C G A S G A

D
W e call S G A the Stac k elb erg undominated s et W eha v e oted n b fore e that the
p ossible disparit y in learning rates leads to tac S k elb ergik e phenomena If the only ect
of async hron y added to our deition of reasonabilit y is to pro duce these Stac k elb ergik e
phenomena then this conjecture will b e true a nd in fact w ew ould ha v e that C G A

S G A for some solution concept W elea v e this as an op en question whic h requires

further in v estigation
Example
T o get a more c oncrete sense of these solution concepts recall the game discussed at the
b eginning of this section
L R
T
B
Note that for the ab o v e g ame there are three orders f g f g f g and f g f g
F or the order f g e just ha v e the original ame g hic w h s i dominance solv able with

While our searc h or f a igh t t solution concept as h not y et succeeded w e a re not alone There are few
solution concepts whic hha v e b een pro v ed to b e tigh t f or a class of learners F or example v arious conditions
ha v e b een sho wn to hold for titious pla y b t s olution concept is kno wn The only on trivial
example w e kno w of i s the tigh tness of correlated equilibria for calibrated learners a V ohra

nd oster
tigh no ut


to en
kno ot
Stac he
actions T L while for f g f gf apl y er s a ction s i xed then p la y er s only
undominated strategy is T L and B L and after restricting to this pla y er s

only undominated strategy is s T h us the o utcome for this game i s T L whic histhe


Corr D
same outcome for the order f g f g b y symmetry h S G A S G A

T L whic h is the same as D G A whereas O is the n e tire game
Solv abilit y a hanism Design
Solv able Games
Cor r
Often the sets of pla yin eth v arious solution concepts suc has S G A r O G A
are quite large and in those cases one cannot predict with p recision the symptotic a pla yof
reasonable learners There are ho w ev er some games where the outcome is unam biguous
W e will call suc h g ames solv able

Deition A game A is Oolvable if j G O G A j Similarly a g ame

Corr D
A is SColvable if j G S G A j it is Dolvable S if j G S G A j t

is Colvable if j G Corr G A j and it is Dolvable f i j G D G A j
Note that solv abilit y do es n ot require that there is a single ev tual la p y only that there
Corr
is a single ev en tual outcome a y o v ector Because Corr G A S G A O G A

D
and Corr G A D G A S G A O G A an y olv O able game is b oth SC
solv able and SDolv able and an y S Dolv able game is Dolv able and Colv able ee
Lemma
Belo w is an example of a mewa ithv arying degrees of solv abilit yas x v aries
L C R
T
M
B x




en




Mec nd
us


When x this game is Oolv able nd a when x it is SDolv able nd Colv S able
but not Oolv able When x this g ame i s not ev en Dolv able r C olv able
T o illustrate a more general Oolv able game w e dee the lass c of gener alize d serial
games A follo wing Moulin and Shenk er to b e hose t that ha v e the follo
e prop erties for an y i j with i j
Ordered action domains A
i
Crossonotonicit y G a G a f y a a i j
i i j j j j
Serialit y G a G a for an y a a a i j
i j j i j j j j i
Unique b est reply or f eac h a there exists an elemen t BR a cu hthat
i i i
x BR a G BR a x
i i i i i i i i i i
Serialit y f o b est reply BR a BR a or an y a BR a
i i i j ij j i i
Theorem Gener alize d serial ames g ar e Oolvable
Pro of Since the O op erator is monotonic he t iteration pro cess m ust con v o t a
non trivial ed p oin t Let this ed p oin tof O b e denoted b y I I
n i
denoting the minim al elemen tof I denoting the maximal lemen e tof I and and
i i i
denoting t he v ectors f o these extremal elemen ts Let MAX x max G x
i i a I i i i
i i
and MI N x G x F y a I and for an y x I G x
i i a I i i i i i i i i
i i
G x G x o MAX x G x nd MI N x G x Assume
i i i i i i i i i i i i i i i
that I is not a singleton so the set f i j g is nonempt y e can dee i as the elemen t
i i
in this set with the smallest I n particular G G
i j j j i i i i i
so MI N MAX If there exists some x I suc hthat G x
i i i i i i i i i i i





an or min

and
with
erge




an or

wing then MAX x IN and so o v erwhelms x If there exists some x I h
i i i i i i i i
that G x G then MI N x AX a nd so x o v erwhelms
i i i i i i i i i i i
h us w em ust ha v e G x G G dn G x G for
i i i i i i i i i i i i
all x I Consequen tly BR and BR his T on c tradicts he t
i i i i i i i i i
serialit y of the function BR
i
In Section w e ill w encoun ter examples of suc h generalized serial games Another
solv able game arises when rationing a xed a moun t C of some go o d when all utilities are
single eak ed ee for e xample Sprumon tLe t p b e the l o ation c f o agen t i p e ak
i
P
The uniform ame g can b e deed as follo ws Eac h a gen t announces a equest r a f a C
i i i
then the allo cations q are g iv en b y q ni a w here is the nique u v alue suc h that
i i i
P P
q C f a C then the allo cations q g en b y q a where is the unique
i i i i i i i
P
v alue suc h that q C In t he case where a p the resulting llo a ation c reduces to he t
i i i
i
uniform mec hanism
Theorem The uniform game is SDolvable nd SColvable but not Oolvable

Pro of First w e pro v e hat t the uniform ame g s i olv D able Let D I I with
P
I l enote d the result of iterated elimination of dominated actions Note that l
i i i i
p u since eac h agen t g ets the highest pa y o b y announcing p Assume k is suc h that
i i i
C
l l uf l then eac hacotinv ector n i D results i n he t same allo c ation
i k i i k
P
C C
with eac haeng t getting q Assume to the con trary tha l f l then p
i k k k k
P P
dominates l he allo cations are monotonic n i r and a re strictly monotonic in the vicinit y
k k
C
of l If l l p then p dominates u he allo cations are monotonic in r and
k k k k k k k
P
are strictly monotonic in the vicinit yof u Therefore b y con tradiction there can b e no
k
suc h k and so all ets s I are erely m the singleton p
i i
Note that this pro f o sho wttha if w e held some of the a ctions ed ot necessarily at their

p eak then he t D set of the game among the remaining pla y ers con v erges to the singleton

and




iv are






suc
ith eac h pla y er p eak p the only remaining action Since on eac h subgame the set D
i
con v erges to the same singleton the construction u sed i n the Stac k elb erg undominated set
also reduces to that singleton
Next w e sho w that t he uniform ame g i s ot n Oolv able Denote b y l r r the set of
i i
pla y er i allo cations ot pa y o resulting from a nnouncing ction a r and letting the other
i
actions v ary from to tho l and u are monotonically increasing in r dn l
i
C C
u l u Ceucase u l l r r l r r for
i i
i i
P P

all r r he T allo cation in terv als lwa a ys o v erlap and so the pa y o sets for an yt w o actions
i
i

o v erlap so there re a no o v erwhelme d a ctions O is the en tire strategy space for this game

Our nal example is that of or der e d externality games riedman and These
are nonatomic games where agen ts lab led e b y a parameter ecide d to participate etting
a or not etting a f i they p articipate their pa y o dep nds e o nly on the
size of the participating p opulation nd if they don participate their pa y o is zero
The pa y o d ecrease w ith the lev el of participation Th us or f a g iv en v ector of a ctions he t
pa y o are of the ormf U a a h i s onincreasing n in nd U a It

is sho wn in F riedman that this game is Oolv able if and only if it con v erges under
b esteply dynamics
F or example consider the congestion game discussed in the In tro duction p la y ed b ya
large n um b e r of pla y ers Eac h pla y er decides whether to send a p ac k et of information Let
a b e the total n b er measure of pla y ers ho w decide to send a pac k et he T dela yto
a pla y er is D hicw h is nondecreasing n i where is the capacit y of he t link F or an

M FIF O queue D ro otherwise h T us the pa y o to a

pla y er who sends a p ac k v c D where v is the p ersonal v alue of the ac p k et and

c is the dela y cost whic h i s a ssumed to b e nondecreasing The pa y o is f i the pla y er

is et
and

um
whic


do es not send a pac k et F or man yt ypical queuing pro cesses this game con v erges under
b est reply dynamics if the capacit yofthe qeuue is suien tly l arge ee F riedman nda
Landsb erg for details Th us in this case the game s i O olv able hese T results also
apply to similar congestion games with m ultiple l inks and pla y ers t a iren d t lo ations c with
b ecoming a v ector dep ending n o the t yp e nda lo cation of pla y er
Implications for Mec hanism Design on the In ternet
So far in our discussions of learning a nd con v ergence w eha v e mplici i tly ssumed a hatt het
game is exogenously giv en oH w ev er in the In ternet and i n other distributed con texts one
w ould w an t to d esign the game in order to shap e the nature of the resulting pla y and thereb y
ac hiev e certain so cial goals This is the mec hanism design or i mplem en tation paradigm
T o x notation consider n a a llo cation roblem p with P agen ts Let U denote the omain d of
utilit y functions ssumed for the sak e f o simplici t y to b e he t same for eac hagne t and let
P
O denote the set of p ossible outcomes A so cial c hoice unction f is a mapping F U

A mec hanism is a set of action spaces A and a mapping M A sso A ciated with
i
P
eac h mec hanism A and a tilit u y p roe U is a stable game G A deed b y
G a U M a W e denote b y C U A the solution concept for a mec hanism M a
i i M
particular utilit y p roe U mce hanism A implements a so c ial c hoice f unction F

if M a F U for all a U
M
W eno wakshwci h o s cial c hoice f unctions can b e implem en ted in a distributed setting
T o b e more recise p for whic h F is there a m ec hanism A suc h t hat M a F U
for all a U Since w e d o n ot kno w the exact nature of C e annot c answ er this
M
question deitiv ely ho w ev w edoha v e some partial results Before presen ting these

Note that the set A is not necessarily the atural action space on t he net w ork but is more c ommonly
denoted the message space F or example in the congestion game A could include a priorit y request along
with a transmission rate

This is sometimes called str ong implem tation in the literature

en
er





at



results w e need a few deitions
P
Deition Consider any p air U V and dee E f i j U V g F is w eakly
i i
coalitionally strategyro of CSP if when E is nonempty ther e always exists some j
E such that U F V U F U F is strictly coalitionally strategyro of CSP i f
j j
F U F V ther e exists j E such that U F V F U F is strictly strategy
j j
pro of SP if F U F V U F U F V F Maskin onotonic m
i i i i i i
M if F V F U whenever U x U F U V x V F U for al l al lo c ations
i i i i
x and al l i

W CSP merely requires that not ll a mem b ers of the eviating d coalition can strictly gain b y
deviating SCSP requires that there i s n o other outcome that s i e quiv alen t r o b etter in the
ey es of the deviating coalition to the truthful o utcome SSP requires that for n a individual
deviator no other outcome is equiv alen t or b etter Th us or f an SSP s o cial c hoice function F
the truth is a trict s Nash equilibrium of F hough p erhaps not the nly o ash N equilibrium
while for an SCSP so cial c hoice function F t he truth i s a strict strong equilibrium hough
again p erhaps not the only one Note that the d eition of SSP implies n on b ossiness
n fact c oalitional non b o ssiness when applied to a priv tea goods con text Maskin
pro v ed that if F is Nash impleme n table n the sense w e m ean ere h then F is Maskin
monotonic
W edonot y et ha v e a tigh t eition d of the solution concept C dn so bleo ww e presen t
results for implem en tati on with diren t p ossible solution concepts If a o s cial c hoice function
is implem e n table with a solution concept w esa yit is mplem en table W ecna no w state

our st theorem that holds if the solution concept i s ndeed i the upp er b und o O
Theorem If a so cial choic e unction f F is Omplementable then it must b e S CSP

This is also referred to as Group Strategyro of see M ulleratterth w aite




is

Pro of Consider some mec hanism M A that i mplem en ts F Assume to the con
that F is not SCSP hen T there exists t w o tilit u y proes U V suc hthta F U F V
but U F U U F V for all i hthat U V Let E f i j U V g Since M
i i i i i i
implem en F there m ust b e t w o action v ectors u and v in A suc h t hat M u F U
and M v F V a h a re in the solution c oncepts at the resp ectiv e tilit u y proes
U and V i u U nd v V Since F U F V w eha v e v U nd
M M M
u V A t he t utilit y roe p U consider the S tac k elb erg ordering with elemen ts in E
M
leading E P E The allo cations t hat result from this Stac k elb erg game m ust b e het
allo cation F U but the allo cation F V is iren d tfrmo F U et giv es all the elemen ts in

O
E at least as go o d outcomes Recall that S O o w e c an apply the solution concept

O to the pla y ers in E ssuming a that the agen ts in P E are resp onding to these pla ys

The solution concept O applied to the game pla y ed b y the agen E con tains the p oin t
u Therefore it m ust also con tain v since the pa y o for v P areto dominate the pa y o
E E E
for u nd therefore none of the strategies in v o v erwhelmed Th us t he p oin t v m ust
E E
b e included in the solution set C U whic hcon tradicts our earlier result
M

Note that the c oalitional asp ects of the O solution concepts and ence h the coalitional
requiremen ts of SCSP id d not a rise b ecause of some explicit notion of collusion among agen ts
in our distributed setting I t arose b ecause of the a sync hron y where there could b e m ultiple
agen ts with long timescales ev en though there w as no explicit collusion
Our next esult r is a sligh t extension of the o riginal bserv o ation due t o d spremon tand
G erard aret on Stac k elb e rgolv able games
Theorem If a so cial choic e unction f F is C mplementable then F must b e SP S
Pro of Assume to the con trary that here t exists a so ial c c hoice unction f F i s C
implem en table with A as the impleme n ting mec hanism but f or whic h there exists


that
are
in ts




eac nd
ts
suc
and
trary U and V suc h that F U F V ut U F U U F V Without loss of
i i i i i i i
generalit y ssume a i and consider a s trict Stac k elb erg ordering with f g
All p oin ts in the olution s concept C U are mapp ed b y M n F U similarly all p ino ts in
M
C V are mapp ed b y M n to F V Let u b e some Stac k elb erg equilibrium with
M
order in C U and let v b e some Stac k elb erg equilibrium with rder o C V
M M
Then the pa y o for agen tt v is at least as great s a the pa y o at u nd a w e can c ho ose
agen t s learning algorithm to fa v v o v er u uc has in the rioritized stage learners

Since v is the Stac k elb erg esp r onse to v and u is the tac S k elb rg e resp onse to u b y

construction v m ust also b e in the set C U as it is a p ossible o utcome of the learning
M
pro cess This con tradicts our original assumption
The follo wing is a standard result ab out S SP and M askin Monotonicit yforcno v enience
w e nclude i the trivial pro f o
Theorem If a so cial choic e unction f F is SSP then F must b e askin M Monotonic
Pro of Consider an SSP so cial c hoice function F and ome s V suc htath U x U F U
i i i
V x V F U for all allo cations x Assume to the con trary hat F U F V
i i i i
Because F is SSP em ust ha v e V F V F U and U F V F U
i i i i i i i i
This con tradicts our assumption ab out V
i
This leads imme diatel y to the follo wing Corollary
Corollary If a so cial choic e function F is C mplementable then F must b e Maskin Mono
tonic
Note that in certain restricted domains Maskin Monotonicit y i mplies W CSP see Shenk er
and Barb era a nd Jac kson ee s Dasgupta Hammond and askin M for a
deition of a monotonically closed domain



or
in

to

Theorem If a so cial choic e unction f F is Maskin Monotonic and he t domain i s mono
tonic al ly close d then F is WCSP
This leads to the follo wing orollary C
Corollary If F C mplementable and the domain is monotonic al ly close d then F is
WCSP
Note that man y o f the most notable strategypro of mec hanisms d o not ha v ean y degree of
resistance to coalitional manipulations F or instance the Clark ero v es lark e Gro v es
mec hanism s re a not in general w eakly coalitionally strategypro of
Examples
W eno w d iscuss a few SDmplemen table nd a Ompleme n table so cial c hoice functions and
their implem e n ting mec hanisms

D
The st example is the uniform so cial c hoice function i ts S mpleme n tabili t yflo
lo ws trivially from heorem T Since the uniform mec hanism relies only n o the p eaks
of the preferences there is no real distinction b et w een the uniform g ame and the uniform
so cial c hoice function Th us Theorem implies that the uniform so cial c hoice function

D
is SDmplem en table b ecause the direct mec hanism is itself S olv able While w eha v e
sho wn that the direct mec hanism is not tself i Oolv able t i remains o n n a o p en question
as to whether the uniform so cial c hoice function is Omplem en table through ome s other
mec hanism
The second example c omes from the congestion ame g with strictly onotonic m ncreasing
in r decreasing in c and conca v e utilities U r a nd a strictly con v ex constrain t function
i i i i i
f The serial mec hanism ee Moulin a nd Shenk er for a description can b e describ ed



is
as follo ws When the agen ts are l ab eled so that r r a i the congestions c are
i i i
recursiv ely determined b y the equation
k
X X
c n k c f min r
i k i k
i i
W eha v e the follo wing theorem
Theorem The serial me chanism with strictly monotonic and c ave utilities and a
strictly c onvex c onstr aint function f i s a gener alize d erial s game
Pro of Consider some i and some j i The pa y o G r U r r is monotonic i n r
i i i i j
since c r is monotonic in r and U is monotonic i n c oreo M v er from the construction
i j i i
it is clear that c r c r r ro lal r r r s o the same olds h for the pa y o G r
i i j j j j i i
Consider the unction f g x G r U x c r Since U is con v ex and the
i i i i i i
opp ortunit y set x c r is strictly conca v e there is a unique p oin t f o tangency nd
i i
so the game has unique b st e replies BR r Lastly consider ome s agen t j suc h that
i i
r BR r V arying r c hanges the opp ortunit y set x c r but he t tangen tat
j i i j i i
x BR r remains nc u hanged Therefore the b est reply remains nc u hanged
i i
Therefore the serial mec hanism is Oolv able in this setting Dee the serial so cial
c hoice function as the allo cation resulting from the nique Nash equilibrium of this game
This so cial c hoice function is ob viously Omplem en table
Corollary The s erial m e chanism with strictly monotonic and c onc ave utilities and a
strictly c onvex c onstr aint function f i s O olvable
Discussion
One migh task wh y if one can only implem en t strategypro of so cial c hoice functions do es one
b other with the mec hanism design paradigm at all Wh ynto wal a ys use the direct metho d

The serial mec hanism is a formalization of the fair queuing p ac k et sc heduling algorithm in routers
emers et al v arian ts of fair queuing re a curren tly implemen ted on ome s In ternet routers






onc

ll forasking for utilities to b e rev ealed and then pplying a F i nstead of using an indirect mec ha
nism M n I the former case y ou can utilize the o f c al p oin t nature of truthful ev r elation and
can implem en t ll a strategypro of so cial c hoice unctions f whereas in the indirect m etho d one
can only implem en t a narro w er class of so cial c hoice functions SP and Maskin Monotonic
While in man y cases it is ob viously preferable to use irect d metho s d there are o ccasions
where indirect mec hanisms are referable p In some con texts the tilit u y unctions f are v ery
complex and rev ealing them in v olv es signian t comm unic ation o v erhead F or instance the
p erformance of a video application is not a simple function of sa yhe a v erage and v ariance
of the pac k et dela ys instead the p erformance dep ends n o the exact string of p ac k et dela ys
In suc h cases the abilit y o t use indirect mec hanism s with their substan tially less complex
signaling is a signian t adv an tage
In addition and p erhaps m ore fundamen tally n nma ynet w ork situations the agen ts do
not kno w heir t exact utilit y f unctions Agen ts can compare t w o diren tlev els o f service and
decide with w hic h hey t are h appier but hey t cannot abstractly represen t t hese rade t
without activ ely exp e riencing them F or instance the o ptimal trade b et w een bandwidth
and dela y in a video stream f or an agen t will dep end on m an y details of the particular
instance s uc h as the particular scene b eing transmitted the exact ela d y istribution d and
the clarit y of sp eec dhn anqu tifying this relationship b eforehand is quite impractical T o
use an analogy p s cifying e the exact utilit y function of suc h net w ork pplications a is m uc h
lik e trying to sp ecify he t optimal con trast setting on a television set Since the optimal
con trast setting dep ends on man y d etails suc h s a the ligh ting in the ro m o nd a the darkness
of the scene most users could not accurately articulate the underlying utilit y function most
of us merely turn the con trast knob un w e otice n hat t a n y deviation from that setting
pro duces w orse results Similarly n anm ynte w orking situations users an c compare their