Learning and Implemen tation on the n I ternet

cJEri riedman

Departmen t of conomics E Rutgers niv U ersit y

New Brunswic k NJ

Scott S henk er

Xero xP AR C

Co y ote Hill Road P alo Alto C A

August

Abstract

W e ddress a the problem of learning and implemen tation on the In ternet When

agen ts pla y r ep eated games in distributed en vironmen ts lik e t he In ternet hey t ha v e

v ery limited a priori information ab out he t other pla y ers and the pa y o matrix and

the pla y can b e highly async hronous Consequen tly standard solution concepts lik e

Nash equilibria or ev en the serially undominated set do not a pply in suc ha tse

ting T o construct more appropriate solution concepts w e rst describ e the essen tial

prop erties that constitute reasonable l earning b ha e vior in distributed en vironmen ts

W e t hen study the con v ergence b e ha vior of suc h a lgorithms these results lead us to

prop ose rather non traditional solutions concepts for this con text Finally e discuss

implemen tation of so cial c hoice unctions f ith w hese t solution oncepts c

W ew ould lik e o t thank Roger Klein and Herv e Moulin for useful iscussions d and seminar participan ts

at Princeton Rutgers Ston ybro ok and U BC or f helpful c ommen ts This researc hw as supp orted b y NSF

Gran t ANI Email friedmanconutgersdu shenk erarcero xom

In tro duction

The In ternet is rapidly b ecoming a cen terpiece of the global telecomm cations infrastruc

ture and someda yit ma yw ell pro vide all of ur o telecomm unication needs In this pap er w e

consider the In ternet as an exercise in resource sharing where t he sharing o c curs on sev eral

diren t lev els Most imp ortan tly n ternet users share access o t the underlying transmission

facilities themselv es With the b estrt nature of the In ternet where resources are not

reserv ed and all ac p k ets are serviced on a rstomersterv e asis b one user usage can

act the qualit y of service seen b y nother a user In addition the In ternet pro vides a seam

less w a y of a ccessing remote services suc h s a databases or w eb serv ers whic h are themselv es

examples of shared resources where usage can i nduce congestion F or example dela ys on the

W orldide eb ha v e i ncreased signian tly n i recen ty ears i t is n o w sometimes w aggishly

referred to as the W orldide ait nd a service ro p viders suc h as A merica Online ha v e

faced la wsuits o v er their access d ela ys Both of these are c ases where o v eruse as h resulted

in deteriorating service qualit y for all sers u

In eac h c ase aggressiv e applications r users get ore m than a n qual e share of these

shared facilities a nd so the In ternet is lik ely to b e a place where nonco op erativ e game theory

is particularly relev an t F or instance w eb bro wsers that op en more TCP connections receiv e

more bandwidth t the exp ense of l ess opp rtunistic o u sers f o the In ternet Similarly esrs

that mo dify their TCP implem en tation to b e less resp onsiv e hen w ongestion c is detected can

obtain m uc h larger s hares of the bandwidth emers e t l a

This ect do es not o ccur on telephone n et w orks b ecause the underlying transmission facilities are n ot

shared on a pac k et yac k et basis bandwidth is reserv ed for eac h call and so the qualit y o f ervice s p erceiv ed

b y a particular user is indep e nden t of the presence of other callers

TCP stands for T ransmission on C trol Proto col a nd this is the proto col that go v t

usage in data transfers In p articular TCP is designed so that o ws slo wdo wn their rate of transmission

when they detect congestion

In the Netscap e Na vigator bro wser the m axim n um b er of CP T connections c an b e set b y the user so

that this form of reediness is under user con trol

al

bandwidth he erns

uniF or the In ternet arc hitecture to b e viable in the ongerm l it m ust not b e vulnerable to

suc h greedy u sers and th us it m ust b e d esigned ith w i ncen es in mind et N w ork arc hitects

are increasingly addressing the ncen i tiv e prop erties of their designs F or example McCanne

et al discuss t he incen tiv e issues n i pac k et dropping algorithms and ts i implications

for la y ered m ulticast ee Ba ja j et a l for a con uation of this l ine of in v estigation

Nagle w as the rst to explore the ncen i tiv e issues i nheren tin acp k heduling in

net w ork routers and this h as b een the fo cus of m uc h subsequen t researc h see for example

Sanders Demers et al Shenk er and Korilis and Lazar Korilis

et al Resnic k e t al ha v e p rop osed mark etased solutions to the problem of

net w ork a ddress allo cation and r oute adv ertisemen ts Net w orks with m ultiple qualities of

service raise in teresting incen tiv e ssues i and this has promoted m uc h f o the recen tin terest

in pricing and accoun ting f or computer net w orks few examples include Co cc hi et al

Clark et al MacKieason a nd V arian and urph M y and Murph y

Mendelson and Whang

F or similar reasons man y t heorists a h v e b egun applying game theory to the In ternet ee

for example F erguson F erguson et al Gupta et al Hsiao and Lazar

Korilis et al a nd Most o f these analyses assume that the appropriate solution

concept the set f o asymptotic pla ys in a rep eated game i s con tained within t he set of Nash

equilibria T o the con trary n i this ap p er w e a rgue that Nash equilibria re a not necessarily

ac hiev ed as a result of learning in the I n ternet setting and that in fact distributed settings

lik e the In ternet require a dramatically d iren t solution oncept c

Because o f t he In ternet i ncreasing role in the telecomm unications infrastructure it is

imp ortan ttath w eac hiev e o s cially desirable allo cations of service i n the In ternet This will

require understanding the nature of l earning a nd con v ergence in the In ternet and o ther

distributed settings so that w e can iden tify the appropriate solution concept Learning and

sc et

tin

tivcon v ergence and its implications for mec hanism design in the n I ternet is the sub ject of this

pap er

F or a concrete e xample of the i ncen e i ssues with whic hw e are concerned consider

the scenario hic h i s more f ully describ ed i n henk S er where sev eral In ternet users

are sim ultaneously sending d ata across a p articular ink l The d ela y xp e erienced b ythe

pac k ets is a function o f the load t he bandwidth consumed b yv arious u sers o n the link

Eac h u ser utilit y f unction U dep ends on her a v erage bandwidth ransmission rate r

i i

and on the a v erage queuing or congestion exp erienced b y er h pac k ets c Users con trol

i

their bandwidth usage r and the et n w ork etermines d the v ector of a v erage queuing c as

i

a function of the set o f b andwidths e r c C r here w the function C rects the

particular pac k et sc heduling algorithm used b y the net w ork nd m ust ob e y the sum rule

P P

that C r f r f or some constrain t unction f f b cause e the o v erall a v erage queue

i i

i i

length is indep enden t f o t he order in whic hpca k ets are serv ed This ongestion game

where eac h pla y er usage c an imp ose dela y on o ther pla y ers can b e m o eled d as a normal

form game with the b andwidths r b eing he t actions and the p a y o g iv en b y U r r

i i i i

The equilibria or more generally the solution concept of this congestion game will determine

the allo cation of n et w ork bandwidth among these users Since n et w ork designers can c ho ose

the sc heduling algorithm C in o rder to attain some so cially desirable outcome the solution

concept of this congestion ame g has s ignian t ractical p ramiations

This congestion game also arises in man y o ther settings F or instance r could b e the

i

usage lev el of a shared database uc h as a video r o text l ibrary r o w eb serv er with c

i

b eing b e the pro cessing dela y r r could b e the a v erage time connected to an online service

i

with c b eing the exp ected time required to connect ee F riedman f or a discussion

i

of this and other ames g arising on the In ternet These examples suggest that there are

man y g ameik e situations arising in distribute d systems l ik eteh In ternet W e call them

tivdistributed systems b ecause he t users are geographically d isp rsed e and are accessing he t

resource through the net w ork The games in these distributed systems share he t feature

that the agen teract only through their join t u se of a s hared resource for instance he t

only form of in teraction b et w een users in the congestion game is that their pac k ets happ en

to collide somewhere i nside the n et w ork Th us it is quite l ik ely that the agen ts ha v elttleoi r

no information ab out eac h o ther Moreo v er he t users p robably kno wv ery little ab out the

detailed nature e apacit c y latency etc of the esource r itself to use the congestion game

again as a sp eci example users ha v e ittle l k no wledge of the underlying net w ork top ology

and c haracteristics so they can alw a ys distinguish b et w een d ela ys due o t the c haracteristics

of the underlying net w ork e sp edfigh e t elad ys in transmission links and dela ys due

to the b eha vior of other u sers queuing dela ys in routers

In this pap er w e sk a t w o q uestions What is the appropriate solution concept for the

congestion game and other games that arise in distributed settings Giv en this solution

concept can w e design sc heduling r o sharing algorithms to ac hiev e t he allo cations w e desire

If the congestion game w ere a canonical onehot game with common kno wledge then one

could in v e tandard s solution concepts suc h a s Nash equilibria or the rationalizable set

Ho w ev er the congestion game is neither a onehot game nor o ne with common kno wledge

Man y d ata transmissions p ersist for a signian t p erio d of time and the sers u are able to

adjust their bandwidth at an yponi t hile w transmitting Th us t he congestion game should

b e mo deled as a r ep eated game rather than a nehot o game Moreo v er b ecause users are

geographically distributed and h a v e n o irect d con tact with r o kno wledge of eac h other

solution concepts based on common kno wledge re a not pplicable a here W e instead m ust

lo ok at the pro cess of learning through rep eated pla y raditional approac hes to learning

through rep eated pla y chi hw e discuss more fully i n Section t ypically assume the pla y ers

use their exp erience to build a mo del of the lik ely a ctions f o ther o pla y ers and then lap y some

ok

in tsform of b est resp onse ither exact b est resp onse s a in the original ctitious pla y approac h

obinson or a sto c hastic b est r esp nse o a s i n F uden b erg and evine L Ba y esian

learning as in Kalai nd a Lehrer i s a particular xample e of this pproac a h whereb y

agen ts b egin the game with priors ab out he t exp ected pla y o f i ndividuals and hen t up date

those b eliefs a s they observ e the pla y Man y f o he t analyses o f suc h learning algorithms

suggest that they result in e ither N ash or correlated equilibria ee e Kalai and Lehrer

F uden b erg and L evine F oster and V ohra

These results while imp o rtan t to understanding t he rational foundations f o equilibria

do not apply in distributed settings due t o the factors w e iscussed d ab o v e In terms of the

underlying game users kno w their o wn action space and can observ e after some dela y the

pa y o resulting from a articular p ction a at a p articular t ime but do not kno w their o wn

pa y o function nor an y o ther pla y er pa y o function a nd cannot observ e t he actions of

other pla y ers Giv en this v ery limited information users ha v e no sense of what other pla y ers

are doing nor an y idea of w hat w ould constitute a b est reply f i t hey did and so sers u cannot

adopt a titious pla y approac h Instead w e a nalyze the case in whic h sers u r the soft w are

on the mac hines they are using mplo e y simple earning l algorithms that xp e erimen twith

v arious actions and then fo cus their pla y n o the actions pro viding the highest pa y o This

is similar in spirit to the stim ulusesp onse approac hes studied i n R oth and Erev

Borgers and Sarin and E rev and oth R Often the w ork n o suc h learning

approac hes concen trates on matc hing the results of the earning l algorithm t o e xp erimen

data Our fo cus here is quite diren t and has three distinct comp onen

First w ew an t to u nderstand the nature of learning in settings ik l e the In ternet where

pla y ers are geographically distributed and ha v e ittle l or no information a b out eac h other nd a

the underlying game In Section w e iscuss d some of the relev an t considerations arising in

the In ternet and other distributed settings W e then presen t riteria c that ll a easonable

ts

tallearning algorithms in this setting m ust satisfy The k ey comp onen ts are optimization

monotonicit y and resp onsiv eness

Second in Section w e address the asymptotic result f o p la y mong a a set of reasonable

learners In a previous pap er F riedman a nd Shenk er w e a nalyzed one particular

family of learning algorithms with these prop erties ere H w e attempt to iden tify the class

of all learning algorithms that can b e onsidered c reasonable and then s tudy the union of

asymptotic pla ys for all p opulations of reasonable learners In other w ords if all w ekno w

is that agen ts are reasonable what predictions can w emka e b a o ut their asymptotic pla y

W e d that the asymptotic pla yawl a ys resides in the serially uno v erwhelmed s et eed

in Section W eare otan letb osho wandinfactdonot believ e that reasonable learn

ing algorithms actually visit ith signian tly arge l probabilit y all p oin ts in the serially

uno v erwhelmed s et W e discuss this i n etail d n i S ection

Third in Section w e discuss the i mplications f o these con v ergence results for mec hanism

design and explore whic h so cial c hoice functions can b e impleme n ted in these istributed d

setting W e nd that so cial c hoice unctions f i mplem en table in this ecen d tralized setting

m ust b e strictly strategypro of n y d eviation that l eads to a diren t o utcome results in

lo w er utilit y f or the eviator d and Maskin onotonic M Moreo v er an ysocialc hoice functions

implem en table with the serially uno v erwhelmed solution concept hic h o v e

is a sup erset of the t rue s olution concept m ust b e strictly c oalitionally strategypro of ee

Section for a d eition W e then presen t e xamples of some implem table so cial c hoice

functions

Learning in Distributed Systems

In this section w e st i nformally discuss the ature n of learning algorithms appropriate for

the In ternet W e then formalize these notions f o what ak m es a r e asonable learning algorithm

en

ab stated as

in to precise deitions and pro vide some examples It s i mp i ortan t to mphasize e that w e are

not claiming that these algorithms are j ustid b y b eing truly rational or pro v ably optimal in

an y precise sense W e are merely trying to mo del the kinds of adaptiv e learning pro cedures

that either are curren tly or could p o ten tially b e used on the n I ternet

Learning in the In ternet

The gameheoretic prop erties of the In ternet are common to an m y other distributed settings

but for concreteness in the paragraphs b elo ww e fo cus solely on the I n ternet con text There

are four main asp ects of the n I ternet that are particularly relev t to our game theoretic

form ulation

First as discussed ab o v e pla y t ypically ha v e xtremely e limited information They

do not kno w who the o ther pla y ers are or ev en ho wman y nd a they do not o bserv e o ther

pla y ers actions In addition b ecause they are ot n a w are of the nderlying u n et w ork top ology

or c haracteristics pla y ers t ypically d on kno w the pa y o functions that s i they d on kno w

ho w their pa y o dep end n o t he actions of other pla y ers or ev o wn actions The

only information a v ailable to users are their o wn actions and the resulting p a y o nd hey t

ma y only learn the pa y o after some dela y This lac k of nformation i is actually a cen tral

design principle of the I n ternet The arc hitectural notion of layering ee T anen baum

for a textb o ok discussion of l a y ering of net w ork proto cols is in tended to allo w computers

to utilize the et n w ork without kno wledge o f the underlying ph ysical infrastructure and to

allo w applications uc h s a email or e transfer o t o p erate without detailed no k wledge of

the curren tlev el of net w ork congestion

Second pla y ers do not carry out n a y ophisticated s ptimization o ro p cedures Often the

actual decisions ab out resource utilization are ade m b y computer programs either the ap

plication or lo w er lev el proto cols lik e CP T without irect d h uman in terv tion Th us the

en

their on en

ers

anearning algorithm m ust b e em b edded in soft w are and that imits l the xibilit y and com

plexit y of the optimization pro cedure Moreo v er suc h l earning algorithms are n i tended to

b e ortable i usable on a n y mac hine lo cated an ywhere a nd so are expressly designed

to not rely on the details of the sp eci on c text In particular a Ba y esian approac h based

on up dating priors is n ot realistic h ere since the la y ering o f et n w ork proto cols ensures that

an y priors w ould b e quite llnformed i v E en in cases where the resource decisions are made

directly b yteh h uman user it seems unlik ely that the user will b e making complex opti

mization decisions giv en the v ery meager information a v ailable T ypically the user actions

in suc h cases are limited to adjusting parameter settings f or underlying programs uc has

adjusting the n um b er o f TCP connections a b ro wser op ens rather than actually xercising e

detailed con trol

Third there is no s ync hronization and o n atural n unit of time on the In ternet Pla y ers

do not all up date their actions a t the same time s they do in the standard rep eated game

literature T o the con trary the rate at whic h the up dating o ccurs can v ary b y sev eral

orders of magnitude Note that there is a d ela ybet w een when agen ts up date their action

and the time they notice a c hange in their p a y o for the congestion game escrib d ed in the

In tro duction this dela yis t ypically on the o rder of a roundtrip time the time it tak es a

pac k et to get to its destination and the ac kno wledgmen t to mak e the return trip These

roundtrip dela ys v ary from s of microseconds if the destination is on the same thernet E

he dela y is due to op erating s ystem o v erhead to s of millise conds i f the destination is

across the coun try he dela y is the sp eedfigh t d ela y o f p ropagation Standard con trol

theoretic results suggest hat t con trol l o ops should not p u d ate faster than the roundtrip time

F or example TCP do es not kno w he t con t of he t data it is con v eying or n do es it kno wan ything

ab out the net w ork o v er whic h the data i s o wing It erely m w aits for signs of congestion and resp onds

appropriately

In TCP t he receipt o f ac e h data pac k et is c wledged b yan A CK pac k et sen t from receiv er to

sender

kno

tenSince up dating rates a re tied to roundtrip times the v ariation in up date rates will b e quite

large Moreo v er some learning agen ts will b e p eople ot n programs nd a their p dating

rates are most lik ely on the rder o of at least s econds if not signian tly slo w Th us the

standard mo del of a rep eated game in whic hpal y ers are sync hronized can b e isleading m in

the In ternet con text

F ourth and ally i t is neither the ong l term n or the short term but the m edium term

s deed b y oth R and Erev that is relev an t Pla y ers t ypically use the system for

man y time units measured in their appropriate timescale ho w ev er the nature of their pa y o

function c hanges fairly often as new pla y ers en ter the system or as the system conuration

c hanges often due o t equipmen t failures for whic h t w ork automatically comp ensates

The imp ortan t p oin t h ere is that lap y ers do not k no w d irectly that the pa y o function has

c hanged they can only observ e he t pa y o they get nd a so can distinguish b et w een when

another pla y er c hanges her action and hen w the en vironmen t itself c hanges This requires

the learning a lgorithm to alw a ys b e r onsive ich hw e ee d more formally in Section

These four prop erties c haracterize what w e call d istributed systems The atural n ques

tion then is what forms o f learning a lgorithms are ppropriate a in distributed systems W e

claim that in suc h ettings s there are three primary requiremen ts that one w ould exp ect of

an y reasonable learning algorithm One requiremen t i s that against a xed p a y o function

when there are no o ther pla y ers just nature the pla y er learns to c a hiev e t he optimal

pa y o This eems s to b e the most asic b requiremen t f o a n o ptimizing learning algorithm

and it w ould b e hard to justify an y lgorithm a that id d not atisfy s this criterion Another

reasonable requiremen t s i hat t the learning lgorithm a b e monotonic i n the pa y o that is if

w e mo dify the a p y o function b y aising r the a p y o for a c ertain action the probabilit yof

Laguno and Matsui ha v e also made a similar p oin t b a out the r ole of async hron y n i the set of

sequen tial equilibria for rep eated games

esp

net he

erthe agen t pla ying that action should not decrease This is similar to the a w f o the Ect

whic his w ell kno w in t he psyc hology literature nd a is discussed b y oth R a nd Erev as

a fundamen tal prop ert y in exp erimen tal learning Finally an y o f the learning algorithms in

the literature decrease the rate t a whic h they resp ond with time in settings lik ethe In ternet

where the pa y o function c hanges frequen tly a s lap y ers come and go agen ts m ust alw a ys

b e prepared to resp ond to a new situation in a b ounded amoun t o f time Th us there are

three informal comp onen ts of b eing a reasonable earner l optimization monotonicit ynd

resp onsiv eness W eno w ro p ceed to mak e these concepts precise but st w em ust describ e

our basic mo del

Mo del

In this section w e escrib d e a s imple mo el d to capture the k ey elemen ts of a istributed d setting

suc h as the In ternet Consider a ame g w ith a set P pla y ers kP k P here w eac h pla y er

has a ite action set A he ap y o o f the game are describ ed b y a time dep enden tnad

i

P

p ossibly sto c hastic function G A A A here w for on c v enience nd a

P

to simplify notation w eha v e restricted pa y o to The game s i la p y ed in con tin uous

time a t A denotes pla y er i action at time t and G a t enotes d her instan taneous

i i i

pa y o w at time t stable game is one i n whic h G a t G a t ro all t t i there

is no time dep endence F or stable g ames A w e will rop d the last argumen t from the

notation and just w rite G a t Later w e will refer to games that are stable after time t

i

whic h means that G a t G a t for all t t

While the pa y o arise from the game structure eac h ndividual i la p y er is completely

una w are f o he t presence of other pla y ers and of the pa y o function G us from het

p ersp ectiv e f o n a individual pla y er w e need only mo del the fact that hey t r eceiv e some

T o guaran tee that in tegrals are w ell deed w e assume that on an y nite time in terv s t t

G a t s nco tin uous in t except at p e rhaps a ite n um b er of laces p

function he al

Th

pa y o w t This pa y o w t can dep e nd explicitly on time erhaps in a sto c hastic

i i

manner and o n all the la p y er previous actions

Preferences o v er diren tpa y o ws can b e extremely complex Here w e restrict our

atten tion to a simple case b y a ssuming that pla y ers ha v e a ed sampling rate ev aluating

a v erage pa y o at discrete and deterministic ep o c hs In our mo del a la p y er has discrete

time h orizons t at whic h she ev aluates er h pa y o as some p ossibly w eigh ted a v erage

i i

of her wpa y o and then at the end of the p e o c h can decide to alter her ction a W e let

n

n

a n b e the pla y er i action c at t h s i hen t main tained un til t Note that

i

i i

there is no sync hronization in the ystem s so the time horizons are diren t or f eac h pla y er

n n

i w e can ha v e and generally do ha v e t t i j esa y t p yis synchr onous

i j

n n

if t t for all n and all i j

i j

Dee

Z n

n

t

i t t

i

n G a t d

i i i

n

n

n

t t t

i i

i

where t is some con tin uous nondecreasing cum ulativ e distribution function ith w

i i

n n

us nsaw ted a v erage f o t v er the ime t p erio d t

i i i

i

i

A

Let h n a n h n n and h n

i i i i

i i

A

h n n b e pla y er i history u p to p erio d n a nd let H n b e the set of all p ossi

i

i i

ble histories for pla y er i n is a function of the time n he t curren tacotin a n the

i i

history h n nd a ma y also b e s to c hastic F or the remainder of this section w e will rite w

i

a n n In this form ulation the ther o pla y ers are mo deled s a p art of the en viron

i i i

men t the fact that their b eha vior i s cted a b y gen a t i history of p la y s i ncorp i orated in

Th us w e are not considering an ything as complex s a the equilibria of rep eated games in con tin uous time

ee for example tinc S hcom b e for a discussion b ut are only attempting to analyze the b eha vior of

fairly simple learners

Note that the decision p o in ts are often determined b y the tec hnology and re a t ypically not treated

as strategic v ariables Nonetheless w e b eliev e hat t ost m o f ur o results re a still v alid for learners that

strategically manipulate their decision p oin ts g iv en noisy pa y o and dela ys in observ ation In particular

the abilit y to manipulate decision p oin ts should not decrease the set of outcomes that arise

to

eigh Th

la hat for

whic hosen

s dep endency on h

i

Agen t i uses a learning algorithm to c a n Since in this setting gen a ts cannot

i

observ e the actions of other agen ts their c hoice o f a n can only dep nd e on the history of

i

agen t i o wn pla o wn pa y o h n With suc hliltte a priori information ab out the

i

game pla y ers m ust exp erimen t with v arious actions in order to learn b a out the resulting

pa y o Suc h exp erimen tation is often b est done with randomized lgorithms a While ran

domization is often extremely useful i t can b e unluc ky nda so w em ust allo w or f o ccasional

istak es sub optimal b eha vior W e will consider learning to b e suien tly optimal if

it is almost optimal a lmost a ll of the time This t yp e f o earning l is kno wn as P A C learning

probably appro ximately correct learning and can b e extremely p o w erful See for example

V alian t or Blumer et al

Giv en a a p y o function and history h n em ust b e able to compare the v of

i i

diren t actions One metho d whic hw ec ho ose for its simplici t y is to ompare c the means

of the random v ariable a n F y w e will w rite a n

i i i i i i

b n to mean that E a n E b n

i i i i i i i i

In the remainder of this section w e ill w onsider c a single la p y er and th us will drop the

subscript iich h w ill b e implicit Let E N e na en vironmen t eed d o v er N p erio ds i

apa y o function deed on n

easonable Learning Algorithms

As w e discussed in Section t he three requiremen ts of a reasonable learner are optimiza

tion monotonicit y and resp onsiv eness These nformal i concepts can b e made ore m precise

with the h elp o f t he follo wing deitions

The requiremen t of o ptimization is s imply the notion that in an en vironmen t w ith a single

action that is b e tter pro vides higher pa y o than n a y other he t learning algorithm should

an or

alue

and ys

ose hoev en tually learn to almost alw a ys tak e this ptimal o action Certainly one cannot imagine

reasonable learning a lgorithms doing otherwise

Deition A n envir onment E N imple with optimal ction a a A if for al l

n N

a n a n h n

for al l a A such that a a f l h n H n

A reasonable learner should b e ble a o t l earn the ptimal o action in suc h g ames if N

is suien tly large A learning algorithm or l earner L s i a mapping f rom histories h n

to probabilit y distributions o v er actions i n Aiv en an en vironmen t E N this induces a

probabilit y distribution o v er the set of all histories H n whic hw e ill w enote d

L E N

Deition ptimization A player is a simple earner l if f or any E N

which is imple with optimal action a A uch s that N N and any m such that

N m N ther e xists e a subset H m H m such that H m and

L E N

for al l h m H m Pr a m a j h m

Simple learners can nd the optimal action in simple games in the sense of pla ying the

optimal action with high probabilit y or f ost histories where ost is deed b ythe

probabilit y d istribution induced b y t he learner Note that the robabilistic p form ulation of het

ab o v e deition ith w the allo w ance of o ccasional istak es i s n ecessary since a randomized

learning algorithm can b e unluc ky

No ww e a ttempt to capture the m ore general idea of resp onsiv eness or medium term

learning Let H m enote d the set of all istories h on x m

x

Deition esp onsiv eness Ale arner is resp onsiv e if iven g any envir on

ment E N and any N m N such that E N r estricte dto m N m is imple

al or

is

with optimal action a ther e e E N r estricte dto m N m a subset H m

m N

H m such that H m a al l h m H m Pr a m

m N L E N m N m N

a j h m

Being resp onsiv e r equires that the learner resp ond to c hanges in the en vi

ronmen t w ithin a b ounded time N that is in an y p erio d of ength l N during whic hthe

en vironmen t h as b een imple the learning algorithm m ust on c v erge in a robabilistic p

sense to the optimal action

Note that resp nsiv o eness is strictly stronger than b eing a simple learner F or example

consider the follo wing uasitatic en vironmen tin hcwi hev ery p erio d s the optimal a ction

ma yc hange but in b et w een c hanges the en vironmen tis imple L et I n b e the indicator

v ariable whic h is when the agen tc ho oses the o ptimal ction a i n t ime p e rio d n and

otherwise W e consider the case where these stable in terv c v ary nda so w e can let

b e a random v ariable with mean

Theorem In the quasitatic envir onment

m

X

lim lim I t

m

m

t

almost sur ely for any r esp onsive le arner

Pro of Let rN for r and consider a p erio d of length where the en vironmen tis

p p

simple With probabilit y reater g than r the p erio d is longer than rN hen T for that

p

period E I T r r Note that this b ound is indep nden e t f o a ll previous

p P

m

p erio ds and since with probabilit y r the b ound holds w eget ilm I t

m

t

m

Most adaptiv e earning l algorithms in the literature b e rg and Levine Erev and Roth

Borgers and Sarin are not adaptiv e b ecause s a time o g es on they b ecome less eactiv r eto c hanges in

their en vironmen t In theory a y esian yp e learners Kalai and Lehrer F a V ohra

could satisfy resp onsiv eness b y i ncluding the p o ssibilit y o f switc hing in the priors ecause B the space of all

p ossible en vironmen tal c hanges is h uge and pla y ers are llnformed i ab ut o their probabilities this w ould

result in an algorithm hat t is extremely iult d to implemen t a nd completely impractical

nd oster

uden

an als

for nd

on xistsp p

r r r almost surely and taking the limit as r completes

the pro of

Note that nonresp onsiv e learners do not satisfy this theorem F or example o egret r

P

m

learners suc h as those in F oster and V ohra do quite adly b im l I t

m

t

m

can b e on the order of j A j

Our next deitions formalize a notion of monotonicit y o r the a w o f the Ect

horndyk e First w e d ee what it means for one h istory to b e b e tter with resp ect

to an action

Deition Given two histories h n and h n we say t hat h n is higher r esp e ct to

A A A

action a A if h n h n dn h n h n whenever h n a and

m m m

A

h n h n whenever h n a

m m m

Deition onotonicit y Ale arner is monotonic if for any p air of histories h n h n

such that h n is higher with r esp e ct to a A than h n hen t

Prob a n a j h n Prob a n a j h n

Com bining these deitions w e can no w p recisely ee d what w e c onsider to b e a rea

sonable learning algorithm in distributed settings l ik e t ternet

Deition Ale arner is an r e asonable le arner i f it s i onotonic m and

r onsive

Note that monotonicit y allo ws us to mak e statemen ts ab out en vironmen ts that are not

imple F or example in an en vironmen t there ma ybe sev eral actions n a y ne o of whic h

ma y b e optimal dep ending n o exogenous ects but there ma y also b e ctions a that are

clearly sub ptimal o In this case w e c an sho w t s h clearly sub ptimal o actions ill w b e

pla y ed rarely b y a reasonable earner l

This is demonstrated n umerically n i Green w ald F riedman and Shenk er

uc hat

esp

In he

with

lim

Theorem Consider an envir onment E N ssume A that ther e is an action a A and a

set of actions A A such that al l ctions a i n A ar e always worse han t a a n

a n h n for al l a A If a player is a r e asonable le arner with N then

for any m with N m N ther e exists a subset H m H m such that H m

L E N

a nd for al l h m H m Pr a m A j h m

Pro of Consider the en vironmen tin iwch h as h the s ame pa y o as E N when either he t

action a a or a A but has zero a p y o for an y other action This n e vironmen tis imple

with optimal action a nd th us Pr a m A j h m b y heorem T H o w ev er for all

a A this en vironmen t is higher than E N Th us in E N the p robabilit y of pla

a A can not b e larger han t this

Examples

Eac h o f the three notions optimizing monotonicit y nd a resp onsiv eness that comprise

our deition o f r easonableness seem n o the surface to b e quite natural and undemanding

requiremen ts Surprisingly ew f formal learning algorithms in the economics l iterature satisfy

this deition of reasonableness Man y of the learning algorithms in the standard literature

do not ha v e the resp onsiv e prop ert y t ypically their resp onsiv eness to c hanges i n pa y o

or their lev el of exp erimen tation diminishes o v er time W e lso a n ote that there are no

deterministic algorithms hic w h are resp onsiv e

W eno w p resen tt w o examples of r e asonable learning a lgorithms

Stage Learners

The st is a stage learner whic his a v ery simple reasonable l earner T he stage learner

SL learns in tages of length During eac h stage the action that h ad the highest

With suitable c hoices of parameters Roth and Erev m o el d of learning is easonable

A sligh tv arian t of this statemen tis pro v F uden b erg and evine L

in en

ying

a v erage in the previous stage ith ties brok en randomly is pla y ed with probabilit y

while the remaining actions a re eac h pla y ed with probabilit y j A j The c hoice of a ction

in an y time p erio d is i Note that the stage learner almost alw a ys pla ys the ction a with

highest exp ected v alue ased on the a p y o bserv o ed in the last stage but exp erimen ts

with suien t frequency to notice c hanges in the en vironmen t nd a react o t them

q

p

Theorem F or suiently smal l SL is an j A j e j A j

r e asonable le arner

Pro of Assume t hat d uring a particular p erio d of length the en vironmen tis imple

p

with optimal action a and Then the stage l earner will ha v e faced a imple en vi

ronmen t during ts i previous stage Dee a n a n E a n h n

Note that not restricting o do es not ct a the stage learner In this en vironmen t

p

E a n h n and E a n for a ll a a oteN that Var a n

for all a A ince s

Dee a tage s to b e ormal if eac h action has b een pla y ed at least j A j times

The exp e cted n um ber of pla ys for an y p articular action is greater than j A j while the

q

standard deviation of the n b er of t imes it is pla y ed i s less than j A j h us from

the cen tral limit theorem the probabilit y of a n a ction not b eing pla y ed at least j A j

q q

times is less than er f j A j whic h s i b ounded b y exp j A j so the probabilit y

of a stage b eing normal is greater than

j A j

q q

X

j A j

j A j j

exp j A j exp j j A j

j j A j j

j

When j A j is o d d w e c an rearrange t erms to g et

j A j

q q q

X

j A j j A j j

A j exp j j A j exp j j A j j A j

j j A j j j

i

q

A j exp j j A j

exp

um

xp since the terms in he t sum are ll a p sitiv o e for suien tly small When j A j is ev en w e use

q q

j A j j A j

the same argumen t after noting that j A j exp j A j

Dee a o beteh a v erage a p y o for ction a a A o v er a n ormal learning stage

q

The standard deviation of a i s less than j A j while the a v erage i s if a

p

optimal action and less than if a is not ptimal o T h us the probabilit y o f t he optimal

q

p

action ha ving a v erage less than s i l ess than exp j A j since the sequence s a

martingale ee Ho eing or f details This s i also the probabilit y of a nonoptimal

p

action ha ving pa y o greater than Th us the probabilit y of the optimal action ha

q

j A j

the highest pa y o is greater than exp j A j whic h i s reater g than j A j

q

exp j j A j c ompleting the pro o f

Note that if there are t w o optimal actions then the stage l earner will alternate randomly

bet w een them F or constructiv e urp p o ses it is often useful to mak etihs c hoice deterministic

Let A b e the set of all strict orderings o n A e for A j A j

S

with i A and i A hen T giv en an ordering A dee the rioritized

i A

stage learner SL to b e a stage learner that pla ys with robabilit p y the highest

ranking ccording to trategy s whose a v erage pa y o in the last stage w as no less then

less than the a v erage pa y o from an y other strategy remaining actions are still pla y

probabilit y j A j

p

Note that mo diation of the stage earner l has n o ect for a imple en vironmen t

other than sligh tly increasing the probabilit y that the learner mistak es the action with the

highest pa y o

p

Theorem F or suiently smal l and any A SL is an j A j

q

exp j A j r e asonable le arner

Pro of The ro p of is iden tical to the previous pro of or f ordinary stage learners except for the

conditions under whic hitc ho oses the incorrect optimal action This ma y rise a when the

with ed

ving

the is

exp

a v erage pa y o for a sub optimal action is within f the a v erage pa y o for the optimal

action whic hc hanges the probabilit y o f a mistak e sligh tly

Resp o nsiv e earning L Automata

Our second example is the resp onsiv e earning l a utomata LA w hic hw as studied in F ried

man and Shenk er and motiv ated the nalysis a in his t p ap er RLAs re a based on algo

rithms studied in the ngineering e iterature l and ha v e b een implem en ted for an m y net w ork

optimization tasks ee e C hrysalis and ars M Mason and Gu and Shrik an

takumar They are also closely related to sev eral o m d els p rop osed or f exp erimen

economic learning rth ur Mo ok erji and Sopher Roth and Erev An RLA

consists o f a probabilit yv ector whic hcan be in terpreted as a m ixed action at ev ery decision

epoc h ith w probabilit y p n a ction a is pla y ed After action a is pla y ed and the pa y o

a

n is observ ed the probabilit yv ector p n i s u p ated d b y he t follo wing rule

a

X

p n p n n c n p n

a a b b

b a

b ap n p n n c n p n

b b b b

where

p n

b

c n mni

b

p n n

b

W e will denote t hese learners b y

Theorem F or suiently smal l ther e exist c onstants such that RLA is

an exp r e asonable le arner

Pro of This follo ws directly from F riedman and S henk er Theorem

RLA

tal

Groups of Reasonable earners L

Con text and Deitions

Our discussion of learning algorithms considered an en vironmen t seen b y a single pla y er

whic h consisted of a eneral g pa y o function with no restriction on ho w t y o w ere

generated Here w e return to the original situation where this p a y o function arises from a

game G in v olving P pla y ers ith P denoting the set of pla y ers eac h with action space A

i

When fo cusing on a single pla y er in a general en vironmen t results ik l e T heorem allo w

us to mak e ome s statemen ts ab out the a symptotic ature n of pla y of a reasonable learner as

deed in Section Similarly in t his section w e a ssume that eac h f o the P pla y a

reasonable learner and sk a what the asymptotic nature of the oin j tpla y is This asymptotic

set of actions is the solution c onc appropriate for learning i n d istributed systems lik ethe

In ternet Note that the solution concept m ust con tain the ev tual pla y f o ll a p ossible sets of

learning algorithms W e a re not in terested in results for one particular learning algorithm

ev en if the set of suc h learners ha v e particularly nice con v ergence prop rties e All w ecan

assume is that learners are reasonable n ot that they conform to s ome sp e ci algorithm

Milgrom nd a Rob erts dee an daptiv e learner s a ne o who e v tually elimi

nates actions that are strictly ominated d n p ure actions o v er time They ro p v e that when

a group of adaptiv e learners pla y together t hey con v erge to the serially undominated set he

result of the iterated deletion of these dominated actions

In this section w e parallel those results with t w o ain m d istinctions First w e o nly assume

that pla y ers a re reasonable l earners as deed in the revious p section In this setting it is not

true that pla y ers alw a en tually abandon dominated actions Pla y ers annot c explicitly

iden tify dominated a ctions ecause they don no k wthe ap y o matrix and furthermore w e

sho w that in some cases dominated actions can ev en b e pla y ed in equilibrium Th us w ecan

ev ys

en

en

ept

is ers

pa heseonly imp ose the requiremen t of reasonableness s w eha v e deed it on learners Second

since in this distributed setting no action can ev er b e completely discarded the on c v ergence

to an y set of actions or the elimination of others is only appro ximate The f act that all

actions remain in pla y f orev er mak es the analysis o f he t join tpal y quite delicate

As w e hall s see a s et of reasonable learners need not con v erge to the serially undominated

set The main result of this section is that a set o f reasonable learners ev en tually pla yin

the serially uno v erwhelmed set the s et remaining after iterated elimination of o v erwhelmed

actions W e do not b e liev e this c haracterization is tigh t in that there are some g ames

where no set of reasonable learners will ev en tually pla y with signian t robabilit p y in some

p ortions of the serially uno v erwhelmed s et Ho w ev er the serially uno v erwhelmed solution

concept is the tigh test o cal set based solution concept p ossible where l o al c set based

solution concepts are the natural generalizations of he t serially undominated set Moreo v er

w e presen t another t w o ets s the tac S k elb erg correlated set and the Stac k elb erg undominated

set and raise the question as to whether the t rue solution concept lies b et w een these t w o

Before pro eeding c w e equire r t w o eitions d

Deition oAl c al dominanc eop er ator on a table s game A is a set of monotone

i A A

i i

op er ators one for e ach i he notation r esents the act f t hat

i

i

dep ends on player i p ayo matrix G We denote this set of op er ators by wher e

i

i

for e ach i and A Note that an op er ator is monotone if

i i

i

A i i

i

for such that if enh

i i

Eac h o l cal dominance op erator describ es the set of p o ssible strategies agen t i migh t

emplo y as a function of the p ossible pla ys the ther o agen ts migh tmak e F or eac hlocal

Recall that a reasonable learner in order to remain resp onsiv e can nev er completely top s pla n a

action since exogenous ects c ould mo dify the pa y o m aking hat t action optimal t a some l ater time

Duggan and Le Breton study the xed p oin ts of lo cal dominance op erators whic h they denote

ominance Structures

ying

only epr

dominance op erator e can dee the related solution concept

Deition Given a lo c al dominanc eop er ator the asso ciate d lo cal et s based

m

solution concept SB is the op er ator dee dby G A lim A

m

One standard LSB is deed using dominated actions The lo cal dominance op erator is

i

giv en b y f a A j b A s a G a b g e will

i i i i i i i i i

i

denote the LSB for this op erator b y D a nd so D G A d enotes the serially undominated

set of the game A

The relev t LSB for decen tralized games s i based on uno v erwhelmed actions The lo c al

dominance op erator is

i

f a A j b A s a G a b g

i i i i i i i i

i

W e will denote the LSB that results from the iteration of this op erator b y O a nd refer to

O G A as the serially uno v erwhelmed et s of the g ame A W e w ill o ccasionally

abbreviate this as O G w hen the action subset is the en tire action set and will further

abbreviate the notation to O when the game i s also unam biguous Similarly when the

game is unam biguous and the action subset i s the en tire action set w e w ill se u the notation

k

O to denote the k h iteration of the uno v erwhelmed o l c al dominance op e rator applied to

the en tire action s et

F or comparison note that one a ction dominates another if all a p y o or f the one are

greater than the other for all giv en e d sets of other pla y ers actions In con trast one

action o v erwhelms another if all pa y o o v er all sets of other pla y ers actions f or the one

are greater than all pa y o o v er all sets f o ther o pla y ers actions for the other Domination

compares the v ector of pa y o term yerm o v erwhelmi ng compares the en tire ag of

pa y o a v ailable and th us is a m uc h stronger r equiremen t

The limit exists since is a monotone set op erator and A is ite

an

F or an y game the serially uno v erwhelmed set con tains the serially undominated set

whic h con tains the set of rationalizable actions

Con v ergence Results

Giv en a nite set of r easonable learners L f L g where eac h L is an

m i i i i i

reasonable learner let L amx max max N max w consider a

i i i i i i i i

rep eated game pla y ed b y these pla y ers with pa y o functions G a t nd elt be het

i

i

largest time in terv al b et w een pla y er i decision ep o c hs the smallest and et l L

i

L

max L min D ee L N L and et l j A j j A j

i i i

i i

L

Note that a set of learners L and a game A induce a measure o v er histories

H b y their pla y whic hw e will call W eno w p resen t our main result whic h is that

L

decen tralized learning eads l o t the serially uno v erwhelmed set

Theorem Given any game G a which s i table s after time t and any Ther e

exists such that or f any s N dn anyste L of r e asonable le arners

playing satisfying L d n L L the players c onver ge

to O in the fol lowing sense ther e exists a s et H s H s with H s such

L

that Pr a s O G j h s

k

Pro of Fixing a game A ho ose an a ction a O and dee

i

i

A

a max min G b max G a

ki i i i i i i i

k k k

b O

i b O a O

i i

i

i i

k

and let in f a j a O g and note t hat if an action is eliminated for

ki ki i i ki

i

pla y er i at round k a nd otherwise etL ax

ik ki ki

Dee time in terv al k b y I t k L N L t k L N L Note that in

k

I all pla yisin O e p ro ceed inductiv Assume that for n a y s in p erio d I learner i

k

k

is pla ying in O with probabilit y greater than L If m L then learner i is

i

ely

and

No

k

pla ying in an en vironmen tin whci h ll a actions n ot in O m ust b e exceeded in exp ected

k

v alue b y hose t a ctions in O and th w e can apply heorem T o t sho w that the learner

learns to pla y these actions in p erio d k with probabilit y ess l han t with probabilit y

i

greater than The probabilit y that the pla y er do es this at ev ery in terv al in p rio e d

i

k is greater than L L Th us the robabilit p y that a ll learners do this is greater

than m L L Finally the probabilit y that this o ccurs o v er all stages is g reater than

m j A j L L since there c an b e at most j A j stages required to reac h O Th us if

j A j L L this sho ws that con v ergence will o ccur

This theorem immedi ately applies to Stage L earners and R LAs

Corollary Ther e e xists some such that any gr oup L of Stage l e arners and RLAs

p

satisfying L ax min c onver ge to the erial s y l u noverwhelme d s et

i i i i

wher ec onver genc e is dee d as n i The or em

The ab o v e esults r hold for a stable games Ho w ev er the nalogous a results hold ev en

with time arying games F or instance consider s w e did in Section f or games against

nature the uasitatic game in whic hev p erio d s the pa y o functions ma yc hange

but in b et w een c hanges the game is constan t Let I t b e the indicator v ariable whic his

when curren t action is in the serially u no v erwhelmed set a t O G Let be a

random v ariable with m ean hen T t he Theorem also implies con v ergence in this game

Corollary In the quasitatic game just describ e d

m

X

lim lim I t L L

m

m

t

for any gr oup of le arners satisfying the c onditions in the pr evious the or em

As discussed in S ection nonresp onsiv e learners ncluding i no regret learners do v ery

p o orly in quasitatic en vironmen ts

ery

and

us

Sync hronous P la y

n n

In terestingly f w e restrict to sets of pla y ers who pla y sync hronously t t a i j n

i j

then w erve ert to the standard results p la y con v erges to D G

Theorem L et L b e a set of r e asonable le arners playing sync hronously a game G a

which is stable after time t T hen for any ther e exists such that for

any s N f L nd L L then ther e e xists a set

H s H s with H s such that Pr a s D G j h s

L

Pro of The p ro of of this theorem is a nalogous to Theorem fter a noting that in a sync hronous

game the exp ected pa y o of an y ominated d action i s a lw a ys less than that of the dominating

t

action since for pla y er i a is go v erned b y a random distribution that s i o d e s not dep end

i

t

on the c hoice of a lthough they ma y b e correlated ex p o st

i

W e do not kno w if there is a smaller set the supp ort of the set of correlated equilibria

or the rationalizable strategies for whic h his t result con tin ues to hold

Minimal Solution Concepts

Theorem e stablishes a b ounding set o n the asymptotic pla y he T true solution concept ma y

b e somewhat smaller Let C G A A b e the true solution concept that is the union of the

set of strategies pla y ed with nonegligible robabilit p yb y all p ossible groups of r easonable

learners More formally eha v e he t follo wing deition

Deition C G A is the smal l est et s for which The or em s i true when O r eplac e d

by C

First w e will sho wttha C G A no tains some tac S k elb erg equilibria G iv en a trict s order

on P P dee he t Stac k elb erg game G to b e the xtensiv e form game with

is

ll for pa y o giv en b y G in whic hpal y er m o v es st then and con uing up to pla y er

P

Deition A tackelb S er ge quilibrium ith w r esp e ct to or der P a s game p erfe ct

e quilibrium of the S tackelb er g game G

The k ey asp ct e of the f ollo wing pro of is the observ ation that b y s eparating their timescales

pla y ers b eha v e sa ifthyae re lpa ying a Stac k elb erg game Note that he t role f o l eader is not

in ten tional b y the learner in fact the learner is not ev en a w are that it is the leader

and is merely the pro duct of learning slo wly h us earning l slo wly u sually p e rceiv ed as a

disadv an tage pro vides the b enes of b eing a Stac k elb erg leader This is an xample e where

sup erior sophistication uc h s a faster computer pro cessors or b etter learning algorithms

ma y lead to inferior results

Theorem F or every or dering P ther e exists ome s a G A such that a is a

Stackelb er ge quilibria for the game A with r esp e ct to or der

j

Pro of Consider a group of iden tical rioritized p stage l earners SL eac hwtihan y ordering

j

A Cho ose suc h that

j

p

min fj G a G a j G a G a g

i i i i

a A i

n P i

Set t n heer d e here d x e is the l east in teger greater than x so

i i

i

the pla y ers up date at ed but diren t in terv als with the rst pla y er in the S tac k elb erg

ordering b eing the slo w est L astly et t for t t t

t o s pla y ers a v erage o v er pa y o o nly uring d the second half of their time in al

n

No w consider pla y P She c ho oses an action at time t nda in the p n e

P

n

n

in terv al b et w t and t no pla y er b e fore her i n he t order will c hange their

P P

een

er

terv

for and

ub is

tincurren t action and pla y P ill w con v erge to a b est reply ith high probabilit y

n

curren t action a b y time t us rom f her p o in t of view the game i s a

P P

P

n

Stac k elb erg one here w P follo ws her since she only ev aluates pa y o b et w een t

P

P

n

and t Con tin uing bac ards through the ordering w e see that e ac hpla y er follo ws the

P

pla y er b efore her and that p la y will con v erge to the sp ecid equilibria

Lastly Theorem ho s ws that C G A O G A As w e discuss later w e susp ect

that this inequalit y is strict or f some games Ho w ev f i w e restrict ourselv es to LSBs then

Theorem is tigh t in the follo wing sense

Theorem L et b e an LSB such that O for al l A and

A nd C G A G A for al l A Then O G A G A for al l

A

Pro of Assume that there exists a ame g A suc h that O G A G A us

k k k k

there m ust exist some k suc htath A O A ut A O A where w e d rop

k

the sup erscript A here and b elo w for notational con v enience A

k

O A Cho ose some b A that do es app ear in O tu tino n No w construct

i i i i i i

the game in the follo wing manner F or all a A set G a G a

i

i

Cho ose a unction f r A A h that or f all a b

i i i i

r a argmin G a

i i i i

a

i i

and

r b argmax G b

i i i i

a

i i

No w for all j i dee G a when a r a G a ewhn a

i i i i i i i i

j j

and G a for all other cases

i i

j

By construction O A and th b O A nice G G

i i i i i

i

Since A O A this implies that b A b y onotonicit m y

i i

us

suc

Let

Th

er

kw

Th

the to er

whic h also implies that b G Ho w ev er w ewill no wsho w that b G pro

i i

the theorem

Construct a Stac k elb erg ordering where pla y er i is the leader i rst in the ordering

and let her b e a prioritized stage learner where action b is the top priorit y action Let the

i

other pla y ers b e in an y o rder and a ssume that they are o rdinary stage learners

No ww e use the same construction as n i the previous theorem to sho w that the outcome

of this game is the strategy proe b b since all follo w ers w ill pla y r a in resp onse

i i j i

to the leader action and the leader will then see a ame g i n whic h action b has t he highest

i

pa y o b y construction of the function r nd a the act f that it is not o v erwhelmed Note

that this pa y o ma y not b e strictly highest ho w ev er the action b will b e c hosen b ecause of

i

the priorit y ordering used

A Tigh ter Solution Concept

While the O solution concept is the tigh test LSB solution concept it is probably not he t

tigh test solution concept or f decen tralized earning l That is w e exp ect that there re a games

for whic h C G A O G A onsider t he follo wing game

L R

T

B

O of this game is the set of all actions It seems in tuitiv e although w eha v enofrmalo

pro of that an ypairof ecend tralized learners will con v erge to T L In Green w ald F riedman

and Shenk er sim ulations of the LAs R and Stage Learners w ere onsisten c t with this

in tuition Since our goal here is to describ e t he p o ssible outcomes of a game p la y ed b y decen

tralized learners it is imp ortan t to nd the tigh test solution concept to w hic h decen tralized

learners con v erge

W eno w d escrib e a class of solution concepts whic hissuggested b y the pro f o o f heorem T

ving W e do not kno w whether an y of these is the correct solution concept w ein tro d uce

these solution concepts t o form ulate a testable op en question whose resolution w ould greatly

impro v e our understanding of reasonable learners in distributed settings

Stac k elb e rg Solution Concepts

Consider some solution concept G A that s i deemed appropriate for sync hronous games

W eno w d ee a s olution concept based on G A that ore m ppropriate a for games with

arbitrary degrees of async hron y Giv en a ite set of pla y ers P ith P j dee

S

a on strict pla y rder o m where r r P nd

r g

T

for r r r r et P b e the set of all on strict pla y orders r

S S

r and r r

r g r r g

Giv en a p la yorerd dee t he asso ciated Stac k elb erg game where pla y ers mo v e according

to that order Eac hpla y er tak es the ctions a of the pla y ers earlier i n the order as a g iv en

and pla ys accordingly Th us a pla y er sees the b eha vior of the earlier pla y ers as ed

and sees the later pla y ers s a reacting to their mo v es Eac hpla y er elemen tal action in

this Stac k elb erg game is actually a r onse function n hcwi h n a action of the underlying

normal form game a c hosen as a function of the ctions a of the revious p in terms of

i

the ordering pla y ers That is for agen t i r a strategy n i the Stac k elb erg game is a

resp onse function A A et G b e the set of all suc h tacS k elb erg strategies

i i

r

for the ordering dn let G b e he t restriction of G to r F or et Out

i

b e the action c hosen b y pla y er i when pla y i s eed d b y or example if P

then Out hic h s i a ed strategy indep enden t o f the other la p y ers mo v es

Out Out and so o n Giv en a v ector f o strategies he

pa y o is G Out

F or an y r a and consider the game pla y ed b y the pla y ers in r

r r

Note that w edonot allo w these resp onse functions to b e mixed strategies

is

esp

jP

They see the strategies of the pla y ers in rsatxed a and see the strategies of

r

the pla y ers in r as a function of their j oin t action Th us to the pla y ers in r the game

has pa y o f o the form

r

G a a G a a

r r r r r r r r

Giv en an order and an y solution concept G A eno w ee d the set G A in

r

ductiv ely or all a A et

r r

r

G A a G a a

r r r r r r

G

r r

where the union is o v er all resp onse functions whose i mage Out iesin eth set

r r

G A Let G A b e t f o suc htath a G A a

r r r r r r r

F or a strategy set B dee the et s of reac hable a ctions b y

R B f a A j B s a Out g

W e prop ose that the set R G A represen ts a p ossible s olution concept or f a g ame

with ordering e can no w d ee the set of Stac k elb erg actions denoted b y S G A

of a game A

Deition The set of Stackelb er g actions S G A of a game A is given by

S G A R G A

P

A C onjecture and a Question

A p ossible conjecture is that t he correct solution concept for reasonable learners i n a dis

tributed setting is S where is the orrect solution concept for reasonable learners in

a ync hronous game If t his i s true then the o nly impact of async hron y is in separating

set he

timescales as in the pro of of Theorem while i f it s i alse f it implies that the ect of

async hron y is more subtle

First w e ote n some relationships b et w een the v arious solution oncepts c

Lemma F or any olution s c onc ept G A the fol lowing hold

i G A S G A

ii G A G A for al l G A S G A S G A al l G A

O

iii S G A O G A

Pro of i This follo ws imme diately since the order P thi P ws that the

Stac k elb erg v ersion of m ust on c tain

ii This follo ws imme diately rom f the eition d of S G A

O

iii The relation O G A S G A olol ws from art p i nd a w eno wsho wtath het

rev erse holds Assume a O G A from the deition f o O a is o v erwhelmed b y

i i

another action then i t m ust b e o v erwhelmed for an y subset o f the other pla y ers actions

O

Therefore a S G A ro ving the qualit e y

i

i

The Stac k elb erg solution concepts are a w a ytotak e a ync hronous solution concept

a nd generalize it to a setting with arbitrary async hron yh w e prop ose the Stac k elb erg

solution concepts as a p ossible c andidate for a decen tralized solution concept C G A

The ob vious question then is what sync hronous s olution concept is ppropriate a F oster

and V ohra sho w t hat the appropriate solution concept for calibrated learners s i he t

set of correlated e quilibria Let Corr G A represen t the supp rt o of the set of correlated

equilibria If reasonable learners rather than calibrated ones lso a ll out the space of

correlated equilibria then the follo wing conjecture ma y b e true

The standard form of calibrated learning lgorithms a are n ot resp onsiv e so the question is whether the

F oster and V ohra result holds f or the resp onsiv ev ersions of suc h l earning a lgorithms suc h algorithms w ere

sim ulated in Green w ald F riedman and Shenk er

us

if

sho

forCorr

Conjecture S G A G A

Corr

W e call S G A t k elb erg correlated set In essence this conjecture sa ys that

while w e do n w what the correct solution concept i s for ync hronous ames g w essu

Cor r

p ect that it con tains the set Corr G A and w e further conjecture that the set S G A

captures the ects of async hron y O n the other hand t he set D G A s i usually tak

b e a sup erset of the actual asymptotic pla yisycnn hronous games If that is indeed true

then it leads to t he follo wing question

D

Question Is C G A S G A

D

W e call S G A the Stac k elb erg undominated s et W eha v e oted n b fore e that the

p ossible disparit y in learning rates leads to tac S k elb ergik e phenomena If the only ect

of async hron y added to our deition of reasonabilit y is to pro duce these Stac k elb ergik e

phenomena then this conjecture will b e true a nd in fact w ew ould ha v e that C G A

S G A for some solution concept W elea v e this as an op en question whic h requires

further in v estigation

Example

T o get a more c oncrete sense of these solution concepts recall the game discussed at the

b eginning of this section

L R

T

B

Note that for the ab o v e g ame there are three orders f g f g f g and f g f g

F or the order f g e just ha v e the original ame g hic w h s i dominance solv able with

While our searc h or f a igh t t solution concept as h not y et succeeded w e a re not alone There are few

solution concepts whic hha v e b een pro v ed to b e tigh t f or a class of learners F or example v arious conditions

ha v e b een sho wn to hold for titious pla y b t s olution concept is kno wn The only on trivial

example w e kno w of i s the tigh tness of correlated equilibria for calibrated learners a V ohra

nd oster

tigh no ut

to en

kno ot

Stac he

actions T L while for f g f gf apl y er s a ction s i xed then p la y er s only

undominated strategy is T L and B L and after restricting to this pla y er s

only undominated strategy is s T h us the o utcome for this game i s T L whic histhe

Corr D

same outcome for the order f g f g b y symmetry h S G A S G A

T L whic h is the same as D G A whereas O is the n e tire game

Solv abilit y a hanism Design

Solv able Games

Cor r

Often the sets of pla yin eth v arious solution concepts suc has S G A r O G A

are quite large and in those cases one cannot predict with p recision the symptotic a pla yof

reasonable learners There are ho w ev er some games where the outcome is unam biguous

W e will call suc h g ames solv able

Deition A game A is Oolvable if j G O G A j Similarly a g ame

Corr D

A is SColvable if j G S G A j it is Dolvable S if j G S G A j t

is Colvable if j G Corr G A j and it is Dolvable f i j G D G A j

Note that solv abilit y do es n ot require that there is a single ev tual la p y only that there

Corr

is a single ev en tual outcome a y o v ector Because Corr G A S G A O G A

D

and Corr G A D G A S G A O G A an y olv O able game is b oth SC

solv able and SDolv able and an y S Dolv able game is Dolv able and Colv able ee

Lemma

Belo w is an example of a mewa ithv arying degrees of solv abilit yas x v aries

L C R

T

M

B x

en

Mec nd

us

When x this game is Oolv able nd a when x it is SDolv able nd Colv S able

but not Oolv able When x this g ame i s not ev en Dolv able r C olv able

T o illustrate a more general Oolv able game w e dee the lass c of gener alize d serial

games A follo wing Moulin and Shenk er to b e hose t that ha v e the follo

e prop erties for an y i j with i j

Ordered action domains A

i

Crossonotonicit y G a G a f y a a i j

i i j j j j

Serialit y G a G a for an y a a a i j

i j j i j j j j i

Unique b est reply or f eac h a there exists an elemen t BR a cu hthat

i i i

x BR a G BR a x

i i i i i i i i i i

Serialit y f o b est reply BR a BR a or an y a BR a

i i i j ij j i i

Theorem Gener alize d serial ames g ar e Oolvable

Pro of Since the O op erator is monotonic he t iteration pro cess m ust con v o t a

non trivial ed p oin t Let this ed p oin tof O b e denoted b y I I

n i

denoting the minim al elemen tof I denoting the maximal lemen e tof I and and

i i i

denoting t he v ectors f o these extremal elemen ts Let MAX x max G x

i i a I i i i

i i

and MI N x G x F y a I and for an y x I G x

i i a I i i i i i i i i

i i

G x G x o MAX x G x nd MI N x G x Assume

i i i i i i i i i i i i i i i

that I is not a singleton so the set f i j g is nonempt y e can dee i as the elemen t

i i

in this set with the smallest I n particular G G

i j j j i i i i i

so MI N MAX If there exists some x I suc hthat G x

i i i i i i i i i i i

an or min

and

with

erge

an or

wing then MAX x IN and so o v erwhelms x If there exists some x I h

i i i i i i i i

that G x G then MI N x AX a nd so x o v erwhelms

i i i i i i i i i i i

h us w em ust ha v e G x G G dn G x G for

i i i i i i i i i i i i

all x I Consequen tly BR and BR his T on c tradicts he t

i i i i i i i i i

serialit y of the function BR

i

In Section w e ill w encoun ter examples of suc h generalized serial games Another

solv able game arises when rationing a xed a moun t C of some go o d when all utilities are

single eak ed ee for e xample Sprumon tLe t p b e the l o ation c f o agen t i p e ak

i

P

The uniform ame g can b e deed as follo ws Eac h a gen t announces a equest r a f a C

i i i

then the allo cations q are g iv en b y q ni a w here is the nique u v alue suc h that

i i i

P P

q C f a C then the allo cations q g en b y q a where is the unique

i i i i i i i

P

v alue suc h that q C In t he case where a p the resulting llo a ation c reduces to he t

i i i

i

uniform mec hanism

Theorem The uniform game is SDolvable nd SColvable but not Oolvable

Pro of First w e pro v e hat t the uniform ame g s i olv D able Let D I I with

P

I l enote d the result of iterated elimination of dominated actions Note that l

i i i i

p u since eac h agen t g ets the highest pa y o b y announcing p Assume k is suc h that

i i i

C

l l uf l then eac hacotinv ector n i D results i n he t same allo c ation

i k i i k

P

C C

with eac haeng t getting q Assume to the con trary tha l f l then p

i k k k k

P P

dominates l he allo cations are monotonic n i r and a re strictly monotonic in the vicinit y

k k

C

of l If l l p then p dominates u he allo cations are monotonic in r and

k k k k k k k

P

are strictly monotonic in the vicinit yof u Therefore b y con tradiction there can b e no

k

suc h k and so all ets s I are erely m the singleton p

i i

Note that this pro f o sho wttha if w e held some of the a ctions ed ot necessarily at their

p eak then he t D set of the game among the remaining pla y ers con v erges to the singleton

and

iv are

suc

ith eac h pla y er p eak p the only remaining action Since on eac h subgame the set D

i

con v erges to the same singleton the construction u sed i n the Stac k elb erg undominated set

also reduces to that singleton

Next w e sho w that t he uniform ame g i s ot n Oolv able Denote b y l r r the set of

i i

pla y er i allo cations ot pa y o resulting from a nnouncing ction a r and letting the other

i

actions v ary from to tho l and u are monotonically increasing in r dn l

i

C C

u l u Ceucase u l l r r l r r for

i i

i i

P P

all r r he T allo cation in terv als lwa a ys o v erlap and so the pa y o sets for an yt w o actions

i

i

o v erlap so there re a no o v erwhelme d a ctions O is the en tire strategy space for this game

Our nal example is that of or der e d externality games riedman and These

are nonatomic games where agen ts lab led e b y a parameter ecide d to participate etting

a or not etting a f i they p articipate their pa y o dep nds e o nly on the

size of the participating p opulation nd if they don participate their pa y o is zero

The pa y o d ecrease w ith the lev el of participation Th us or f a g iv en v ector of a ctions he t

pa y o are of the ormf U a a h i s onincreasing n in nd U a It

is sho wn in F riedman that this game is Oolv able if and only if it con v erges under

b esteply dynamics

F or example consider the congestion game discussed in the In tro duction p la y ed b ya

large n um b e r of pla y ers Eac h pla y er decides whether to send a p ac k et of information Let

a b e the total n b er measure of pla y ers ho w decide to send a pac k et he T dela yto

a pla y er is D hicw h is nondecreasing n i where is the capacit y of he t link F or an

M FIF O queue D ro otherwise h T us the pa y o to a

pla y er who sends a p ac k v c D where v is the p ersonal v alue of the ac p k et and

c is the dela y cost whic h i s a ssumed to b e nondecreasing The pa y o is f i the pla y er

is et

and

um

whic

do es not send a pac k et F or man yt ypical queuing pro cesses this game con v erges under

b est reply dynamics if the capacit yofthe qeuue is suien tly l arge ee F riedman nda

Landsb erg for details Th us in this case the game s i O olv able hese T results also

apply to similar congestion games with m ultiple l inks and pla y ers t a iren d t lo ations c with

b ecoming a v ector dep ending n o the t yp e nda lo cation of pla y er

Implications for Mec hanism Design on the In ternet

So far in our discussions of learning a nd con v ergence w eha v e mplici i tly ssumed a hatt het

game is exogenously giv en oH w ev er in the In ternet and i n other distributed con texts one

w ould w an t to d esign the game in order to shap e the nature of the resulting pla y and thereb y

ac hiev e certain so cial goals This is the mec hanism design or i mplem en tation paradigm

T o x notation consider n a a llo cation roblem p with P agen ts Let U denote the omain d of

utilit y functions ssumed for the sak e f o simplici t y to b e he t same for eac hagne t and let

P

O denote the set of p ossible outcomes A so cial c hoice unction f is a mapping F U

A mec hanism is a set of action spaces A and a mapping M A sso A ciated with

i

P

eac h mec hanism A and a tilit u y p roe U is a stable game G A deed b y

G a U M a W e denote b y C U A the solution concept for a mec hanism M a

i i M

particular utilit y p roe U mce hanism A implements a so c ial c hoice f unction F

if M a F U for all a U

M

W eno wakshwci h o s cial c hoice f unctions can b e implem en ted in a distributed setting

T o b e more recise p for whic h F is there a m ec hanism A suc h t hat M a F U

for all a U Since w e d o n ot kno w the exact nature of C e annot c answ er this

M

question deitiv ely ho w ev w edoha v e some partial results Before presen ting these

Note that the set A is not necessarily the atural action space on t he net w ork but is more c ommonly

denoted the message space F or example in the congestion game A could include a priorit y request along

with a transmission rate

This is sometimes called str ong implem tation in the literature

en

er

at

results w e need a few deitions

P

Deition Consider any p air U V and dee E f i j U V g F is w eakly

i i

coalitionally strategyro of CSP if when E is nonempty ther e always exists some j

E such that U F V U F U F is strictly coalitionally strategyro of CSP i f

j j

F U F V ther e exists j E such that U F V F U F is strictly strategy

j j

pro of SP if F U F V U F U F V F Maskin onotonic m

i i i i i i

M if F V F U whenever U x U F U V x V F U for al l al lo c ations

i i i i

x and al l i

W CSP merely requires that not ll a mem b ers of the eviating d coalition can strictly gain b y

deviating SCSP requires that there i s n o other outcome that s i e quiv alen t r o b etter in the

ey es of the deviating coalition to the truthful o utcome SSP requires that for n a individual

deviator no other outcome is equiv alen t or b etter Th us or f an SSP s o cial c hoice function F

the truth is a trict s Nash equilibrium of F hough p erhaps not the nly o ash N equilibrium

while for an SCSP so cial c hoice function F t he truth i s a strict strong equilibrium hough

again p erhaps not the only one Note that the d eition of SSP implies n on b ossiness

n fact c oalitional non b o ssiness when applied to a priv tea goods con text Maskin

pro v ed that if F is Nash impleme n table n the sense w e m ean ere h then F is Maskin

monotonic

W edonot y et ha v e a tigh t eition d of the solution concept C dn so bleo ww e presen t

results for implem en tati on with diren t p ossible solution concepts If a o s cial c hoice function

is implem e n table with a solution concept w esa yit is mplem en table W ecna no w state

our st theorem that holds if the solution concept i s ndeed i the upp er b und o O

Theorem If a so cial choic e unction f F is Omplementable then it must b e S CSP

This is also referred to as Group Strategyro of see M ulleratterth w aite

is

Pro of Consider some mec hanism M A that i mplem en ts F Assume to the con

that F is not SCSP hen T there exists t w o tilit u y proes U V suc hthta F U F V

but U F U U F V for all i hthat U V Let E f i j U V g Since M

i i i i i i

implem en F there m ust b e t w o action v ectors u and v in A suc h t hat M u F U

and M v F V a h a re in the solution c oncepts at the resp ectiv e tilit u y proes

U and V i u U nd v V Since F U F V w eha v e v U nd

M M M

u V A t he t utilit y roe p U consider the S tac k elb erg ordering with elemen ts in E

M

leading E P E The allo cations t hat result from this Stac k elb erg game m ust b e het

allo cation F U but the allo cation F V is iren d tfrmo F U et giv es all the elemen ts in

O

E at least as go o d outcomes Recall that S O o w e c an apply the solution concept

O to the pla y ers in E ssuming a that the agen ts in P E are resp onding to these pla ys

The solution concept O applied to the game pla y ed b y the agen E con tains the p oin t

u Therefore it m ust also con tain v since the pa y o for v P areto dominate the pa y o

E E E

for u nd therefore none of the strategies in v o v erwhelmed Th us t he p oin t v m ust

E E

b e included in the solution set C U whic hcon tradicts our earlier result

M

Note that the c oalitional asp ects of the O solution concepts and ence h the coalitional

requiremen ts of SCSP id d not a rise b ecause of some explicit notion of collusion among agen ts

in our distributed setting I t arose b ecause of the a sync hron y where there could b e m ultiple

agen ts with long timescales ev en though there w as no explicit collusion

Our next esult r is a sligh t extension of the o riginal bserv o ation due t o d spremon tand

G erard aret on Stac k elb e rgolv able games

Theorem If a so cial choic e unction f F is C mplementable then F must b e SP S

Pro of Assume to the con trary that here t exists a so ial c c hoice unction f F i s C

implem en table with A as the impleme n ting mec hanism but f or whic h there exists

that

are

in ts

eac nd

ts

suc

and

trary U and V suc h that F U F V ut U F U U F V Without loss of

i i i i i i i

generalit y ssume a i and consider a s trict Stac k elb erg ordering with f g

All p oin ts in the olution s concept C U are mapp ed b y M n F U similarly all p ino ts in

M

C V are mapp ed b y M n to F V Let u b e some Stac k elb erg equilibrium with

M

order in C U and let v b e some Stac k elb erg equilibrium with rder o C V

M M

Then the pa y o for agen tt v is at least as great s a the pa y o at u nd a w e can c ho ose

agen t s learning algorithm to fa v v o v er u uc has in the rioritized stage learners

Since v is the Stac k elb erg esp r onse to v and u is the tac S k elb rg e resp onse to u b y

construction v m ust also b e in the set C U as it is a p ossible o utcome of the learning

M

pro cess This con tradicts our original assumption

The follo wing is a standard result ab out S SP and M askin Monotonicit yforcno v enience

w e nclude i the trivial pro f o

Theorem If a so cial choic e unction f F is SSP then F must b e askin M Monotonic

Pro of Consider an SSP so cial c hoice function F and ome s V suc htath U x U F U

i i i

V x V F U for all allo cations x Assume to the con trary hat F U F V

i i i i

Because F is SSP em ust ha v e V F V F U and U F V F U

i i i i i i i i

This con tradicts our assumption ab out V

i

This leads imme diatel y to the follo wing Corollary

Corollary If a so cial choic e function F is C mplementable then F must b e Maskin Mono

tonic

Note that in certain restricted domains Maskin Monotonicit y i mplies W CSP see Shenk er

and Barb era a nd Jac kson ee s Dasgupta Hammond and askin M for a

deition of a monotonically closed domain

or

in

to

Theorem If a so cial choic e unction f F is Maskin Monotonic and he t domain i s mono

tonic al ly close d then F is WCSP

This leads to the follo wing orollary C

Corollary If F C mplementable and the domain is monotonic al ly close d then F is

WCSP

Note that man y o f the most notable strategypro of mec hanisms d o not ha v ean y degree of

resistance to coalitional manipulations F or instance the Clark ero v es lark e Gro v es

mec hanism s re a not in general w eakly coalitionally strategypro of

Examples

W eno w d iscuss a few SDmplemen table nd a Ompleme n table so cial c hoice functions and

their implem e n ting mec hanisms

D

The st example is the uniform so cial c hoice function i ts S mpleme n tabili t yflo

lo ws trivially from heorem T Since the uniform mec hanism relies only n o the p eaks

of the preferences there is no real distinction b et w een the uniform g ame and the uniform

so cial c hoice function Th us Theorem implies that the uniform so cial c hoice function

D

is SDmplem en table b ecause the direct mec hanism is itself S olv able While w eha v e

sho wn that the direct mec hanism is not tself i Oolv able t i remains o n n a o p en question

as to whether the uniform so cial c hoice function is Omplem en table through ome s other

mec hanism

The second example c omes from the congestion ame g with strictly onotonic m ncreasing

in r decreasing in c and conca v e utilities U r a nd a strictly con v ex constrain t function

i i i i i

f The serial mec hanism ee Moulin a nd Shenk er for a description can b e describ ed

is

as follo ws When the agen ts are l ab eled so that r r a i the congestions c are

i i i

recursiv ely determined b y the equation

k

X X

c n k c f min r

i k i k

i i

W eha v e the follo wing theorem

Theorem The serial me chanism with strictly monotonic and c ave utilities and a

strictly c onvex c onstr aint function f i s a gener alize d erial s game

Pro of Consider some i and some j i The pa y o G r U r r is monotonic i n r

i i i i j

since c r is monotonic in r and U is monotonic i n c oreo M v er from the construction

i j i i

it is clear that c r c r r ro lal r r r s o the same olds h for the pa y o G r

i i j j j j i i

Consider the unction f g x G r U x c r Since U is con v ex and the

i i i i i i

opp ortunit y set x c r is strictly conca v e there is a unique p oin t f o tangency nd

i i

so the game has unique b st e replies BR r Lastly consider ome s agen t j suc h that

i i

r BR r V arying r c hanges the opp ortunit y set x c r but he t tangen tat

j i i j i i

x BR r remains nc u hanged Therefore the b est reply remains nc u hanged

i i

Therefore the serial mec hanism is Oolv able in this setting Dee the serial so cial

c hoice function as the allo cation resulting from the nique Nash equilibrium of this game

This so cial c hoice function is ob viously Omplem en table

Corollary The s erial m e chanism with strictly monotonic and c onc ave utilities and a

strictly c onvex c onstr aint function f i s O olvable

Discussion

One migh task wh y if one can only implem en t strategypro of so cial c hoice functions do es one

b other with the mec hanism design paradigm at all Wh ynto wal a ys use the direct metho d

The serial mec hanism is a formalization of the fair queuing p ac k et sc heduling algorithm in routers

emers et al v arian ts of fair queuing re a curren tly implemen ted on ome s In ternet routers

onc

ll forasking for utilities to b e rev ealed and then pplying a F i nstead of using an indirect mec ha

nism M n I the former case y ou can utilize the o f c al p oin t nature of truthful ev r elation and

can implem en t ll a strategypro of so cial c hoice unctions f whereas in the indirect m etho d one

can only implem en t a narro w er class of so cial c hoice functions SP and Maskin Monotonic

While in man y cases it is ob viously preferable to use irect d metho s d there are o ccasions

where indirect mec hanisms are referable p In some con texts the tilit u y unctions f are v ery

complex and rev ealing them in v olv es signian t comm unic ation o v erhead F or instance the

p erformance of a video application is not a simple function of sa yhe a v erage and v ariance

of the pac k et dela ys instead the p erformance dep ends n o the exact string of p ac k et dela ys

In suc h cases the abilit y o t use indirect mec hanism s with their substan tially less complex

signaling is a signian t adv an tage

In addition and p erhaps m ore fundamen tally n nma ynet w ork situations the agen ts do

not kno w heir t exact utilit y f unctions Agen ts can compare t w o diren tlev els o f service and

decide with w hic h hey t are h appier but hey t cannot abstractly represen t t hese rade t

without activ ely exp e riencing them F or instance the o ptimal trade b et w een bandwidth

and dela y in a video stream f or an agen t will dep end on m an y details of the particular

instance s uc h as the particular scene b eing transmitted the exact ela d y istribution d and

the clarit y of sp eec dhn anqu tifying this relationship b eforehand is quite impractical T o

use an analogy p s cifying e the exact utilit y function of suc h net w ork pplications a is m uc h

lik e trying to sp ecify he t optimal con trast setting on a television set Since the optimal

con trast setting dep ends on man y d etails suc h s a the ligh ting in the ro m o nd a the darkness

of the scene most users could not accurately articulate the underlying utilit y function most

of us merely turn the con trast knob un w e otice n hat t a n y deviation from that setting

pro duces w orse results Similarly n anm ynte w orking situations users an c compare their

## Comments 0

Log in to post a comment