A Window of Cognition:Eyetracking the
Reasoning Process in Spatial Beauty
Contest Games
∗
ChunTing Chen,ChenYing Huang and Joseph Taoyi Wang
February 5,2013
Abstract
We study the reasoning process people utilize to reach a decision in an environ
ment where ﬁnal choices are well understood,the associated theory is procedural,
and the decisionmaking process is observable.In particular,we introduce a two
person “beauty contest” game played spatially on a twodimensional plane.Players
choose locations and are rewarded by hitting “targets” dependent on opponents’ lo
cations.By tracking subjects’ eye movements (termed the lookups),we infer their
reasoning process and classify subjects into various types based on a levelk model.
More than half of the subjects’ classiﬁcations coincide with their classiﬁcations us
ing ﬁnal choices alone,supporting a literal interpretation of the levelk model for
subject’s reasoning process.When choice data is noisy,lookup data could provide
additional separation of types.
Keywords beauty contest game,levelk model,best response hierarchy,cognitive
hierarchy
JEL C91,C72,D87
∗
Department of Economics,National Taiwan University,21 HsuChow Road,Taipei 100,
Taiwan.Chen:r94323016@ntu.edu.tw,cuc230@psu.edu;Huang:chenying@ntu.edu.tw;Wang:
josephw@ntu.edu.tw (corresponding author).Research support was provided by the National Science
Council of Taiwan (grant 962415H002006).Joseph thanks the advice,guidance and support of Colin
F.Camerer.We thank Ming Hsu for valuable suggestions that direct us to eyetracking.We thank
comments from Vincent Crawford,Rosemarie Nagel,Matthew Shum,YiTing Chen,ShihHsun Hsu,
ChingKang Ing,ChungMing Kuan,and the audience of the ESA 2008 International Meeting and North
American Region Meeting,TEA 2008 Annual Meeting,2009 Stony Brook Workshop on Behavioral Game
Theory,AEA 2010 Annual Meeting,2010 KEEL conference,the 12th BDRM conference and the 2010
World Congress of the Econometric Society.
1
I Introduction
Since Samuelson [1938] developed the theory of revealed preferences,economic theory has
been focusing on interpreting people’s observed choices as directly reﬂecting their personal
preferences,usually unobserved by outsiders.Based on the theoretical predictions,empir
ical researchers then collect data either fromnatural occurring or controlled environments,
and construct econometric models to analyze it.The revealed preference approach has
achieved tremendous success by simply assuming utility optimization.Nonetheless,this
focus on ﬁnal choices (and the preferences they reﬂect) does not exclude the possibility
of analyzing the decisionmaking process in the middle.Just as modern theories of the
ﬁrm open up the black box of proﬁt maximization and explore the eﬀect of contracts and
organizational structures within the ﬁrm,there is no reason why economic theory cannot
consider the reasoning process prior to the ﬁnal decision,especially when it is potentially
observable and can help make better predictions.
In many cases,the economic theory could potentially suggest a procedure by which
people calculate and reason to determine what is the best.When economic theories
provide clear predictions on the underlying decisionmaking process,it is natural to ask
whether one could test these predictions using some form of empirical data.For example,
in extensive form games,subgame perfect equilibrium is typically solved by backward
induction,a procedure that can be carried out (and therefore tested) stepbystep by
players of the game.Hence,Camerer et al.[1993] and Johnson et al.[2002] employ a
mousetracking technology called “mouselab” to test predictions of backward induction,
and ﬁnd evidence against it even in threestage bargaining games.In addition to testing
predictions,one could also use a procedural theory to analyze how diﬀerent reasoning
processes can lead to systematically diﬀerent behavior.For example,Krajbich,Armel and
Rangel [2010] consider an attentional driftdiﬀusion model and demonstrate how diﬀerent
decision thresholds can lead to speciﬁc premature choices in an individual decisionmaking
problem.More recently,Koszegi and Szeidl [2013] consider the possibility that people
focus on certain attributes of available options,and hence,become prone to present bias
and time inconsistency problems.
In this paper,we attempt to study the reasoning process as well as ﬁnal choices in
a gametheoretic environment.In particular,we consider the reasoning process people
utilize to reach a decision,in which they perform diﬀerent levels of strategic reasoning.
To conduct this alternative research strategy of studying the decisionmaking process,
there are three important requirements on the task to use.First,we need a setting in
which ﬁnal choices are well understood and mature theories exist to explain how choices
are made.This is because if there is still no consensus regarding which theory best explains
ﬁnal choices and why,it is conceivably harder to come up with satisfactory hypotheses
2
on reasoning processes to base tests on.Secondly,to make a plausible hypothesis on
reasoning,we want the associated economic theory to be more procedural.In other
words,there is room that if the theory is taken literally,it makes predictions on not only
choices,but also a particular reasoning process that leads to the ﬁnal choice.Finally,we
require some data collection method that will allow us to observe the reasoning process
and for that purpose the task used has to suit the method.
We design a new set of games,termed twoperson spatial beauty contest games,to
analyze individual’s reasoning process by observing lookup patterns with videobased
eyetracking,meeting all three requirements as follows.This new set of games,as its
name suggests,is essentially a graphical simpliﬁcation of the pbeauty contest games
for two players.
1
It is known that initial responses in the pbeauty contest games can
be well explained by theories of heterogeneous levels of rationality such as the levelk
model.
2
Since levelk models can predict choices well in these guessing games,the ﬁrst
requirement that mature theory exists to explai ﬁnal choices is met.Logically the next
question should be on whether they can also predict the reasoning processes.A key in
the levelk model is that players of higher levels of rationality best respond to players of
lower levels,who in turn best respond to players of even lower levels and so on.This
best response procedural hierarchy is the perfect candidate for modeling the reasoning
process of a subject prior to making the ﬁnal choice,since in a twoperson game,the
ﬁnal choice should be a best response to the subject’s belief regarding the other player’s
choice,which in turn is a best response to the subject’s belief about the other player’s
belief about her choice,and so on.
3
In other words,to ﬁgure out which choice to make,
a subject has to go through a particular best response hierarchical procedure.Thus,the
second requirement is squarely met since by taking the levelk model procedurally,one can
come up with a natural hypothesis regarding the reasoning process.Lastly,the graphical
representation of the spatial beauty contest games induces subjects to go through this
hierarchical procedure of best responses by counting on the computer screen (instead of
reasoning in their minds),leaving footprints that the experimenter can trace,and thus
the third requirement is met.
We eyetrack each subject’s reasoning process by recording the entire sequence of lo
cations she looks at.In other words,we record not only her ﬁnal choice,but also every
1
Nagel [1995],Ho,Camerer and Weigelt [1998] studied the pbeauty contest game.Variants of two
person guessing games are studied by CostaGomes and Crawford [2006] and Grosskopf and Nagel [2008].
However,unlike the two person guessing game considered in Grosskopf and Nagel [2008],choosing the
boundary is not a dominant strategy in our spatial beauty contest game.
2
Levelk models are proposed and applied by Stahl and Wilson [1995],Nagel [1995],and CostaGomes
and Crawford [2006].A related model,the cognitive hierarchy model is proposed by Camerer,Ho and
Chong [2004].
3
To avoid confusion,the subject is denoted by her while her opponent is denoted by him.
3
location the subject has ever ﬁxated at in an experimental trial realtime.Following the
convention,we call this realtime ﬁxation data the “lookups” even though there is really
nothing to be looked up in our experiment.When a subject reasons through a particular
best response hierarchy,designated by her levelk type,each step of thinking is charac
terized as a “state.” To describe changes between the thinking states of a subject,we
construct a constrained Markovswitching model between these states.Eye ﬁxations con
ditional on each thinking state are then modeled to allow for logit errors due to imprecise
eyetracking or peripheral vision.We classify subjects into various levelk types based on
maximum likelihood estimation using individual lookup data.Moreover,we adopt an
empirical likelihood ratio test for nonnested but overlapping models proposed by Vuong
[1989] to ensure the distinctive separation of the estimated type from other competing
types.Results show that among the seventeen subjects we tracked,one follows the level
0 (L0) best response hierarchy the closest with her lookups,six follow the level1 (L1)
hierarchy,four follow the level2 (L2) hierarchy,another four follow the level3 (L3) hier
archy,and the remaining two follow the equilibrium (EQ) best response hierarchy,which
coincides with level4 (L4) hierarchy in most games of our experiment.Treating the EQ
type as having a thinking step of 4,the average thinking step is 2:00,in line with results
of other pbeauty contest games.
If the levelk model can predict not only choices but also reasoning processes well,the
estimated level of a player when we analyze her lookups should coincide with her level
when we analyze her choices alone since k reﬂects her strategic sophistication.To check
whether the lookup data indeed align well with choice data,we classify subjects by using
their ﬁnal choice data only.We ﬁnd that choicebased and lookupbased classiﬁcations are
pretty consistent,classifying ten of the seventeen subjects as the same type.Consistency
between choicebased and lookupbased classiﬁcations suggests that for a high percentage
of subjects,if their lookups are classiﬁed as a particular levelk type,their ﬁnal choices
follow the prediction of that levelk type as well.This is a strong support to a literal
interpretation of the levelk model to explain subjects’ reasoning process and ﬁnal choice
altogether in the spatial beauty contest game.It means that the corresponding best
response hierarchy implied by each levelk type is literally carried out by subjects.
We look further into the subtle diﬀerence between lookup and choice data even though
for the majority of subjects they align well.Among the seven subjects whose two clas
siﬁcations diﬀer,for all but one subject,the choicebased levelk types are not robust to
a (nonparametric) bootstrap procedure,having a misclassiﬁcation rate of at least 18% if
one resamples the choice data and performs the same estimation.On the other hand,
for the ten subjects whose two classiﬁcations are the same,the average misclassiﬁcation
rate is less than 5%.The diﬀerence is signiﬁcant,having a pvalue of 0:0123 according to
4
the MannWhitneyWilcoxon rank sum test.In other words,when the two classiﬁcations
diﬀer,it is when the choice data is noisy.When the two classiﬁcations agree,choice data
is quite robust.This leaves open the possibility that lookup data may help classify sub
jects more sharply since when they diﬀer,choice data is noisy and thus there is room to
improve choice estimation.
Even when the level based on lookups and that based on choices diﬀer,the level based
on lookups does a reasonable job in predicting choices and is thus a viable alternative to
the choicebased type.In fact,for six out of seven subjects whose two classiﬁcations dif
fer,their types based on analyzing lookups predict ﬁnal choices reasonably well,ranking
second in terms of likelihood.
4
According to a bootstrap procedure,their lookupbased
types are also the second most successful types in predicting choices.Moreover,we demon
strate how lookups indeed provide better classiﬁcation when choicebased estimation is
not robust through an outofsample prediction exercise.We estimate the models with
2=3 of the trials and predict the ﬁnal choices of the remaining trials on the nine subjects
whose ﬁnal choices are not robust according to the bootstrap procedure.We show that
the lookupbased model is superior in terms of both mean square errors and economic
value (Camerer,Ho and Chong,2004).To sum up,when the classiﬁcations based on
lookups and choices diﬀer,the lookup type predicts choices reasonably well.Moreover,
when the choice data is noisy,we can predict the later choices of a subject better by her
earlier lookup data than by her earlier choice data.In other words,looking into players’
reasoning process gives us valuable information if we are to classify them properly.
5
In the related literature,some experimental studies do attempt to investigate “infor
mation search” patterns in games,in order to capture part of the reasoning process.In
addition to Camerer et al.[1993] and Johnson et al.[2002],CostaGomes,Crawford and
Broseta [2001] and CostaGomes and Crawford [2006] also employ the mousetracking
technology “mouselab” to study payoﬀ lookups in normal form games and information
search in twoperson guessing games.Gabaix,Laibson,Moloche and Weinberg [2006] also
use mouselab to observe information acquisition and analyze aggregate information search
patterns to test a heuristic “directed cognition” model.More recently,Wang,Spezio and
Camerer [2010] employ eyetracking to observe the decisionmaking process of a deceptive
sender in senderreceiver games.In all these studies some information must be withheld,
and “lookedup” by subjects during the experiment.Hence,these studies rely on informa
tion search to infer certain stages of the reasoning process,instead of directly observing
the entire process itself.Our paper diﬀers from these previous attempts by observing
4
The last subject’s type based on lookups ranked third.The most successful type is of course the one
based on analyzing choices.
5
Even if we focus on the seven subjects whose two classiﬁcations diﬀer,the lookupbased model is still
superior in terms of mean square errors and is comparable in economic value.
5
lookup patterns when there is no explicit hidden information to be acquired.We directly
observe the reasoning process instead of making an inference on it.To the best of our
knowledge,this is the ﬁrst paper analyzing the reasoning process directly and comparing
it with ﬁnal choice.Speciﬁcally,it is the graphical feature of our design that makes direct
observations of reasoning processes possible.This points to the importance of tailoring
games for tracking decisionmaking.The structure of the pbeauty contest games implies
a best response hierarchy of reasoning which can be fully exploited in our spatial design.
In other lessstructured games,some viable hypotheses concerning the reasoning process
have to be formed and speciﬁc designs have to be tailor made so that these reasoning
processes can be directly observed.This leaves open an interesting direction for future
research.
6
The remaining of the paper is structured as follows:Section A describes the spatial
beauty contest game and its theoretical predictions;Section B describes details of the
experiment;Section III reports aggregate statistics on lookups;Section IV reports classi
ﬁcation results from the Markovswitching model based on lookups;Section V compares
classiﬁcation results with those based on ﬁnal choices alone.Section VI concludes.
II The Experiment
A The Spatial Beauty Contest Game
We now introduce our design,the equilibrium prediction,the prediction by the levelk
model and formulate the hypotheses which will be tested.To create a spatial version
of the pbeauty contest game,we reduce the number of players to two,so that we can
display the action space of all players on the computer screen visually.Players choose
locations (instead of numbers) simultaneously on a 2dimensional plane attempting to hit
one’s target location determined by the opponent’s choice.The target location is deﬁned
as a relative location to the other player’s choice of location by a pair of coordinates
(x;y).We use the standard Euclidean coordinate system.For instance,(0;−2),means
the target location of a player is “two steps below the opponent,” and (−4;0) means
the target location of a player is “four steps to the left of the opponent.” These targets
are common knowledge to the players.Payoﬀs are determined by how “far” (the sum of
horizontal distance and vertical distance) a player is away from the target.The larger
this distance is,the lower her payoﬀ is.Players can only choose locations on a given grid
6
Several recent levelk papers estimate population mixture models to infer the fraction of levelk types
within the population (Burchardi and Penczynski [2011]).Instead of investigating the population mixture
of types,we focus on how well individual lookup patterns correspond to a particular levelk best response
hierarchy in an environment where we already know the levelk model predicts aggregate subject behavior
fairly well.
6
map,though one’s target may fall outside if the opponent is close to or on the boundary.
7
For example,consider the 7 × 7 grid map in Figure I.For the purpose of illustration,
suppose a player’s opponent has chosen the center location labeled O ((0;0)) and the
player’s target is (−4;0).Then to hit her target,she has to choose location (−4;0).But
location (−4;0) is not on the map,while choosing location (−3;0) is optimal among all
49 feasible choices because location (−3;0) is the only feasible location that is one step
from location (−4;0).
8
The spatial beauty contest game is essentially a spatial version of CostaGomes and
Crawford [2006]’s asymmetric twoperson guessing games,in which one subject would like
to choose α of her opponent’s choice and her opponent would like to choose β of her choice.
Hence,similar to CostaGomes and Crawford [2006],the equilibrium prediction of this
spatial beauty contest game is determined by the targets of both players.For example,if
the targets of the two players are (0;2) and (4;0) respectively,the equilibrium consists of
both players choosing the TopRight corner of the map.This conceptually coincides with
a player hitting the lower bound in the twoperson guessing game of CostaGomes and
Crawford [2006] where αβ is less than 1,or all choosing zero in the pbeauty contest game
where p is less than 1.
9
Note that in general the equilibrium need not be at the corner
since targets can have opposite signs.For example,when the targets are (4;−2) and
(−2;4) played on a 7×7 grid map,the equilibrium locations for the two players are both
two steps away from the corner (labeled as E1 and E2 for the two players respectively in
Figure I).
We derive the equilibrium predictions for the general case as follows.Formally,con
sider a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
).With some abuse of
notation,suppose player i chooses location (x
i
;y
i
) on a map G satisfying (x
i
;y
i
) ∈ G ≡
{−X;−X + 1;:::;X} × {−Y;−Y + 1;:::;Y } where (0;0) is the center of the map.For
instance,(x
i
;y
i
) = (X;Y ) means player i chooses the TopRight corner of the map.The
other player −i also chooses a location (x
−i
;y
−i
) on the same map:(x
−i
;y
−i
) ∈ G.The
payoﬀ to player i in this game is:
p
i
(x
i
;y
i
;x
−i
;y
−i
;a
i
;b
i
) = ¯s −(x
i
−(x
−i
+a
i
) +y
i
−(y
−i
+b
i
))
where
s is a constant.Notice that payoﬀs are decreasing in the number of steps a player is
away fromher target,which in turn depends on the choice of the other player.There is no
7
Similar designs of 3 × 3 games could also be found in Kuo et al.[2009].They addressed diﬀerent
issues.
8
For instance,to go from location (−3;1) to (−4;0),one has to travel one step left and one step down
and hence the distance is 2.
9
However,choosing the TopRight corner is not a dominant strategy,unlike in the symmetric two
person guessing game analyzed by Grosskopf and Nagel [2008].
7
interaction between the choices of x
i
and y
i
.Hence the maximization can be obtained by
choosing x
i
and y
i
separately to minimize the two absolute value terms.We thus consider
the case for x
i
only.The case for y
i
is analogous.
10
To ensure uniqueness,in all our experimental trials,a
i
+a
−i
6= 0.
11
Without loss of
generality,we assume that a
i
+ a
−i
< 0 so that the overall trend is to move leftward.
12
Suppose a
1
< 0.If a
1
a
2
< 0,implying player 1 would like to move leftward but player 2
would like to move rightward,since the overall trend is to move leftward,it is straight
forward to see that the force of equilibrium would make player 1 hit the lower bound
while player 2 will best respond to that.The equilibrium choices of both,denoted by
(x
e
1
;x
e
2
),are characterized by x
e
1
= −X and x
e
2
= −X + a
2
.
13
If a
1
a
2
≥ 0,since both
players would like to move leftward,they will both hit the lower bound.The equilibrium
is characterized by x
e
1
= x
e
2
= −X.To summarize,when a
1
+ a
2
< 0,only the player
whose target is greater than zero will not hit the lower bound.Therefore,as a spatial
analog to Observation 1 of CostaGomes and Crawford [2006],we obtain:
Proposition 1
In a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
) where two players each
choose a location (x
i
;y
i
) ∈ G satisfying G ≡ {−X;−X+1;:::;X}×{−Y;−Y +1;:::;Y },
−2X ≤ a
1
;a
2
≤ 2X and −2Y ≤ b
1
;b
2
≤ 2Y,the equilibrium choices (x
e
i
;y
e
i
) are
characterized by:(I{·} is the indicator function)
(
x
e
i
= −X +a
i
· I{a
i
> 0} if a
i
+a
−i
< 0
x
e
i
= X +a
i
· I{a
i
< 0} if a
i
+a
−i
> 0
and
(
y
e
i
= −Y +b
i
· I{b
i
> 0} if b
i
+b
−i
< 0
y
e
i
= Y +b
i
· I{b
i
< 0} if b
i
+b
−i
> 0
In addition to the equilibrium prediction,one may also specify various levelk pre
dictions.First,we need to determine the anchoring L0 player who is nonstrategic or
10
As an illustrative example,consider a
1
= −2 and a
2
= +1,indicating that player 1 wants to be two
steps to the left of player 2,while player 2 wants to be one step to the right of player 1.
11
Suppose a
1
= −2 and a
2
= +2.Any location where player 1 is two steps to the left of player 2 is an
equilibriumsince player 2 is then two steps to the right of player 1.Note that this corresponds to the case
where αβ = 1 in the twoperson guessing game of CostaGomes and Crawford [2006].If a
i
= −a
−i
= a,
any feasible x
i
;x
−i
satisfying x
i
−x
−i
= a constitutes an equilibrium.
12
In the illustrative example of a
1
= −2 and a
2
= +1,(−2) +1 < 0.Due to symmetry,all other cases
are isomorphic to this case.
13
In the illustrative example of a
1
= −2 and a
2
= +1,the equilibrium is (x
e
1
;x
e
2
) = (−X;−X+1).We
impose a
i
≤ 2X for all games in the experiment,thus we do not need to worry about the possibility that
x
e
i
lies outside the upper bound X (i.e.,x
e
i
= −X +a
i
> X).In general,if a
i
> 2X,player i would hit
the upper bound and thus x
e
i
= X.Similarly,we assume −2X ≤ a
i
,so we need not worry about the
possibility that x
e
i
lies outside the lower bound −X (i.e.,x
e
i
= X +a
i
< −X).
8
na¨ıve.This is usually done by assuming players choosing randomly.
14
In a spatial set
ting,Reutskaja et al.[2011] ﬁnd the center location focal,while Crawford and Iriberri
[2007a] deﬁne L0 players as being drawn toward focal points in the nonneutral display
of choices.In addition,due to a driftcorrection procedure of the eyetracker (ﬁxating on
a dot at the center and hitting a button or key) prior to every trial,the center location
is the ﬁrst ﬁxation of every trial.Therefore,a natural assumption here is that an L0
player will either choose any location on the map randomly (according to the uniform
distribution),which is on average the center (0;0),or will simply choose the center.An
L1 player i with target (a
i
;b
i
) would best respond to an L0 opponent who either chooses
the center on average or exactly chooses the center,and as a von NeumannMorgenstern
utility maximizer,would choose the same location against these two opponents.
15
If an
L0 player chooses (on average) the center,to best respond,an L1 player would choose the
location (a
i
;b
i
) unless X,Y is too small so that it is not feasible.
16
Similarly,for an L2
opponent j with the target (a
j
;b
j
) to best respond to an L1 player i who chooses (a
i
;b
i
),
he would choose (a
i
+a
j
;b
i
+b
j
) when X,Y is large enough.Repeating this procedure,
one can determine the best responses of all higher levelk (Lk) types.Figure I shows the
various levelk predictions of a 7 × 7 spatial beauty contest game for two players with
targets (4;−2) and (−2;4).
To account for the possibility that one’s target may fall outside the map,we deﬁne
the adjusted choice R(X;Y;(x;y)).Formally,the adjusted choice is given by
R(X;Y;(x;y)) ≡ (min{X;max{−X;x}};min{Y;max{−Y;y}}):
In words,if the ideal best response which hits the target is location (x;y),the adjusted
choice (˜x;˜y) ≡ R(X;Y;(x;y)) gives us the closest feasible location on the map so the
choice (˜x;˜y) is constrained to lie within the range ˜x ∈ {−X;−X+1;:::;X},˜y ∈ {−Y;−Y +
1;:::;Y }.This adjusted choice is the best feasible choice on the map since payoﬀs are
decreasing in the distance between the ideal best response (target) and the ﬁnal choice.
Moreover,as shown in Supplementary Appendix A2,since the grid map is of a ﬁnite size,
eventually when k for a levelk type is large enough,the Lk prediction will coincide with
the equilibrium.To summarize,we have
Proposition 2
14
See CostaGomes,Crawford and Broseta [2001],Camerer,Ho and Chong [2004],CostaGomes and
Crawford [2006] and Crawford and Iriberri [2007b].
15
See proof in Supplementary Appendix A1.This is true because our payoﬀ structure is point symmetric
by (0;0) over the grid map.Hence,it makes no diﬀerence for an L1 opponent whether we assume an L0
player chooses exactly the center,or randomly (on average the center).In our estimation,we assume L0
chooses the center but incorporates random L0 as a special case (when the logit parameter is zero).
16
In this case,an L1 player would choose the closest feasible location.
9
Consider a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
) where two
players choose locations (x
1
;y
1
),(x
2
;y
2
) satisfying (x
i
;y
i
) ∈ G ≡ {−X;−X+1;:::;X}×
{−Y;−Y +1;:::;Y },−2X ≤ a
1
;a
2
≤ 2X and −2Y ≤ b
1
;b
2
≤ 2Y.Denote the choice of
a levelk player i by (x
k
i
;y
k
i
),then (x
0
1
;y
0
1
) = (x
0
2
;y
0
2
) ≡ (0;0) and
1.(x
k
i
;y
k
i
) = R
X;Y;(a
i
+x
k−1
−i
;b
i
+y
k−1
−i
)
for k = 1;2;:::
2.there exists a smallest positive integer
k such that for all k ≥
k,(x
k
i
;y
k
i
) = (x
e
i
;y
e
i
).
Proof.
See Supplementary Appendix A2.
In Table I we list all the 24 spatial beauty contest games used in the experiment,their
various levelk predictions,equilibrium predictions and the minimum
k’s.Notice that in
the ﬁrst 12 games,targets of each player are 1 dimensional while in the last 12 games,
targets are 2 dimensional.Also,Games (2m−1) and (2m) (where m = 1;2;:::;12) are
the same but with reversed roles of the two players,so for instance,Games 1 and 2 are
the same,Games 3 and 4 are the same,etc.
The
k’s for our 24 games are almost always 4,but some are 3 (Games 1,10,17),5
(Games 5,11,12) or 6 (Game 6).This indicates that as long as we include levelk types
with k up to 3 and the equilibrium type,we will not miss the higher levelk types much
since higher types coincide with the equilibrium most of the time.Moreover,as evident
in Table I,diﬀerent levels make diﬀerent predictions.In other words,various levels are
strongly separated on the map.
17
The levelk model predicts what ﬁnal choices are made
for each level k.This is formulated in Hypothesis 1.
Hypothesis 1 (Final Choice) Consider a series of oneshot spatial beauty contest games
without feedback,n = 1;2;:::;N,each with targets (a
1;n
;b
1;n
) and (a
2;n
;b
2;n
) where two
players choose locations (x
1;n
;y
1;n
),(x
2;n
;y
2;n
) satisfying (x
i;n
;y
i;n
) ∈ G
n
≡ {−X
n
;−X
n
+
1;· · ·;X
n
} ×{−Y
n
;−Y
n
+1;· · ·;Y
n
},−2X
n
≤ a
1;n
;a
2;n
≤ 2X
n
,and −2Y
n
≤ b
1;n
;b
2;n
≤
2Y
n
.A levelk subject i’s choice for game n,denoted (x
k
i;n
;y
k
i;n
) is (x
k
i;n
;y
k
i;n
) = R(X
n
;Y
n
;(a
i;n
+
x
k−1
−i;n
;b
i;n
+y
k−1
−i;n
)) as deﬁned in Proposition 2,and this k is constant across games.
Since our games are spatial,players can literally count using their eyes how many steps
on the map they have to move to hit their targets.Thus,a natural way to use lookups is
to take the levelk reasoning processes literally in the following sense.Take an L2 player
as an example,the levelk model implies that she best responds to an L1 opponent,who
in turn best responds to an L0.Therefore,for the L2 player to make a ﬁnal choice,she
17
The only exceptions are L3 and EQ in Games 1,10,17,L2 and L3 in Games 2,6,9,and L2 and
EQ in Game 18.See the underlined predictions in Table I.
10
has to ﬁrst ﬁgure out what an L0 would choose since her opponent thinks of her as an L0.
She then needs to ﬁgure out what her opponent,an L1,would choose.Finally,she has to
make a choice as an L2.It is possible that this process is carried out solely in the mind of
a player.Yet since the games are spatial,one can simply ﬁgure all these out by looking
at and counting on the map.This has the advantage of reducing much memory load and
being much more straightforward.If this hypothesis is true,an L2 player would look
at the center (where an L0 player would choose),her opponent’s L1 choice and her own
ﬁnal choice as an L2.In other words,the hotspots of an L2 player in her lookups would
consist of these three locations on the map.This is probably the most natural prediction
on the lookup data one can make when the underlying model is the levelk model.Hence
we formulate Hypothesis 2 and base our econometric analysis of lookups on this.
Hypothesis 2 (Lookup) Consider a series of oneshot spatial beauty contest games with
targets (a
1;n
;b
1;n
) and (a
2;n
;b
2;n
) where two players choose locations (x
1;n
;y
1;n
),(x
2;n
;y
2;n
)
satisfying (x
i;n
;y
i;n
) ∈ G
n
≡ {−X
n
;−X
n
+1;· · ·;X
n
}×{−Y
n
;−Y
n
+1;· · ·;Y
n
},−2X
n
≤
a
1;n
;a
2;n
≤ 2X
n
,and −2Y
n
≤ b
1;n
;b
2;n
≤ 2Y
n
played without feedback.Denote the choice
of a levelk player i by (x
k
i;n
;y
k
i;n
).Assuming one carries out the reasoning process on the
map,a levelk subject i will also:
a.(Duration of Lookups):Fixate at the following locations in the levelk best response
hierarchy (x
0
∙;n
;y
0
∙;n
) (L0 player’s choices),...,(x
k−2
i;n
;y
k−2
i;n
) (own L(k − 2) player’s
choice),(x
k−1
−i;n
;y
k−1
−i;n
) (opponent L(k −1) player’s choice),(x
k
i;n
;y
k
i;n
) (own Lk player’s
choice) associated with that particular k longer than random.
18
b.(Sequence of Lookups):Have ﬁxation sequences for each game n with many tran
sitions from (x
K−1
−i;n
;y
K−1
−i;n
) to (x
K
i;n
;y
K
i;n
) for K = k;k − 2;:::;and transitions from
(x
K−1
i;n
;y
K−1
i;n
) to (x
K
−i;n
;y
K
−i;n
) for K = k − 1;k − 3;:::(steps of the associated level
k best response hierarchy).
B Experimental Procedure
We conduct 24 spatial beauty contest games (with various targets and map sizes) ran
domly ordered without feedback at the Social Science Experimental Laboratory (SSEL),
California Institute of Technology.Each game is played twice,once on the twodimensional
grid map as shown in Figure II (which we denote as the GRAPH presentation),the other
time as two onedimensional choices chosen separately (see Figure III,denoted as the
SEPARATE presentation).
19
Half of the subjects are shown the twodimensional grid
18
The player subscript of (x
0
∙;n
;y
0
∙;n
) is dropped since both L0 players choose the center.
19
Note that these two presentations are mathematically identical.However,the GRAPH presentation
allows us to trace the decisionmaking process through observing the lookups.
11
maps ﬁrst in trials 124 and the two onedimensional choices later in trials 2548,while
the rest are shown the two onedimensional choices ﬁrst (trials 124) and the maps later
(trials 2548).The results of the two presentations are quite similar,so we focus on the
results of the twodimensional presentation.
20
In addition to recording subjects’ ﬁnal choices,we also employ Eyelink II eyetrackers
(SRresearch Inc.) to track the entire decision process before the ﬁnal choice is made.The
experiment is programmed using the Psychophysics Toolbox of Matlab (Brainard,1997),
which includes the Video Toolbox (Pelli,1997) and the Eyelink Toolbox (Cornelissen
et al.,2002).For every 4 milliseconds,the eyetracker records the location one’s eyes are
looking at on the screen and one’s pupil sizes.Location accuracy is guaranteed by ﬁrst
calibrating subjects’ eyetracking patterns (video images and cornea reﬂections of the eyes)
when they ﬁxate at certain locations on the screen (typically 9 points),interpolating this
calibration to all possible locations,and validating it with another set of similar locations.
Since there is no hidden information in this game,the main goal of eyetracking is not to
record information search.Instead,the goal is to capture how subjects reason before
making their decision and to test whether they think through the best response hierarchy
implied by a literal interpretation of the levelk model.
Before each game,a drift correction is performed in which subjects ﬁxate at the center
of the screen and hit a button (or space bar).This realigns the calibration at the center
of the screen.During each game,when subjects use their eyes to ﬁxate at a location,
the eyetracker sends the current location back to the display computer,and the display
computer lights up the location (real time) in red (as Figures 2 and 3 show).Seeing
this red location,if subjects decide to choose that location,they could hit the space bar.
Subjects are then asked to conﬁrm their choices (“Are you sure?”).They then have a
chance to conﬁrm their choice (“YES”) or restart the process (“NO”) by looking at the
bottom left or right corners of the screen.
In each session,two subjects were recruited to be eyetracked.Since there was no
feedback,each subject was eyetracked in a separate room individually and their results
were matched with the other subject at end of the experiment.Three trials were randomly
drawn from the 48 trials played to be paid.Average payment is US$15.24 plus a showup
fee of US$20.A sample of the instructions can be found in the Supplementary Appendix.
Due to insuﬃcient showup of eligible subjects,three sessions were conducted with only
one subject eyetracked,and their results matched with a subject from a diﬀerent session.
Hence,we have eyetracking data for 17 subjects.
20
A comparison of the ﬁnal choices under these two representations is shown in Supplementary Table
2.None of the subjects’ two sets of ﬁnal choices diﬀer signiﬁcantly.
12
III Lookup Summary Statistics
We ﬁrst summarize subjects’ lookups to test Hypothesis 2a,namely,subjects do look at
and count on the map during their reasoning process.Then,we analyze subjects’ lookups
with a constrained Markovswitching model to classify them into various levelk types to
test Hypothesis 2b.As a part of the estimation,we employ Vuong’s test for nonnested
but overlapping models to ensure separation between competing types.
According to Hypothesis 2a,subjects will spend more time at locations corresponding
to the thinking steps of a particular best response hierarchy.We present aggregate data
regarding empirical lookups for all 24 Spatial Beauty Contest games in Supplementary
Figures 1 through 24.For each game,we calculate the percentage of time a subject spent
on each location.The radius of the circle is proportional to the average percentage of time
spent on each location,so bigger circles indicate longer time spent.The levelk choice
predictions are labeled as O,L1,L2,L3,E for each game.
If Hypothesis 2a were true,the empirical lookups would concentrate on locations
predicted by the levelk best response hierarchy.For some games,many big circles in
Supplementary Figures 1–24 do fall on various locations corresponding to the thinking
steps of the levelk best response hierarchy.
21
However,there seems to be a lot of noise in
the lookup data:Many locations other than those speciﬁed in the best response hierarchy
are also looked up.
We attempt to quantify this concentration of attention.First,we deﬁne Hit area for
every levelk type as the minimal convex set enveloping the locations predicted by this
levelk type’s best response hierarchy in game n.For instance,for an L2 subject i (with
opponent −i),the best response hierarchy consists of (x
0
∙;n
;y
0
∙;n
),(x
1
−i;n
;y
1
−i;n
),(x
2
i;n
;y
2
i;n
).
Thus we can construct a minimal convex set enveloping these three locations.We then
take the union of Hit areas of all levelk types and see if subjects’ lookups are indeed
within the union.Figure IV shows an example of Hit areas for various levelk types in a
7 ×7 spatial beauty contest game with target (4;−2) and the opponent’s target (−2;4)
(Game 16).
Figure V shows the empirical percentage of time spent on the union of Hit areas,
or hit time,denoted as h
t
.Across the 24 games,average hit time is 0:62,ranging from
h
t
= 0:81 (in Game 9),to h
t
= 0:36 (in Game 21).However,hit time depends on the
21
However,not all locations are looked up.This is likely because the error structure of high speed
videobased eyetracking is very diﬀerent from the error structure of mousetracking (such as MouseLab).
In particular,eyetrackers have imprecise spatial resolution due to imperfect calibration and peripheral
vision,but little temporal error (usually 250 or more samples per second).In contrast,mousetracking
has very precise spatial resolution for cursor locations and mouse clicks,but movements of the mouse
cursor need not correspond to movements of the eye.Hybrid methods are a promising direction for future
research.
13
size of the area.Even if subjects scan over the map uniformly,the empirical hit time
would not be zero.Instead,it would be proportional to the size percentage of the union
of Hit areas,or hit area size,denoted as h
as
.To correct for this hit area size bias,we
calculate Selten [1991]’s linear “diﬀerence measure of predicted success,” h
t
−h
as
,i.e.the
diﬀerence between empirical hit time and hit area size,and report it in Figure VI.Note
that if subjects scan randomly over the map,the percentage of time she spends on the
union of the Hit areas will roughly equal the hit area size.By subtracting the hit area
size,we can evaluate how high the empirical hit time is compared with random scanning
over the map.These measures are all positive (except for Game 21),strongly rejecting the
null hypothesis of random lookups.The pvalue of one sample ttest is 0:0001,suggesting
that subjects indeed spend a disproportionately long time on the union of Hit areas.
In fact,sometimes subjects have hit time nearly 1.For example,Figure VII shows the
lookups of subject 2 in round 17,acting as a Member B.The diameter of each ﬁxation
circle is proportional to the length of each lookup.Note that these circles fall almost
exclusively on the best response hierarchy of an L2,which is exactly her levelk type
(based on lookups) according to the ﬁfth column of Table II.
To sum up,the aggregate result is largely consistent with Hypothesis 2a that subjects
look at locations of the levelk best response hierarchy longer than random scanning
would imply,although the data is noisy.We next turn to test Hypothesis 2b and consider
whether individual lookup data can be used to classify subjects into various levelk types.
IV A MarkovSwitching Model for Levelk
Reasoning
A The State Space
According to Hypothesis 2b,a levelk type subject i goes through a particular best re
sponse hierarchy associated with her levelk type during the reasoning process,and carries
out transitions from
x
K−1
−i;n
;y
K−1
−i;n
to
x
K
i;n
;y
K
i;n
,for K = k;k−2;· · ·,and transitions from
x
K−1
i;n
;y
K−1
i;n
to
x
K
−i;n
;y
K
−i;n
for K = k −1;k −3;· · ·.Taking level2 as an example,the
two key transition steps are from(x
0
i;n
;y
0
i;n
) to (x
1
−i;n
;y
1
−i;n
),thinking as a level1 opponent,
bestresponding to her as a level0 player and from (x
1
−i;n
;y
1
−i;n
) to (x
2
i;n
;y
2
i;n
),thinking
as a level2 player,bestresponding to a level1 opponent.Hence,the reasoning process
of a level2 subject i consists of three stages.First,she would ﬁxate at (x
0
i;n
;y
0
i;n
) since
she believes her opponent is level1,who believes she is level0.Then,she would ﬁxate
at (x
1
−i;n
;y
1
−i;n
),thinking through her opponent’s choice as a level1 best responding to a
level0.Finally,she would best respond to the belief that her opponent is a level1 by
14
making her choice ﬁxating at (x
2
i;n
;y
2
i;n
).These reasoning processes are gone through in
the mind of a subject and may be reﬂected in her lookups.
We deﬁne each stage of the reasoning process as a state.The states are in the mind of
a subject.If she is a level2,then according to the best response hierarchy of reasoning,in
her mind,there are three states.To distinguish a state regarding beliefs about self from
beliefs about the opponent,if a state is about the opponent,we indicate it by a minus
sign.Thus,for a level2 player,three states,namely s = 0 (ﬁxating at the location of
(x
0
i;n
;y
0
i;n
) since she thinks her opponent thinks she is a level0),s = −1 (ﬁxating at the
location of (x
1
−i;n
;y
1
−i;n
) since she thinks her opponent is a level1),and s = 2 (ﬁxating at
the location of (x
2
i;n
;y
2
i;n
) since she is a level2),are expected to be passed through during
the reasoning process of a level2 subject.We hasten to point out that these states are in
the mind of a subject.It is not the level of a player.Take a level2 subject as an example.
Her level,according to the levelk model,is 2.But there are three states,s = 0,s = −1,
and s = 2,in her mind.Which state she is in depends on what she is currently reasoning
about.A level2 subject could be at state s = −1 because at that point of time,she is
thinking about what her opponent would choose,who is a level1 according to the best
response hierarchy.However,this state s = −1 is not to be confused with k = 1 for a
level1 subject (whose states of thinking consist of s = −0 and s = 1).
More generally,for a levelk subject,deﬁne s = k as the highest state indicating that
she is contemplating a choice by ﬁxating at the location (x
k
i;n
;y
k
i;n
),best responding to an
opponent of level(k −1).Imagining what an opponent of level(k −1) would do,state
s = −(k −1) is deﬁned as the second highest state when her ﬁxation is at the location
(x
k−1
−i;n
;y
k−1
−i;n
) contemplating her opponent’s choice by best responding to herself as a level
(k −2).
22
Lower states s = k −2;s = −(k −3);:::;etc.are deﬁned similarly.Then,steps
of reasoning of a subject’s best response hierarchy of Hypothesis 2b (associated with a
particular “k”) can be expressed as “0;:::;k −2;−(k −1);k.” We regard these (k +1)
steps of reasoning as the (k + 1) states of the mind for a levelk player i.Hence,for a
levelk subject,state space Ω
k
consists of all thinking steps in the best response hierarchy
of this particular levelk type.Thus,Ω
k
= {0;:::;−(k −3);k −2;−(k −1);k}.
B The Constrained Markov Transition Process
To account for the transitions of states within a subject’s mind,we employ a Markov
switching model by Hamilton [1989] and characterize the transition of states by a Markov
transition matrix.Instead of requiring a levelk subject to “strictly” obey a monotonic
order of levelk thinking going from lower states to higher states,we allow subjects to
22
We use the minus sign (−) to refer to players contemplating about their opponent.Note that the
lowest state 0 can be about one’s own or the opponent.Thus the state 0 and −0 should be distinguished.
For the ease of exposition,we do not make this distinction and call the lowest state 0.
15
move back from higher states to lower states.This is to account for the possibilities that
subjects may go back to double check as may be typical in experiments.However,since a
levelk player best responds to a level(k −1) opponent,it is diﬃcult to imagine a subject
jumping from the reasoning state of say s = (k −2) to that of s = k without ﬁrst going
through the reasoning state of s = −(k −1).Thus,we restrict the probabilities for all
transitions that involve a jump in states to be zero.
23
Speciﬁcally,suppose the subject is a particular levelk.Let S
t
be the random variable
representing subject’s state at time t,drawn from the state space
Ω
k
= {0;:::;−(k −3);k −2;−(k −1);k}:
Let the realization of the state at time t be s
t
.Denote the state history up to time t
by S
t
≡ {s
1
;:::;s
t−1
;s
t
}.
24
Since lookups may be serially correlated,we model this by
estimating a constrained Markov stationary transition matrix of states.Let the transition
probability from state S
t−1
= s
t−1
to S
t
= s
t
be
Pr(S
t
= s
t
S
t−1
= s
t−1
) = π
s
t−1
→s
t
:(1)
Thus,the state transition matrices θ
k
for levelk types for k ∈ {0;1;2;3;4} are
θ
0
= (π
0→0
) = (1);θ
1
=
π
0→0
π
0→1
π
1→0
π
1→1
!
;θ
2
=
π
0→0
π
0→−1
0
π
−1→0
π
−1→−1
π
−1→2
π
2→0
π
2→−1
π
2→2
;
θ
3
=
π
0→0
π
0→1
0 0
π
1→0
π
1→1
π
1→−2
0
π
−2→0
π
−2→1
π
−2→−2
π
−2→3
π
3→0
π
3→1
π
3→−2
π
3→3
;θ
4
=
π
0→0
π
0→−1
0 0 0
π
−1→0
π
−1→−1
π
−1→2
0 0
π
2→0
π
2→−1
π
2→2
π
2→−3
0
π
−3→0
π
−3→−1
π
−3→2
π
−3→−3
π
−3→4
π
4→0
π
4→−1
π
4→2
π
4→−3
π
4→4
:
Note that the upper triangle where the column number is greater than one plus the row
number is restricted to zero since we do not allow for jumps.
C From States to Lookups
When a subject is in a particular state,her reasoning will be reﬂected in the lookups which
we can track.Recall that for each game n,G
n
is the map on which she can ﬁxate at.
23
Estimation results without such restrictions are similar to the results presented belowand are provided
in Supplementary Table 4:12 of the 17 subjects are classiﬁed as the same levelk lookup type.
24
In the experiment,subjects could look at the entire computer screen.Here,we only consider lookups
that fall on the grid map and drop the rest.
16
Deﬁne a statetolookup mapping l
k
n
:Ω
k
→G
n
which assigns each state s a corresponding
lookup location on the map G
n
according to the levelk model.
25
Suppose a level2 player
is inferred to be in state s = −1,then by the mapping l
2
n
,her lookup should fall exactly
on the location l
2
n
(−1).In words,when a level2 player is in state s = −1,she is thinking
about what her opponent as a level1 would choose.Hence,the statetolookup mapping
l
2
n
(−1) should be on the location a level1 opponent would choose.If her lookup is not
on that location,we interpret this as an error.We assume a logit error structure so that
looking at locations farther away from l
2
n
(−1) is less likely.
Formally,the lookup sequence in trial n is a time series over t = 1;:::;T
n
where T
n
is
the number of her lookups in this game n.Because of the logit error,a levelk subject
may not look at a location with certainty.Therefore,at the tth lookup,let the random
variable R
t
n
be the probabilistic lookup location in G
n
and its realization be r
t
n
.Denote
the lookup history up to time t by R
t
n
≡ {r
1
n
;:::;r
t−1
n
;r
t
n
}.
Conditional on S
t
= s
t
,the probability distribution of a levelk subject’s probabilistic
lookup R
t
n
is assumed to follow a logit error quantal response model (centered at l
k
n
(s
t
)),
independent of lookup history R
t−1
n
.In other words,
Pr(R
t
n
= r
t
n
S
t
= s
t
;R
t−1
n
) =
exp
−λ
k
r
t
n
−l
k
n
(s
t
)
P
g∈G
n
exp(−λ
k
kg −l
k
n
(s
t
)k)
:(2)
where λ
k
∈ [0;∞) is the precision parameter.If λ
k
= 0,the subject randomly looks
at locations in G
n
.As λ
k
→ ∞,her lookups concentrate on the lookup location l
k
n
(s
t
)
predicted by the state s
t
of a levelk.
Combining the state transition matrix and the logit error,we can calculate the prob
ability of observing lookup r
t
n
conditional on past lookup history R
t−1
n
:
Pr(R
t
n
= r
t
n
R
t−1
n
) =
X
s
t
∈Ω
k
Pr(S
t
= s
t
R
t−1
n
) · Pr(R
t
n
= r
t
n
S
t
= s
t
;R
t−1
n
) (1)
25
For instance,if a level2 player with target (4;−2) in game n = 16 (player 1 as shown in Figure I)
is at state s = 0 at a point of time,the mapping l
2
16
would give us the location l
2
16
(0) = (0;0) which a
level0 player would choose (O in Figure I) since at this particular point of time,she is thinking about
what her opponent thinks she would choose as a level0.Similarly,if a level2 player is in state −1,then
the l
2
16
mapping would give us the location l
2
16
(−1) = (−2;3) which a level1 opponent would choose (L1
2
in Figure I) since at this particular point of time,she is thinking about what her opponent would choose
as a level1.Finally,if a level2 player 1 is in state 2,then the mapping l
2
16
would give us the location
l
2
16
(2) = (2;1) which a level2 subject would choose (L2
1
in Figure I) since at this particular point of
time,she is thinking about her choice as a level2.
17
where
Pr(S
t
= s
t
R
t−1
n
)
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
R
t−1
n
) · Pr(S
t
= s
t
S
t−1
= s
t−1
;R
t−1
n
)
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
R
t−1
n
) · π
s
t−1
→s
t
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
R
t−2
n
) Pr(R
t−1
n
= r
t−1
n
S
t−1
= s
t−1
;R
t−2
n
)
Pr(R
t−1
n
= r
t−1
n
R
t−2
n
)
·π
s
t−1
→s
t
:(2)
The second equality in equation (2) follows since according to the Markov property,
S
t−1
= s
t−1
is suﬃcient to predict S
t
= s
t
.Note that equation (2) depends on the Markov
transition matrix.Meanwhile,the second term on the right hand side of equation (1)
(Pr(R
t
n
= r
t
n
S
t
= s
t
;R
t−1
n
)) depends on the logit error.Notice that all the terms on the
last line of equation (2) are now expressed with the time index moving backwards by one
period.Hence,for a given game n,coupled with the initial distribution of states,the joint
density of a levelk subject’s empirical lookups,denoted by
f
k
n
(r
1
n
;:::;r
Tn−1
n
;r
Tn
n
) ≡ Pr(r
1
n
;:::;r
Tn−1
n
;r
Tn
n
)
= Pr(r
1
n
) Pr(r
2
n
r
1
n
) Pr(r
3
n
r
1
n
;r
2
n
):::Pr(r
T
n
n
r
1
n
;r
2
n
;:::;r
T
n
−1
n
);
can be derived.
26
The log likelihood over all 24 trials is thus
L(λ
k
;θ
k
) = ln
"
24
Y
n=1
f
k
n
(r
1
n
;:::;r
T
n
−1
n
;r
T
n
n
)
#
:(3)
Since levelk reasoning starts from the lowest state (here state 0),we assume this
initial distribution of states degenerates to a mass point at the lowest state corresponding
to level0 (of herself if k is even and of her opponent if k is odd).With this assumption,
we estimate the precision parameter λ
k
and the constrained Markov transition matrix θ
k
using maximum likelihood estimation for each k,and classify subjects into the particular
levelk type which has the largest likelihood.
To summarize,for each level k,we estimate a state transition matrix and a precision
parameter for the logit error.Thus for a given initial distribution of the states,we know
the probability distribution of states at any point of time using the state transition matrix.
Moreover,at any point of time,the mapping l
k
n
from the state to the lookup gives us the
lookup location corresponding to any state when there is no error.Coupled with the error
26
See Supplementary Appendix A3 for a formal derivation.
18
structure,we can calculate the probability distribution of various errors and therefore the
distribution of predicted lookup locations.We then maximize the likelihood to explain
the entire observed sequence of lookups.We do this for various levels.The ﬁnal step is
to select the k in various levelk types to best explain the observed sequence of lookups
for each subject.
D Vuong’s Test for NonNested but Overlapping Models
The above econometric model may be plagued by an overﬁtting problemsince higher level
k types have more states and hence more parameters.It is not surprising if one discovers
that models with more parameters ﬁt better.In particular,the Markovswitching model
for levelk has (k +1) states with a (k +1) ×(k +1) transition matrix.This gives the
model
h
k(k+3)
2
i
parameters in the transition matrix alone.
27
For example,a level2 subject
has 3 states 0,−1,and 2 and ﬁve (Markov) parameters,but a level1 subject has only 2
states 0 and 1 and two (Markov) parameters.Hence,we need to make sure our estimation
does not select higher levels merely because it contains more states and more parameters.
However,usual tests for model restrictions may not apply,since the parameters involved
in diﬀerent levelk types could be nonnested.In particular,the state space of a level2
subject {0;−1;2} and the states of a level1 subject {0;1} are not nested.Yet,the state
space of a level1 type,{0;1},is nested in the state space of a level3 type,{0;1;−2;3}.
In order to evaluate the classiﬁcation,we use Vuong’s test for nonnested but overlapping
models (1989).
28
Let Lk
∗
be the type which has the largest likelihood with corresponding parameters
(λ
k
∗;θ
k
∗).Let Lk
a
be an alternative type with corresponding parameters (λ
k
a;θ
k
a).In
our case Lk
∗
is the type with the largest likelihood based on lookups.The alternative
type Lk
a
is the type having the next largest likelihood among all lower level types.
29
If
according to Vuong’s test,Lk
∗
is a better model than Lk
a
,we can be assured that the
maximum likelihood criterion does not pick up the reported type by mere chance.Thus,
we conclude that the lookupbased type is Lk
∗
.If instead we ﬁnd that according to
Vuong’s test,Lk
∗
and Lk
a
are equally good,then we conservatively classify the subject
as the second largest lower type Lk
a
.
Table II shows the results of the maximum likelihood estimation and Vuong’s test
27
Since each row sums up to one and elements with the column index greater than the row index plus
one are zero,we have in total (k +1)(k +1) −(k +1) −[k(k −1)]=2 = [k(k +3)]=2 parameters.
28
See Supplementary Appendix A4 for the details of Vuong’s test for nonnested but overlapping models.
Note that this is the generalized version of the wellknown “nested” Vuong’s test.
29
Recall that the reason why we look at Vuong’s test is to avoid overﬁtting.Hence,if the alternative
type has a larger transition matrix (more parameters) but a lower likelihood,there is no point to perform
a test,since Lk
∗
will not suﬀer from the problem of overﬁtting because it has fewer parameters but has
a higher likelihood.This leads us to consider only lower level types as the alternative type.
19
for each subject.For each subject,we list her Lk
∗
type,her Lk
a
type,her Vuong’s
test statistic,and her lookupbased type according to Vuong’s test in order.Six of the
seventeen subjects (subjects 1,5,6,8,11,13) pass Vuong’s test and have their lookup
based type as Lk
∗
.The remaining eleven subjects are conservatively classiﬁed as Lk
a
.
The overall results are summarized in column (A) of Table III.After employing Vuong’s
test,the type distribution for (L0;L1;L2;L3;EQ) is (1;6;4;4;2).
30
The distribution is
slightly higher than typical type distributions reported in previous studies.In particular,
there are two EQ’s and four L3’s,accounting for more than one third of the data.Treating
the EQ type as having a thinking step of 4,we ﬁnd that the average number of thinking
steps is 2:00,in line with results of the standard pbeauty contest games using Caltech
subjects,but higher than normal subjects.
31
Neither employing Hansen [1992]’s test
(to avoid nuisance parameter problems),nor iteratively applying Vuong’s test (until the
likelihood of the current type is signiﬁcantly higher than that of the next alternative)
alters the distribution of levelk types by much (see A4 and Supplementary Table 3).
Up to now,we have shown that lookups do fall on the hotspots of the best response
hierarchy (Hypothesis 2a).Classifying subjects based on lookups (Hypothesis 2b) gives
us a reasonable level of sophistication as argued above.However,one might still wonder
whether the results reported in Table II is due to a misspeciﬁcation of possible types.
After all,many assumptions are required for Hypothesis 2b to hold.We take up this
issue now.Our argument is that if we take the levelk theory literally to interpret under
lying reasoning process,the classiﬁcation based on lookups should match well with the
classiﬁcation using ﬁnal choices alone since the level k reﬂects a player’s sophistication.
V Matching Up with Final Choices
We ﬁrst classify subjects using their ﬁnal choices and compare classiﬁcations based on
choices to those based on lookups.We point out the similarity between these two clas
siﬁcation results.Finally we address how lookup data could help classify subjects when
the choice data is noisy.
Following the literature,we classify individual subjects into various levelk types based
on ﬁnal choices alone.Supplementary Appendix A5 provides details of the maximum
30
Ignoring the two pseudo17 subjects (subjects 3 and 17,both classiﬁed as L1) whose choices sug
gest noncompliance to levelk theory,the type distribution for (L0;L1;L2;L3;EQ) is (1;4;4;4;2).For
pseudotypes,refer to CostaGomes and Crawford [2006].
31
Camerer [1997] reports that Caltech students play an average of 21:88 in a pbeauty contest game with
p = 0:7.This is between L2’s choice of 24:5 and L3’s choice of 17:15.Higher than typical distributions
could also result from the spatial beauty contest game being intuitive and not requiring mathematical
multiplication (as compared with say,the standard pbeauty contest game),as Chou et al.[2009] show
that a graphical presentation of the standard pbeauty contest game yields results closer to equilibrium.
20
likelihood estimation and pseudotype test we adopt from CostaGomes and Crawford
[2006],and subjectbysubject results are reported in the sixth column of Table II.The
idea of the pseudotype is to treat each subject’s choices as a possible type.This is to
examine whether there are clusters of subjects whose choices resemble each other’s and
thus predict other’s choices in the cluster better than the prespeciﬁed levelk types.
Since we have 17 subjects,we include 17 pseudotypes,each constructed from one of our
subject’s choices in 24 trials.The aggregate distribution of types (with or without the
pseudotype test) are reported in column (B) and (C) of Table III.In Table III,the choice
based and lookupbased classiﬁcation results look similar.The choice results indicate
slightly more steps of reasoning (2:12 − 2:13 for choicebased types instead of 2:00 for
lookupbased types).This suggests that the lookupbased estimation (and the underlying
Hypothesis 2b) is in the right ballpark.In fact,if we consider the classiﬁcation results
on a subjectbysubject basis,the similarity between the two estimations are even more
evident.As reported in Table II,overall,for ten out of the seventeen subjects,their
lookupbased types and the choicebased types are the same.In other words,for most
subjects,when their choices reﬂect a particular level of sophistication,their lookup data
suggests the same level of sophistication.Such alignment in classiﬁcation results would
be surprising if one thought Hypothesis 2b was too strong a claim.This supports a literal
interpretation of the levelk model.When a subject’s choice data indicates a particular
level of sophistication,her lookups suggest that the best response hierarchy of that level
is carried out when she reasons.
Since the classiﬁcation based on lookups and that based on choices align,we next turn
to discuss the subtle diﬀerences between them.We evaluate the robustness of individual
choicebased classiﬁcation by performing bootstrap.This is a departure from past lit
erature such as CostaGomes and Crawford [2006],as they do not consider whether the
maximum likelihood estimation has enough power to distinguish between various types.
For example,reading from Supplementary Table 1,for subject 14,the log likelihood is
−98:89 for L0,−84:17 for L1,−96:99 for L2,−76:67 for L3,and −74:45 for EQ.Maxi
mumlikelihood estimation classiﬁes her as EQ,although the likelihood of L3 is also close.
In this case,classifying this subject as EQ based on maximum likelihood alone may be
questionable.To the best of our knowledge,there has not been any proposed test in
experimental economics for evaluating the robustness of maximum likelihoodbased type
classiﬁcations.Hence we propose a bootstrap procedure (Efron [1979];Efron and Tib
shirani [1994]) to deal with the issue of robustness.
32
Imagine that from the maximum
likelihood estimation,a subject is classiﬁed as a particular levelk type with the logit
32
CostaGomes and Crawford [2006] do use various information criteria to perform the horserace.
However,this still fails to address how much the runnerup is “close” to the winner.
21
error parameter λ
k
.Draw (with replacement) 24 new trials out of the original dataset
and reestimate her k and λ
k
.We do this 1000 times to generate the discrete distribu
tion of k and the distribution of λ
k
.Then,we evaluate the robustness of k by looking
at the distribution of k.Each levelk type estimated from a resampled dataset that is
not the same as her original levelk type is viewed as a “misclassiﬁcation,” and counted
against the original classiﬁcation k.By calculating the total misclassiﬁcation rate (out
of 1000 resamples),we can measure the robustness of the original classiﬁcation.This
bootstrap procedure is in the spirit of the test reported in Salmon [2001],which evaluates
the robustness of the parameters estimated in a EWA learning model using simulated
data.
The results of this bootstrap procedure are listed in Table IV.For each subject,we
report the bootstrap distribution of k (the number of times a subject is classiﬁed into L0,
L1,L2,L3 or EQ in the 1000 resampled datasets).The bootstrap misclassiﬁcation rate
(percentage of times classifying the subject as a type diﬀerent from her original type) is
listed in the last column.For example,subject 14 is originally classiﬁed as EQ,but is
only reclassiﬁed as EQ 587 times during the bootstrap procedure.Subject 14 is instead
classiﬁed as L3 228 times and as L1 185 times.Hence,the distribution on the number
of times that subject 14 is classiﬁed into L0,L1,L2,L3 or EQ in the 1000 resampled
datasets is (0;185;0;228;587) and the corresponding misclassiﬁcation rate is 0:413.
The bootstrap results align surprisingly well with whether the lookupbased classi
ﬁcations match their choicebased types.In particular,for the ten subjects whose two
classiﬁcations match,all but three of themhave (choicebased) bootstrap misclassiﬁcation
rates lower than 0:05,suggesting that their classiﬁcations are truly sharp.
33
In contrast,
for six of the remaining seven subjects whose two classiﬁcations do not match,their
choicebased type have bootstrap misclassiﬁcation rates higher than 18:4%,suggesting
that misclassifying these subjects into the wrong types using choice data alone (due to in
signiﬁcantly larger likelihoods) is possible.The diﬀerence is signiﬁcant,having a pvalue
of 0:0123 according to MannWhitneyWilcoxon rank sum test.To sum up,when the
lookupbased types match the choicebased types,it is when the choicebased classiﬁca
tion is quite sharp.In contrast,when they diﬀer,the classiﬁcation based on choice is not
that sharp,suggesting that for these subjects,choice data may not be enough.
In this case,one wonders whether lookup data could provide additional separation of
types to predict choices.A closer look at Table IV (see the type underlined) indicates for
ten subjects,when we resample their choices,the level they are most frequently classiﬁed
into in the 1000 resampled choice datasets is exactly their level classiﬁed using their
33
One of these three subjects (subject 17) fails the pseudotype test and is unlikely to resemble any of
the levelk types.The remaining two subjects (subjects 2 and 4) have a misclassiﬁcation rate of 0:076
and 0:110.These are marginally higher than 0:05.
22
lookups.
34
For six other subjects,their lookupbased type is the one they are second most
frequently classiﬁed into.
35
In fact,these subjects’ lookupbased type also rank second in
terms of likelihood based on choices.
36
A subject’s lookupbased type is classiﬁed using
her lookups,not using her choices.The high predictability of choices by her lookupbased
type suggests that the lookupbased type is a viable alternative for predicting choices even
when the lookupbased types diﬀer from the choicebased types.
In order to evaluate whether lookup data can indeed improve classiﬁcation,we perform
an outofsample prediction horserace between the lookupbased and choicebased types.
Note that our lookupbased model makes predictions on lookups,not on ﬁnal choice per
se.However,we can ﬁrst classify individual subjects into a particular levelk type based
on either lookups or choices using two thirds of the trials,and see how well the classiﬁed
levelk type predicts the ﬁnal choices of the remaining one third of trials.In particular,
for each subject,we classify her as a levelk
l
16
type based on lookups (using the ﬁrst 16
sequences of lookups) and a levelk
c
16
type based on ﬁnal choices (using the ﬁrst 16 ﬁnal
choices) respectively.We then use these particular k’s (one for lookup,the other for
choice) to predict ﬁnal choices of the last eight trials.Since we are mainly interested in
how lookup data can provide additional separation of types (to predict behavior) when
choice data is insuﬃcient,we group subjects into those whose choicebased classiﬁcation
is robust (having bootstrap misclassiﬁcation rates greater than 0:05 as reported in the
right panel of Table II),and those who is not.
To compare the prediction power of the two models,we report mean square errors
of the predicted choices for the lookupbased and choicebased models.In particular,
suppose a subject chose location g
n
= (x
n
;y
n
) in trial n,while the lookupbased and
choicebased models predicted (x
l
n
;y
l
n
) and (x
c
n
;y
c
n
).Then,the mean square errors of the
two models are
x
n
−x
l
n
2
+
y
n
−y
l
n
2
and x
n
−x
c
n

2
+y
n
−y
c
n

2
respectively.As reported
in Table V,though overall performance of the two models are comparable,among the nine
subjects whose choicebased types are not robust,the lookupbased model has a better
mean square error of 5:75 (compared with 8:67 for the choicebased model) predicting
the last eight trials.
37
A Wilcoxon sign rank test shows that this diﬀerence is marginally
signiﬁcant (p = 0:0781).
38
To see how signiﬁcant this gain in prediction power is,we calculate the “economic
34
They are subjects 1,2,4,5,7,10,12,13,16,17 (those whose two classiﬁcations match).
35
They are subjects 3,6,8,9,11,15.
36
Refer to the likelihood double underlined in Supplementary Table 1.
37
Even among the “robust” subjects,subject 7 is the only one whose lookupbased model has a much
larger mean square error than the choicebased model.
38
If we focus only on the seven subjects whose two classiﬁcations diﬀer,the lookupbased model still
has a better mean square error of 6.55 (compared with 8.68 for the choicebased model),though not
statistically signiﬁcant.
23
value” (cf.Camerer,Ho and Chong,2004) of the two models,to evaluate how much these
predictions could potentially add to the opponent’s payoﬀs.In particular,we calculate
the opponent’s payoﬀs had they followed these models and best responded to the model
predictions,π
Follow
,and see how much an opponent can gain in addition to his actual
payoﬀs,π
Actual
,in the experiment.The economic value is the percentage of this gain,
compared with the maximum gain possible,π
BR
:(Note that economic values could be
negative if the model performs worse than actual subjects.)
EV =
π
Follow
−π
Actual
π
BR
−π
Actual
Results in the last two columns of Table V show that both choicebased and lookup
based models have good predictive power (compared to actual subjects) and can (on
average) increase opponent payoﬀs by 39−41%.Moreover,the bootstrap robustness test
indeed evaluates choicebased models well—the second panel of Table V show that for
the robustness subjects,the average economic value for the choicebased model is 56:3%,
higher than the lookupbased model (42:0%).On the other hand,the lookupbased model
is a good compliment,especially when choice data is not good enough:As shown in the
the ﬁrst panel of Table V,for the nonrobust subjects,the average economic value for the
lookupbased model is 40:4%,compared with 24:3% for the choicebased model.In other
words,among the subjects whose choicebased type is not robust to bootstrap,had the
opponent known her lookupbased level,his payoﬀs could be increased by 40:4%.As a
comparison,had the opponent known her choicebased level,his payoﬀs could be increased
by 24:3%.
To summarize,these results show that lookup data can help us conﬁrm classiﬁcation
results based on choices alone and even provide better classiﬁcation results when choice
based classiﬁcations are not robust.Moreover,lookup data provide a chance to put the
levelk model to an ultimate test,asking if the model can not only predict ﬁnal choices,
but also describe the decisionmaking process employed by subjects by going through the
best response hierarchy speciﬁed in Hypothesis 2b.Results in Table II show that the
levelk model does indeed hold up under this test for our spatial beauty contest games.
One ought to keep in mind that explaining the reasoning process is a hard one,if not
harder than explaining choices.Seeing in our dataset,for more than a half of subjects,
their lookupbased types are aligned with their choicebased types should be read as a
strong support to the levelk model.This may be due to the graphical nature of the spatial
beauty contest games.How general this result is should be tested in future experiments
in which the reasoning process can somehow be analyzed.
24
VI Conclusion
We introduce a new spatial beauty contest game in which the process of reasoning can
be tracked,and provide theoretical predictions based on the equilibrium and a literal
interpretation of the levelk theory.The theoretical predictions of the levelk model yield
a plausible hypothesis on the decisionmaking process when the game is actually played.
We then conduct laboratory experiments using videobased eyetracking technology to
test this conjecture,and ﬁt the eyetracking data on lookups using a constrained Markov
switching model of levelk reasoning.Results show that based on lookups,experimental
subjects’ lookup sequences could be classiﬁed into following various levelk best response
hierarchies,which for more than a half of themcoincide with types that they were classiﬁed
into using ﬁnal choices alone.Moreover,when the two classiﬁcations diﬀer,most of the
choicebased types are not robust to bootstrap,indicating that we might have misclassiﬁed
them due to insigniﬁcantly larger likelihoods.In fact,lookupbased types often come out
second (if not ﬁrst) in the bootstrap procedure.Finally,for all subjects whose choice
based models are not robust to bootstrap,an out of sample prediction exercise shows
that lookupbased models predict ﬁnal choices better.This suggests that studying the
reasoning process (such as through eyetracking lookups) can indeed help us understand
economic behavior (such as individual’s ﬁnal choices) better.
Analyzing reasoning processes is a hard task.The spatial beauty contest game is
designed to fully exploit the structure of the pbeauty contest so that subjects are induced
to literally count on the map to carry out their reasoning as implied by the best response
hierarchy of a levelk theory.The high percentage of subjects whose classiﬁcations based
on lookups and choices align could be read as a support to the levelk model as a complete
theory of reasoning and choice altogether in the spatial beauty contest game.Whether
this holds true for more general games remains to be seen.Nevertheless,the paper points
out a possibility of analyzing reasoning before arriving at choices.A design exploiting
the structure of the game and is ideal for the tracking technology used seems to be
indispensable.
Pennsylvania State University
National Taiwan University
National Taiwan University
VII References
Brainard,D.H.[1997],‘The psychophysics toolbox’,Spatial Vision 10,433–436.
25
Burchardi,K.B.and Penczynski,S.P.[2011],Out of your mind:Eliciting individual
reasoning in one shot games.
Camerer,C.F.[1997],‘Progress in behavioral game theory’,Journal of Economic Per
spectives 11(4),167–188.
Camerer,C.F.,Ho,T.H.and Chong,J.K.[2004],‘A cognitive hierarchy model of
games’,Quarterly Journal of Economics 119(3),861–898.
Camerer,C.F.,Johnson,E.,Rymon,T.and Sen,S.[1993],Cognition and Framing in
Sequential Bargaining for Gains and Losses,MIT Press,Cambridge,pp.27–47.
Chou,E.,McConnell,M.,Nagel,R.and Plott,C.R.[2009],‘The control of game form
recognition in experiments:Understanding dominant strategy failures in a simple two
person ”guessing” game’,Experimental Econoimcs 12(2),159–179.
Cornelissen,F.W.,Peters,E.M.and Palmer,J.[2002],‘The eyelink toolbox:Eye
tracking with matlab and the psychophysics toolbox’,Behavior Research Methods,In
struments and Computers 34,613–617.
CostaGomes,M.A.and Crawford,V.P.[2006],‘Cognition and behavior in twoperson
guessing games:An experimental study’,American Economic Review 96(5),1737–1768.
CostaGomes,M.,Crawford,V.P.and Broseta,B.[2001],‘Cognition and behavior in
normalform games:An experimental study’,Econometrica 69(5),1193–1235.
Crawford,V.P.and Iriberri,N.[2007a],‘Fatal attraction:Salience,naivete,and sophis
tication in experimental hideandseek games’,American Economic Review 97(5),1731–
1750.
Crawford,V.P.and Iriberri,N.[2007b],‘Levelk auctions:Can a nonequilibrium model
of strategic thinking explain the winner’s curse and overbidding in privatevalue auc
tions?’,Econometrica 75(6),1721–1770.
Efron,B.[1979],‘Bootstrap methods:Another look at the jackknife’,The Annals of
Statistics 7(1),1–26.
Efron,B.and Tibshirani,R.J.[1994],An Introduction to the Bootstrap,Chapman and
Hall/CRC Monographs on Statistics and Applied Probability,Chapman and Hall/CRC.
Gabaix,X.,Laibson,D.,Moloche,G.and Weinberg,S.[2006],‘Costly information
acquisition:Experimental analysis of a boundedly rational model’,American Economic
Review 96(4),1043–1068.
26
Grosskopf,B.and Nagel,R.[2008],‘The twoperson beauty contest’,Games and Eco
nomic Behavior 62(1),93–99.
Hamilton,J.D.[1989],‘A new approach to the economic analysis of nonstationary time
series and the business cycle’,Econometrica 57(2),357–384.
Hansen,B.E.[1992],‘The likelihood ratio test under nonstandard conditions:Testing
the markov switching model of gnp’,Journal of Applied Econometrics 7(S1),S61–S82.
Ho,T.H.,Camerer,C.F.and Weigelt,K.[1998],‘Iterated dominance and iterated best
response in experimental ”pbeauty contests”’,American Economic Review 88(4),947–
969.
Johnson,E.J.,Camerer,C.,Sen,S.and Rymon,T.[2002],‘Detecting failures of back
ward induction:Monitoring information search in sequential bargaining’,Journal of
Economic Theory 104(1),16–47.
Koszegi,B.and Szeidl,A.[2013],‘A model of focusing in economic choice’,Quarterly
Journal of Economics 128(1),forthcoming.
Krajbich,I.,Armel,C.and Rangel,A.[2010],‘Visual ﬁxations and the computation
and comparison of value in simple choice’,Nature Neuroscience 13(10),1292–1298.
10.1038/nn.2635.
Kuo,W.J.,Sjostrom,T.,Chen,Y.P.,Wang,Y.H.and Huang,C.Y.[2009],‘Intuition
and deliberation:Two systems for strategizing in the brain’,Science 324(5926),519–522.
Nagel,R.[1995],‘Unraveling in guessing games:An experimental study’,American
Economic Review 85(5),1313–1326.FLA 00028282 American Economic Association
Copyright 1995 American Economic Association.
Pelli,D.G.[1997],‘The videotoolbox software for visual psychophysics:Transforming
numbers into movies’,Spatial Vision 10,437–442.
Reutskaja,E.,Nagel,R.,Camerer,C.F.and Rangel,A.[2011],‘Search dynamics in con
sumer choice under time pressure:An eyetracking study’,American Economic Review
101(2),900–926.
Salmon,T.C.[2001],‘An evaluation of econometric models of adaptive learning’,Econo
metrica 69(6),1597–1628.FLA 00129682 Econometric Society Copyright 2001 The
Econometric Society.
27
Samuelson,P.A.[1938],‘A note on the pure theory of consumer’s behaviour’,Economica
5(17),61–71.
Selten,R.[1991],‘Properties of a measure of predictive success’,Mathematical Social
Sciences 21(2),153–167.
Stahl,Dale,O.and Wilson,P.W.[1995],‘On players’ models of other players:Theory
and experimental evidence’,Games and Economic Behavior 10(1),218–254.
Vuong,Q.[1989],‘Likelihood ratio tests for model selection and nonnested hypotheses’,
Econometrica 57(2),307–333.
Wang,J.T.y.,Spezio,M.and Camerer,C.F.[2010],‘Pinocchio’s pupil:Using eye
tracking and pupil dilation to understand truth telling and deception in senderreceiver
games’,American Economic Review 100(3),1–26.
28
29
Figures and Tables
3
L1
2
L3
2
E
2
2
L2
2
1
L2
1
E
1
0
O
L3
1
1
2 L1
1
3
3 2 1 0 1 2 3
Figure I: Equilibrium and Levelk Predictions of a 7x7 Spatial Beauty Contest Game
with Targets (4, 2) and (2, 4) (Game 16). Predictions specifically for player 1 with
Target (4,2) are L1
1
~ E
1
, and predictions for player 2 with Target (2,4) are L1
2
~ E
2
.
O stands for the prediction of L0 for both players. Note that Lk
1
and Lk
2
are the best
responses to L(k1)
2
and L(k1)
1
, respectively. For example, L2
2
’s choice (1,2) is the
best response to L1
1
since (3,2) + (2, 4) = (1, 2).
30
Figure II: Screen Shot of the GRAPH Presentation
Figure III: Screen Shot of the SEPARATE Presentation
31
Figure IV: Hit Areas for Various Levelk Types in Game 16 (7x7 with Target (4, 2) and
the Opponent Target (2, 4). Hit area is the minimal convex set enveloping the locations
predicted by each levelk type’s best response hierarchy.
Note: If we refer to Figure 1, for player 1, the Hit Area for level1 is the minimal convex
set enveloping the locations (O, L1
1
). The Hit Area for level2 is the minimal convex
set enveloping the locations (O, L1
2
, L2
1
), and so on.
Figure V: Aggregate Empirical Percentage of Time Spent on the Union of Hit Areas
(“Hit Time”) in Each Game
0.00
0.20
0.40
0.60
0.80
1.00
9 24162010 6 172319 5 151112 4 8 2 14132218 7 3 1 21
Game
Hit time
Hit time
32
Figure VI: Aggregate Linear Difference Measure of Predicted Success in Each Game. It
measures the difference between hit time and the hit area size.
Figure VII: Subject 2’s Eye Lookups in Trial 17 (as a Member B). The radius of the
circle is proportional to the length of that lookup, so bigger circles indicate longer time
spent.
33
Table I: Levelk, Equilibrium Predictions and Minimum
k
’s in All Games
Game
Map size
Player 1
target
Player 2
target
L0 L1 L2 L3 EQ
k
1
9
×
9

2
,
0
0
,

4
0
,
0

2
,
0

2
,

4

4
,

4

4
,

4
3
2
9
×
9
0
,

4

2
,
0
0
,
0
0
,

4

2
,

4

2
,

4

4
,

4
4
3
7
×
7
2
,
0
0
,

2
0
,
0
2
,
0
2
,

2
3
,

2
3
,

3
4
4
7
×
7
0
,

2
2
,
0
0
,
0
0
,

2
2
,

2
2
,

3
3
,

3
4
5
11
×
5
2
,
0
0
,
2
0
,
0
2
,
0
2
,
2
4
,
2
5
,
2
5
6
11
×
5
0
,
2
2
,
0
0
,
0
0
,
2
2
,
2
2
,
2
5
,
2
6
7
9
×
7

2
,
0
0
,

2
0
,
0

2
,
0

2
,

2

4
,

2

4
,

3
4
8
9
×
7
0
,

2

2
,
0
0
,
0
0
,

2

2
,

2

2
,

3

4
,

3
4
9
7
×
9

4
,
0
0
,
2
0
,
0

3
,
0

3
,
2

3
,
2

3
,
4
4
10
7
×
9
0
,
2

4
,
0
0
,
0
0
,
2

3
,
2

3
,
4

3
,
4
3
11
7
×
9
2
,
0
0
,
2
0
,
0
2
,
0
2
,
2
3
,
2
3
,
4
5
12
7
×
9
0
,
2
2
,
0
0
,
0
0
,
2
2
,
2
2
,
4
3
,
4
5
13
9
×
9

2
,

6
4
,
4
0
,
0

2
,

4
2
,

2
0
,

4
2
,

4
4
14
9
×
9
4
,
4

2
,

6
0
,
0
4
,
4
2
,
0
4
,
2
4
,
0
4
15
7
×
7

2
,
4
4
,

2
0
,
0

2
,
3
1
,
2
0
,
3
1
,
3
4
16
7
×
7
4
,

2

2
,
4
0
,
0
3
,

2
2
,
1
3
,
0
3
,
1
4
17
11
×
5
6
,
2

2
,

4
0
,
0
5
,
2
4
,
0
5
,
0
5
,
0
3
18
11
×
5

2
,

4
6
,
2
0
,
0

2
,

2
3
,

2
2
,

2
3
,

2
4
19
9
×
7

6
,

2
4
,
4
0
,
0

4
,

2

2
,
1

4
,
0

4
,
1
4
20
9
×
7
4
,
4

6
,

2
0
,
0
4
,
3
0
,
2
2
,
3
0
,
3
4
21
7
×
9

2
,

4
4
,
2
0
,
0

2
,

4
1
,

2
0
,

4
1
,

4
4
22
7
×
9
4
,
2

2
,

4
0
,
0
3
,
2
2
,

2
3
,
0
3
,

2
4
23
7
×
9

2
,
6
4
,

4
0
,
0

2
,
4
1
,
2
0
,
4
1
,
4
4
24
7
×
9
4
,

4

2
,
6
0
,
0
3
,

4
2
,
0
3
,

2
3
,
0
4
Note: Each row corresponds to a game and contains the following information in order: (1) the
game number, (2) the size of the grid map for that game, (3) the target of player 1, (4) the target
of player 2, (5) the theoretic prediction of L0 for player 1, (6) the theoretic prediction of L1 for
player 1, (7) the theoretic prediction of L2 for player 1, (8) the theoretic prediction of L3 for player
1, (9) the theoretic prediction of EQ for player 1, and (10) the minimum
k
for player 1 such that
as long as the level is weakly higher, the choice of that type is the same as the choice of EQ
Comments 0
Log in to post a comment