A Window of Cognition: Eyetracking the

gudgeonmaniacalAI and Robotics

Feb 23, 2014 (3 years and 1 month ago)

76 views

A Window of Cognition:Eyetracking the
Reasoning Process in Spatial Beauty
Contest Games

Chun-Ting Chen,Chen-Ying Huang and Joseph Tao-yi Wang
February 5,2013
Abstract
We study the reasoning process people utilize to reach a decision in an environ-
ment where final choices are well understood,the associated theory is procedural,
and the decision-making process is observable.In particular,we introduce a two-
person “beauty contest” game played spatially on a two-dimensional plane.Players
choose locations and are rewarded by hitting “targets” dependent on opponents’ lo-
cations.By tracking subjects’ eye movements (termed the lookups),we infer their
reasoning process and classify subjects into various types based on a level-k model.
More than half of the subjects’ classifications coincide with their classifications us-
ing final choices alone,supporting a literal interpretation of the level-k model for
subject’s reasoning process.When choice data is noisy,lookup data could provide
additional separation of types.
Keywords beauty contest game,level-k model,best response hierarchy,cognitive
hierarchy
JEL C91,C72,D87

Department of Economics,National Taiwan University,21 Hsu-Chow Road,Taipei 100,
Taiwan.Chen:r94323016@ntu.edu.tw,cuc230@psu.edu;Huang:chenying@ntu.edu.tw;Wang:
josephw@ntu.edu.tw (corresponding author).Research support was provided by the National Science
Council of Taiwan (grant 96-2415-H-002-006).Joseph thanks the advice,guidance and support of Colin
F.Camerer.We thank Ming Hsu for valuable suggestions that direct us to eyetracking.We thank
comments from Vincent Crawford,Rosemarie Nagel,Matthew Shum,Yi-Ting Chen,Shih-Hsun Hsu,
Ching-Kang Ing,Chung-Ming Kuan,and the audience of the ESA 2008 International Meeting and North
American Region Meeting,TEA 2008 Annual Meeting,2009 Stony Brook Workshop on Behavioral Game
Theory,AEA 2010 Annual Meeting,2010 KEEL conference,the 12th BDRM conference and the 2010
World Congress of the Econometric Society.
1
I Introduction
Since Samuelson [1938] developed the theory of revealed preferences,economic theory has
been focusing on interpreting people’s observed choices as directly reflecting their personal
preferences,usually unobserved by outsiders.Based on the theoretical predictions,empir-
ical researchers then collect data either fromnatural occurring or controlled environments,
and construct econometric models to analyze it.The revealed preference approach has
achieved tremendous success by simply assuming utility optimization.Nonetheless,this
focus on final choices (and the preferences they reflect) does not exclude the possibility
of analyzing the decision-making process in the middle.Just as modern theories of the
firm open up the black box of profit maximization and explore the effect of contracts and
organizational structures within the firm,there is no reason why economic theory cannot
consider the reasoning process prior to the final decision,especially when it is potentially
observable and can help make better predictions.
In many cases,the economic theory could potentially suggest a procedure by which
people calculate and reason to determine what is the best.When economic theories
provide clear predictions on the underlying decision-making process,it is natural to ask
whether one could test these predictions using some form of empirical data.For example,
in extensive form games,subgame perfect equilibrium is typically solved by backward
induction,a procedure that can be carried out (and therefore tested) step-by-step by
players of the game.Hence,Camerer et al.[1993] and Johnson et al.[2002] employ a
mouse-tracking technology called “mouselab” to test predictions of backward induction,
and find evidence against it even in three-stage bargaining games.In addition to testing
predictions,one could also use a procedural theory to analyze how different reasoning
processes can lead to systematically different behavior.For example,Krajbich,Armel and
Rangel [2010] consider an attentional drift-diffusion model and demonstrate how different
decision thresholds can lead to specific premature choices in an individual decision-making
problem.More recently,Koszegi and Szeidl [2013] consider the possibility that people
focus on certain attributes of available options,and hence,become prone to present bias
and time inconsistency problems.
In this paper,we attempt to study the reasoning process as well as final choices in
a game-theoretic environment.In particular,we consider the reasoning process people
utilize to reach a decision,in which they perform different levels of strategic reasoning.
To conduct this alternative research strategy of studying the decision-making process,
there are three important requirements on the task to use.First,we need a setting in
which final choices are well understood and mature theories exist to explain how choices
are made.This is because if there is still no consensus regarding which theory best explains
final choices and why,it is conceivably harder to come up with satisfactory hypotheses
2
on reasoning processes to base tests on.Secondly,to make a plausible hypothesis on
reasoning,we want the associated economic theory to be more procedural.In other
words,there is room that if the theory is taken literally,it makes predictions on not only
choices,but also a particular reasoning process that leads to the final choice.Finally,we
require some data collection method that will allow us to observe the reasoning process
and for that purpose the task used has to suit the method.
We design a new set of games,termed two-person spatial beauty contest games,to
analyze individual’s reasoning process by observing lookup patterns with video-based
eyetracking,meeting all three requirements as follows.This new set of games,as its
name suggests,is essentially a graphical simplification of the p-beauty contest games
for two players.
1
It is known that initial responses in the p-beauty contest games can
be well explained by theories of heterogeneous levels of rationality such as the level-k
model.
2
Since level-k models can predict choices well in these guessing games,the first
requirement that mature theory exists to explai final choices is met.Logically the next
question should be on whether they can also predict the reasoning processes.A key in
the level-k model is that players of higher levels of rationality best respond to players of
lower levels,who in turn best respond to players of even lower levels and so on.This
best response procedural hierarchy is the perfect candidate for modeling the reasoning
process of a subject prior to making the final choice,since in a two-person game,the
final choice should be a best response to the subject’s belief regarding the other player’s
choice,which in turn is a best response to the subject’s belief about the other player’s
belief about her choice,and so on.
3
In other words,to figure out which choice to make,
a subject has to go through a particular best response hierarchical procedure.Thus,the
second requirement is squarely met since by taking the level-k model procedurally,one can
come up with a natural hypothesis regarding the reasoning process.Lastly,the graphical
representation of the spatial beauty contest games induces subjects to go through this
hierarchical procedure of best responses by counting on the computer screen (instead of
reasoning in their minds),leaving footprints that the experimenter can trace,and thus
the third requirement is met.
We eyetrack each subject’s reasoning process by recording the entire sequence of lo-
cations she looks at.In other words,we record not only her final choice,but also every
1
Nagel [1995],Ho,Camerer and Weigelt [1998] studied the p-beauty contest game.Variants of two-
person guessing games are studied by Costa-Gomes and Crawford [2006] and Grosskopf and Nagel [2008].
However,unlike the two person guessing game considered in Grosskopf and Nagel [2008],choosing the
boundary is not a dominant strategy in our spatial beauty contest game.
2
Level-k models are proposed and applied by Stahl and Wilson [1995],Nagel [1995],and Costa-Gomes
and Crawford [2006].A related model,the cognitive hierarchy model is proposed by Camerer,Ho and
Chong [2004].
3
To avoid confusion,the subject is denoted by her while her opponent is denoted by him.
3
location the subject has ever fixated at in an experimental trial real-time.Following the
convention,we call this real-time fixation data the “lookups” even though there is really
nothing to be looked up in our experiment.When a subject reasons through a particular
best response hierarchy,designated by her level-k type,each step of thinking is charac-
terized as a “state.” To describe changes between the thinking states of a subject,we
construct a constrained Markov-switching model between these states.Eye fixations con-
ditional on each thinking state are then modeled to allow for logit errors due to imprecise
eyetracking or peripheral vision.We classify subjects into various level-k types based on
maximum likelihood estimation using individual lookup data.Moreover,we adopt an
empirical likelihood ratio test for non-nested but overlapping models proposed by Vuong
[1989] to ensure the distinctive separation of the estimated type from other competing
types.Results show that among the seventeen subjects we tracked,one follows the level-
0 (L0) best response hierarchy the closest with her lookups,six follow the level-1 (L1)
hierarchy,four follow the level-2 (L2) hierarchy,another four follow the level-3 (L3) hier-
archy,and the remaining two follow the equilibrium (EQ) best response hierarchy,which
coincides with level-4 (L4) hierarchy in most games of our experiment.Treating the EQ
type as having a thinking step of 4,the average thinking step is 2:00,in line with results
of other p-beauty contest games.
If the level-k model can predict not only choices but also reasoning processes well,the
estimated level of a player when we analyze her lookups should coincide with her level
when we analyze her choices alone since k reflects her strategic sophistication.To check
whether the lookup data indeed align well with choice data,we classify subjects by using
their final choice data only.We find that choice-based and lookup-based classifications are
pretty consistent,classifying ten of the seventeen subjects as the same type.Consistency
between choice-based and lookup-based classifications suggests that for a high percentage
of subjects,if their lookups are classified as a particular level-k type,their final choices
follow the prediction of that level-k type as well.This is a strong support to a literal
interpretation of the level-k model to explain subjects’ reasoning process and final choice
altogether in the spatial beauty contest game.It means that the corresponding best
response hierarchy implied by each level-k type is literally carried out by subjects.
We look further into the subtle difference between lookup and choice data even though
for the majority of subjects they align well.Among the seven subjects whose two clas-
sifications differ,for all but one subject,the choice-based level-k types are not robust to
a (nonparametric) bootstrap procedure,having a misclassification rate of at least 18% if
one resamples the choice data and performs the same estimation.On the other hand,
for the ten subjects whose two classifications are the same,the average misclassification
rate is less than 5%.The difference is significant,having a p-value of 0:0123 according to
4
the Mann-Whitney-Wilcoxon rank sum test.In other words,when the two classifications
differ,it is when the choice data is noisy.When the two classifications agree,choice data
is quite robust.This leaves open the possibility that lookup data may help classify sub-
jects more sharply since when they differ,choice data is noisy and thus there is room to
improve choice estimation.
Even when the level based on lookups and that based on choices differ,the level based
on lookups does a reasonable job in predicting choices and is thus a viable alternative to
the choice-based type.In fact,for six out of seven subjects whose two classifications dif-
fer,their types based on analyzing lookups predict final choices reasonably well,ranking
second in terms of likelihood.
4
According to a bootstrap procedure,their lookup-based
types are also the second most successful types in predicting choices.Moreover,we demon-
strate how lookups indeed provide better classification when choice-based estimation is
not robust through an out-of-sample prediction exercise.We estimate the models with
2=3 of the trials and predict the final choices of the remaining trials on the nine subjects
whose final choices are not robust according to the bootstrap procedure.We show that
the lookup-based model is superior in terms of both mean square errors and economic
value (Camerer,Ho and Chong,2004).To sum up,when the classifications based on
lookups and choices differ,the lookup type predicts choices reasonably well.Moreover,
when the choice data is noisy,we can predict the later choices of a subject better by her
earlier lookup data than by her earlier choice data.In other words,looking into players’
reasoning process gives us valuable information if we are to classify them properly.
5
In the related literature,some experimental studies do attempt to investigate “infor-
mation search” patterns in games,in order to capture part of the reasoning process.In
addition to Camerer et al.[1993] and Johnson et al.[2002],Costa-Gomes,Crawford and
Broseta [2001] and Costa-Gomes and Crawford [2006] also employ the mouse-tracking
technology “mouselab” to study payoff lookups in normal form games and information
search in two-person guessing games.Gabaix,Laibson,Moloche and Weinberg [2006] also
use mouselab to observe information acquisition and analyze aggregate information search
patterns to test a heuristic “directed cognition” model.More recently,Wang,Spezio and
Camerer [2010] employ eyetracking to observe the decision-making process of a deceptive
sender in sender-receiver games.In all these studies some information must be withheld,
and “looked-up” by subjects during the experiment.Hence,these studies rely on informa-
tion search to infer certain stages of the reasoning process,instead of directly observing
the entire process itself.Our paper differs from these previous attempts by observing
4
The last subject’s type based on lookups ranked third.The most successful type is of course the one
based on analyzing choices.
5
Even if we focus on the seven subjects whose two classifications differ,the lookup-based model is still
superior in terms of mean square errors and is comparable in economic value.
5
lookup patterns when there is no explicit hidden information to be acquired.We directly
observe the reasoning process instead of making an inference on it.To the best of our
knowledge,this is the first paper analyzing the reasoning process directly and comparing
it with final choice.Specifically,it is the graphical feature of our design that makes direct
observations of reasoning processes possible.This points to the importance of tailoring
games for tracking decision-making.The structure of the p-beauty contest games implies
a best response hierarchy of reasoning which can be fully exploited in our spatial design.
In other less-structured games,some viable hypotheses concerning the reasoning process
have to be formed and specific designs have to be tailor made so that these reasoning
processes can be directly observed.This leaves open an interesting direction for future
research.
6
The remaining of the paper is structured as follows:Section A describes the spatial
beauty contest game and its theoretical predictions;Section B describes details of the
experiment;Section III reports aggregate statistics on lookups;Section IV reports classi-
fication results from the Markov-switching model based on lookups;Section V compares
classification results with those based on final choices alone.Section VI concludes.
II The Experiment
A The Spatial Beauty Contest Game
We now introduce our design,the equilibrium prediction,the prediction by the level-k
model and formulate the hypotheses which will be tested.To create a spatial version
of the p-beauty contest game,we reduce the number of players to two,so that we can
display the action space of all players on the computer screen visually.Players choose
locations (instead of numbers) simultaneously on a 2-dimensional plane attempting to hit
one’s target location determined by the opponent’s choice.The target location is defined
as a relative location to the other player’s choice of location by a pair of coordinates
(x;y).We use the standard Euclidean coordinate system.For instance,(0;−2),means
the target location of a player is “two steps below the opponent,” and (−4;0) means
the target location of a player is “four steps to the left of the opponent.” These targets
are common knowledge to the players.Payoffs are determined by how “far” (the sum of
horizontal distance and vertical distance) a player is away from the target.The larger
this distance is,the lower her payoff is.Players can only choose locations on a given grid
6
Several recent level-k papers estimate population mixture models to infer the fraction of level-k types
within the population (Burchardi and Penczynski [2011]).Instead of investigating the population mixture
of types,we focus on how well individual lookup patterns correspond to a particular level-k best response
hierarchy in an environment where we already know the level-k model predicts aggregate subject behavior
fairly well.
6
map,though one’s target may fall outside if the opponent is close to or on the boundary.
7
For example,consider the 7 × 7 grid map in Figure I.For the purpose of illustration,
suppose a player’s opponent has chosen the center location labeled O ((0;0)) and the
player’s target is (−4;0).Then to hit her target,she has to choose location (−4;0).But
location (−4;0) is not on the map,while choosing location (−3;0) is optimal among all
49 feasible choices because location (−3;0) is the only feasible location that is one step
from location (−4;0).
8
The spatial beauty contest game is essentially a spatial version of Costa-Gomes and
Crawford [2006]’s asymmetric two-person guessing games,in which one subject would like
to choose α of her opponent’s choice and her opponent would like to choose β of her choice.
Hence,similar to Costa-Gomes and Crawford [2006],the equilibrium prediction of this
spatial beauty contest game is determined by the targets of both players.For example,if
the targets of the two players are (0;2) and (4;0) respectively,the equilibrium consists of
both players choosing the Top-Right corner of the map.This conceptually coincides with
a player hitting the lower bound in the two-person guessing game of Costa-Gomes and
Crawford [2006] where αβ is less than 1,or all choosing zero in the p-beauty contest game
where p is less than 1.
9
Note that in general the equilibrium need not be at the corner
since targets can have opposite signs.For example,when the targets are (4;−2) and
(−2;4) played on a 7×7 grid map,the equilibrium locations for the two players are both
two steps away from the corner (labeled as E1 and E2 for the two players respectively in
Figure I).
We derive the equilibrium predictions for the general case as follows.Formally,con-
sider a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
).With some abuse of
notation,suppose player i chooses location (x
i
;y
i
) on a map G satisfying (x
i
;y
i
) ∈ G ≡
{−X;−X + 1;:::;X} × {−Y;−Y + 1;:::;Y } where (0;0) is the center of the map.For
instance,(x
i
;y
i
) = (X;Y ) means player i chooses the Top-Right corner of the map.The
other player −i also chooses a location (x
−i
;y
−i
) on the same map:(x
−i
;y
−i
) ∈ G.The
payoff to player i in this game is:
p
i
(x
i
;y
i
;x
−i
;y
−i
;a
i
;b
i
) = ¯s −(|x
i
−(x
−i
+a
i
)| +|y
i
−(y
−i
+b
i
)|)
where
s is a constant.Notice that payoffs are decreasing in the number of steps a player is
away fromher target,which in turn depends on the choice of the other player.There is no
7
Similar designs of 3 × 3 games could also be found in Kuo et al.[2009].They addressed different
issues.
8
For instance,to go from location (−3;1) to (−4;0),one has to travel one step left and one step down
and hence the distance is 2.
9
However,choosing the Top-Right corner is not a dominant strategy,unlike in the symmetric two-
person guessing game analyzed by Grosskopf and Nagel [2008].
7
interaction between the choices of x
i
and y
i
.Hence the maximization can be obtained by
choosing x
i
and y
i
separately to minimize the two absolute value terms.We thus consider
the case for x
i
only.The case for y
i
is analogous.
10
To ensure uniqueness,in all our experimental trials,a
i
+a
−i
6= 0.
11
Without loss of
generality,we assume that a
i
+ a
−i
< 0 so that the overall trend is to move leftward.
12
Suppose a
1
< 0.If a
1
a
2
< 0,implying player 1 would like to move leftward but player 2
would like to move rightward,since the overall trend is to move leftward,it is straight-
forward to see that the force of equilibrium would make player 1 hit the lower bound
while player 2 will best respond to that.The equilibrium choices of both,denoted by
(x
e
1
;x
e
2
),are characterized by x
e
1
= −X and x
e
2
= −X + a
2
.
13
If a
1
a
2
≥ 0,since both
players would like to move leftward,they will both hit the lower bound.The equilibrium
is characterized by x
e
1
= x
e
2
= −X.To summarize,when a
1
+ a
2
< 0,only the player
whose target is greater than zero will not hit the lower bound.Therefore,as a spatial
analog to Observation 1 of Costa-Gomes and Crawford [2006],we obtain:
Proposition 1
In a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
) where two players each
choose a location (x
i
;y
i
) ∈ G satisfying G ≡ {−X;−X+1;:::;X}×{−Y;−Y +1;:::;Y },
−2X ≤ a
1
;a
2
≤ 2X and −2Y ≤ b
1
;b
2
≤ 2Y,the equilibrium choices (x
e
i
;y
e
i
) are
characterized by:(I{·} is the indicator function)
(
x
e
i
= −X +a
i
· I{a
i
> 0} if a
i
+a
−i
< 0
x
e
i
= X +a
i
· I{a
i
< 0} if a
i
+a
−i
> 0
and
(
y
e
i
= −Y +b
i
· I{b
i
> 0} if b
i
+b
−i
< 0
y
e
i
= Y +b
i
· I{b
i
< 0} if b
i
+b
−i
> 0
In addition to the equilibrium prediction,one may also specify various level-k pre-
dictions.First,we need to determine the anchoring L0 player who is non-strategic or
10
As an illustrative example,consider a
1
= −2 and a
2
= +1,indicating that player 1 wants to be two
steps to the left of player 2,while player 2 wants to be one step to the right of player 1.
11
Suppose a
1
= −2 and a
2
= +2.Any location where player 1 is two steps to the left of player 2 is an
equilibriumsince player 2 is then two steps to the right of player 1.Note that this corresponds to the case
where αβ = 1 in the two-person guessing game of Costa-Gomes and Crawford [2006].If a
i
= −a
−i
= a,
any feasible x
i
;x
−i
satisfying x
i
−x
−i
= a constitutes an equilibrium.
12
In the illustrative example of a
1
= −2 and a
2
= +1,(−2) +1 < 0.Due to symmetry,all other cases
are isomorphic to this case.
13
In the illustrative example of a
1
= −2 and a
2
= +1,the equilibrium is (x
e
1
;x
e
2
) = (−X;−X+1).We
impose a
i
≤ 2X for all games in the experiment,thus we do not need to worry about the possibility that
x
e
i
lies outside the upper bound X (i.e.,x
e
i
= −X +a
i
> X).In general,if a
i
> 2X,player i would hit
the upper bound and thus x
e
i
= X.Similarly,we assume −2X ≤ a
i
,so we need not worry about the
possibility that x
e
i
lies outside the lower bound −X (i.e.,x
e
i
= X +a
i
< −X).
8
na¨ıve.This is usually done by assuming players choosing randomly.
14
In a spatial set-
ting,Reutskaja et al.[2011] find the center location focal,while Crawford and Iriberri
[2007a] define L0 players as being drawn toward focal points in the non-neutral display
of choices.In addition,due to a drift-correction procedure of the eyetracker (fixating on
a dot at the center and hitting a button or key) prior to every trial,the center location
is the first fixation of every trial.Therefore,a natural assumption here is that an L0
player will either choose any location on the map randomly (according to the uniform
distribution),which is on average the center (0;0),or will simply choose the center.An
L1 player i with target (a
i
;b
i
) would best respond to an L0 opponent who either chooses
the center on average or exactly chooses the center,and as a von Neumann-Morgenstern
utility maximizer,would choose the same location against these two opponents.
15
If an
L0 player chooses (on average) the center,to best respond,an L1 player would choose the
location (a
i
;b
i
) unless X,Y is too small so that it is not feasible.
16
Similarly,for an L2
opponent j with the target (a
j
;b
j
) to best respond to an L1 player i who chooses (a
i
;b
i
),
he would choose (a
i
+a
j
;b
i
+b
j
) when X,Y is large enough.Repeating this procedure,
one can determine the best responses of all higher level-k (Lk) types.Figure I shows the
various level-k predictions of a 7 × 7 spatial beauty contest game for two players with
targets (4;−2) and (−2;4).
To account for the possibility that one’s target may fall outside the map,we define
the adjusted choice R(X;Y;(x;y)).Formally,the adjusted choice is given by
R(X;Y;(x;y)) ≡ (min{X;max{−X;x}};min{Y;max{−Y;y}}):
In words,if the ideal best response which hits the target is location (x;y),the adjusted
choice (˜x;˜y) ≡ R(X;Y;(x;y)) gives us the closest feasible location on the map so the
choice (˜x;˜y) is constrained to lie within the range ˜x ∈ {−X;−X+1;:::;X},˜y ∈ {−Y;−Y +
1;:::;Y }.This adjusted choice is the best feasible choice on the map since payoffs are
decreasing in the distance between the ideal best response (target) and the final choice.
Moreover,as shown in Supplementary Appendix A2,since the grid map is of a finite size,
eventually when k for a level-k type is large enough,the Lk prediction will coincide with
the equilibrium.To summarize,we have
Proposition 2
14
See Costa-Gomes,Crawford and Broseta [2001],Camerer,Ho and Chong [2004],Costa-Gomes and
Crawford [2006] and Crawford and Iriberri [2007b].
15
See proof in Supplementary Appendix A1.This is true because our payoff structure is point symmetric
by (0;0) over the grid map.Hence,it makes no difference for an L1 opponent whether we assume an L0
player chooses exactly the center,or randomly (on average the center).In our estimation,we assume L0
chooses the center but incorporates random L0 as a special case (when the logit parameter is zero).
16
In this case,an L1 player would choose the closest feasible location.
9
Consider a spatial beauty contest game with targets (a
1
;b
1
) and (a
2
;b
2
) where two
players choose locations (x
1
;y
1
),(x
2
;y
2
) satisfying (x
i
;y
i
) ∈ G ≡ {−X;−X+1;:::;X}×
{−Y;−Y +1;:::;Y },−2X ≤ a
1
;a
2
≤ 2X and −2Y ≤ b
1
;b
2
≤ 2Y.Denote the choice of
a level-k player i by (x
k
i
;y
k
i
),then (x
0
1
;y
0
1
) = (x
0
2
;y
0
2
) ≡ (0;0) and
1.(x
k
i
;y
k
i
) = R

X;Y;(a
i
+x
k−1
−i
;b
i
+y
k−1
−i
)

for k = 1;2;:::
2.there exists a smallest positive integer
k such that for all k ≥
k,(x
k
i
;y
k
i
) = (x
e
i
;y
e
i
).
Proof.
See Supplementary Appendix A2.
In Table I we list all the 24 spatial beauty contest games used in the experiment,their
various level-k predictions,equilibrium predictions and the minimum
k’s.Notice that in
the first 12 games,targets of each player are 1 dimensional while in the last 12 games,
targets are 2 dimensional.Also,Games (2m−1) and (2m) (where m = 1;2;:::;12) are
the same but with reversed roles of the two players,so for instance,Games 1 and 2 are
the same,Games 3 and 4 are the same,etc.
The
k’s for our 24 games are almost always 4,but some are 3 (Games 1,10,17),5
(Games 5,11,12) or 6 (Game 6).This indicates that as long as we include level-k types
with k up to 3 and the equilibrium type,we will not miss the higher level-k types much
since higher types coincide with the equilibrium most of the time.Moreover,as evident
in Table I,different levels make different predictions.In other words,various levels are
strongly separated on the map.
17
The level-k model predicts what final choices are made
for each level k.This is formulated in Hypothesis 1.
Hypothesis 1 (Final Choice) Consider a series of one-shot spatial beauty contest games
without feedback,n = 1;2;:::;N,each with targets (a
1;n
;b
1;n
) and (a
2;n
;b
2;n
) where two
players choose locations (x
1;n
;y
1;n
),(x
2;n
;y
2;n
) satisfying (x
i;n
;y
i;n
) ∈ G
n
≡ {−X
n
;−X
n
+
1;· · ·;X
n
} ×{−Y
n
;−Y
n
+1;· · ·;Y
n
},−2X
n
≤ a
1;n
;a
2;n
≤ 2X
n
,and −2Y
n
≤ b
1;n
;b
2;n

2Y
n
.A level-k subject i’s choice for game n,denoted (x
k
i;n
;y
k
i;n
) is (x
k
i;n
;y
k
i;n
) = R(X
n
;Y
n
;(a
i;n
+
x
k−1
−i;n
;b
i;n
+y
k−1
−i;n
)) as defined in Proposition 2,and this k is constant across games.
Since our games are spatial,players can literally count using their eyes how many steps
on the map they have to move to hit their targets.Thus,a natural way to use lookups is
to take the level-k reasoning processes literally in the following sense.Take an L2 player
as an example,the level-k model implies that she best responds to an L1 opponent,who
in turn best responds to an L0.Therefore,for the L2 player to make a final choice,she
17
The only exceptions are L3 and EQ in Games 1,10,17,L2 and L3 in Games 2,6,9,and L2 and
EQ in Game 18.See the underlined predictions in Table I.
10
has to first figure out what an L0 would choose since her opponent thinks of her as an L0.
She then needs to figure out what her opponent,an L1,would choose.Finally,she has to
make a choice as an L2.It is possible that this process is carried out solely in the mind of
a player.Yet since the games are spatial,one can simply figure all these out by looking
at and counting on the map.This has the advantage of reducing much memory load and
being much more straightforward.If this hypothesis is true,an L2 player would look
at the center (where an L0 player would choose),her opponent’s L1 choice and her own
final choice as an L2.In other words,the hotspots of an L2 player in her lookups would
consist of these three locations on the map.This is probably the most natural prediction
on the lookup data one can make when the underlying model is the level-k model.Hence
we formulate Hypothesis 2 and base our econometric analysis of lookups on this.
Hypothesis 2 (Lookup) Consider a series of one-shot spatial beauty contest games with
targets (a
1;n
;b
1;n
) and (a
2;n
;b
2;n
) where two players choose locations (x
1;n
;y
1;n
),(x
2;n
;y
2;n
)
satisfying (x
i;n
;y
i;n
) ∈ G
n
≡ {−X
n
;−X
n
+1;· · ·;X
n
}×{−Y
n
;−Y
n
+1;· · ·;Y
n
},−2X
n

a
1;n
;a
2;n
≤ 2X
n
,and −2Y
n
≤ b
1;n
;b
2;n
≤ 2Y
n
played without feedback.Denote the choice
of a level-k player i by (x
k
i;n
;y
k
i;n
).Assuming one carries out the reasoning process on the
map,a level-k subject i will also:
a.(Duration of Lookups):Fixate at the following locations in the level-k best response
hierarchy (x
0
∙;n
;y
0
∙;n
) (L0 player’s choices),...,(x
k−2
i;n
;y
k−2
i;n
) (own L(k − 2) player’s
choice),(x
k−1
−i;n
;y
k−1
−i;n
) (opponent L(k −1) player’s choice),(x
k
i;n
;y
k
i;n
) (own Lk player’s
choice) associated with that particular k longer than random.
18
b.(Sequence of Lookups):Have fixation sequences for each game n with many tran-
sitions from (x
K−1
−i;n
;y
K−1
−i;n
) to (x
K
i;n
;y
K
i;n
) for K = k;k − 2;:::;and transitions from
(x
K−1
i;n
;y
K−1
i;n
) to (x
K
−i;n
;y
K
−i;n
) for K = k − 1;k − 3;:::(steps of the associated level-
k best response hierarchy).
B Experimental Procedure
We conduct 24 spatial beauty contest games (with various targets and map sizes) ran-
domly ordered without feedback at the Social Science Experimental Laboratory (SSEL),
California Institute of Technology.Each game is played twice,once on the two-dimensional
grid map as shown in Figure II (which we denote as the GRAPH presentation),the other
time as two one-dimensional choices chosen separately (see Figure III,denoted as the
SEPARATE presentation).
19
Half of the subjects are shown the two-dimensional grid
18
The player subscript of (x
0
∙;n
;y
0
∙;n
) is dropped since both L0 players choose the center.
19
Note that these two presentations are mathematically identical.However,the GRAPH presentation
allows us to trace the decision-making process through observing the lookups.
11
maps first in trials 1-24 and the two one-dimensional choices later in trials 25-48,while
the rest are shown the two one-dimensional choices first (trials 1-24) and the maps later
(trials 25-48).The results of the two presentations are quite similar,so we focus on the
results of the two-dimensional presentation.
20
In addition to recording subjects’ final choices,we also employ Eyelink II eyetrackers
(SR-research Inc.) to track the entire decision process before the final choice is made.The
experiment is programmed using the Psychophysics Toolbox of Matlab (Brainard,1997),
which includes the Video Toolbox (Pelli,1997) and the Eyelink Toolbox (Cornelissen
et al.,2002).For every 4 milliseconds,the eyetracker records the location one’s eyes are
looking at on the screen and one’s pupil sizes.Location accuracy is guaranteed by first
calibrating subjects’ eyetracking patterns (video images and cornea reflections of the eyes)
when they fixate at certain locations on the screen (typically 9 points),interpolating this
calibration to all possible locations,and validating it with another set of similar locations.
Since there is no hidden information in this game,the main goal of eyetracking is not to
record information search.Instead,the goal is to capture how subjects reason before
making their decision and to test whether they think through the best response hierarchy
implied by a literal interpretation of the level-k model.
Before each game,a drift correction is performed in which subjects fixate at the center
of the screen and hit a button (or space bar).This realigns the calibration at the center
of the screen.During each game,when subjects use their eyes to fixate at a location,
the eyetracker sends the current location back to the display computer,and the display
computer lights up the location (real time) in red (as Figures 2 and 3 show).Seeing
this red location,if subjects decide to choose that location,they could hit the space bar.
Subjects are then asked to confirm their choices (“Are you sure?”).They then have a
chance to confirm their choice (“YES”) or restart the process (“NO”) by looking at the
bottom left or right corners of the screen.
In each session,two subjects were recruited to be eyetracked.Since there was no
feedback,each subject was eyetracked in a separate room individually and their results
were matched with the other subject at end of the experiment.Three trials were randomly
drawn from the 48 trials played to be paid.Average payment is US$15.24 plus a show-up
fee of US$20.A sample of the instructions can be found in the Supplementary Appendix.
Due to insufficient showup of eligible subjects,three sessions were conducted with only
one subject eyetracked,and their results matched with a subject from a different session.
Hence,we have eyetracking data for 17 subjects.
20
A comparison of the final choices under these two representations is shown in Supplementary Table
2.None of the subjects’ two sets of final choices differ significantly.
12
III Lookup Summary Statistics
We first summarize subjects’ lookups to test Hypothesis 2a,namely,subjects do look at
and count on the map during their reasoning process.Then,we analyze subjects’ lookups
with a constrained Markov-switching model to classify them into various level-k types to
test Hypothesis 2b.As a part of the estimation,we employ Vuong’s test for non-nested
but overlapping models to ensure separation between competing types.
According to Hypothesis 2a,subjects will spend more time at locations corresponding
to the thinking steps of a particular best response hierarchy.We present aggregate data
regarding empirical lookups for all 24 Spatial Beauty Contest games in Supplementary
Figures 1 through 24.For each game,we calculate the percentage of time a subject spent
on each location.The radius of the circle is proportional to the average percentage of time
spent on each location,so bigger circles indicate longer time spent.The level-k choice
predictions are labeled as O,L1,L2,L3,E for each game.
If Hypothesis 2a were true,the empirical lookups would concentrate on locations
predicted by the level-k best response hierarchy.For some games,many big circles in
Supplementary Figures 1–24 do fall on various locations corresponding to the thinking
steps of the level-k best response hierarchy.
21
However,there seems to be a lot of noise in
the lookup data:Many locations other than those specified in the best response hierarchy
are also looked up.
We attempt to quantify this concentration of attention.First,we define Hit area for
every level-k type as the minimal convex set enveloping the locations predicted by this
level-k type’s best response hierarchy in game n.For instance,for an L2 subject i (with
opponent −i),the best response hierarchy consists of (x
0
∙;n
;y
0
∙;n
),(x
1
−i;n
;y
1
−i;n
),(x
2
i;n
;y
2
i;n
).
Thus we can construct a minimal convex set enveloping these three locations.We then
take the union of Hit areas of all level-k types and see if subjects’ lookups are indeed
within the union.Figure IV shows an example of Hit areas for various level-k types in a
7 ×7 spatial beauty contest game with target (4;−2) and the opponent’s target (−2;4)
(Game 16).
Figure V shows the empirical percentage of time spent on the union of Hit areas,
or hit time,denoted as h
t
.Across the 24 games,average hit time is 0:62,ranging from
h
t
= 0:81 (in Game 9),to h
t
= 0:36 (in Game 21).However,hit time depends on the
21
However,not all locations are looked up.This is likely because the error structure of high speed
video-based eyetracking is very different from the error structure of mouse-tracking (such as MouseLab).
In particular,eyetrackers have imprecise spatial resolution due to imperfect calibration and peripheral
vision,but little temporal error (usually 250 or more samples per second).In contrast,mouse-tracking
has very precise spatial resolution for cursor locations and mouse clicks,but movements of the mouse
cursor need not correspond to movements of the eye.Hybrid methods are a promising direction for future
research.
13
size of the area.Even if subjects scan over the map uniformly,the empirical hit time
would not be zero.Instead,it would be proportional to the size percentage of the union
of Hit areas,or hit area size,denoted as h
as
.To correct for this hit area size bias,we
calculate Selten [1991]’s linear “difference measure of predicted success,” h
t
−h
as
,i.e.the
difference between empirical hit time and hit area size,and report it in Figure VI.Note
that if subjects scan randomly over the map,the percentage of time she spends on the
union of the Hit areas will roughly equal the hit area size.By subtracting the hit area
size,we can evaluate how high the empirical hit time is compared with random scanning
over the map.These measures are all positive (except for Game 21),strongly rejecting the
null hypothesis of random lookups.The p-value of one sample t-test is 0:0001,suggesting
that subjects indeed spend a disproportionately long time on the union of Hit areas.
In fact,sometimes subjects have hit time nearly 1.For example,Figure VII shows the
lookups of subject 2 in round 17,acting as a Member B.The diameter of each fixation
circle is proportional to the length of each lookup.Note that these circles fall almost
exclusively on the best response hierarchy of an L2,which is exactly her level-k type
(based on lookups) according to the fifth column of Table II.
To sum up,the aggregate result is largely consistent with Hypothesis 2a that subjects
look at locations of the level-k best response hierarchy longer than random scanning
would imply,although the data is noisy.We next turn to test Hypothesis 2b and consider
whether individual lookup data can be used to classify subjects into various level-k types.
IV A Markov-Switching Model for Level-k
Reasoning
A The State Space
According to Hypothesis 2b,a level-k type subject i goes through a particular best re-
sponse hierarchy associated with her level-k type during the reasoning process,and carries
out transitions from

x
K−1
−i;n
;y
K−1
−i;n

to

x
K
i;n
;y
K
i;n

,for K = k;k−2;· · ·,and transitions from

x
K−1
i;n
;y
K−1
i;n

to

x
K
−i;n
;y
K
−i;n

for K = k −1;k −3;· · ·.Taking level-2 as an example,the
two key transition steps are from(x
0
i;n
;y
0
i;n
) to (x
1
−i;n
;y
1
−i;n
),thinking as a level-1 opponent,
best-responding to her as a level-0 player and from (x
1
−i;n
;y
1
−i;n
) to (x
2
i;n
;y
2
i;n
),thinking
as a level-2 player,best-responding to a level-1 opponent.Hence,the reasoning process
of a level-2 subject i consists of three stages.First,she would fixate at (x
0
i;n
;y
0
i;n
) since
she believes her opponent is level-1,who believes she is level-0.Then,she would fixate
at (x
1
−i;n
;y
1
−i;n
),thinking through her opponent’s choice as a level-1 best responding to a
level-0.Finally,she would best respond to the belief that her opponent is a level-1 by
14
making her choice fixating at (x
2
i;n
;y
2
i;n
).These reasoning processes are gone through in
the mind of a subject and may be reflected in her lookups.
We define each stage of the reasoning process as a state.The states are in the mind of
a subject.If she is a level-2,then according to the best response hierarchy of reasoning,in
her mind,there are three states.To distinguish a state regarding beliefs about self from
beliefs about the opponent,if a state is about the opponent,we indicate it by a minus
sign.Thus,for a level-2 player,three states,namely s = 0 (fixating at the location of
(x
0
i;n
;y
0
i;n
) since she thinks her opponent thinks she is a level-0),s = −1 (fixating at the
location of (x
1
−i;n
;y
1
−i;n
) since she thinks her opponent is a level-1),and s = 2 (fixating at
the location of (x
2
i;n
;y
2
i;n
) since she is a level-2),are expected to be passed through during
the reasoning process of a level-2 subject.We hasten to point out that these states are in
the mind of a subject.It is not the level of a player.Take a level-2 subject as an example.
Her level,according to the level-k model,is 2.But there are three states,s = 0,s = −1,
and s = 2,in her mind.Which state she is in depends on what she is currently reasoning
about.A level-2 subject could be at state s = −1 because at that point of time,she is
thinking about what her opponent would choose,who is a level-1 according to the best
response hierarchy.However,this state s = −1 is not to be confused with k = 1 for a
level-1 subject (whose states of thinking consist of s = −0 and s = 1).
More generally,for a level-k subject,define s = k as the highest state indicating that
she is contemplating a choice by fixating at the location (x
k
i;n
;y
k
i;n
),best responding to an
opponent of level-(k −1).Imagining what an opponent of level-(k −1) would do,state
s = −(k −1) is defined as the second highest state when her fixation is at the location
(x
k−1
−i;n
;y
k−1
−i;n
) contemplating her opponent’s choice by best responding to herself as a level-
(k −2).
22
Lower states s = k −2;s = −(k −3);:::;etc.are defined similarly.Then,steps
of reasoning of a subject’s best response hierarchy of Hypothesis 2b (associated with a
particular “k”) can be expressed as “0;:::;k −2;−(k −1);k.” We regard these (k +1)
steps of reasoning as the (k + 1) states of the mind for a level-k player i.Hence,for a
level-k subject,state space Ω
k
consists of all thinking steps in the best response hierarchy
of this particular level-k type.Thus,Ω
k
= {0;:::;−(k −3);k −2;−(k −1);k}.
B The Constrained Markov Transition Process
To account for the transitions of states within a subject’s mind,we employ a Markov-
switching model by Hamilton [1989] and characterize the transition of states by a Markov
transition matrix.Instead of requiring a level-k subject to “strictly” obey a monotonic
order of level-k thinking going from lower states to higher states,we allow subjects to
22
We use the minus sign (−) to refer to players contemplating about their opponent.Note that the
lowest state 0 can be about one’s own or the opponent.Thus the state 0 and −0 should be distinguished.
For the ease of exposition,we do not make this distinction and call the lowest state 0.
15
move back from higher states to lower states.This is to account for the possibilities that
subjects may go back to double check as may be typical in experiments.However,since a
level-k player best responds to a level-(k −1) opponent,it is difficult to imagine a subject
jumping from the reasoning state of say s = (k −2) to that of s = k without first going
through the reasoning state of s = −(k −1).Thus,we restrict the probabilities for all
transitions that involve a jump in states to be zero.
23
Specifically,suppose the subject is a particular level-k.Let S
t
be the random variable
representing subject’s state at time t,drawn from the state space
Ω
k
= {0;:::;−(k −3);k −2;−(k −1);k}:
Let the realization of the state at time t be s
t
.Denote the state history up to time t
by S
t
≡ {s
1
;:::;s
t−1
;s
t
}.
24
Since lookups may be serially correlated,we model this by
estimating a constrained Markov stationary transition matrix of states.Let the transition
probability from state S
t−1
= s
t−1
to S
t
= s
t
be
Pr(S
t
= s
t
|S
t−1
= s
t−1
) = π
s
t−1
→s
t
:(1)
Thus,the state transition matrices θ
k
for level-k types for k ∈ {0;1;2;3;4} are
θ
0
= (π
0→0
) = (1);θ
1
=

π
0→0
π
0→1
π
1→0
π
1→1
!

2
=



π
0→0
π
0→−1
0
π
−1→0
π
−1→−1
π
−1→2
π
2→0
π
2→−1
π
2→2



;
θ
3
=






π
0→0
π
0→1
0 0
π
1→0
π
1→1
π
1→−2
0
π
−2→0
π
−2→1
π
−2→−2
π
−2→3
π
3→0
π
3→1
π
3→−2
π
3→3







4
=








π
0→0
π
0→−1
0 0 0
π
−1→0
π
−1→−1
π
−1→2
0 0
π
2→0
π
2→−1
π
2→2
π
2→−3
0
π
−3→0
π
−3→−1
π
−3→2
π
−3→−3
π
−3→4
π
4→0
π
4→−1
π
4→2
π
4→−3
π
4→4








:
Note that the upper triangle where the column number is greater than one plus the row
number is restricted to zero since we do not allow for jumps.
C From States to Lookups
When a subject is in a particular state,her reasoning will be reflected in the lookups which
we can track.Recall that for each game n,G
n
is the map on which she can fixate at.
23
Estimation results without such restrictions are similar to the results presented belowand are provided
in Supplementary Table 4:12 of the 17 subjects are classified as the same level-k lookup type.
24
In the experiment,subjects could look at the entire computer screen.Here,we only consider lookups
that fall on the grid map and drop the rest.
16
Define a state-to-lookup mapping l
k
n

k
→G
n
which assigns each state s a corresponding
lookup location on the map G
n
according to the level-k model.
25
Suppose a level-2 player
is inferred to be in state s = −1,then by the mapping l
2
n
,her lookup should fall exactly
on the location l
2
n
(−1).In words,when a level-2 player is in state s = −1,she is thinking
about what her opponent as a level-1 would choose.Hence,the state-to-lookup mapping
l
2
n
(−1) should be on the location a level-1 opponent would choose.If her lookup is not
on that location,we interpret this as an error.We assume a logit error structure so that
looking at locations farther away from l
2
n
(−1) is less likely.
Formally,the lookup sequence in trial n is a time series over t = 1;:::;T
n
where T
n
is
the number of her lookups in this game n.Because of the logit error,a level-k subject
may not look at a location with certainty.Therefore,at the t-th lookup,let the random
variable R
t
n
be the probabilistic lookup location in G
n
and its realization be r
t
n
.Denote
the lookup history up to time t by R
t
n
≡ {r
1
n
;:::;r
t−1
n
;r
t
n
}.
Conditional on S
t
= s
t
,the probability distribution of a level-k subject’s probabilistic
lookup R
t
n
is assumed to follow a logit error quantal response model (centered at l
k
n
(s
t
)),
independent of lookup history R
t−1
n
.In other words,
Pr(R
t
n
= r
t
n
|S
t
= s
t
;R
t−1
n
) =
exp

−λ
k


r
t
n
−l
k
n
(s
t
)



P
g∈G
n
exp(−λ
k
kg −l
k
n
(s
t
)k)
:(2)
where λ
k
∈ [0;∞) is the precision parameter.If λ
k
= 0,the subject randomly looks
at locations in G
n
.As λ
k
→ ∞,her lookups concentrate on the lookup location l
k
n
(s
t
)
predicted by the state s
t
of a level-k.
Combining the state transition matrix and the logit error,we can calculate the prob-
ability of observing lookup r
t
n
conditional on past lookup history R
t−1
n
:
Pr(R
t
n
= r
t
n
|R
t−1
n
) =
X
s
t
∈Ω
k
Pr(S
t
= s
t
|R
t−1
n
) · Pr(R
t
n
= r
t
n
|S
t
= s
t
;R
t−1
n
) (1)
25
For instance,if a level-2 player with target (4;−2) in game n = 16 (player 1 as shown in Figure I)
is at state s = 0 at a point of time,the mapping l
2
16
would give us the location l
2
16
(0) = (0;0) which a
level-0 player would choose (O in Figure I) since at this particular point of time,she is thinking about
what her opponent thinks she would choose as a level-0.Similarly,if a level-2 player is in state −1,then
the l
2
16
mapping would give us the location l
2
16
(−1) = (−2;3) which a level-1 opponent would choose (L1
2
in Figure I) since at this particular point of time,she is thinking about what her opponent would choose
as a level-1.Finally,if a level-2 player 1 is in state 2,then the mapping l
2
16
would give us the location
l
2
16
(2) = (2;1) which a level-2 subject would choose (L2
1
in Figure I) since at this particular point of
time,she is thinking about her choice as a level-2.
17
where
Pr(S
t
= s
t
|R
t−1
n
)
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
|R
t−1
n
) · Pr(S
t
= s
t
|S
t−1
= s
t−1
;R
t−1
n
)
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
|R
t−1
n
) · π
s
t−1
→s
t
=
X
s
t−1
∈Ω
k
Pr(S
t−1
= s
t−1
|R
t−2
n
) Pr(R
t−1
n
= r
t−1
n
|S
t−1
= s
t−1
;R
t−2
n
)
Pr(R
t−1
n
= r
t−1
n
|R
t−2
n
)
·π
s
t−1
→s
t
:(2)
The second equality in equation (2) follows since according to the Markov property,
S
t−1
= s
t−1
is sufficient to predict S
t
= s
t
.Note that equation (2) depends on the Markov
transition matrix.Meanwhile,the second term on the right hand side of equation (1)
(Pr(R
t
n
= r
t
n
|S
t
= s
t
;R
t−1
n
)) depends on the logit error.Notice that all the terms on the
last line of equation (2) are now expressed with the time index moving backwards by one
period.Hence,for a given game n,coupled with the initial distribution of states,the joint
density of a level-k subject’s empirical lookups,denoted by
f
k
n
(r
1
n
;:::;r
Tn−1
n
;r
Tn
n
) ≡ Pr(r
1
n
;:::;r
Tn−1
n
;r
Tn
n
)
= Pr(r
1
n
) Pr(r
2
n
|r
1
n
) Pr(r
3
n
|r
1
n
;r
2
n
):::Pr(r
T
n
n
|r
1
n
;r
2
n
;:::;r
T
n
−1
n
);
can be derived.
26
The log likelihood over all 24 trials is thus
L(λ
k

k
) = ln
"
24
Y
n=1
f
k
n
(r
1
n
;:::;r
T
n
−1
n
;r
T
n
n
)
#
:(3)
Since level-k reasoning starts from the lowest state (here state 0),we assume this
initial distribution of states degenerates to a mass point at the lowest state corresponding
to level-0 (of herself if k is even and of her opponent if k is odd).With this assumption,
we estimate the precision parameter λ
k
and the constrained Markov transition matrix θ
k
using maximum likelihood estimation for each k,and classify subjects into the particular
level-k type which has the largest likelihood.
To summarize,for each level k,we estimate a state transition matrix and a precision
parameter for the logit error.Thus for a given initial distribution of the states,we know
the probability distribution of states at any point of time using the state transition matrix.
Moreover,at any point of time,the mapping l
k
n
from the state to the lookup gives us the
lookup location corresponding to any state when there is no error.Coupled with the error
26
See Supplementary Appendix A3 for a formal derivation.
18
structure,we can calculate the probability distribution of various errors and therefore the
distribution of predicted lookup locations.We then maximize the likelihood to explain
the entire observed sequence of lookups.We do this for various levels.The final step is
to select the k in various level-k types to best explain the observed sequence of lookups
for each subject.
D Vuong’s Test for Non-Nested but Overlapping Models
The above econometric model may be plagued by an overfitting problemsince higher level-
k types have more states and hence more parameters.It is not surprising if one discovers
that models with more parameters fit better.In particular,the Markov-switching model
for level-k has (k +1) states with a (k +1) ×(k +1) transition matrix.This gives the
model
h
k(k+3)
2
i
parameters in the transition matrix alone.
27
For example,a level-2 subject
has 3 states 0,−1,and 2 and five (Markov) parameters,but a level-1 subject has only 2
states 0 and 1 and two (Markov) parameters.Hence,we need to make sure our estimation
does not select higher levels merely because it contains more states and more parameters.
However,usual tests for model restrictions may not apply,since the parameters involved
in different level-k types could be non-nested.In particular,the state space of a level-2
subject {0;−1;2} and the states of a level-1 subject {0;1} are not nested.Yet,the state
space of a level-1 type,{0;1},is nested in the state space of a level-3 type,{0;1;−2;3}.
In order to evaluate the classification,we use Vuong’s test for non-nested but overlapping
models (1989).
28
Let Lk

be the type which has the largest likelihood with corresponding parameters

k
∗;θ
k
∗).Let Lk
a
be an alternative type with corresponding parameters (λ
k
a;θ
k
a).In
our case Lk

is the type with the largest likelihood based on lookups.The alternative
type Lk
a
is the type having the next largest likelihood among all lower level types.
29
If
according to Vuong’s test,Lk

is a better model than Lk
a
,we can be assured that the
maximum likelihood criterion does not pick up the reported type by mere chance.Thus,
we conclude that the lookup-based type is Lk

.If instead we find that according to
Vuong’s test,Lk

and Lk
a
are equally good,then we conservatively classify the subject
as the second largest lower type Lk
a
.
Table II shows the results of the maximum likelihood estimation and Vuong’s test
27
Since each row sums up to one and elements with the column index greater than the row index plus
one are zero,we have in total (k +1)(k +1) −(k +1) −[k(k −1)]=2 = [k(k +3)]=2 parameters.
28
See Supplementary Appendix A4 for the details of Vuong’s test for non-nested but overlapping models.
Note that this is the generalized version of the well-known “nested” Vuong’s test.
29
Recall that the reason why we look at Vuong’s test is to avoid overfitting.Hence,if the alternative
type has a larger transition matrix (more parameters) but a lower likelihood,there is no point to perform
a test,since Lk

will not suffer from the problem of overfitting because it has fewer parameters but has
a higher likelihood.This leads us to consider only lower level types as the alternative type.
19
for each subject.For each subject,we list her Lk

type,her Lk
a
type,her Vuong’s
test statistic,and her lookup-based type according to Vuong’s test in order.Six of the
seventeen subjects (subjects 1,5,6,8,11,13) pass Vuong’s test and have their lookup-
based type as Lk

.The remaining eleven subjects are conservatively classified as Lk
a
.
The overall results are summarized in column (A) of Table III.After employing Vuong’s
test,the type distribution for (L0;L1;L2;L3;EQ) is (1;6;4;4;2).
30
The distribution is
slightly higher than typical type distributions reported in previous studies.In particular,
there are two EQ’s and four L3’s,accounting for more than one third of the data.Treating
the EQ type as having a thinking step of 4,we find that the average number of thinking
steps is 2:00,in line with results of the standard p-beauty contest games using Caltech
subjects,but higher than normal subjects.
31
Neither employing Hansen [1992]’s test
(to avoid nuisance parameter problems),nor iteratively applying Vuong’s test (until the
likelihood of the current type is significantly higher than that of the next alternative)
alters the distribution of level-k types by much (see A4 and Supplementary Table 3).
Up to now,we have shown that lookups do fall on the hotspots of the best response
hierarchy (Hypothesis 2a).Classifying subjects based on lookups (Hypothesis 2b) gives
us a reasonable level of sophistication as argued above.However,one might still wonder
whether the results reported in Table II is due to a misspecification of possible types.
After all,many assumptions are required for Hypothesis 2b to hold.We take up this
issue now.Our argument is that if we take the level-k theory literally to interpret under-
lying reasoning process,the classification based on lookups should match well with the
classification using final choices alone since the level k reflects a player’s sophistication.
V Matching Up with Final Choices
We first classify subjects using their final choices and compare classifications based on
choices to those based on lookups.We point out the similarity between these two clas-
sification results.Finally we address how lookup data could help classify subjects when
the choice data is noisy.
Following the literature,we classify individual subjects into various level-k types based
on final choices alone.Supplementary Appendix A5 provides details of the maximum
30
Ignoring the two pseudo-17 subjects (subjects 3 and 17,both classified as L1) whose choices sug-
gest non-compliance to level-k theory,the type distribution for (L0;L1;L2;L3;EQ) is (1;4;4;4;2).For
pseudotypes,refer to Costa-Gomes and Crawford [2006].
31
Camerer [1997] reports that Caltech students play an average of 21:88 in a p-beauty contest game with
p = 0:7.This is between L2’s choice of 24:5 and L3’s choice of 17:15.Higher than typical distributions
could also result from the spatial beauty contest game being intuitive and not requiring mathematical
multiplication (as compared with say,the standard p-beauty contest game),as Chou et al.[2009] show
that a graphical presentation of the standard p-beauty contest game yields results closer to equilibrium.
20
likelihood estimation and pseudotype test we adopt from Costa-Gomes and Crawford
[2006],and subject-by-subject results are reported in the sixth column of Table II.The
idea of the pseudotype is to treat each subject’s choices as a possible type.This is to
examine whether there are clusters of subjects whose choices resemble each other’s and
thus predict other’s choices in the cluster better than the pre-specified level-k types.
Since we have 17 subjects,we include 17 pseudotypes,each constructed from one of our
subject’s choices in 24 trials.The aggregate distribution of types (with or without the
pseudotype test) are reported in column (B) and (C) of Table III.In Table III,the choice-
based and lookup-based classification results look similar.The choice results indicate
slightly more steps of reasoning (2:12 − 2:13 for choice-based types instead of 2:00 for
lookup-based types).This suggests that the lookup-based estimation (and the underlying
Hypothesis 2b) is in the right ballpark.In fact,if we consider the classification results
on a subject-by-subject basis,the similarity between the two estimations are even more
evident.As reported in Table II,overall,for ten out of the seventeen subjects,their
lookup-based types and the choice-based types are the same.In other words,for most
subjects,when their choices reflect a particular level of sophistication,their lookup data
suggests the same level of sophistication.Such alignment in classification results would
be surprising if one thought Hypothesis 2b was too strong a claim.This supports a literal
interpretation of the level-k model.When a subject’s choice data indicates a particular
level of sophistication,her lookups suggest that the best response hierarchy of that level
is carried out when she reasons.
Since the classification based on lookups and that based on choices align,we next turn
to discuss the subtle differences between them.We evaluate the robustness of individual
choice-based classification by performing bootstrap.This is a departure from past lit-
erature such as Costa-Gomes and Crawford [2006],as they do not consider whether the
maximum likelihood estimation has enough power to distinguish between various types.
For example,reading from Supplementary Table 1,for subject 14,the log likelihood is
−98:89 for L0,−84:17 for L1,−96:99 for L2,−76:67 for L3,and −74:45 for EQ.Maxi-
mumlikelihood estimation classifies her as EQ,although the likelihood of L3 is also close.
In this case,classifying this subject as EQ based on maximum likelihood alone may be
questionable.To the best of our knowledge,there has not been any proposed test in
experimental economics for evaluating the robustness of maximum likelihood-based type
classifications.Hence we propose a bootstrap procedure (Efron [1979];Efron and Tib-
shirani [1994]) to deal with the issue of robustness.
32
Imagine that from the maximum
likelihood estimation,a subject is classified as a particular level-k type with the logit
32
Costa-Gomes and Crawford [2006] do use various information criteria to perform the horse-race.
However,this still fails to address how much the runner-up is “close” to the winner.
21
error parameter λ
k
.Draw (with replacement) 24 new trials out of the original dataset
and re-estimate her k and λ
k
.We do this 1000 times to generate the discrete distribu-
tion of k and the distribution of λ
k
.Then,we evaluate the robustness of k by looking
at the distribution of k.Each level-k type estimated from a re-sampled dataset that is
not the same as her original level-k type is viewed as a “misclassification,” and counted
against the original classification k.By calculating the total misclassification rate (out
of 1000 re-samples),we can measure the robustness of the original classification.This
bootstrap procedure is in the spirit of the test reported in Salmon [2001],which evaluates
the robustness of the parameters estimated in a EWA learning model using simulated
data.
The results of this bootstrap procedure are listed in Table IV.For each subject,we
report the bootstrap distribution of k (the number of times a subject is classified into L0,
L1,L2,L3 or EQ in the 1000 resampled datasets).The bootstrap misclassification rate
(percentage of times classifying the subject as a type different from her original type) is
listed in the last column.For example,subject 14 is originally classified as EQ,but is
only re-classified as EQ 587 times during the bootstrap procedure.Subject 14 is instead
classified as L3 228 times and as L1 185 times.Hence,the distribution on the number
of times that subject 14 is classified into L0,L1,L2,L3 or EQ in the 1000 resampled
datasets is (0;185;0;228;587) and the corresponding misclassification rate is 0:413.
The bootstrap results align surprisingly well with whether the lookup-based classi-
fications match their choice-based types.In particular,for the ten subjects whose two
classifications match,all but three of themhave (choice-based) bootstrap misclassification
rates lower than 0:05,suggesting that their classifications are truly sharp.
33
In contrast,
for six of the remaining seven subjects whose two classifications do not match,their
choice-based type have bootstrap misclassification rates higher than 18:4%,suggesting
that misclassifying these subjects into the wrong types using choice data alone (due to in-
significantly larger likelihoods) is possible.The difference is significant,having a p-value
of 0:0123 according to Mann-Whitney-Wilcoxon rank sum test.To sum up,when the
lookup-based types match the choice-based types,it is when the choice-based classifica-
tion is quite sharp.In contrast,when they differ,the classification based on choice is not
that sharp,suggesting that for these subjects,choice data may not be enough.
In this case,one wonders whether lookup data could provide additional separation of
types to predict choices.A closer look at Table IV (see the type underlined) indicates for
ten subjects,when we resample their choices,the level they are most frequently classified
into in the 1000 resampled choice datasets is exactly their level classified using their
33
One of these three subjects (subject 17) fails the pseudotype test and is unlikely to resemble any of
the level-k types.The remaining two subjects (subjects 2 and 4) have a misclassification rate of 0:076
and 0:110.These are marginally higher than 0:05.
22
lookups.
34
For six other subjects,their lookup-based type is the one they are second most
frequently classified into.
35
In fact,these subjects’ lookup-based type also rank second in
terms of likelihood based on choices.
36
A subject’s lookup-based type is classified using
her lookups,not using her choices.The high predictability of choices by her lookup-based
type suggests that the lookup-based type is a viable alternative for predicting choices even
when the lookup-based types differ from the choice-based types.
In order to evaluate whether lookup data can indeed improve classification,we perform
an out-of-sample prediction horse-race between the lookup-based and choice-based types.
Note that our lookup-based model makes predictions on lookups,not on final choice per
se.However,we can first classify individual subjects into a particular level-k type based
on either lookups or choices using two thirds of the trials,and see how well the classified
level-k type predicts the final choices of the remaining one third of trials.In particular,
for each subject,we classify her as a level-k
l
16
type based on lookups (using the first 16
sequences of lookups) and a level-k
c
16
type based on final choices (using the first 16 final
choices) respectively.We then use these particular k’s (one for lookup,the other for
choice) to predict final choices of the last eight trials.Since we are mainly interested in
how lookup data can provide additional separation of types (to predict behavior) when
choice data is insufficient,we group subjects into those whose choice-based classification
is robust (having bootstrap misclassification rates greater than 0:05 as reported in the
right panel of Table II),and those who is not.
To compare the prediction power of the two models,we report mean square errors
of the predicted choices for the lookup-based and choice-based models.In particular,
suppose a subject chose location g
n
= (x
n
;y
n
) in trial n,while the lookup-based and
choice-based models predicted (x
l
n
;y
l
n
) and (x
c
n
;y
c
n
).Then,the mean square errors of the
two models are


x
n
−x
l
n


2
+


y
n
−y
l
n


2
and |x
n
−x
c
n
|
2
+|y
n
−y
c
n
|
2
respectively.As reported
in Table V,though overall performance of the two models are comparable,among the nine
subjects whose choice-based types are not robust,the lookup-based model has a better
mean square error of 5:75 (compared with 8:67 for the choice-based model) predicting
the last eight trials.
37
A Wilcoxon sign rank test shows that this difference is marginally
significant (p = 0:0781).
38
To see how significant this gain in prediction power is,we calculate the “economic
34
They are subjects 1,2,4,5,7,10,12,13,16,17 (those whose two classifications match).
35
They are subjects 3,6,8,9,11,15.
36
Refer to the likelihood double underlined in Supplementary Table 1.
37
Even among the “robust” subjects,subject 7 is the only one whose lookup-based model has a much
larger mean square error than the choice-based model.
38
If we focus only on the seven subjects whose two classifications differ,the lookup-based model still
has a better mean square error of 6.55 (compared with 8.68 for the choice-based model),though not
statistically significant.
23
value” (cf.Camerer,Ho and Chong,2004) of the two models,to evaluate how much these
predictions could potentially add to the opponent’s payoffs.In particular,we calculate
the opponent’s payoffs had they followed these models and best responded to the model
predictions,π
Follow
,and see how much an opponent can gain in addition to his actual
payoffs,π
Actual
,in the experiment.The economic value is the percentage of this gain,
compared with the maximum gain possible,π
BR
:(Note that economic values could be
negative if the model performs worse than actual subjects.)
EV =
π
Follow
−π
Actual
π
BR
−π
Actual
Results in the last two columns of Table V show that both choice-based and lookup-
based models have good predictive power (compared to actual subjects) and can (on
average) increase opponent payoffs by 39−41%.Moreover,the bootstrap robustness test
indeed evaluates choice-based models well—the second panel of Table V show that for
the robustness subjects,the average economic value for the choice-based model is 56:3%,
higher than the lookup-based model (42:0%).On the other hand,the lookup-based model
is a good compliment,especially when choice data is not good enough:As shown in the
the first panel of Table V,for the non-robust subjects,the average economic value for the
lookup-based model is 40:4%,compared with 24:3% for the choice-based model.In other
words,among the subjects whose choice-based type is not robust to bootstrap,had the
opponent known her lookup-based level,his payoffs could be increased by 40:4%.As a
comparison,had the opponent known her choice-based level,his payoffs could be increased
by 24:3%.
To summarize,these results show that lookup data can help us confirm classification
results based on choices alone and even provide better classification results when choice-
based classifications are not robust.Moreover,lookup data provide a chance to put the
level-k model to an ultimate test,asking if the model can not only predict final choices,
but also describe the decision-making process employed by subjects by going through the
best response hierarchy specified in Hypothesis 2b.Results in Table II show that the
level-k model does indeed hold up under this test for our spatial beauty contest games.
One ought to keep in mind that explaining the reasoning process is a hard one,if not
harder than explaining choices.Seeing in our dataset,for more than a half of subjects,
their lookup-based types are aligned with their choice-based types should be read as a
strong support to the level-k model.This may be due to the graphical nature of the spatial
beauty contest games.How general this result is should be tested in future experiments
in which the reasoning process can somehow be analyzed.
24
VI Conclusion
We introduce a new spatial beauty contest game in which the process of reasoning can
be tracked,and provide theoretical predictions based on the equilibrium and a literal
interpretation of the level-k theory.The theoretical predictions of the level-k model yield
a plausible hypothesis on the decision-making process when the game is actually played.
We then conduct laboratory experiments using video-based eyetracking technology to
test this conjecture,and fit the eyetracking data on lookups using a constrained Markov-
switching model of level-k reasoning.Results show that based on lookups,experimental
subjects’ lookup sequences could be classified into following various level-k best response
hierarchies,which for more than a half of themcoincide with types that they were classified
into using final choices alone.Moreover,when the two classifications differ,most of the
choice-based types are not robust to bootstrap,indicating that we might have misclassified
them due to insignificantly larger likelihoods.In fact,lookup-based types often come out
second (if not first) in the bootstrap procedure.Finally,for all subjects whose choice-
based models are not robust to bootstrap,an out of sample prediction exercise shows
that lookup-based models predict final choices better.This suggests that studying the
reasoning process (such as through eyetracking lookups) can indeed help us understand
economic behavior (such as individual’s final choices) better.
Analyzing reasoning processes is a hard task.The spatial beauty contest game is
designed to fully exploit the structure of the p-beauty contest so that subjects are induced
to literally count on the map to carry out their reasoning as implied by the best response
hierarchy of a level-k theory.The high percentage of subjects whose classifications based
on lookups and choices align could be read as a support to the level-k model as a complete
theory of reasoning and choice altogether in the spatial beauty contest game.Whether
this holds true for more general games remains to be seen.Nevertheless,the paper points
out a possibility of analyzing reasoning before arriving at choices.A design exploiting
the structure of the game and is ideal for the tracking technology used seems to be
indispensable.
Pennsylvania State University
National Taiwan University
National Taiwan University
VII References
Brainard,D.H.[1997],‘The psychophysics toolbox’,Spatial Vision 10,433–436.
25
Burchardi,K.B.and Penczynski,S.P.[2011],Out of your mind:Eliciting individual
reasoning in one shot games.
Camerer,C.F.[1997],‘Progress in behavioral game theory’,Journal of Economic Per-
spectives 11(4),167–188.
Camerer,C.F.,Ho,T.-H.and Chong,J.-K.[2004],‘A cognitive hierarchy model of
games’,Quarterly Journal of Economics 119(3),861–898.
Camerer,C.F.,Johnson,E.,Rymon,T.and Sen,S.[1993],Cognition and Framing in
Sequential Bargaining for Gains and Losses,MIT Press,Cambridge,pp.27–47.
Chou,E.,McConnell,M.,Nagel,R.and Plott,C.R.[2009],‘The control of game form
recognition in experiments:Understanding dominant strategy failures in a simple two
person ”guessing” game’,Experimental Econoimcs 12(2),159–179.
Cornelissen,F.W.,Peters,E.M.and Palmer,J.[2002],‘The eyelink toolbox:Eye
tracking with matlab and the psychophysics toolbox’,Behavior Research Methods,In-
struments and Computers 34,613–617.
Costa-Gomes,M.A.and Crawford,V.P.[2006],‘Cognition and behavior in two-person
guessing games:An experimental study’,American Economic Review 96(5),1737–1768.
Costa-Gomes,M.,Crawford,V.P.and Broseta,B.[2001],‘Cognition and behavior in
normal-form games:An experimental study’,Econometrica 69(5),1193–1235.
Crawford,V.P.and Iriberri,N.[2007a],‘Fatal attraction:Salience,naivete,and sophis-
tication in experimental hide-and-seek games’,American Economic Review 97(5),1731–
1750.
Crawford,V.P.and Iriberri,N.[2007b],‘Level-k auctions:Can a nonequilibrium model
of strategic thinking explain the winner’s curse and overbidding in private-value auc-
tions?’,Econometrica 75(6),1721–1770.
Efron,B.[1979],‘Bootstrap methods:Another look at the jackknife’,The Annals of
Statistics 7(1),1–26.
Efron,B.and Tibshirani,R.J.[1994],An Introduction to the Bootstrap,Chapman and
Hall/CRC Monographs on Statistics and Applied Probability,Chapman and Hall/CRC.
Gabaix,X.,Laibson,D.,Moloche,G.and Weinberg,S.[2006],‘Costly information
acquisition:Experimental analysis of a boundedly rational model’,American Economic
Review 96(4),1043–1068.
26
Grosskopf,B.and Nagel,R.[2008],‘The two-person beauty contest’,Games and Eco-
nomic Behavior 62(1),93–99.
Hamilton,J.D.[1989],‘A new approach to the economic analysis of nonstationary time
series and the business cycle’,Econometrica 57(2),357–384.
Hansen,B.E.[1992],‘The likelihood ratio test under nonstandard conditions:Testing
the markov switching model of gnp’,Journal of Applied Econometrics 7(S1),S61–S82.
Ho,T.H.,Camerer,C.F.and Weigelt,K.[1998],‘Iterated dominance and iterated best
response in experimental ”p-beauty contests”’,American Economic Review 88(4),947–
969.
Johnson,E.J.,Camerer,C.,Sen,S.and Rymon,T.[2002],‘Detecting failures of back-
ward induction:Monitoring information search in sequential bargaining’,Journal of
Economic Theory 104(1),16–47.
Koszegi,B.and Szeidl,A.[2013],‘A model of focusing in economic choice’,Quarterly
Journal of Economics 128(1),forthcoming.
Krajbich,I.,Armel,C.and Rangel,A.[2010],‘Visual fixations and the computation
and comparison of value in simple choice’,Nature Neuroscience 13(10),1292–1298.
10.1038/nn.2635.
Kuo,W.-J.,Sjostrom,T.,Chen,Y.-P.,Wang,Y.-H.and Huang,C.-Y.[2009],‘Intuition
and deliberation:Two systems for strategizing in the brain’,Science 324(5926),519–522.
Nagel,R.[1995],‘Unraveling in guessing games:An experimental study’,American
Economic Review 85(5),1313–1326.FLA 00028282 American Economic Association
Copyright 1995 American Economic Association.
Pelli,D.G.[1997],‘The videotoolbox software for visual psychophysics:Transforming
numbers into movies’,Spatial Vision 10,437–442.
Reutskaja,E.,Nagel,R.,Camerer,C.F.and Rangel,A.[2011],‘Search dynamics in con-
sumer choice under time pressure:An eye-tracking study’,American Economic Review
101(2),900–926.
Salmon,T.C.[2001],‘An evaluation of econometric models of adaptive learning’,Econo-
metrica 69(6),1597–1628.FLA 00129682 Econometric Society Copyright 2001 The
Econometric Society.
27
Samuelson,P.A.[1938],‘A note on the pure theory of consumer’s behaviour’,Economica
5(17),61–71.
Selten,R.[1991],‘Properties of a measure of predictive success’,Mathematical Social
Sciences 21(2),153–167.
Stahl,Dale,O.and Wilson,P.W.[1995],‘On players’ models of other players:Theory
and experimental evidence’,Games and Economic Behavior 10(1),218–254.
Vuong,Q.[1989],‘Likelihood ratio tests for model selection and non-nested hypotheses’,
Econometrica 57(2),307–333.
Wang,J.T.-y.,Spezio,M.and Camerer,C.F.[2010],‘Pinocchio’s pupil:Using eye-
tracking and pupil dilation to understand truth telling and deception in sender-receiver
games’,American Economic Review 100(3),1–26.
28
29
Figures and Tables







3
L1
2

L3
2

E
2


2
L2
2



1


L2
1

E
1


0
O
L3
1


-1

-2 L1
1


-3







-3 -2 -1 0 1 2 3


Figure I: Equilibrium and Level-k Predictions of a 7x7 Spatial Beauty Contest Game
with Targets (4, -2) and (-2, 4) (Game 16). Predictions specifically for player 1 with
Target (4,-2) are L1
1
~ E
1
, and predictions for player 2 with Target (-2,4) are L1
2
~ E
2
.
O stands for the prediction of L0 for both players. Note that Lk
1
and Lk
2
are the best
responses to L(k-1)
2
and L(k-1)
1
, respectively. For example, L2
2
’s choice (1,2) is the
best response to L1
1
since (3,-2) + (-2, 4) = (1, 2).






30
Figure II: Screen Shot of the GRAPH Presentation

Figure III: Screen Shot of the SEPARATE Presentation

31
Figure IV: Hit Areas for Various Level-k Types in Game 16 (7x7 with Target (4, -2) and
the Opponent Target (-2, 4). Hit area is the minimal convex set enveloping the locations
predicted by each level-k type’s best response hierarchy.

Note: If we refer to Figure 1, for player 1, the Hit Area for level-1 is the minimal convex
set enveloping the locations (O, L1
1
). The Hit Area for level-2 is the minimal convex
set enveloping the locations (O, L1
2
, L2
1
), and so on.

Figure V: Aggregate Empirical Percentage of Time Spent on the Union of Hit Areas
(“Hit Time”) in Each Game


0.00
0.20
0.40
0.60
0.80
1.00
9 24162010 6 172319 5 151112 4 8 2 14132218 7 3 1 21
Game
Hit time
Hit time
32
Figure VI: Aggregate Linear Difference Measure of Predicted Success in Each Game. It
measures the difference between hit time and the hit area size.


Figure VII: Subject 2’s Eye Lookups in Trial 17 (as a Member B). The radius of the
circle is proportional to the length of that lookup, so bigger circles indicate longer time
spent.


33
Table I: Level-k, Equilibrium Predictions and Minimum
k
’s in All Games
Game

Map size

Player 1
target
Player 2
target
L0 L1 L2 L3 EQ
k

1

9

×

9

-
2

,

0

0

,

-
4

0

,

0

-
2

,

0

-
2

,

-
4

-
4

,

-
4

-
4

,

-
4

3

2

9

×

9

0

,

-
4

-
2

,

0

0

,

0

0

,

-
4

-
2

,

-
4

-
2

,

-
4

-
4

,

-
4

4

3

7

×

7

2

,

0

0

,

-
2

0

,

0

2

,

0

2

,

-
2

3

,

-
2

3

,

-
3

4

4

7

×

7

0

,

-
2

2

,

0

0

,

0

0

,

-
2

2

,

-
2

2

,

-
3

3

,

-
3

4

5

11

×

5

2

,

0

0

,

2

0

,

0

2

,

0

2

,

2

4

,

2

5

,

2

5

6

11

×

5

0

,

2

2

,

0

0

,

0

0

,

2

2

,

2

2

,

2

5

,

2

6

7

9

×

7

-
2

,

0

0

,

-
2

0

,

0

-
2

,

0

-
2

,

-
2

-
4

,

-
2

-
4

,

-
3

4

8

9

×

7

0

,

-
2

-
2

,

0

0

,

0

0

,

-
2

-
2

,

-
2

-
2

,

-
3

-
4

,

-
3

4

9

7

×

9

-
4

,

0

0

,

2

0

,

0

-
3

,

0

-
3

,

2

-
3

,

2

-
3

,

4

4

10

7

×

9

0

,

2

-
4

,

0

0

,

0

0

,

2

-
3

,

2

-
3

,

4

-
3

,

4

3

11

7

×

9

2

,

0

0

,

2

0

,

0

2

,

0

2

,

2


3

,

2

3

,

4

5

12

7

×

9

0

,

2

2

,

0

0

,

0

0

,

2

2

,

2

2

,

4

3

,

4

5

13

9

×

9

-
2

,

-
6

4

,

4

0

,

0

-
2

,

-
4

2

,

-
2

0

,

-
4

2

,

-
4

4

14

9

×

9

4

,

4

-
2

,

-
6

0

,

0

4

,

4

2

,

0

4

,

2

4

,

0

4

15

7

×

7

-
2

,

4

4

,

-
2

0

,

0

-
2

,

3

1

,

2

0

,

3

1

,

3

4

16

7

×

7

4

,

-
2

-
2

,

4

0

,

0

3

,

-
2

2

,

1

3

,

0

3

,

1

4

17

11

×

5

6

,

2

-
2

,

-
4

0

,

0

5

,

2

4

,

0

5

,

0

5

,

0

3

18

11

×

5

-
2

,

-
4

6

,

2

0

,

0

-
2

,

-
2

3

,

-
2

2

,

-
2

3

,

-
2

4

19

9

×

7

-
6

,

-
2

4

,

4

0

,

0

-
4

,

-
2

-
2

,

1

-
4

,

0

-
4

,

1

4

20

9

×

7

4

,

4

-
6

,

-
2

0

,

0

4

,

3

0

,

2

2

,

3

0

,

3

4

21

7

×

9

-
2

,

-
4

4

,

2

0

,

0

-
2

,

-
4

1

,

-
2

0

,

-
4

1

,

-
4

4

22

7

×

9

4

,

2

-
2

,

-
4

0

,

0

3

,

2

2

,

-
2

3

,

0

3

,

-
2

4

23

7

×

9

-
2

,

6

4

,

-
4

0

,

0

-
2

,

4

1

,

2

0

,

4

1

,

4

4

24

7

×

9

4

,

-
4

-
2

,

6

0

,

0

3

,

-
4

2

,

0

3

,

-
2

3

,

0

4

Note: Each row corresponds to a game and contains the following information in order: (1) the
game number, (2) the size of the grid map for that game, (3) the target of player 1, (4) the target
of player 2, (5) the theoretic prediction of L0 for player 1, (6) the theoretic prediction of L1 for
player 1, (7) the theoretic prediction of L2 for player 1, (8) the theoretic prediction of L3 for player
1, (9) the theoretic prediction of EQ for player 1, and (10) the minimum
k
for player 1 such that
as long as the level is weakly higher, the choice of that type is the same as the choice of EQ