Evaluation Through Conflict

agreeablesocietyAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

65 views

Evaluation Through Conflict

Martin
Zinkevich

Yahoo! Inc.

http://
martin.zinkevich.org
/lemonade

Who was I


Worked with U Alberta Computer Poker
Research Group


Designed Counterfactual Regret Algorithm


Theory behind DIVAT


Worked on AAAI Computer Poker Competition


2006 as lead programmer, 2007 as chair


Work used in Man
Vs Machine

Who am I


Run the Lemonade Stand Game Competition


Work with Yahoo Anti
-
Abuse Team

AAAI Computer Poker Competition


5 years running


Now the ANNUAL Computer Poker
Competition


Latest
-
11 universities et al


Competitions:

Science
vs

Entertainment

AAAI Computer Poker Competition

May The Best Program Win!

And Win Again IF WE PLAYED AGAIN!

Head to Head

VS

for 1000 hands

Head to Head

VS

for 1000 hands

All Combinations

7
,
-
7

10
,
-
10

-
7
,
7

5
,
-
5

-
10
,
10

-
5
,
5

OK, But Who Won?


Online:

Maximize total winnings


Equilibrium:

Maximize number of people I can
win money from (or don’t lose against)


Why a New Competition?

Computing

Equilibria



Choosing

Equilibria

?

Bach or Stravinsky

2
,
1

0
,
0

0
,
0

1
,
2

Big Question: How Do (or Would)
People Get to Nash
Equilibria
?

Solvable Games

$

Unsolvable Games



$

?

An Old Idea


Think about learning in the presence of other
intelligent agents.


Prove cool stuff about your learning algorithm
given:


constraints about the adversary


constraints about the game


Solving the Unsolvable


In current competitions, people are often
applying techniques that are effective in
solvable games, even when the game is not
solvable.


In what competitions is it useless to
approximate the game as solvable?

Axelrod’s

Iterated Prisoner’s Dilemma


A competition between many competitors.


One entry: tit
-
for
-
tat (Anatol Rapaport)


Nice (initially)


Retaliating


Forgiving


Non
-
envious


Learned that cooperation has value, but:


Cooperate with whom?


How do we cooperate?




The Lemonade Stand Game

The Lemonade Stand Game

The Lemonade Stand Game

What Is The Lemonade Stand Game?


Every round for 100 rounds:


each person selects an action privately


then, the actions are revealed


The score of a player is the distance clockwise
to the next player plus the distance
counterclockwise.


Key Observations


A constant
-
sum game between 3 players.


For every gain, someone has to lose.


Possibilities For Cooperation


Opposite sides of the circle, “sandwiching”


Not a “Solvable Game” (Nash, 1951)


Playing equilibrium strategies is not advisable


Easy To Set “Table Image”


The constant strategy often evokes cooperative behavior


Existing Techniques Fail


Experts algorithms lose to constant strategy

Strategy #1: Play Constant

Strategy #2: Play Opposite

Strategy #3: Sandwich

Competition Structure


Every set of three players played 100 rounds
180 times (1.5 million rounds total)


Highest Total Score Wins


Mean, Standard Error can be calculated

Competitors


28 players, 9 teams


University of Southampton/Imperial College London
(
Soton
)


Yahoo! Inc. (
Pujara
)


Rutgers University (RL3)


Brown University (Brown)


Carnegie Mellon (2 teams
-
Waugh, ACTR)


University of Michigan (
FrozenPontiac
)


Princeton University (
Schapire
)


(Greg
Kuhlmann
)

Competition Results

Competitor

Score Per Round

Results

Competitor

Score Per Round
-
8

Modified Constant

Uniformly Random

Restricting to Top 6

Competitor

Score Per Round
-
8

Restricting to Top 4

Teach Simply!

EQUILIBRIUM

FREE

=

Learn

=

=

=

?

Learn

=

=

10

7

The High Level


Phenomenal Intelligence:
the observed
behavior used by a set of people at a point in
time for some task.

Lofty Goals


Phenomenal Intelligence:
the observed
behavior

used

by a set of people at a point in
time for some task.


behavior
: a fully specified strategy.


used
: actually leveraged

Practical Concessions


Phenomenal Intelligence:
the observed
behavior used by a
set of people at a point in
time for some task
.


Not any intelligent agent


Not any time (people change)


Not any task (context matters)

Thank You

http://
martin.zinkevich.org
/lemonade