# Evaluation Through Conflict

AI and Robotics

Oct 29, 2013 (4 years and 6 months ago)

80 views

Evaluation Through Conflict

Martin
Zinkevich

Yahoo! Inc.

http://
martin.zinkevich.org

Who was I

Worked with U Alberta Computer Poker
Research Group

Designed Counterfactual Regret Algorithm

Theory behind DIVAT

Worked on AAAI Computer Poker Competition

2006 as lead programmer, 2007 as chair

Work used in Man
Vs Machine

Who am I

Run the Lemonade Stand Game Competition

Work with Yahoo Anti
-
Abuse Team

AAAI Computer Poker Competition

5 years running

Now the ANNUAL Computer Poker
Competition

Latest
-
11 universities et al

Competitions:

Science
vs

Entertainment

AAAI Computer Poker Competition

May The Best Program Win!

And Win Again IF WE PLAYED AGAIN!

VS

for 1000 hands

VS

for 1000 hands

All Combinations

7
,
-
7

10
,
-
10

-
7
,
7

5
,
-
5

-
10
,
10

-
5
,
5

OK, But Who Won?

Online:

Maximize total winnings

Equilibrium:

Maximize number of people I can
win money from (or don’t lose against)

Why a New Competition?

Computing

Equilibria

Choosing

Equilibria

?

Bach or Stravinsky

2
,
1

0
,
0

0
,
0

1
,
2

Big Question: How Do (or Would)
People Get to Nash
Equilibria
?

Solvable Games

\$

Unsolvable Games

\$

?

An Old Idea

Think about learning in the presence of other
intelligent agents.

given:

Solving the Unsolvable

In current competitions, people are often
applying techniques that are effective in
solvable games, even when the game is not
solvable.

In what competitions is it useless to
approximate the game as solvable?

Axelrod’s

Iterated Prisoner’s Dilemma

A competition between many competitors.

One entry: tit
-
for
-
tat (Anatol Rapaport)

Nice (initially)

Retaliating

Forgiving

Non
-
envious

Learned that cooperation has value, but:

Cooperate with whom?

How do we cooperate?

What Is The Lemonade Stand Game?

Every round for 100 rounds:

each person selects an action privately

then, the actions are revealed

The score of a player is the distance clockwise
to the next player plus the distance
counterclockwise.

Key Observations

A constant
-
sum game between 3 players.

For every gain, someone has to lose.

Possibilities For Cooperation

Opposite sides of the circle, “sandwiching”

Not a “Solvable Game” (Nash, 1951)

Playing equilibrium strategies is not advisable

Easy To Set “Table Image”

The constant strategy often evokes cooperative behavior

Existing Techniques Fail

Experts algorithms lose to constant strategy

Strategy #1: Play Constant

Strategy #2: Play Opposite

Strategy #3: Sandwich

Competition Structure

Every set of three players played 100 rounds
180 times (1.5 million rounds total)

Highest Total Score Wins

Mean, Standard Error can be calculated

Competitors

28 players, 9 teams

University of Southampton/Imperial College London
(
Soton
)

Yahoo! Inc. (
Pujara
)

Rutgers University (RL3)

Brown University (Brown)

Carnegie Mellon (2 teams
-
Waugh, ACTR)

University of Michigan (
FrozenPontiac
)

Princeton University (
Schapire
)

(Greg
Kuhlmann
)

Competition Results

Competitor

Score Per Round

Results

Competitor

Score Per Round
-
8

Modified Constant

Uniformly Random

Restricting to Top 6

Competitor

Score Per Round
-
8

Restricting to Top 4

Teach Simply!

EQUILIBRIUM

FREE

=

Learn

=

=

=

?

Learn

=

=

10

7

The High Level

Phenomenal Intelligence:
the observed
behavior used by a set of people at a point in

Lofty Goals

Phenomenal Intelligence:
the observed
behavior

used

by a set of people at a point in

behavior
: a fully specified strategy.

used
: actually leveraged

Practical Concessions

Phenomenal Intelligence:
the observed
behavior used by a
set of people at a point in
.

Not any intelligent agent

Not any time (people change)