Chikayama & Taura Lab.

zoomzurichAI and Robotics

Oct 16, 2013 (4 years and 25 days ago)

101 views

Chikayama

&
Taura

Lab.

M1
Ayato

Miki

1

1.
Introduction

2.
Computer Game Players

3.
Machine Learning in Computer Game
Players

4.
Tuning Evaluation Functions


Supervised Learning


Reinforcement Learning


Evolutionary Algorithms

5.
Conclusion

2


Improvements in Computer Game Players


DEEP BLUE defeated Kasparov in 1997


GEKISASHI and TANASE SHOGI on WCSC 2008



Strong Computer Game Players are usually
developed by strong human players


Input heuristics manually


Devote a lot of time and energy to tuning


3


Machine Learning enables automatic tuning
using a large amount of data



It is not necessary for a developer to be an
expert of the game

4

1.
Introduction

2.
Computer Game Players

3.
Machine Learning in Computer Game
Players

4.
Tuning Evaluation Functions


Supervised Learning


Reinforcement Learning


Evolutionary Algorithms

5.
Conclusion

5


Games



Game Trees



Game Tree Search



Evaluation Function

6


Turn system games


ex. tic
-
tac
-
toe, chess,
shogi
, poker,
mah
-
jong




Additional Classification


two player or otherwise


zero
-
sum or otherwise


deterministic or non
-
deterministic


perfect or imperfect information



Game Tree Model

7

8


player’s turn


move 2

move 1




opponent’s turn


ex.
Minimax

search algorithm

9

5

5

8

3

6

5

3

3

5

1

4

2

8

3

1

0

6

2

4

Max

Max

Max

Min

Min


Difficult to search up to leaf nodes


10^220 possible positions in
shogi



Stop search at practicable depth


And “Evaluate” nodes


Using Evaluation Function

10


Estimate the superiority of the position



Elements


feature vector of the position


parameter vector


feature vector of position s

parameter vector

11


Introduction


Computer Game Players


Machine Learning in Computer Game Players


Tuning Evaluation Functions


Supervised Learning


Reinforcement Learning


Evolutionary Algorithms


Conclusion

12


Initial work


Samuel’s research [1959]



Learning objective


What do Computer Game Players Learn ?

13


Many useful techniques


Rote learning


Quiescence search


3
-
layer neural network evaluation function



And some machine learning techniques


Learning through self
-
play


Temporal
-
difference learning


Comparison training

14


Opening Book



Search Control



Evaluation Function

15


Automatic construction of evaluation function


Construct and select a feature vector automatically


ex. GLEM [
Buro
, 1998]


Difficult



Tuning evaluation function parameters


Make a feature vector manually and tune its
parameters automatically


Easy and effective

18


Introduction


Computer Game Players


Machine Learning in Computer Game Players


Tuning Evaluation Functions


Supervised Learning


Reinforcement Learning


Evolutionary Algorithms


Conclusion

19


Supervised Learning



Reinforcement Learning



Evolutionary Algorithm

20


Provide the program with example positions and
their exact evaluation values





Adjusts the parameters in a way that minimizes the
error between the evaluation function outputs and
the exact values

20

50

・・・

40

50

21


Manual labeling positions



Quantitative evaluation

22

Consider more soft approach


Soft Supervised Training



Require only relative order for the possible moves


Easier and more intuitive

>

23


Comparison training using records of expert
games




Simple relative order

The expert
move

other moves

>

24


Based on the Optimal Control Theory


Minimize the Cost Function J

example positions in the records

error function

total number of example positions

25

Error Function

child position with move m

total number of possible moves

the move played in the record

minimax

search value

order
discriminant

function

26


Sigmoid Function






k is the parameter to control the gradient


When , T(x) is Step Function


In this case, the error function means “the number
of moves that were considered to be better than the
move in the record”

27


30,000 professional game records and
30,000 high rating game records in SHOGI
CLUB 24 were used



The weight parameters of about 10,000
feature elements were tuned



And won in the World Computer
Shogi

Championship 2006

29


It is costly to accumulate a training data set


It takes a lot of time to label manually


Using expert records has been successful



But how if not enough expert records ?


New games


Minor games

30


Other approach without a training set


ex. Reinforcement Learning (Next)


Supervised Learning



Reinforcement Learning



Evolutionary Algorithm

31


The learner gets “a reward” from the
environment



In the domain of game, the reward is final
outcome(win/lose)



Reinforcement learning requires only the
objective information of the game

32

33

+100

+60

+30

+10

+200

+120

+60

+20

-
100

-
60

-
30

-
10

Inefficient in Games…

34

+100

+60

+30

+10

+80

+15

+10

+10


Trained through self
-
play

35

Version

Features

Strength

TD
-
Gammon 0.0

Raw Board
Information

Top

of Computer
Players

TD
-
Gammon 1.0

Plus Additional

Heuristics

World
-
championship


Falling into a local optimum


Lack of playing variation


Solutions


Add intentional randomness


Play against various players (computer/human)



Credit Assignment Problem (CAP)


Not clear which action was effective

36


Supervised Learning



Reinforcement Learning



Evolutionary Algorithm

37

Initialize Population

Randomly Vary
Individuals

Evaluate “Fitness”

Apply Selection

38


Evolutionary algorithm for chess player



Using open
-
source chess program


Attempt to tune its parameters

39


Make initial 10 parents


Initialize parameters with random values

40


Create 10
offsprings

from each surviving
parent by mutating parental parameters

Gaussian random variable

strategy parameter

41


Each player plays ten games against randomly
selected opponents







Ten best players become parents of the next
generation

42

Select 10 opponents randomly


Material value



Positional value



Weights and biases of three neural networks


43


Each network has 3 Layers


Input = Arrangement of specific areas

(front 2 rows, back 2 rows, and center 4x4 square)


Hidden = 10 Units


Output = Worth of the area arrangement

44

16 input

10 hidden

1 output


Initial Rating = 2066 (Expert)


Rating of open
-
source player





Best Rating =
2437
(Senior Master)



But the program cannot yet compete with
other strongest chess programs (R2800~)

45

10 independent trials (Each has 50 generations)


Introduction


Computer Game Players


Machine Learning in Computer Game Players


Tuning Evaluation Functions


Supervised Learning


Reinforcement Learning


Evolutionary Algorithms


Conclusion

47

Advantages

Disadvantages

Supervised Learning

Direct and Effective

Manual

Labeling Cost

Reinforcement
Learning

Wide

Application

Local Optimal

CAP

Evolutionary Algorithm

Wide Application

No CAP

Indirect

Random

Dispersion

48


Automatic position labeling


Using records or computer play



Sophisticated reward


Consider opponent’s strength


Move analysis for credit assignment



Experiment in other games

49