Chikayama
&
Taura
Lab.
M1
Ayato
Miki
1
1.
Introduction
2.
Computer Game Players
3.
Machine Learning in Computer Game
Players
4.
Tuning Evaluation Functions
◦
Supervised Learning
◦
Reinforcement Learning
◦
Evolutionary Algorithms
5.
Conclusion
2
Improvements in Computer Game Players
◦
DEEP BLUE defeated Kasparov in 1997
◦
GEKISASHI and TANASE SHOGI on WCSC 2008
Strong Computer Game Players are usually
developed by strong human players
◦
Input heuristics manually
◦
Devote a lot of time and energy to tuning
3
Machine Learning enables automatic tuning
using a large amount of data
It is not necessary for a developer to be an
expert of the game
4
1.
Introduction
2.
Computer Game Players
3.
Machine Learning in Computer Game
Players
4.
Tuning Evaluation Functions
◦
Supervised Learning
◦
Reinforcement Learning
◦
Evolutionary Algorithms
5.
Conclusion
5
Games
Game Trees
Game Tree Search
Evaluation Function
6
Turn system games
◦
ex. tic

tac

toe, chess,
shogi
, poker,
mah

jong
…
Additional Classification
◦
two player or otherwise
◦
zero

sum or otherwise
◦
deterministic or non

deterministic
◦
perfect or imperfect information
Game Tree Model
7
8
←
player’s turn
←
move 2
move 1
→
←
opponent’s turn
ex.
Minimax
search algorithm
9
5
5
8
3
6
5
3
3
5
1
4
2
8
3
1
0
6
2
4
Max
Max
Max
Min
Min
Difficult to search up to leaf nodes
◦
10^220 possible positions in
shogi
Stop search at practicable depth
And “Evaluate” nodes
◦
Using Evaluation Function
10
Estimate the superiority of the position
Elements
◦
feature vector of the position
◦
parameter vector
feature vector of position s
parameter vector
11
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦
Supervised Learning
◦
Reinforcement Learning
◦
Evolutionary Algorithms
Conclusion
12
Initial work
◦
Samuel’s research [1959]
Learning objective
◦
What do Computer Game Players Learn ?
13
Many useful techniques
◦
Rote learning
◦
Quiescence search
◦
3

layer neural network evaluation function
And some machine learning techniques
◦
Learning through self

play
◦
Temporal

difference learning
◦
Comparison training
14
Opening Book
Search Control
Evaluation Function
15
Automatic construction of evaluation function
◦
Construct and select a feature vector automatically
◦
ex. GLEM [
Buro
, 1998]
◦
Difficult
Tuning evaluation function parameters
◦
Make a feature vector manually and tune its
parameters automatically
◦
Easy and effective
18
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦
Supervised Learning
◦
Reinforcement Learning
◦
Evolutionary Algorithms
Conclusion
19
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
20
Provide the program with example positions and
their exact evaluation values
Adjusts the parameters in a way that minimizes the
error between the evaluation function outputs and
the exact values
20
50
・・・
40
50
21
Manual labeling positions
Quantitative evaluation
22
Consider more soft approach
Soft Supervised Training
Require only relative order for the possible moves
◦
Easier and more intuitive
>
23
Comparison training using records of expert
games
Simple relative order
The expert
move
other moves
>
24
Based on the Optimal Control Theory
Minimize the Cost Function J
example positions in the records
error function
total number of example positions
25
Error Function
child position with move m
total number of possible moves
the move played in the record
minimax
search value
order
discriminant
function
26
Sigmoid Function
◦
k is the parameter to control the gradient
◦
When , T(x) is Step Function
◦
In this case, the error function means “the number
of moves that were considered to be better than the
move in the record”
27
30,000 professional game records and
30,000 high rating game records in SHOGI
CLUB 24 were used
The weight parameters of about 10,000
feature elements were tuned
And won in the World Computer
Shogi
Championship 2006
29
It is costly to accumulate a training data set
◦
It takes a lot of time to label manually
◦
Using expert records has been successful
But how if not enough expert records ?
◦
New games
◦
Minor games
30
Other approach without a training set
◦
ex. Reinforcement Learning (Next)
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
31
The learner gets “a reward” from the
environment
In the domain of game, the reward is final
outcome(win/lose)
Reinforcement learning requires only the
objective information of the game
32
33
+100
+60
+30
+10
+200
+120
+60
+20

100

60

30

10
Inefficient in Games…
34
+100
+60
+30
+10
+80
+15
+10
+10
Trained through self

play
35
Version
Features
Strength
TD

Gammon 0.0
Raw Board
Information
Top
of Computer
Players
TD

Gammon 1.0
Plus Additional
Heuristics
World

championship
Falling into a local optimum
◦
Lack of playing variation
Solutions
◦
Add intentional randomness
◦
Play against various players (computer/human)
Credit Assignment Problem (CAP)
◦
Not clear which action was effective
36
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
37
Initialize Population
Randomly Vary
Individuals
Evaluate “Fitness”
Apply Selection
38
Evolutionary algorithm for chess player
Using open

source chess program
◦
Attempt to tune its parameters
39
Make initial 10 parents
◦
Initialize parameters with random values
40
Create 10
offsprings
from each surviving
parent by mutating parental parameters
Gaussian random variable
strategy parameter
41
Each player plays ten games against randomly
selected opponents
Ten best players become parents of the next
generation
42
Select 10 opponents randomly
Material value
Positional value
Weights and biases of three neural networks
43
Each network has 3 Layers
Input = Arrangement of specific areas
(front 2 rows, back 2 rows, and center 4x4 square)
Hidden = 10 Units
Output = Worth of the area arrangement
44
16 input
10 hidden
1 output
Initial Rating = 2066 (Expert)
◦
Rating of open

source player
Best Rating =
2437
(Senior Master)
But the program cannot yet compete with
other strongest chess programs (R2800~)
45
10 independent trials (Each has 50 generations)
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦
Supervised Learning
◦
Reinforcement Learning
◦
Evolutionary Algorithms
Conclusion
47
Advantages
Disadvantages
Supervised Learning
Direct and Effective
Manual
Labeling Cost
Reinforcement
Learning
Wide
Application
Local Optimal
CAP
Evolutionary Algorithm
Wide Application
No CAP
Indirect
Random
Dispersion
48
Automatic position labeling
◦
Using records or computer play
Sophisticated reward
◦
Consider opponent’s strength
◦
Move analysis for credit assignment
Experiment in other games
49
Comments 0
Log in to post a comment