Slides (Part 1) - Microsoft Research

runmidgeΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

67 εμφανίσεις

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges

Test Beds for Artificial Intelligence


Perfect instrumentation and
measurements


Perfect control and manipulation


Reduced cost


Reduced risk


Great way to showcase algorithms

Improve User Experience


Create adaptive, believable game AI


Compose great multiplayer matches
based on skill and social criteria


Mitigate Network latency using
prediction


Create realistic character movement

Partially observable stochastic games

States only partially observed

Multiple agents choose actions

Stochastic pay
-
offs and state transitions depend on state and all the
other agents’ actions

Goal: Optimise long term pay
-
off (reward)

Just like life: complex, adversarial, uncertain, and we are in it for
the long run!




Reinforcement Learning


Unsupervised Learning


Supervised Learning


Planning and
Pathfinding

Approximations
and Methods


Maximises its chances of winning


Delivers best entertainment value


???

What is the best
AI?

Agents

1977

2001

Human Player

Non
-
Player

Character

Wii
Xbox 360
PS3
50 million

30 million

23 million

Number of Games Consoles
Sold (May 2009)

2002
2003
2004
2005
2006
2007
$21.9 bln

$23.3 bln

$26.3 bln

$27.7 bln

$31.6 bln

$41.9 bln

Worldwide Video Game Revenues


Graphics (graphics cards, displays)


Sound (5.1, 7.1 surround, speakers, headphones)


CPU speed, cache


Memory, Hard disks, DVD, HDDVD (
Blu
-
Ray)


Networking (Broadband)

Hardware
development


Graphics (rendering etc.)


Sound (sound rendering etc.)


Physics simulation (e.g., Havoc engine)


Networking software (e.g., Xbox Live)


Artificial Intelligence

Software
development

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges

Objective is to nurture
creatures called norns

Model incorporates
artificial life features

Norns had neural
network brains

Their development can
be influenced by player
feedback

Peter Molineux’s famous
“God Game”

Player determines fate of
villagers as their “God”
(seen as a hand)

Creature can be taught
complex behaviour

Good and Evil
-

actions
have consequence



Variety of tracks, drivers
and road conditions

Racing line provided by
author, neural network
keeps car on racing line

Multilayer perceptrons
trained with RPROP

Simple rules for recovery
and overtaking



Adaptive avatar for
driving

Separate game mode

Basis of all in
-
game AI

Basis of “dynamic”
racing line

XBOX Game


Dynamic Racing Line


Learning a
Drivatar


Using a
Drivatar


Ranking and Matchmaking for Xbox Live and Halo 3

Xbox 360 Live (Launched 2005)

Every game uses
TrueSkill
™ to match players

> 20 million players, > 3 million matches per day

> 2 billion hours of
gameplay

Halo 3 (Launched 2007)

Largest entertainment launch in history

> 200,000 player concurrently (peak: 1 million)

Halo 3 Game


Matchmaking


Skill Stats


Tight Matches


Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges


Learning an input
-
output relationship from examples


Tasks: regression, classification, ranking


Applications: skill estimation, behavioural cloning

Supervised
Learning


Learning policies from state
-
action
-
reward sequences


Tasks: control, value estimation, policy learning


Applications: learning to drive, learning to walk, learning to fight

Reinforcement
Learning


Learning the underlying structure from examples


Tasks: clustering, manifold learning, density estimation


Applications: modelling motion capture data, user behaviour

Unsupervised
Learning

Goal: Learn from skilled players how to act in a first
-
person
shooter (FPS) game

Test Environment:

Unreal Tournament FPS game engine

Gamebots control framework

Idea: Naive Bayes classifier to learn under which circumstances
to switch behaviour

S
t
:
bot’s

state at t

S
t+1
:
bot’s

state t+1

H
: health level

W
: weapon

OW
: opponent’s weapon

HN
: hear noise

NE
: number of
close enemies

PW
: weapon close by?

PH
: health pack close by?

PH

H

PW

NE

HN

OW

W



Specify probabilities
P
(H| S
t+1
) and
P
(S
t+1
|S
t
)


Use
Bayes

rule for
P
(S
t+1
|H,W,...) etc.

Naive
Bayes

(“inverse
programming”)


Recognition of state of human trainer


Reading out variables from game engine


Determine relative frequencies to estimate
probabilities
P
(H| S
t+1
) (table).

Supervised
learning

Adaptive avatar for
driving

Separate game mode

Basis of all in
-
game AI

Basis of “dynamic”
racing line

“Built
-
In” AI Behaviour

Development Tool

Drivatar

Racing Line

Behaviour Model

Vehicle Interaction and Racing
Strategy

Controller

Car Behaviour







Drivatar

Learning System






Drivatar

AI Driving

Recorded Player

Driving

Two phase process:

1.
Pre
-
generate possible racing lines prior to the race from a
(compressed) racing table.

2.
Switch the lines during the race to add variability.

Compression reduces the memory needs
per
racing line
segment

Switching makes smoother racing lines.

a
4

a
2

a
3

a
1

Segments

Physics (“causal”)

Physics

Simulation

System

Car Position and
Speed at time
t

Static Car
Properties

Car Controls

Car Position and
Speed at time
t
+1

Desired Pos. and
Speed at time
t
+1

Controller

Control (“inverse”)

Competition is central to our lives

Innate biological trait

Driving principle of many sports

Chess Rating for fair competition

ELO: Developed in 1960 by
Árpád Imre Élő

Matchmaking system for tournaments

Challenges of online gaming

Learn from few match outcomes efficiently

Support multiple teams and multiple players per team

Given
:

Match outcomes: Orderings among
k

teams consisting of
n
1
,

n
2
, ...,
n
k

players, respectively

Questions:

Skill
s
i

for each player such that



Global ranking among all players

Fair matches between teams of players

Latent Gaussian performance model for fixed skills

Possible outcomes: Player 1 wins over 2 (and vice versa)

y
1
2

p
1

p
2

s
1

s
2

y
1
2

y
2
3

s
1

s
2

s
3

s
4

t
1

t
2

t
3

Gaussian Prior Factors






Ranking Likelihood Factors

Fast and efficient approximate message passing using
Expectation Propagation

Leaderboard

Global ranking of all players


Matchmaking

For gamers: Most uncertain outcome

For inference: Most informative

Both are equivalent!

Data Set: Halo 2 Beta

3 game modes

Free
-
for
-
All

Two Teams

1 vs. 1

> 60,000 match outcomes

≈ 6,000 players

6 weeks of game play

Publically available

0

5

10

15

20

25

30

35

40

Level

0

100

200

300

400

Number of Games

char (Halo 2 rank)

SQLWildman

(Halo 2 rank)

char (
TrueSkill
™)

SQLWildman

(
TrueSkill
™)

0

100

200

300

400

500

0%

20%

40%

60%

80%

100%

Number of games played

Winning probability

5/8 games won by
char

char wins

SQLWildman

wins

Both players draw

Golf (18 holes)
: 60 levels

Car racing (3
-
4 laps)
: 40 levels

UNO (chance game)
: 10 levels

Model time
-
series of skills
by smoothing across time

History of Chess

3.5M game outcomes (
ChessBase
)

20 million variables (each of
200,000 players in each year of
lifetime + latent variables)

40 million factors

s
t
,
i

s
t
+1,

i

s
t
,
j

s
t
+1,

j

p
t
,
j

p
t
,
i

p
t
,
j

p
t
,
i

p
t
+1,
j

p
t
+1,
i

p
t
+1,
j

p
t
+1,
i

1850

1858

1866

1875

1883

1891

1899

1907

1916

1924

1932

1940

1949

1957

1965

1973

1981

1990

1998

2006

1400

1600

1800

2000

2200

2400

2600

2800

3000

Adolf
Anderssen

Mikhail
Botvinnik

Jose Raul
Capablanca

Robert James Fischer

Anatoly
Karpov

Garry Kasparov

Emanuel
Lasker

Paul
Morphy

Boris V
Spassky

Whilhelm

Steinitz

Year

Skill estimate

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges

g
ame
state

action

action

game state

reward /
punishment

parameter update

Agent

Game

Learning Algorithm

3 ft

Q
-
Table

THROW

KICK

STAND

1ft / GROUND

2ft / GROUND

3ft / GROUND

4ft / GROUND

5ft / GROUND

6ft / GROUND

1ft / KNOCKED

2ft / KNOCKED

3ft / KNOCKED

4ft / KNOCKED

5ft / KNOCKED

6ft / KNOCKED

actions

game states

13.2

10.2

-
1.3

3.2

6.0

4.0

+10.0

Q
(
s
,
a
)

is
expected reward
for action
a

in state
s

α

is
rate

of learning

a

is

action

chosen

r

is

reward

resulting from
a

s

is
current

state

s


is
state

after

executing
a

γ

is
discount

factor for future rewards

Q Learning (off
-
policy)

SARSA (on
-
policy)

Game state features


Separation (5 binned ranges)


Last action (6 categories)


Mode (ground, air, knocked)


Proximity to obstacle

Available Actions


19 aggressive (kick, punch)


10 defensive (block, lunge)


8 neutral (run)

Q
-
Function Representation


One layer neural net (
tanh
)

Reinforcement Learner

In
-
Game AI Code

Early in the learning process …

… after 15 minutes of learning

Reward for decrease in
Wulong

Goth’s health

Early in the learning process …

… after 15 minutes of learning

Punishment for decrease in
either
player’s health

1.
Collect Experience

2.
Learn transition probabilities and
rewards

3.
Revise Value Function and Policy

4.
Revise state
-
action abstraction

5.
Return to 1 and collect more
experience

Speed

Left Distance

Representational Complexity

Too Coarse

Too Fine

Just Right!

Split

A

A

A

A

A

A

A

A

Merge

Split

Merge

Real time racing
simulation.

Goal: as fast lap
times as possible.

Laser Range Finder
Measurements as Features

Progress along Track as
Reward

Coast

Accelerate

Brake

Hard
-
Left

Hard
-
Right

Soft
-
Left

Soft
-
Right

2D Demo


Principles Explained

XBOX 360 Integration


Efficient Implementation


60 fps

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges

Fix Markers at key body
positions

Record their position in
3D during motion

Fundamental technology
in animation today

Free download of mo
-
cap files:
www.bvhfiles.com

Generative model for dimensionality reduction

Probabilistic equivalent to PCA which defines a probability
distribution over data

Non
-
linear manifolds based on kernels

Visualisation of high
-
dimensional data

Back
-
projection from latent to data space

Can deal with missing data

Principal Component
Analysis:

Marginalise over x and
optimise W

Gaussian Process Latent
Variable Model:

Marginalise over W and
optimise x

Latent variables

Weight matrix

Data

x

W

y

Tutorial

Introduction

Machine
Learning

Commercial
Games

Supervised
Learning

Reinforcement
Learning

Unsupervised
Learning

Coffee
Break

Artificial
Intelligence

Pathfinding

Planning in
Games

Next Steps

Testbeds

Future
Challenges