©
2010
Autodesk
Massive Parallelism in AI
Throughput versus Realtime
Pierre Pontevia
10
th
March 2010
©
2010
Autodesk
Agenda
Where are
we
today
The
pathfinding
challenge : from
throughput
to
realtime
MASAI :
the premises of an AI massive parallel
solution
©
2010
Autodesk
WHERE ARE WE
TODAY
?
©
2010
Autodesk
Where are we
today
?
Parallel
programming has becoming a
reality
for game
developers since the arrival of
”next gen”
consoles (2005

2006)
Since then, a lot of new
languages
and
programming models
have been suggested to better tackle parallelism,
And new
hardware
is being announced, shaping the
future
of
consoles
…
So this is a good moment to see how
parallelism
could be
revisited
for the games of tomorrow… with a special focus on
pathfinding
©
2010
Autodesk
As a start, the
13 dwarves
should help us to find
the right
parallel pattern
The
13 dwarves
is an initiative from
Berkeley University
to help
achieve high parallelism
A dwarf is an
algorithmic method
that captures a
pattern
of
computation
and
communication
The 1st exercise is to
identify
which
dwarves
match the
problems involved in pathfinding
©
2010
Autodesk
As a start, the
13 dwarves
should help us to find
the right
parallel pattern
(cont’d)
Dwarf
Description
1. Dense Linear Algebra
Data are dense matrices or
vectors
2. Sparse Linear Algebra
Data sets include many zero values. Data is usually stored
in compressed matrices to reduce the storage and
bandwidth requirements to access all of the nonzero values
3. Spectral Methods
Data are in the frequency domain, as opposed to time or
spatial
domains
4. N

Body Methods
Depends on interactions between many discrete points.
Variations include particle

particle methods
5. Structured Grids
Represented by a regular grid; points on grid are
conceptually updated together. It has high spatial
locality
6. Unstructured Grids
An irregular grid where data locations are selected, usually
by underlying characteristics of the
application
7. Monte Carlo
Calculations depend on statistical results of repeated
random trials
©
2010
Autodesk
As a start, the
13 dwarves
should help us to find
the right
parallel pattern
(cont’d)
Dwarf
Description
8. Combinational Logic
Functions that are implemented with logical functions and
stored
state
9. Graph traversal
Visits many nodes in a graph by following successive edges.
These applications typically involve many levels of
indirection, and a relatively small amount of
computation
10. Dynamic Programming
Computes a solution by solving simpler overlapping
sub
problems.
Particularly useful in optimization problems with a
large set of feasible
solutions
11. Backtrack
and
Branch +
Bound
Finds an optimal solution by recursively dividing the feasible
region into
sub domains,
and then pruning
sub problems
that are
suboptimal
12. Construct Graphical
Models
Constructs graphs that represent random variables as nodes
and conditional dependencies as edges. Examples include
Bayesian networks and Hidden Markov
Models
13. Finite State Machine
A system whose behavior is defined by states, transitions
defined by inputs and the current state, and events
associated with transitions or
states
©
2010
Autodesk
Recent
languages
and
programming models
provide guidance for parallel implementation
Data Parallelism for homogenous
architectures
•
OpenMP
•
TBB
•
Ct
Data Parallelism for heterogeneous
architectures
•
CUDA,
•
OpenCL,
•
DirectCompute
•
SPURS
•
RapidMind
PC clusters
•
MPI
•
Map Reduce
Concurrent Programming
•
PPL, Asynchronous Agents
•
Grand Central Station
©
2010
Autodesk
However, there are
specific constraints
in the
video games
impacting on parallel design…
Memory Resources Constraints
How much scratch memory required by solver
Concurrent Memory access
Computations are done on data which can change significantly from frame to
frame
Data lifetime / persistence
Things are volatile by nature
Reactivity
/ Time delay / Frequency constraints
When do you really need the result of your computation
Interruptibility
The system can change its mind
–
80% of the path goals are never reached
©
2010
Autodesk
…and even
more constraints
when you develop
middleware
Multiple cohabitant
models
Several middleware with several threading models
Not
blocking
is not enough

> fine tuning issues
Spurs everywhere?
Multiple
HW
targets
PC is different from Xbox 360 console which is different from a
PlayStation
®
3 (PS3) console
Multiple exclusive programming languages
©
2010
Autodesk
A gap analysis on existing solutions shows that
no one solution fits the video game context
perfectly
No model really takes care of
memory as a limitating
resource
in the design of parallel solutions
No model takes into account
time as a dimension
of
the problem
All the approches are very
throughput
oriented
©
2010
Autodesk
THE
PATHFINDING
CHALLENGE :
FROM
THROUGHPUT
TO
REALTIME
©
2010
Autodesk
Pathfinding
in a nutshell
Path Planning
Path
Smoothing
DA
(*)
&
Steering
LOW FREQUENCY (0,1 Hz)
•
Input :

Topology

current position

destination
•
Output :

Valid Path
MEDIUM FREQUENCY (2 Hz)
Input :

current position

destination
•
Output :

Target point
HIGH FREQUENCY (10 Hz)
•
Input :

current position

Target point
•
Output :

New Target point
(*): DA

Dynamic Avoidance
A
B
©
2010
Autodesk
Pathfinding is made of different
solvers
with
different
characteristics
3
categories of solvers:
A*,
Graph Traversal
: low frequency
/
large input

work
memory
Trajectory Smoothing
: medium frequency
/optional
DA / Steering
:
high frequency/critical
Frequency
Work Memory
requirements
•
A*
•
Graph Traversals
•
Smoothing
•
DA
•
Steering
10
3
0.2
> 500 K
< 5 K
©
2010
Autodesk
There are
2 natures
of
data parallelism
in
pathfinding
Number of characters
: all solver jobs increase linearly with the
number of characters
Size of graph
: Graph
Traversal related solvers can use a
Dwarf 9 pattern solving approach
©
2010
Autodesk
A first approach could be a
single frame batch
paradigm
(throughput)
compatible with most programming models
Pathfinder
–
Entity 1
Path Request
Queue
Target Request
Queue
DA Request
Queue
Steering Request
Queue
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Search
Path
Task
Select
Target
Task
Compute
DA
Task
Compute
Steering
Task
PPM (Parallel Programming Model)
MiddleWare
Queue
PPM
Queue
Framework
©
2010
Autodesk
Each task request has a context composed of character
data, global data, and potentially customized objects
Searching
Path
Start & Destination
Movement Model
Constraint
LPF
(*)
Shortcut
Pathdata
Potentially all
PathObjects
Path
Selecting
Target
Current Pos
Current target
Path
Movement Model
Constraint
LPF
(*)
Shortcut
PathObjects of the path
Pathdata
Target Pos
Computing
DA Target
Current Pos
Current Target
Movement Model
Cluster of entities
Pathdata
DA Target Pos
Steering
Current Pos
Current DA Target
Movement Model
Current PathObject
LPF Shortcut
Wanted Speed & Yaw
Character Context
Global Data
Customizable
Output
(*): LPF
–
Obstacle Avoidance
©
2010
Autodesk
Compute
Path
Compute
Target
Point
Compute
DA Tgt
Point
Compute
Steering
However, as the number of
solvers
can be
limited
by
memory
…
Thread 1
Thread 2
©
2010
Autodesk
…throughput maximization approach in
parallelization
can be capped by
Amdahl’ law
Thread 1
Thread 1
Thread 2
Thread 1
Thread 2
Parallel

No memory limitation
Parallel

Memory constrained environment
Serial

No memory limitation
©
2010
Autodesk
To avoid that, the Pathfinding solution needs to
find more
task parallelism
on
time dimension
Moving from
“How to solve all the work within a frame”
To
“How to distribute work across several frames”
©
2010
Autodesk
A good illustration is describing Pathfinding as a
statechart
with 4
orthogonal
states
Stopped
Path Not Found
Has Arrived
Active
Target Selection
No Target
Selecting
Target
Target
Found
Path Updated
Target Found
Has arrived
DA Target
No DA Target
Computing
DA Target
DA Target
Computed
Target Updated
DA Target Found
Has arrived
Steering
No Steering
Computing
Steering
Steering
Computed
DA Target Updated
Steering Computed
Has arrived
Path Planning
No Path
Searching
Path
Path
Found
New Destination
Path Found
Has arrived
New Destination
New Pos
Path Updated
New Pos
Target Updated
New Pos
DA Target Updated
New Destination
Pos updated
©
2010
Autodesk
It is still compatible with the precedent approach, but
multiframe
(no more capped by Amdahl’s law)
Path Request
Queue
Target Request
Queue
DA Request
Queue
Steering Request
Queue
Search
Path
Task
Select
Target
Task
Compute
DA
Task
Compute
Steering
Task
MiddleWare
Queue
Framework
Active
Target Selection
No Target
Selecting
Target
Target
Found
Path Updated
Target Found
Has arrived
DA Target
No DA Target
Computing
DA Target
DA Target
Computed
Target Updated
DA Target Found
Has arrived
Steering
No Steering
Computing Steering
Steering
Computed
DA Target Updated
Steering Computed
Has arrived
Path Planning
No Path
Searching
Path
Path
Found
New Destination
Path Found
Has arrived
New Destination
New Pos
Path Updated
New Pos
Target Updated
New Pos
DA Target Updated
Active
Target Selection
No Target
Selecting
Target
Target
Found
Path Updated
Target Found
Has arrived
DA Target
No DA Target
Computing
DA Target
DA Target
Computed
Target Updated
DA Target Found
Has arrived
Steering
No Steering
Computing Steering
Steering
Computed
DA Target Updated
Steering Computed
Has arrived
Path Planning
No Path
Searching
Path
Path
Found
New Destination
Path Found
Has arrived
New Destination
New Pos
Path Updated
New Pos
Target Updated
New Pos
DA Target Updated
Active
Target Selection
No Target
Selecting
Target
Target
Found
Path Updated
Target Found
Has arrived
DA Target
No DA Target
Computing
DA Target
DA Target
Computed
Target Updated
DA Target Found
Has arrived
Steering
No Steering
Computing Steering
Steering
Computed
DA Target Updated
Steering Computed
Has arrived
Path Planning
No Path
Searching
Path
Path
Found
New Destination
Path Found
Has arrived
New Destination
New Pos
Path Updated
New Pos
Target Updated
New Pos
DA Target Updated
©
2010
Autodesk
But now we have
3
new
problems
Problem 1
: How to
guarantee
that high frequency
steering solvers return value
on time
?
Problem 2
: How to deal with
multiframe volatility
and
dynamicity
of data?
Problem 3
: What computation
triggering
logic do
we want?
©
2010
Autodesk
Problem 1 is a
scheduling
problem for
realtime
systems
Problem 1 can be reworded as follows:
“How to guarantee a
deadline
for each pathfinding
solver
request
compatible with the
frequency
of the solver”
This is very close the definition of a realtime software as found on
Wikipedia:
“In computer science,
real

time computing
(RTC), or "reactive
computing", is the study of hardware and software systems that are
subject to a "
real

time constraint
"
—
i.e., operational
deadlines
from
event
to system
response
”
The good news is that there is a good literature
on realtime scheduling!
©
2010
Autodesk
To answer
problem 1
we restate pathfinding
solvers in a
realtime
formalism…
Realtime formalism: a task x is defined by 4 parameters
X.s
: starting time
X.d
: deadline
X.e
: execution requirement
X.p
: execution period
Adapting to pathfinding solvers:
Need to assume
all tasks
are
periodic
:
Easy for smoothing, steering or DA solvers
More tricky for A* and other Graph traversals solvers
Need to have an
estimate
of each core solver
job duration
:
Again quite simple for smoothing, steering or DA solvers
Much less easy for A* and other Graph traversals solvers

> need to decompose graph
traversal tasks into subtasks of constant duration
©
2010
Autodesk
…and select a
scheduling
algorithm
P

fairness scheduling scheme
(S.K. Baruah, N.K. Cohen, C.G. Plaxton, D.A. Varvel)
:
Defines a notion of
proportionate progress
called P

fairness
Uses it to define an efficient algorithm solving the periodic scheduling problem
Cache

aware P

fair based scheduling scheme
(J.H. Anderson, J.M. Calendrino, U.M. Devi)
Extends P

fairness approach to avoid scheduling of co

existent threads that
would worsen performance of shared caches
Task

grouping P

fair based scheduling scheme
(J.H.
Anderson, J.M. Calendrino)
Extends P

fairness approach to encourage grouping of tasks that share
common working set
©
2010
Autodesk
Answering
problem 2
(volatile data) requires a
better description of
memory models
Programming models differ in the way they manage
memory space
Homogenous
models: unified memory
Heterogeneous
models: Host / Device space
Today only
homogenous
models offer a transparent
memory management
For
heterogeneous
models, the developer still has to do
a
lot of work
©
2010
Autodesk
Programming models differ in the way they
manage memory space
Framework
Request
Queue
Compute
Kernel
Compute
Kernel
Compute
Kernel
Compute
Kernel
Task
OpenCL
Queue
Host Memory Space
Device Memory Space
©
2010
Autodesk
There is a need for locking
mechanism between the framework
and the kernel
Framework
Request
Task
Request
Kernel
Request
Kernel
Execution
Task
Update
Framework
Update
Inserting
Data
OK
OK
LOCK
(if Kernel
uses data)
OK
OK
OK
Data Ready
OK
OK
OK
OK
OK
OK
Data Locked
OK
OK
OK
LOCK
(if Kernel
accesses host
memory)
OK
OK
Removing
Data
OK
OK
LOCK
(if Kernel
uses data)
OK
OK
OK
©
2010
Autodesk
It requires also a better description of
user data
There are 3 types of
user data
:
Read Only
Memory
(e.g. navmesh in a static world)
Needs to be aware of when user data is available and when it is garbage
Read / Write
Memory
(e.g.. navmesh in a dynamic world)
Same as Read Only approach, with extension to secure data modification
stages
Work
Memory
(e.g. open & closed sets for a A* solver)
Located where the solver is really called
©
2010
Autodesk
Data Lifecycle States
Data Life cycle States are introduced to handle R/O
and R/W data volatility and dynamicity
Data
Ready
Notifying Data
To be Inserted
Data in
Insertion
Data in
Removal
Notifying Data
Removed
Data
Locked
On Dependency Insertion / Removal
Dependency Inserted / Removed
CRITICAL when data are not owned by middleware
©
2010
Autodesk
Problem 3
(triggering logic)
requires choosing
between
Pull
or
Push
Triggering mechanism
To limit computations over time, it is important to decide
whether we want a pull or push triggering model
In a push model
, the system
polls
over all the characters to get new steering
policy
In a pull model
, the system gets
update requirements
from the game engine
and only performs computations on related characters
The pull model better controls the amount of computations
–
not really compatible with a Realtime approach
The push model offers the capabilities of optimizing from a
Cache and Task Grouping point of view
©
2010
Autodesk
MASAI : THE PREMISES OF AN AI
MASSIVE PARALLEL
SOLUTION
©
2010
Autodesk
Guidelines for a new parallel programming
model for realtime AI
•
Extends to the full AI the rational described in previous slides
•
Data
/ Message Flow
based
system
•
Realtime
P

fair Scheduling
algorithm
•
Compatible with
heterogeneous
programming models
•
Push
Triggering
Mechanism
©
2010
Autodesk
Introducing the concept of Working Unit
A WU receives requests to process
A WU communicates with another WU ONLY through strongly typed requests
Requests are explicitly exposed in the WU interface
A request can be synchronous or asynchronous (2 different implementations of the
request)
A WU is responsible for the serialization Host<

>Device of its context
Working Unit
Host Code
Device Code
Owner / Children
Event Handler
Incoming Requests
Queues
Context
Context
Serializer
Requests
Interface
Context
Accessors
©
2010
Autodesk
The system works on a mixture of events and requests
Entity 1
Entity 2
Entity …
Brain1
Brain 2
Brain …
PF 1
PF 2
PF …
Entity Update WU
Entity Update
Queue
Brain Update WU
Brain Update
Queue
Pathfinding WU
Pathfinding
Update
Queue
Pathdata
Mgr
CanGo WU
CanGo
Queue
World Update
WU
World
Update
Queue
Request
Event
Game Engine
World1
World…
Geometry
Mgr
IsVisible WU
IsVisible
Queue
©
2010
Autodesk
The underlying architecture would rely on
a event broadcaster and communicating
components
Global Events Broadcaster
Local Events Broadcaster
SearchPath
CC
SelectTarget
CC
ComputeDA
CC
Steering
CC
Local Events Broadcaster
SearchPath
CC
SelectTarget
CC
ComputeDA
CC
Steering
CC
Communicating Component = Working Unit for parallelism
©
2010
Autodesk
Open challenges
Customized Objects vs. Data / Services model
Interruptability
Multi

platform
Scheduling algorithm performance
And many more
…
©
2010
Autodesk
Multiplatform
Too many programming languages!
C++
C for OpenCL
C for CUDA
C99 for Spurs
HLSL 5 for DirectX
…
Which standards will emerge?
Which standards will be chosen in future consoles?
©
2010
Autodesk
GAME DEVELOPER ZONE
www.the

area.com/gamedev
©
2010
Autodesk
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο