# Massive Parallelism in AI

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

95 εμφανίσεις

2010
Autodesk

Massive Parallelism in AI

Throughput versus Realtime

Pierre Pontevia

10
th

March 2010

2010
Autodesk

Agenda

Where are

we
today

The
pathfinding

challenge : from
throughput

to
realtime

MASAI :

the premises of an AI massive parallel
solution

2010
Autodesk

WHERE ARE WE
TODAY
?

2010
Autodesk

Where are we
today
?

Parallel
programming has becoming a
reality

for game
developers since the arrival of

”next gen”
consoles (2005
-
2006)

Since then, a lot of new
languages
and
programming models
have been suggested to better tackle parallelism,

And new
hardware
is being announced, shaping the
future
of
consoles

So this is a good moment to see how
parallelism

could be
revisited

for the games of tomorrow… with a special focus on
pathfinding

2010
Autodesk

As a start, the
13 dwarves
should help us to find
the right
parallel pattern

The
13 dwarves
is an initiative from
Berkeley University
to help
achieve high parallelism

A dwarf is an
algorithmic method
that captures a
pattern
of
computation
and
communication

The 1st exercise is to
identify
which
dwarves
match the
problems involved in pathfinding

2010
Autodesk

As a start, the
13 dwarves
should help us to find
the right
parallel pattern
(cont’d)

Dwarf

Description

1. Dense Linear Algebra

Data are dense matrices or
vectors

2. Sparse Linear Algebra

Data sets include many zero values. Data is usually stored
in compressed matrices to reduce the storage and
bandwidth requirements to access all of the nonzero values

3. Spectral Methods

Data are in the frequency domain, as opposed to time or
spatial
domains

4. N
-
Body Methods

Depends on interactions between many discrete points.
Variations include particle
-
particle methods

5. Structured Grids

Represented by a regular grid; points on grid are
conceptually updated together. It has high spatial
locality

6. Unstructured Grids

An irregular grid where data locations are selected, usually
by underlying characteristics of the
application

7. Monte Carlo

Calculations depend on statistical results of repeated
random trials

2010
Autodesk

As a start, the
13 dwarves
should help us to find
the right
parallel pattern
(cont’d)

Dwarf

Description

8. Combinational Logic

Functions that are implemented with logical functions and
stored
state

9. Graph traversal

Visits many nodes in a graph by following successive edges.
These applications typically involve many levels of
indirection, and a relatively small amount of
computation

10. Dynamic Programming

Computes a solution by solving simpler overlapping
sub
problems.
Particularly useful in optimization problems with a
large set of feasible
solutions

11. Backtrack
and

Branch +
Bound

Finds an optimal solution by recursively dividing the feasible
region into
sub domains,
and then pruning
sub problems
that are
suboptimal

12. Construct Graphical
Models

Constructs graphs that represent random variables as nodes
and conditional dependencies as edges. Examples include
Bayesian networks and Hidden Markov
Models

13. Finite State Machine

A system whose behavior is defined by states, transitions
defined by inputs and the current state, and events
associated with transitions or
states

2010
Autodesk

Recent
languages
and
programming models
provide guidance for parallel implementation

Data Parallelism for homogenous
architectures

OpenMP

TBB

Ct

Data Parallelism for heterogeneous
architectures

CUDA,

OpenCL,

DirectCompute

SPURS

RapidMind

PC clusters

MPI

Map Reduce

Concurrent Programming

PPL, Asynchronous Agents

Grand Central Station

2010
Autodesk

However, there are
specific constraints
in the
video games
impacting on parallel design…

Memory Resources Constraints

How much scratch memory required by solver

Concurrent Memory access

Computations are done on data which can change significantly from frame to
frame

Things are volatile by nature

Reactivity

/ Time delay / Frequency constraints

When do you really need the result of your computation

Interruptibility

The system can change its mind

80% of the path goals are never reached

2010
Autodesk

…and even
more constraints
when you develop
middleware

Multiple cohabitant

models

Several middleware with several threading models

Not

blocking
is not enough
-
> fine tuning issues

Spurs everywhere?

Multiple
HW
targets

PC is different from Xbox 360 console which is different from a
PlayStation
®

3 (PS3) console

Multiple exclusive programming languages

2010
Autodesk

A gap analysis on existing solutions shows that
no one solution fits the video game context
perfectly

No model really takes care of
memory as a limitating
resource
in the design of parallel solutions

No model takes into account
time as a dimension
of
the problem

All the approches are very
throughput
oriented

2010
Autodesk

THE
PATHFINDING

CHALLENGE :
FROM
THROUGHPUT

TO
REALTIME

2010
Autodesk

Pathfinding

in a nutshell

Path Planning

Path

Smoothing

DA
(*)

&

Steering

LOW FREQUENCY (0,1 Hz)

Input :

-
Topology

-
current position

-
destination

Output :

-
Valid Path

MEDIUM FREQUENCY (2 Hz)

Input :

-
current position

-
destination

Output :

-
Target point

HIGH FREQUENCY (10 Hz)

Input :

-
current position

-
Target point

Output :

-
New Target point

(*): DA
-

Dynamic Avoidance

A

B

2010
Autodesk

solvers

with
different
characteristics

3

categories of solvers:

A*,

Graph Traversal
: low frequency
/
large input
-
work

memory

Trajectory Smoothing
: medium frequency
/optional

DA / Steering
:

high frequency/critical

Frequency

Work Memory
requirements

A*

Graph Traversals

Smoothing

DA

Steering

10

3

0.2

> 500 K

< 5 K

2010
Autodesk

There are
2 natures
of
data parallelism
in
pathfinding

Number of characters
: all solver jobs increase linearly with the
number of characters

Size of graph
: Graph

Traversal related solvers can use a
Dwarf 9 pattern solving approach

2010
Autodesk

A first approach could be a
single frame batch

(throughput)
compatible with most programming models

Pathfinder

Entity 1

Path Request

Queue

Target Request

Queue

DA Request

Queue

Steering Request

Queue

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

Search

Path

Select

Target

Compute

DA

Compute

Steering

PPM (Parallel Programming Model)

MiddleWare

Queue

PPM

Queue

Framework

2010
Autodesk

Each task request has a context composed of character
data, global data, and potentially customized objects

Searching
Path

Start & Destination

Movement Model

Constraint

LPF
(*)

Shortcut

Pathdata

Potentially all
PathObjects

Path

Selecting
Target

Current Pos

Current target

Path

Movement Model

Constraint

LPF
(*)

Shortcut

PathObjects of the path

Pathdata

Target Pos

Computing
DA Target

Current Pos

Current Target

Movement Model

Cluster of entities

Pathdata

DA Target Pos

Steering

Current Pos

Current DA Target

Movement Model

Current PathObject

LPF Shortcut

Wanted Speed & Yaw

Character Context

Global Data

Customizable

Output

(*): LPF

Obstacle Avoidance

2010
Autodesk

Compute

Path

Compute

Target

Point

Compute

DA Tgt

Point

Compute

Steering

However, as the number of
solvers

can be
limited

by
memory

2010
Autodesk

…throughput maximization approach in
parallelization

can be capped by
Amdahl’ law

Parallel
-

No memory limitation

Parallel
-

Memory constrained environment

Serial
-

No memory limitation

2010
Autodesk

To avoid that, the Pathfinding solution needs to
find more
on
time dimension

Moving from

“How to solve all the work within a frame”

To

“How to distribute work across several frames”

2010
Autodesk

A good illustration is describing Pathfinding as a

statechart
with 4
orthogonal

states

Stopped

Has Arrived

Active

Target Selection

No Target

Selecting

Target

Target

Found

Path Updated

Target Found

Has arrived

DA Target

No DA Target

Computing

DA Target

DA Target

Computed

Target Updated

DA Target Found

Has arrived

Steering

No Steering

Computing
Steering

Steering

Computed

DA Target Updated

Steering Computed

Has arrived

Path Planning

No Path

Searching

Path

Path

Found

New Destination

Path Found

Has arrived

New Destination

New Pos

Path Updated

New Pos

Target Updated

New Pos

DA Target Updated

New Destination

Pos updated

2010
Autodesk

It is still compatible with the precedent approach, but

multiframe

(no more capped by Amdahl’s law)

Path Request

Queue

Target Request

Queue

DA Request

Queue

Steering Request

Queue

Search

Path

Select

Target

Compute

DA

Compute

Steering

MiddleWare

Queue

Framework

Active

Target Selection

No Target

Selecting

Target

Target

Found

Path Updated

Target Found

Has arrived

DA Target

No DA Target

Computing

DA Target

DA Target

Computed

Target Updated

DA Target Found

Has arrived

Steering

No Steering

Computing Steering

Steering

Computed

DA Target Updated

Steering Computed

Has arrived

Path Planning

No Path

Searching

Path

Path

Found

New Destination

Path Found

Has arrived

New Destination

New Pos

Path Updated

New Pos

Target Updated

New Pos

DA Target Updated

Active

Target Selection

No Target

Selecting

Target

Target

Found

Path Updated

Target Found

Has arrived

DA Target

No DA Target

Computing

DA Target

DA Target

Computed

Target Updated

DA Target Found

Has arrived

Steering

No Steering

Computing Steering

Steering

Computed

DA Target Updated

Steering Computed

Has arrived

Path Planning

No Path

Searching

Path

Path

Found

New Destination

Path Found

Has arrived

New Destination

New Pos

Path Updated

New Pos

Target Updated

New Pos

DA Target Updated

Active

Target Selection

No Target

Selecting

Target

Target

Found

Path Updated

Target Found

Has arrived

DA Target

No DA Target

Computing

DA Target

DA Target

Computed

Target Updated

DA Target Found

Has arrived

Steering

No Steering

Computing Steering

Steering

Computed

DA Target Updated

Steering Computed

Has arrived

Path Planning

No Path

Searching

Path

Path

Found

New Destination

Path Found

Has arrived

New Destination

New Pos

Path Updated

New Pos

Target Updated

New Pos

DA Target Updated

2010
Autodesk

But now we have
3

new
problems

Problem 1
: How to
guarantee
that high frequency
steering solvers return value
on time
?

Problem 2

: How to deal with
multiframe volatility
and
dynamicity
of data?

Problem 3
: What computation
triggering
logic do
we want?

2010
Autodesk

Problem 1 is a
scheduling

problem for
realtime

systems

Problem 1 can be reworded as follows:

“How to guarantee a

for each pathfinding
solver

request
compatible with the
frequency

of the solver”

This is very close the definition of a realtime software as found on
Wikipedia:

“In computer science,
real
-
time computing
(RTC), or "reactive
computing", is the study of hardware and software systems that are
subject to a "
real
-
time constraint
"

i.e., operational

from
event

to system
response

The good news is that there is a good literature

on realtime scheduling!

2010
Autodesk

problem 1

we restate pathfinding
solvers in a
realtime

formalism…

Realtime formalism: a task x is defined by 4 parameters

X.s

: starting time

X.d

X.e

: execution requirement

X.p

: execution period

Need to assume
are
periodic
:

Easy for smoothing, steering or DA solvers

More tricky for A* and other Graph traversals solvers

Need to have an
estimate

of each core solver
job duration
:

Again quite simple for smoothing, steering or DA solvers

Much less easy for A* and other Graph traversals solvers
-
> need to decompose graph

2010
Autodesk

…and select a
scheduling

algorithm

P
-
fairness scheduling scheme

(S.K. Baruah, N.K. Cohen, C.G. Plaxton, D.A. Varvel)
:

Defines a notion of
proportionate progress
called P
-
fairness

Uses it to define an efficient algorithm solving the periodic scheduling problem

Cache
-
aware P
-
fair based scheduling scheme

(J.H. Anderson, J.M. Calendrino, U.M. Devi)

Extends P
-
fairness approach to avoid scheduling of co
-
would worsen performance of shared caches

-
grouping P
-
fair based scheduling scheme

(J.H.
Anderson, J.M. Calendrino)

Extends P
-
fairness approach to encourage grouping of tasks that share
common working set

2010
Autodesk

problem 2

(volatile data) requires a
better description of
memory models

Programming models differ in the way they manage
memory space

Homogenous

models: unified memory

Heterogeneous

models: Host / Device space

Today only
homogenous

models offer a transparent
memory management

For
heterogeneous

models, the developer still has to do
a
lot of work

2010
Autodesk

Programming models differ in the way they
manage memory space

Framework

Request

Queue

Compute

Kernel

Compute

Kernel

Compute

Kernel

Compute

Kernel

OpenCL

Queue

Host Memory Space

Device Memory Space

2010
Autodesk

There is a need for locking
mechanism between the framework
and the kernel

Framework

Request

Request

Kernel

Request

Kernel

Execution

Update

Framework

Update

Inserting

Data

OK

OK

LOCK

(if Kernel

uses data)

OK

OK

OK

OK

OK

OK

OK

OK

OK

Data Locked

OK

OK

OK

LOCK

(if Kernel
accesses host
memory)

OK

OK

Removing

Data

OK

OK

LOCK

(if Kernel

uses data)

OK

OK

OK

2010
Autodesk

It requires also a better description of
user data

There are 3 types of
user data
:

Memory

(e.g. navmesh in a static world)

Needs to be aware of when user data is available and when it is garbage

Memory

(e.g.. navmesh in a dynamic world)

Same as Read Only approach, with extension to secure data modification
stages

Work

Memory

(e.g. open & closed sets for a A* solver)

Located where the solver is really called

2010
Autodesk

Data Lifecycle States

Data Life cycle States are introduced to handle R/O
and R/W data volatility and dynamicity

Data

Notifying Data
To be Inserted

Data in
Insertion

Data in
Removal

Notifying Data
Removed

Data
Locked

On Dependency Insertion / Removal

Dependency Inserted / Removed

CRITICAL when data are not owned by middleware

2010
Autodesk

Problem 3
(triggering logic)

requires choosing
between
Pull

or
Push
Triggering mechanism

To limit computations over time, it is important to decide
whether we want a pull or push triggering model

In a push model
, the system
polls
over all the characters to get new steering
policy

In a pull model
, the system gets
update requirements
from the game engine
and only performs computations on related characters

The pull model better controls the amount of computations

not really compatible with a Realtime approach

The push model offers the capabilities of optimizing from a
Cache and Task Grouping point of view

2010
Autodesk

MASAI : THE PREMISES OF AN AI
MASSIVE PARALLEL
SOLUTION

2010
Autodesk

Guidelines for a new parallel programming
model for realtime AI

Extends to the full AI the rational described in previous slides

Data
/ Message Flow
based

system

Realtime
P
-
fair Scheduling
algorithm

Compatible with
heterogeneous

programming models

Push

Triggering

Mechanism

2010
Autodesk

Introducing the concept of Working Unit

A WU receives requests to process

A WU communicates with another WU ONLY through strongly typed requests

Requests are explicitly exposed in the WU interface

A request can be synchronous or asynchronous (2 different implementations of the
request)

A WU is responsible for the serialization Host<
-
>Device of its context

Working Unit

Host Code

Device Code

Owner / Children

Event Handler

Incoming Requests
Queues

Context

Context

Serializer

Requests

Interface

Context

Accessors

2010
Autodesk

The system works on a mixture of events and requests

Entity 1

Entity 2

Entity …

Brain1

Brain 2

Brain …

PF 1

PF 2

PF …

Entity Update WU

Entity Update

Queue

Brain Update WU

Brain Update

Queue

Pathfinding WU

Pathfinding
Update
Queue

Pathdata
Mgr

CanGo WU

CanGo

Queue

World Update
WU

World
Update

Queue

Request

Event

Game Engine

World1

World…

Geometry
Mgr

IsVisible WU

IsVisible

Queue

2010
Autodesk

The underlying architecture would rely on
components

SearchPath

CC

SelectTarget

CC

ComputeDA

CC

Steering

CC

SearchPath

CC

SelectTarget

CC

ComputeDA

CC

Steering

CC

Communicating Component = Working Unit for parallelism

2010
Autodesk

Open challenges

Customized Objects vs. Data / Services model

Interruptability

Multi
-
platform

Scheduling algorithm performance

And many more

2010
Autodesk

Multiplatform

Too many programming languages!

C++

C for OpenCL

C for CUDA

C99 for Spurs

HLSL 5 for DirectX

Which standards will emerge?

Which standards will be chosen in future consoles?