Multi-User Virtual Worlds Accomplishments - chmpr

orangesvetΗλεκτρονική - Συσκευές

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

66 εμφανίσεις

Multi
-
user Extensible
Virtual Worlds


Increasing complexity of objects
and interactions with increasing
world size, users, numbers of
objects and types of interactions.


Sheldon Brown,

Site Director CHMPR, UCSD

Daniel Tracy, Programmer, Experimental Game Lab

Erik Hill, Programmer, Experimental Game Lab

Todd Margolis, Technical Director, CRCA

Kristen Kho, Programmer, Experimental Game Lab

Current schemes using compute clusters break virtual worlds into
small “shards” which have a few dozen interacting objects.
Compute systems with large amounts of coherent addressable
memory alleviate cluster node jumping and can create worlds
with several orders of higher level data complexity. Tens of
thousands of entities vs. dozens per shard. Takes advantage of
techniques hybrid compute techniques for richer object
dynamics.


Central server manages world state
changes

Number of clients and amount of activity
determines world size and shape

City road schemes are computed for
each player when they enter a new city,
using Hybrid multicore compute
accelerators

Each player has several
views of the world:


Partial view of one city


Total view of one city


Partial view of two cities


View of entire globe

Within a city are several
thousand objects. The
dynamics of these objects are
computed on the best
available resource,
balancing computability and
coherency and alleviating
world Sharding.

Many classes of computing devices are used.


Multi
-
core portable devices (i.e.
snapdragon based cell phone)


Computing cloud data storage.


z10 mainframe


transaction
processing state management


Server side compute accelerators:

NVidia Tesla, Cell processor and
x86


Varied desktop comptuation
including hybrid multicore


Increasing complexity of objects and interactions with increasing world size,
users, numbers of objects and

types of interactions.

Server services are distributed across cloud clusters, and redistributed across
clients as performance or local work necessitates. Coherency with overall system
is pursued, managed by centralized server. Virtual world components have
dynamic tolerance levels for discoherency and latency.


Cell Processor,
x86 and GPU
compute
accelerators for
asset
transformation,
physics and
behaviors.

Multiple 10gb interfaces
to compute accelerators,
storage clusters and
compute cloud.

Development Server Framework 5/2010

3 10gb interfaces to
compute
accelerators

Z10 mainframe computer at San
Diego Supercomputer Center

2
-

IFL’s with 128mb Ram, zVM
virtual OS manager with Linux
guests

6 tb storage fast local storage


15K disks

4 SR and 2 LR 10gb ethernet
interfaces

2 QS22 blades


4
Cell Processors

2 HS22 blades
-

4
Xeons

1 10gb interfaces to
internet

4 QS20 blades

nVidia Tesla accelerator


4 GPU’s on linux host,
external dual pci
connection.

Many Clients

SDSC View


Producing a multi
-
user
networked virtual world from a
single
-
player environment

Multi
-
user Extensible
Virtual Worlds



Goals


Feasibility


Transformation from single
-
player program to client/server
multi
-
player networking is non
-
trivial


Structured methodology for transformation required


Scalability


Support large environments, massively multi
-
player


After working version, iteratively tackle bottlenecks


Multi
-
platform server


Explore z10, x86, CellBE, Tesla accelerators


Cross
-
platform communication required

Evaluate

drop in


solutions

Benefits and liabilities of client/server
side schemes such as OpenSIM and
Darkstar.


Custom virtual reality engine

ERSATZ

Ogre3D

real time3D rendering engine

OpenGL Direct3D

The (Original) Scalable City

Technology Infrastructure

ODE, Newton

Open source physics libraries

NVIDIA FX Composer, ATI Render Monkey

IDEs for HLSL and GLSL, GPU programming

Autodesk Maya, 3DMax

Procedural assets creation
through our own plug
-
ins

Loki, Xerces, Boost

Utilities Libraries

CGAL

Computational Geometry
Library

Intel OpenCV

Real time computer vision

fmod

Sound library

Chromium, DMX, Sage

Distributed rendering Libraries

Serial pipeline. Increase performance by
increasing CPU speed
.

Moore’s law computational gains
have not been achievable via
faster clock speeds for the
past 8 years.


Multicore computing is the tactic


New computing architectures


New algorithmic methods


New software engineering


New systems designs

Sony/Toshiba/IBM

Cell BE Processor

1 PPU, 8 SPU’s per chip

Intel Larrabee Processor

32 x86 cores per chip

IBM System z processor

4 cores 1 service procesor

nVidia Fermi GPGPU

16 units with 32 cores each

Ogre3D Scene graph

Open Source Libraries


needs work for adding data level parallelism

The Scalable City

Next Stage

Technology Infrastructure

Abstract physics to use multiple
physics libraries (ODE, Bullet,
etc.) Replace computational
bottlenecks in these libraries
with data parallel operations.

Computational Geometry
Library

Intel OpenCV Real time
computer vision

Fmod Sound library

Cell Processors
compute

Dynamic Assets

Input Data

Output Data

Data Parallel

n threads + SIMD

Thread Barrier

ERSATZ
ENGINE

Input Data

Output Data

Convert assets to data parallel
meshes after physics
transformation, boosts rendering
~33%

Ogre3D Scene graph

Max’s out at about 12 clients for world as complex as Scalable City

The Scalable City

Next Stage

Technology Infrastructure

Abstract physics to use multiple
physics libraries (ODE, Bullet,
etc.) Replace computational
bottlenecks in these libraries
with data parallel operations.

Computational Geometry
Library

Intel OpenCV Real time
computer vision

Fmod Sound library

Cell Processors
compute

Dynamic Assets

Input Data

Output Data

Data Parallel

n threads + SIMD

Thread Barrier

ERSATZ
ENGINE

Input Data

Output Data

Convert assets to data parallel
meshes after physics
transformation, boosts rendering
~33%

DarkStar
Server

Systems are not designed for interaction of 10,000’s of dynamic objects

Even a handful of complex objects overload dynamics computation.

Extensive re
-
engineering makes to provide capability and use hybrid
multicore infrastructure


defeating their general purpose platform

Open Sim
Server

ERSATZ
ENGINE

Real Xtend
or Linden
Client

Challenges & Approach


Software Engineering Challenges:


SC: Large, Complex, with many behaviors.


Code consisted of tightly coupled systems not conducive to
separation into client and server.


Multi
-
user support takes time, and features will be expanded by
others simultaneously!


Basic Approach
-

Agile methodology
:


Incrementally evolve single
-
user code into a system that can be
trivially made multi
-
user in the final step.


Always have a running and testable program.


Test for unwanted behavioral changes at each step.


Allows others to expand features simultaneously.

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

Data Structures

BlackBoard

(Singleton)

Player

House Piece

Landscape
Manager

Camera

Audio

Clouds

Rendering

User Input

Physics

Inverse
Kinematics

Road Animation

House Lots

Visual
Component

MeshHandler

Abstracting Client & Server
Object Representations


Server: Visual Component


Visual asset representation on the server side


Consolidates task of updating clients


Used for house pieces, cyclones, landscape, roads, fences, trees,
signs (animated, static, dynamic).


Dynamic, run
-
time properties control update behavior



Client: Mesh


Mesh properties communicated from Visual Component


Used to select rendering algorithm


Groups assets per city for quick de
-
allocation

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

2.
All data access paths must be segmented into c/s


Cross
-
boundary calls recast as buffered communication.

Data Access Paths


Systems access world state via the
Blackboard (singleton pattern)



After separating into Client & Server
Blackboard, Server systems must be weaned
off of Client Blackboard and vice versa.



Cross
-
boundary calls recast as buffered
communication.

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

2.
All data access paths must be segmented into c/s


Cross
-
boundary calls recast as buffered communication.

3.
Initialization & run loop separation


Dependencies on order must be resolved.

Initialization & Run
-
loop

Initialize Graphics

Initialize Physics

Init Loading Screen

Load Landscape Data

Initialize Clouds

Create Roads

Place Lots

Place House Pieces

Place Player

Get Camera Position

Initialize Graphics

Init Loading Screen

Initialize Clouds

Get Camera Position

Initialize Physics

Load Landscape Data

Create Roads

Place Lots

Place House Pieces

Place Player

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

2.
All data access paths must be segmented into c/s


Cross
-
boundary calls recast as buffered communication.

3.
Initialization & run loop separation


Dependencies on order must be resolved.

4.
Unify cross
-
boundary comm. to one subsystem.


This will interface with network code in the end.

Unify Communication

Single buffer, common format, ordered messages

Communicate in one stage: solve addiction to immediate answers

MovePlayer

Animations

ReadClient

Physics/IK

WriteClient

Transforms

Render

ReadServer

UserInput

WriteServer

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

2.
All data access paths must be segmented into c/s


Cross
-
boundary calls recast as buffered communication.

3.
Initialization & run loop separation


Dependencies on order must be resolved.

4.
Unify cross
-
boundary comm. to one subsystem.


This will interface with network code in the end.

5.
Final separation of client & server into two programs


Basic networking code allows communication

Separate

Two programs, plus basic synchronous networking code

Loops truly asynchronous (previously one called the other)

Step by Step Conversion

1.
Data
-
structure focused: is it client or server?


Some data structures may have to be split.

2.
All data access paths must be segmented into c/s


Cross
-
boundary calls recast as buffered communication.

3.
Initialization & run loop separation


Dependencies on order must be resolved.

4.
Unify cross
-
boundary comm. to one subsystem.


This will interface with network code in the end.

5.
Final separation of client & server into two programs


Basic networking code allows communication

6.
Optimize!


New configuration changes behavior even for single player


Experience


Positives


Smooth transition to multi
-
user possible


All features/behaviors retained or explicitly disabled


Feature development continued successfully during
transition (performance, feature, and behavioral
enhancements on both client and server side, CAVE
support, improved visuals, machinima engine, etc).


Negatives


Resulting code structure not ideal for client/server
application (no MVC framework, some legacy structure).


Feature development and client/server work sometimes
clash, require re
-
working in client/server fashion.

Initial Optimizations

Basic issues addressed in
converting to a massively multi
-
user
networked model

Multi
-
User Load Challenges


Communications


Graphics Rendering


Geometry Processing


Shaders


Rendering techniques


Dynamics Computation


Physics


AI or other application specific behaviors


Animation

Multi
-
User Load Challenges


Communications


Graphics Rendering


Geometry Processing


Shaders


Rendering techniques


Dynamics Computation


Physics


AI or other application specific behaviors


Animation

Communication


In a unified system, subsystems can share
data and communicate quickly.



In a Client/Server model, subsystems on
different machines have to rely on
messages sent over the network


Data marshalling overhead


Data unmarshalling overhead


Bandwidth/latency limitations

New Client Knowledge Model


Stand
-
Alone version had all cities in memory


All clients received updates for activity in all cities


Increased memory & bandwidth use as environment scales



Now: Clients only given cities they can see


City assets dynamically loaded onto client as needed


Reduces the updates the clients need



Further Challenge: Dynamically loading cities
without server or client hiccups
.

Communication Challenges


More Clients leads to:


More activity


Physics object movements


Road/Land Animations


House Construction


More communication


Per client due to increase in activity


More clients for server to keep up to date


Server communication = activity x clients!



Dynamically loading large data sets (cities in this case) without
server or client hiccups


Communication Subsystem


Code
-
generation for data marshalling


Fast data structure serialization


Binary transforms for cross
-
platform


Token or text
-
based too slow


Endian

issues resolved during serialization


Tested on z10, Intel


Asynchronous reading and writing


Dedicated threads perform communication


Catch up on all messages each game cycle

Reducing Data Marshalling
Time


Reduce use of per
-
player queues:


Common messages sent to a queue associated
with the event’s city


Players receive buffers of each city they see, in
addition to their player
-
specific queue.


Perform buffer allocation, data marshalling, &
copy once for many players.


Significantly reduces communication overhead
for server.

Preventing Stutters


Send smaller chunks of data


Break up large messages



Incrementally load cities as a player
approaches them


Space out sending assets over many cycles


Large geometry (landscape) subdivided


If player arrives, finish all transfers



Prevent disk access on client


Pre
-
load resources