Cross-Platform Development Best Practices For - Microsoft

billowycookieUrban and Civil

Nov 29, 2013 (3 years and 6 months ago)

168 views

Cross Platform Development Best
Practices

Matt Lee, Kev Gee

Microsoft Game Technology Group

Agenda

Code Considerations

CPU Considerations

GPU Considerations

IO Considerations

Content Considerations

Data Build System

Geometry Formats

Texture Formats

Shaders

Audio Considerations

Compiler Comparison

VS 2005 front end used for both platforms

Preprocessor benefits both platforms

Debugger experience is the same

Full 2005 IDE support coming


Xbox 360 optimizing back end added with
XDK install

Single solution / MSBuild file can target both
platforms

PC CPUs

Intel Pentium D / AMD Athlon64 X2

Programming Model

2 Cores running @ around 3.20 GHz

12
-
KB Execution trace cache

16
-
KB L1 cache,
1 MB L2 cache

Deep Branch Prediction

Dynamic data flow analysis

Speculative Execution

Little
-
endian byte ordering

SIMD instructions

Quad Core announced for early 2007

360 Custom CPU

Custom IBM Processor

3 64
-
bit PowerPC cores
running at 3.2 GHz

Two hardware threads per
core

32
-
KB L1 instruction cache
& data cache, per core

Shared 1
-
MB L2 cache

128
-
byte cache lines on all
caches

Big
-
endian byte ordering

VMX 128 SIMD

Lots of Registers

Performance Tools

Profiling approaches are very similar between
PC and Xbox 360

PIX for Xbox 360 & PIX for Windows

Being developed by the same team now

Use instrumented tools on Xbox 360

XbPerfView / Tracedump

Xbox 360 does not have a sampling profiler yet

Use PC profiling tools

Intel VTune / AMD Code Analyst / VS Team System
Profiler

Attend the Performance Hands on training!

Focus Your Efforts

Use performance tools to guide work

Areas where we have seen platform
specific efforts reap rewards

Single Data Pass engine design

High Frequency Game API Layers

Use your profiler tools to target the hot spots

Math Library
-

Bespoke vs XGMath vs D3DXMath


Impact on Code Design

Designing Cross platform APIs

Use of virtual Functions

Parameter passing mechanisms

Pass by reference vs. pass by value

Typedef vector types and intrinsics

Math Library Design Case Study

Use of inlining



Use of Virtual Functions

Be careful when using virtual functions to
hide platform differences

Virtual function performance on Xbox 360

Adds branch instruction which is always
mispredicted!

Compiler limited in optimizing these

Make a concrete implementation for Xbox
360

Avoid virtual functions in inner loops

Cross Platform Render Example

IRenderSystem


Semi
-
Abstract

Base Class

D3D9


Overrides

Virtual Base

Xbox 360



Concrete


Implementation

D3D10



Overrides

Virtual Base

Cross Platform Render Example (ctd.)

class IRenderSystem

{


……


public:


#if !defined(_XBOX)



virtual void Draw()=0;


#else



void Draw();


#endif

};

void IRenderSystem::Draw()

{


// 360 Implementation


……

}


D3D9 & D3D10
implementations subclass
for specialization

Beware Big Constructors

Ctors can dominate execution time

Ctors often hidden to casual observer

Copy ctors add objects to containers

Arrays of C++ objects are constructed

Overloaded operators may construct temporaries

Consider: should ctor init data?

Example: matrix class zeroing all data

Prefer array initialization = { … }

Inlining

Careful inlining is in general a Good Thing

Plan to spend time ensuring the compiler
is inlining the right stuff

Use Perf Tools such as VTune / Trace recorder

Try the “inline any suitable” option

Enable link
-
time code generation

Consider profile
-
guided optimization

Use __forceinline only where necessary

Consider Passing Native Types by
Value

Xbox 360 has large registers

64 bit Native PC does too

Pass and return these types by value

int, __int64, float

Consider these types if targeting SSE / VMX

__m128 / __vector4, XMVECTOR, XMMATRIX

Pass structs by pointer or reference

Help the compiler using _restrict

Math Library Header (Xbox 360)

#if defined( _XBOX )


#include <ppcintrinsics.h>

#include <vectorintrinsics.h>


typedef __vector4 XVECTOR;


typedef const XVECTOR XVECTOR_PARAM;

typedef XVECTOR& XVECTOR_OUTPARAM;


#define XMATHAPI inline


#define VMX128_INTRINSICS


#endif

Pass by value

Math Library Header (Windows)

#if defined( _WIN32 )


#include <xmmintrin.h>


typedef __m128 XVECTOR;


typedef const XVECTOR& XVECTOR_PARAM;

typedef XVECTOR& XVECTOR_OUTPARAM;


#define XMATHAPI inlin
e


#define SSE_INTRINSICS


#endif

Pass by
reference

Math Library Function

XVECTOR XMATHAPI XVectorAdd( XVECTOR_PARAM vA,







XVECTOR_PARAM vB )

{

#if defined( VMX128_INTRINSICS )



return __vaddfp( vA, vB );


#elif defined( SSE_INTRINSICS )



return _mm_add_ps( vA, vB );


#endif

}



Threading

Why Multithread?

Necessary to take full advantage of modern
CPUs

Attend the Multi
-
threading talk later today

Covers synchronization prims and lockless sync
methods

See Also:

Talks from Intel and AMD (GDC2005 / GDC
-
E)

OpenMP


C, not C++, useful in limited circumstances

Concur


C++, see

http://microsoft.sitestream.com/PDC05/TLN/TLN309_fi
les/Default.htm#nopreload=1&autostart=1

D3D Architectural Differences

D3D9 draw call cost is higher on Windows
than on Xbox 360

360 is optimized for a Single GPU target

D3D10 improves draw call cost by design on
Windows

Very important to carefully manage the
number of batches submitted

This can have an impact on content creation

This work will help with 360 performance too

Agenda

Code Considerations

CPU Considerations

GPU Considerations

IO Considerations

Content Considerations

Data Build System

Geometry Formats

Texture Formats

Shaders

Audio Considerations

PC GPUs

Wide variety of available Direct3D9 H/W

CAPs and Shader Models abstract over feature
differences

GPUs that are approximately equivalent performance to the
Xbox 360 GPU

ATi X1900 / NVidia 7800 GTX

Shader Model 3.0 support


Direct3D10 Standardizes feature set

H/W Scales on performance instead

Xbox 360 Custom GPU

Direct3D 9.0+ compatible

High
-
Level Shader Language (HLSL) 3.0+ support

10 MB Embedded DRAM

Frame Buffer with 256 GB/sec bandwidth

Hardware scaling for display resolution matching

48 shader ALUs shared between pixel and vertex shading
(unified shaders)

Up to 8 simultaneous contexts (threads) in
-
flight at once

Changing shaders or render state can be cheap, since a new context
can be started up easily

Hardware tesselator

N
-
patches, triangular patches, and rectangular patches

For non continuous / adaptive cases trade memory for
this feature on PC systems

Explicit Resolve Control

Copies surface data from EDRAM to a texture in
system memory

Required for render
-
to
-
texture and presentation
to the screen

Can perform MSAA sample averaging or resolve individual
samples

Can perform format conversions and biasing

Cannot do rescaling or resampling of any kind

This can Impact your Xbox 360 engine design as
it adds an extra step to common operations.

Agenda

Code Considerations

CPU Considerations

GPU Considerations

IO Considerations

Content Considerations

Geometry

Textures

Shaders

Audio data

Use Native File I/O Routines

Only native routines support key features:

Asynchronous I/O

Completion routines

Prefer
CreateFile

and
ReadFile

Guaranteed as fast or faster than any other
alternatives

Avoid
fopen
,
fread
, C++ iostreams

Use Asynchronous File I/O

File read/write operations block by default

Async operations allows the game to do
other interesting work

CreateFile with
FILE_FLAG_OVERLAPPED

Use FILE_FLAG_NO_BUFFERING, too

Guarantees no intermediate buffering

Use OVERLAPPED struct to determine when
operation is complete

See
CreateFile

docs for details


Memory Mapped File I/O

Fastest way to load data on Windows

However, the 32 bit address space is getting tight

This is a great 64 bit feature add!



Memory Mapped I/O not supported on 360

No HDD backed Virtual Memory management
system



XInput is the same API for Xbox 360 and Windows

The Microsoft universal controller is a reference
design which can be leveraged by other hardware
manufacturers

XP Driver available from Windows Update

Support is built in to Xbox 360 and Windows Vista


Universal Gaming Controller

Agenda

Code Considerations

CPU Considerations

GPU Considerations

IO Considerations

Content Considerations

Data Build System

Geometry Formats

Texture Formats

Shaders

Audio Considerations

Data Build System

Add a data build / processing phase to your
production system

Compile, optimize and compress data according to
multiple target platform requirements

Easier and faster to handle endian
-
ness and other format
conversions offline

Data packing process can occur here too

Invest time in making the build fast

Artists need to rapidly iterate to make quality content

Incremental builds can really help reduce the buildtime

Try the XNA build tools

Copies of XNA build CTP are available NOW!

Geometry Compression

Offline Compression of Geometry

Provides wins across all platforms

Disk I/O wins as well as GPU wins

The compression approach is likely to be target
specific

PC is usually a superset of the consoles in this
area

D3D9 CAPs / limitations to consider

16 bit Normals
-

D3DDECLTYPE_FLOAT16_2

Compressing Textures

Wide variety of Texture Compression
Tools

ATI Compressinator

DirectX SDK DDS tools

NVIDIA


Photoshop DDS Export

Compression tools for 360 (xgraphics.lib)

Supports endian swap of texture formats


Build your own too!

Make them fit your content.

Texture Formats

DXT* / DXGI_FORMAT_BC*

BC == Block Compressed

Standard DXT* formats across all platforms


DXN / DXGI_FORMAT_BC5 / BC5u

2
-
component format with 8 bits of precision per
component

Great for normal maps

DXT3A / DXT5A

Single component textures made from a DXT3/DXT5
alpha block

4 bits of precision

Xbox 360 / D3D9 Only

Texture Arrays

Texture arrays

generalized version of cube maps

D3D9 emulate using a texture atlas

Xbox 360

Up to 64 surfaces within a texture, optional MIPmaps for each
surface

Surface is indexed with a [0..1] z coordinate in a 3D texture
fetch

D3D10 supports this as a standard feature

Up to 512 surfaces within a texture

Bindable as rendertarget, with per
-
primitive array index
selection

Custom Vertex Fetch / Vertex Texture

D3D9 Vertex Texture implementations use
intrinsics

tex2dlod()


360 supports explicit instructions for this


D3D10 supports this as a standard feature

Load() from buffer (VB, IB, etc.) at any stage

Sample() from texture at any stage

Effects

D3DX FX and FX Lite co
-
exist easily

#define around the texture sampler differences


Preshaders are not supported on FX Lite

We advise that these should be optimized to
native code for D3D9 Effects

HLSL Development

Set up your engine and tools for rapid
shader development and iteration

Compile shaders offline for performance,

maybe allow run
-
time recompilation during
development

Be careful with shader generation tools

Perf needs to be considered

Schedule / Plan work for this

Cross
-
Platform HLSL Consideration

Texture access instruction considerations

Xbox 360 has native
tfetch
/
getWeights

features

Constant texel offsets (
-
8.0 to 7.5 in 0.5 increments)

Independent of texture size


Direct3D 10 supports integer texture offsets when
fetching

Direct3D 10 supports getdimensions() natively

Equivalent to getWeights


Direct3D 9 can emulate tfetch &
getWeights

behavior using a shader constant for texture
dimensions

HLSL Example


float2 g_invTexSize = float2( 1/512.0f, 1/512.0f);


float2 getWeights2D( float2 texCoord )


{


return frac( texCoord / g_invTexSize );

}


float4 tex2DOffset( sampler t, float2 texCoord, float2 offset )

{


texCoord += offset * g_invTexSize;


return tex2D( t, texCoord );

}

Shader management

Find a balance between übershaders and specialized
shader libraries

Dynamic/static branching versus static compilation

Small shader libraries can be built and stored inside a single
Effect file

One technique per shader configuration


Larger shader libraries

Hash table populated with configurations

Streaming code can load could shader groups on demand

Profile
-
guided content generation

Avoid compiling shaders at run time

Compiled shaders compress very well

Audio Considerations

XACT


(Microsoft Cross
-
Platform Audio Creation Tool)

API and authoring tool parity:


author once, deploy to both platforms

Primary difference = wave compression

ADPCM on Windows vs. Xbox 360 native XMA support

XMA: controllable quality setting (varies, typically ~6
-
14:1)

ADPCM: Static ~3.5:1 compression

Likely need to trade memory for bit rate.

On Windows, can use hard disk streaming to balance
lower compression rates if needed

Call To Action!

Design your games, engines and production
systems with cross platform development in mind

(PC / Xbox 360 / Other)


Invest in making your data build system fast

Take advantage of each platforms strengths

Target a D3D10 content design point and fallback to
D3D9+, D3D9, …


Provide feedback on how we can make production
easier

Attend the XACT, HLSL, SM4.0 and Performance
Hands On Labs

Questions?