MANTLE FOR DEVELOPERS

spongemintΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

57 εμφανίσεις

MANTLE FOR DEVELOPERS

JOHAN ANDERSSON


TECHNICAL DIRECTOR

FROSTBITE

ELECTRONIC ARTS


Simplify advanced development




Improve performance




Enable developers to innovate




Challenge the status quo


Mantle?

Control

GPU performance

CPU performance

Programmability

Platforms

Developer impact areas

Explicit Model:
Mantle

Traditional Model:

Black Box


Middle
-
ground abstraction


compromise
between performance & “usability”


Hidden resource
memory & state


Resource CPU access
tied to device context


Driver analyzes & synchronizes implicitly




Thin low
-
level abstraction to expose how
hardware works


App explicit memory
management


Resources are
globally accessible


App explicit resource
state transitions




Control

New model


Tell when render target will be used as a texture


And many more resource state transitions



Don’t destroy resources that GPU is using


Keep track with fences or frames



Manual dynamic resource renaming


No DISCARD for driver resource renaming



Resource memory tiling



Powerful validation layer will help!













App responsibility

Control


App high
-
level decisions & optimizations


Has full scene information


Easier to optimize performance & memory



Flexible & efficient memory management


Linear frame allocators


Memory pools


Pinned memory



Reduced development time


For advanced game engines & apps


Easier to get to target performance & robustness







Explicit control enables

Control




Light
-
weight driver


Easier to develop & maintain


Reduced CPU draw call overhead





Transient resources


Alias render targets within frame


Major memory savings


No need to pre
-
allocate everything





Explicit control enables

Control

CPU performance

Control

CPU
perf



Descriptor sets



Monolithic pipelines



Command buffers


Core concepts


Table with resource references to bind to
graphics or compute pipeline






Replaces traditional resource stage binding


Major performance & flexibility advantage


Closer to how the hardware works



App managed
-

lots of strategies possible!


Tiny
vs

huge sets


Single
vs

multiple


Static
vs

semi
-
static
vs

dynamic





Example 1: Single simple dynamic descriptor set


Bind everything you need for a single draw call


Close to DX/GL model but share between stages

Descriptor sets

CPU
perf

Link

Sampler

Image

Memory

VertexBuffer

(VS)

Texture0 (VS+PS)

Constants (VS)

Texture1 (PS)

Texture2 (PS)

Sampler0 (VS+PS)

Dynamic descriptor set


Table with resource references to bind to
graphics or compute pipeline






Replaces traditional resource stage binding


Major performance & flexibility advantage


Closer to how the hardware
works



App managed
-

lots of strategies possible!


Tiny
vs

huge sets


Single
vs

multiple


Static
vs

semi
-
static
vs

dynamic





Example 2: Reuse static set with nesting


Reduce update time & memory usage

Descriptor sets

CPU
perf

Link

Sampler

Image

Memory

Constants (VS)

Link

Dynamic descriptor set

Texture3 (PS)

Texture4 (PS)

Sampler0 (VS+PS)

Texture2 (PS)

Texture1 (PS)

Sampler1 (PS)

Static descriptor set

VertexBuffer

(VS)

Texture0 (VS+PS)

CPU
perf


Shader

stages & select graphics state combined into single object


No runtime compilation or patching needed!


Significantly less runtime overhead to use



Supports parallel building & caching


Fast loading times



Usage & management up to the app


Static
vs

dynamic creation


Amount of pipelines


State usage




Monolithic pipelines



IA

VS

HS

DS

Tessellator

GS

RS

PS

DB

CB

Pipeline state


Issue pipelined graphics & compute commands into a command buffer


Bind graphics state, descriptor sets, pipeline


Draw calls


Render targets


Clears


Memory transfers


NOT: resource mapping



Fully independent objects


Create multiple every frame


Or pre
-
build up front and reuse




Command buffers

CPU
perf

Render

Driver Render

Game

Render

Game

Game

Render


Automatically extracts parallelism out of most apps



Doesn’t scale beyond 2
-
3 cores



Additional latency



Driver thread often bottleneck


can collide app threads




CPU 0

CPU 1

CPU 2

CPU
perf

DX/GL parallelism

Render

Game

Render

Game

Game

Render


App can go fully wide with its rendering


minimal latency





Close to linear scaling with CPU cores



No driver threads


no overhead


no contention



Frostbite’s approach on all consoles


and on PC with Mantle!


Render

Render

Render

Render

Render

Render

Render

Render

Render

CPU 0

CPU 1

CPU 2

CPU 3

CPU 4

CPU
perf

Parallel dispatch with Mantle

GPU performance

CPU performance

GPU
perf


Thanks to improved CPU performance


CPU
will rarely be a bottleneck for the GPU


CPU could help GPU more:


Less brute force rendering


Improve culling




Shader

pipeline object


driver optimizations


Can optimize with pipeline state knowledge


Can optimize across all
shader

stages




Resource states


Gives driver a lot more knowledge & flexibility


Apps can avoid expensive/redundant transitions,
such as surface decompression




Expose existing GPU functionality


Quad & Rect
-
lists


HW
-
specific MSAA & depth data access


Programmable sample patterns


And more..



GPU optimizations



Modern GPUs are heterogeneous machines
with multiple engines


Graphics pipeline


Compute pipeline(s)


DMA transfer


Video encode/decode


More…



Mantle exposes
queues

for the engines +
synchronization primitives



Queues

GPU
perf

Graphics

Compute

DMA

GPU

. . .

Queues



Queues

GPU
perf

Graphics

Compute

DMA

GPU

. . .

Queues



Async

DMA transfers


Copy resources in parallel with graphics or
compute








Queue use cases

GPU
perf

Render

Other render

Use copy

Copy

Graphics

DMA


Async

DMA transfers


Copy resources in parallel with graphics or
compute



Async

compute together with graphics


ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units







Queue use cases

GPU
perf

GBuffer

Shadowmap

0

Shadowmap

1

Final lighting

Non
-
shadowed lighting

Compute

Graphics


Async

DMA transfers


Copy resources in parallel with graphics or
compute



Async

compute together with graphics


ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units






Multiple compute kernels collaborating


Can be faster than über
-
kernel


Example: Compute geometry backend & compute
rasterizer



Queue use cases

GPU
perf

Compute Geometry

Compute 0

Compute 1

Graphics

Ordinary Rendering

Compute Rasterizer


Async

DMA transfers


Copy resources in parallel with graphics or
compute



Async

compute together with graphics


ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units






Multiple compute kernels collaborating


Can be faster than über
-
kernel


Example: Compute geometry backend & compute
rasterizer



Compute as frontend for graphics pipeline


Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline



Queue use cases

GPU
perf


Game engines will build large GPU job graphs


Move away from single sequential submission


Just as we already have done on CPU



Draw0

Draw1

Draw2

Process0

Compute

Graphics

Process1

Process0

GPU performance

Programmability

Programmability


Explicit control of GPU queues and synchronization, finally!


Implement your own Alternate
-
Frame
-
Rendering


Or something more exotic..



Use case: Workstation rendering with 4
-
8 GPUs


Super high
-
quality rendering & simulation


Load balance graphics & compute job graphs across GPUs


20
-
40
TFlops

in a single machine!



Use case: Low
-
latency rendering


Important for VR and competitive games


Latency optimized GPU job graph scheduling


VR: Simultaneously drive 2 GPUs (1 per eye)





Explicit Multi
-
GPU



Programmability


Command buffer predication & flow control


GPU affecting/skipping submitted commands


Go beyond
DrawIndirect

/
DispatchIndirect


Advanced variable workloads


Advanced culling optimizations





Write occlusion query results into GPU buffer


No CPU roundtrip needed


Can drive predicated rendering


Or use results directly in
shaders

(lens flares)


New mechanisms

Programmability


Mantle supports
bindless

resources


Shaders

can select resources to use instead of
static binding from CPU


Extension of the descriptor set support



Key component that will open up a lot of
opportunities!



Examples


Performance optimizations


less data to update


Logic & data structures that live fully on the GPU


Scene culling & rendering


Material representations


Deferred shading


Raytracing


Bindless resources

Programmability

Platforms


Mantle gives us strong benefits on Windows
today


Console
-
like performance & programmability on both Windows 7 and Windows 8


For us, well worth the
dev

time!



DX & GL are the industry standards


Needed for platforms that do not support Mantle


Needed by
devs

who do not want/need more control


Have to have fallback paths for GL/DX, but not limit oneself to it



Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations


PS4 graphics API has great programmability & performance as well


Share concepts, methods & optimization strategies








Today

Platforms


Want to see Mantle on Linux and Mac!


Would enable support for our full engine & rendering


Significantly easier to do efficient renderer with Mantle than with OpenGL



Use cases:


Workstations


R&D


Not limited by WDDM


Games



Mantle + SteamOS = powerful combination!



Linux & Mac

Platforms


Mobile architectures are getting closer in capabilities to desktop GPUs



Want graphics API that allows apps to fully utilize the hardware


Power efficient


High performance


Programmable




Major opportunity with Mantle


leap frog GL4, DX11


For mobile SoC vendors


For Google and Apple



Mobile

Platforms


Mantle is designed to be a thin hardware abstraction


Not tied to AMD’s GCN architecture


Forward compatible


Extensions for architecture
-

and platform
-
specific functionality



Mantle would be a much more efficient graphics API for other vendors as well


Most Mantle functionality can be supported on today’s modern GPUs



Want to see
f
uture version of Mantle supported on all platforms and on all modern GPUs!


Become an active industry standard with IHVs and ISVs collaborating


Enable us developers to innovate with great performance & programmability everywhere



Multi
-
vendor?

Platforms

Platforms


Mantle support is in development


Core renderer (closer to PS4 than DX11)


Implement all rendering techniques used in BF4 (many!)


CPU optimizations (parallel dispatch, descriptor sets)


GPU optimizations (minimize transitions, MSAA)


R&D for advanced GPU optimizations


Memory management


Multi
-
GPU support


~2 months of work



Update targeting late December

Battlefield 4

Frostbite


Very different rendering
compared to BF4




Frostbite Mantle renderer will
work out of the box



Focus on APU performance



Plants vs Zombies: Garden Warfare

Frostbite


All Frostbite games designed with Mantle


15 games in development across all of EA



Advanced Mantle rendering & use cases


Lots of exciting R&D opportunities!



Want multi
-
vendor & multi
-
platform support!

Future

Frostbite

THE END

Email:
repi@dice.se

Web:
http://frostbite.com

Twitter:
@repi