LOD Case Study & Application

yellvillepotatocreekΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

75 εμφανίσεις

LOD Case Study &

Application


Robert Huebner

Nihilistic Software

innerloop@nihilistic.com


Speaker Bio


President and Director of Technology for
Nihilistic Software


Currently working on “Starcraft:Ghost” for
Blizzard Entertainment


Previous credits include Vampire: The
Masquerade, Jedi Knight: Dark Forces 2, Descent


International Game Developer’s Association
Board Member (IGDA)


www.igda.org



Game Developer’s Conference (GDC)
Advisory Board

Purpose of Talk


Review some of the topics and ideas
presented earlier in the course


Try to explain what worked for us, and
what didn’t


This talk is a “case study in progress”
for our current Gamecube and XBOX
work


Still tweaking and changing some LOD
schemes

Starcraft: Ghost

(needs LOD too!)

Goal of LOD


Back on Pre
-
3D
-
hardware PCs, we would
spend a LOT of CPU to avoid drawing a few
triangles


The cost of rendering was much higher


We were willing to spend significant CPU to
eliminate a single triangle


Systems like ROAM, view
-
dependent LOD


Current hardware renders fast, so we only
spend CPU if we can discard a
lot

of triangles


Or if it saves us state changes, texture fetches,
memory bandwidth, or other costly processing

RAM

General Block Diagram

Vertex Unit

Pixel Unit

CPU

GPU

FIFO

Texture Mem

Frame buffer

Data Flow Management


Managing data flow and bandwidth is
an important performance metric


Each platform has different
architectures


So our choice of LOD differs for each
platform


Each main data path can utilize
different LOD techniques to increase
throughput


We try to do this without wasting CPU or
memory resources, which are also scarce

RAM

Where Do We Use LOD?

Vertex Unit

Pixel Unit

CPU

GPU

FIFO

Texture Mem

Framebuffer

Classes of Game LOD


The design of most console systems is
dominated by three data paths:


The RAM
-
>GPU path and GPU throughput
is managed with geometric LOD


The GPU
-
>Framebuffer path is managed
via shader LOD


The Texture
-
>GPU path is managed with
MIP
-
mapping and shader LOD

Games Vs. Research


The biggest problems we run into
when adopting academic LOD
systems to game use are:


Dealing with additional properties of
meshes


Vertex normals, texture, UV coordinates, etc.


Avoid the need for general
-
purpose
processing at the vertex level


Maintaining data in a format that our
hardware can process directly


Runtime Selection


In our engine, all LOD processing for
a given object is driven by a single
value


The LOD value is stored both as a float
(0.0 to 1.0) and as a discrete BYTE (1..X)


Each sub
-
system that wants to do LOD
can use either version of the LOD metric
to control behavior


Runtime Selection


The LOD metric is stored for each object or
“sector” (world section)


Based on many factors (highest to lowest
weight)


Estimated screen space (size / distance)


Overall performance or estimated triangle counts
for scene (scene metric)


Current player control mode (interact or cutscene,
combat or stealth)


“Importance” of the object (active AI vs. inactive
AI)


Viewing angle for terrain blocks

Geometric LOD


Geometric LOD is the most interesting &
complex topic for games


There are three main goals we try to
achieve with geometric LOD:


Send less data to the GPU to avoid exceeding its
throughput


Utilize less bus bandwidth moving data into the
graphics unit


Try achieve a constant average triangle size to
balance load between vertex and pixel units

Compiled Models


Most game engines are constructed to
load “compiled” models


Vertex data is adjusted to match native
format


Triangles are batched to minimize state
changes and fit within hardware limits


Optimum strips are constructed


DisplayLists/Pushbuffers are compiled


Compiled models are highly platform
-
specific

Basic LOD Choices


Based on platform specifics, we select
a simple half
-
edge collapse operation
as the basis of our LOD


Minimizes memory use, vertex data
remains unchanged


Minimizes dynamically changing vertex
data, which minimizes bandwidth & FIFO
space


Allows us to address problems with
property discontinuities

Calculating LOD


We perform all our LOD computation off
-
line
during model compilation


We offer the artists a choice of LOD metric to use
when computing automatic LOD levels


We chose an LOD scheme that is based on
half
-
edge collapse operations only


Less memory, more static data set


The LOD is constructed based on edge score


Each edge in the model is given a score based on
its length, curvature, or other factors


Vertices are also given scores to control which
endpoint is preserved during the edge collapse

Calculating LOD


We begin by building an augmented
“collapse vertex” structure for the model


Links to neighbor verts (edges)


Links to associated faces


Link and score of “least cost” edge


Identification of “border” or “seam” verts


Links to “paired” verts


Links to the actual “render” vertices


This process happens after vertices are split
due to texture/normal/UV changes


This means one collapse vertex can be linked to
multiple “export” vertices

Calculating LOD


We add game
-
specific restrictions to LOD


Either adjust the vertex score, exempt it entirely,
or link its removal to that of another vertex


Texture or UV mapping “seams” due to
composited textures


Vertex normal discontinuities (hard edge)


Unpaired edges


Artist influence (blind vertex data in Maya)


We also use domain
-
specific knowledge to
adjust scoring algorithm


Terrain blocks use z (height) differential as main
score factor


Shadow/collision LOD ignores texture/UV seams

Calculating LOD


Once we have a full set of edge scores, we
select the least cost edge and remove its
least cost vertex


Half
-
edge collapse to the higher
-
cost endpoint


Record the operation in fields in our underlying
data


Remove degenerate triangles


Re
-
compute all edge costs in neighboring
triangles


Repeat until only non
-
collapsible edges remain

Note on quality


Our reduction and scoring system is simple,
but accuracy suffers


Because of this, we have found that the last 10%
or so of the collapse operations are judged by
artists as being unsatisfactory


We allow the export process to specify some
control over the quality


Limit on the maximum cost collapse that will be
executed (default excludes about 10% of
operations)


Object
-
specific tweaks to the computed LOD
factor

Calculating LOD


The results of this operation are two new
data fields in our renderable vertex
structure


The “collapseOrder” field gives the ordering of the
collapse operation


The “collapseTo” field is the destination vertex for
the edge collapse operation that removes this
vertex from the mesh


Using these fields, we can export the LOD in
various ways in the final compilation


Since the LOD metrices are all export
-
side,
we can adopt improvements periodically
without affecting run
-
time data


Just re
-
export to get benefits of better reduction

Discrete LOD


Discrete LOD is still the workhorse of game
mesh LOD


Each level can undergo heavy pre
-
processing for
strip
-
ordering or displaylist creation


Artists can hand
-
tune the reduction for visual
accuracy


Can optionally replace both vertices and index
lists, or just indices to save memory


We represent discrete LOD by loading
multiple sets of face index lists, or separate
“index buffers”


Vertex data is unchanged

Exporting Discrete LOD


We can use our computed data to export
any number of discrete LOD steps


Pick a desired number of vertices for the LOD level


Calculate how many collapse operations will reach
this level


Build an indexed ordering for the mesh


For any vertex with a “collapseOrder” value lower
than the # of operations, replace its index with its
“collapseTo” index


Repeat until a vertex is reached that has a higher
collapseOrder field


Process each index ordering for strips &
cache coherency, create packets, etc.

Discrete Blended LOD


To minimize “popping” that occurs during
the LOD switch, we can use image
-
space
blending


When an object needs to change between discrete
LOD levels, it is queued for blending


During blending, the object is actually rendered
twice, at both LOD levels, and the alpha values are
cross
-
faded


In practice, we find this is useful for larger
objects or terrain blocks, but not useful for
typical models



Continuous LOD


Continuous LOD can be an effective
extension to discrete
-
LOD for games


Reductions with greater granularity can
avoid visible “popping”


It can also save memory compared to
storing a high number of discrete levels


Our continuous implementation is
based mainly on half
-
edge collapse


This is the best way to keep our data
static

CLOD Implementation


To implement run
-
time CLOD, what
we’re effectively doing is moving our
off
-
line creation of discrete LOD index
lists to the run
-
time engine


To save memory, we re
-
order vertices in
order of their “collapseOrder” field


We export a separate parallel array to
contain the “collapseTo” index for each
vertex

CLOD Runtime


At run
-
time, we select a desired number of
vertices and repeat the recursive collapse
process


Each index replaced with its collapseTo until a
value less than the desired size is reached


For efficiency, we re
-
order our original index
list in reverse
-
collapse order


This allows us to stop when the first degenerate
triangle is detected during the collapse process


The result is a new indexing of the mesh
with the precise number of vertices
requested


Result is cached in our model instance data



CLOD Advantages


This method maps moderately well to
console needs


The vertex data remains static and
indexable


Re
-
indexing can be cached over multiple
frames to amortize costs


Minimal storage costs above cost of
storing basic model data


2 bytes per vert fixed
-
cost


Can actually be more memory
-
efficient than
discrete LOD, but not by a lot

CLOD Disadvantages


The biggest challenge with CLOD is to optimize the
index ordering


Normally we perform intense, off
-
line strip
generation to achieve this


With an index list that could change every frame,
we aren’t able to spend time generating strips


We can still “compile” displaylists, etc. but at
some additional cost


Skip strips and similar techniques of partial
-
strip
buffering can help address these concerns


Exploit the fact that most of the model remains
unchanged after each step

Non
-
Geometric LOD

Vertex Shader LOD


Vertex “shader” refers to the processing
path required to setup each vertex in the
scene


Newer PC and console hardware allow for
extremely complex vertex operations
including transformation, blending, and
lighting


The throughput of the GPU in verts/sec
varies by orders of magnitude depending
on the processing required


Un
-
textured, un
-
lit = 30M V/s


Dual
-
texture, 4 Lights = 9M V/s

Lighting LOD


One of the most costly parts of vertex
processing is lighting calculation


Generally the cost increases linearly with
the number of active lights.


All games do basic operations like
selecting the X brightest nearby lights for
each mesh


The number of lights X can be
increased/decreased based on LOD metrics

Pre
-
lighting


Because lighting is so expensive, a
common optimization is to pre
-
calculate
lights when possible


A non
-
moving (or rarely
-
moving object) can have
the lighting contribution from all nearby, non
-
moving lights calculated offline & stored in per
-
vertex color channel


As long as certain conditions hold, the
object is rendered with a 0
-
light path



If additional moving lights come into range, the
hardware allows us to add dynamic and pre
-
calculated colors in hardware


If the object moves, it can revert to real
-
time
lighting

Lighting LOD


At lower LOD levels, we can use
simpler lighting equations


Use a static envmap (spherical or cubic)
and normal
-
based texture projection to
approximate diffuse lighting


Switch to purely ambient lighting or
directional lighting at low LOD


At lower LOD levels, shadow
generation is reduced or disabled


Remove self
-
shadowing, remove accurate
projected shadow volumes or textures

Projected Lighting


A common technique in current games is to
use texture projection to simulate complex
lighting scenarios


Generally this requires an additional rendering
pass on affected meshes


At lower LOD, we attempt to replace a projected
light with a similar point or spotlight


Match color & size to approximate the texture effect


We also begin to exclude smaller objects from
projection


Light will affect walls, but not characters

Vertex Shader LOD


After lighting, the next most costly
operation is skinning or blending the vertex


Can be performed by fixed
-
function matrix
-
palette
blending, or programmable vertex shader


Our goal with LOD is to use the existing model
data but to simplify the vertex processing math


We create N versions of all active game
vertex processing functions


All accept the same input data


Selection is driven at run
-
time by the shared “LOD
Factor”


Essentially its discrete vertex LOD

Model Coordinate System


We store vertex position and normal data in
“model space”


This enables us to select between several types of
vertex processing when needed


If we ignore all bone associations and render with
a single transform, we get the “at
-
rest” model
pose


If we store bone influences in sorted order, we
can blend only against the first bone to get less
-
accurate skinning

Skeleton LOD


The number of bones in a model skeleton
can also affect performance


Our vertex shader offers a fixed number of
matrices that can be loaded into hardware
registers simultaneously


This limits on the number of faces we can render
before re
-
loading these registers (batch size)


We can replace a vertex
-
>bone binding with
that bone’s parent to eliminate “leaf” bones


Their geometry will behave as if the removed
bones are fused in their at
-
rest pose


This needs to be done off
-
line because it affects
how we split the model into render groups

Other Vertex LOD


At lower LOD, we replace accurate
reflected
-
normal vectors with
camera
-
space normal vectors


Requires less CPU assistance on some
platforms


We can often reduce the accuracy of
skinning/blending for normal vectors
before we do the same for position
vectors


Effects of inaccurate normals are far less
obvious

Pixel Shader LOD


Pixel shader LOD simply means having multiple
implementations of each raster
-
level visual effect


Alternate versions would achieve a similar visual
result with fewer render passes, texture stages, or
texture fetches


Disabling multi
-
pass techniques is particularly effective
because it benefits geometric LOD as well


Reducing texture stages or fetches increases pixel
fill
-
rate


Generally implemented simply as multiple code paths
selectable according to LOD metrics


Light mapped walls can revert to vertex
-
lit


Bumpmaps, Envmaps are blended out

Imposters


The most extreme form of geometric
LOD is replacing a complex object
with an imposter


The imposter can be a flat, textured quad


Or it can be a simple geometric shell


The goal is to approximate the shape &
color of the original object at great
distances


Some game objects are always
rendered as imposters


Particles, explosions, bullets, foliage

Billboard Imposter


The billboard imposter replaces a complex shape
with a flat textured quad


Can be rotated to face the camera in 1, 2 or 3
axes, depending on object symmetry


The texture can contain multiple frames to
represent different angles or animation frames


The engine can blend between frames to improve
fidelity, or use 3D volume textures to perform
hardware blending


Typically billboard imposters use masked (1
-
bit
alpha) texture images so the actual quad outline
is not visible


“Z sprites” can provide imposters that z
-
buffer
more accurately, particularly useful in clusters of
objects

Dynamic Texture Imposter


Render
-
to
-
texture is a common &
reasonably efficient console pipeline


Non
-
dynamic texture imposters use valuable
texture memory


Gives better simulation of animation, lighting, and
movement of the replaced objects


We allocate a pool of textures for dynamic
imposters at startup and re
-
use them when
necessary


A large crowd scene might re
-
use each imposter
many times

Geometric Imposter


A Geometric imposter uses a rigid 3D model
in place of a complex articulated 3D model


The “rigid mesh” vertex shader is usually several
times faster than skinned/blended


The imposter can use simpler shaders, fewer
textures, and larger render batches


Geometric imposters look better when viewed
from multiple angles (object rotating or camera
panning)


Can take up less memory

than multi
-
frame texture
imposters, and can render nearly as quickly

Terrain LOD


Terrain LOD is often handled specially


Mainly because the terrain is very large
compared to the viewer (player)


Our terrain is not stored as a heightfield,
so we can do more arbitrary shapes


We break the terrain into separate blocks
according to a 2D grid overlay

Terrain LOD


Each block has discrete LOD levels pre
-
computed and compiled into display lists


At run
-
time, an LOD factor is computed
for each block


Based on distance, viewing angle, viewer
height


Vertices that lie along the boundaries
between blocks are not subject to
removal


This avoids opening gaps and allows each block
to LOD independently


Image
-
space blending can help hide
switches

Image Processing
Techniques


Z
-
Fade


Gameplay elements that are only of player
interest at close range can be alpha blended out at
increasing z
-
distance


Powerups, small detail models, ground cover
foliage, atmosphere objects, etc.


Depth of Field effects


If the game utilizes a depth
-
of
-
field effect to blur
distant objects, the game can use far more
aggressive distance LOD schemes

Non
-
Visual LOD


Creating a special LOD geometry for shadow
projection


Could use more aggressive methods beyone half
-
edge collapse to generate silhouettes


Because shadows don’t have texture/lighting
concerns, we can be more aggressive in choosing
algorithms


Automatic Collision geometry


Currently we create collision geometry using
simple volume shapes, or convex hull algorithms


More demanding games could use some of the
volume
-
based LOD reductions to create better
-
fit
collision geometry

Future Directions


Subdivision & curved surfaces


If future platforms increase RAM sizes and are
fast enough to render 1
-
tri
-
per
-
pixel, its unclear if
subdiv is needed


However, artists are adopting this rapidly for
cutscene work, so data
-
sharing is appealing benefit


Subdivision with hardware support that was
effectively “free” would definitely find an
audience


Otherwise, we expect that next
-
generation projects
will continue to encode more data into textures and
use programmable shaders to simulate details

Future Directions


Vertex processing hardware is
becoming more general
-
purpose


Will allow more meaningful per
-
vertex
processing for LOD schemes


Possibly more emphasis on view
-
dependent schemes

References


Surface Simplification Using Quadric Error Metrics,

by Michael Garland and Paul Heckbert, SIGGRAPH 97



Bischoff, "Towards Hardware Implementation of Loop Subdivision",
Proceedings 2000 SIGGRAPH/EUROGRAPHICS Workshop on Graphics
Hardware, August 2000


Brickhill, "Practical Implementation Techniques for Multi
-
Resolution
Subdivision Surfaces". GDC Conference Proceeding, 2001.