LOD Case Study & Application

yellvillepotatocreekΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 4 χρόνια και 7 μήνες)

95 εμφανίσεις

LOD Case Study &


Robert Huebner

Nihilistic Software


Speaker Bio

President and Director of Technology for
Nihilistic Software

Currently working on “Starcraft:Ghost” for
Blizzard Entertainment

Previous credits include Vampire: The
Masquerade, Jedi Knight: Dark Forces 2, Descent

International Game Developer’s Association
Board Member (IGDA)


Game Developer’s Conference (GDC)
Advisory Board

Purpose of Talk

Review some of the topics and ideas
presented earlier in the course

Try to explain what worked for us, and
what didn’t

This talk is a “case study in progress”
for our current Gamecube and XBOX

Still tweaking and changing some LOD

Starcraft: Ghost

(needs LOD too!)

Goal of LOD

Back on Pre
hardware PCs, we would
spend a LOT of CPU to avoid drawing a few

The cost of rendering was much higher

We were willing to spend significant CPU to
eliminate a single triangle

Systems like ROAM, view
dependent LOD

Current hardware renders fast, so we only
spend CPU if we can discard a

of triangles

Or if it saves us state changes, texture fetches,
memory bandwidth, or other costly processing


General Block Diagram

Vertex Unit

Pixel Unit




Texture Mem

Frame buffer

Data Flow Management

Managing data flow and bandwidth is
an important performance metric

Each platform has different

So our choice of LOD differs for each

Each main data path can utilize
different LOD techniques to increase

We try to do this without wasting CPU or
memory resources, which are also scarce


Where Do We Use LOD?

Vertex Unit

Pixel Unit




Texture Mem


Classes of Game LOD

The design of most console systems is
dominated by three data paths:

>GPU path and GPU throughput
is managed with geometric LOD

>Framebuffer path is managed
via shader LOD

The Texture
>GPU path is managed with
mapping and shader LOD

Games Vs. Research

The biggest problems we run into
when adopting academic LOD
systems to game use are:

Dealing with additional properties of

Vertex normals, texture, UV coordinates, etc.

Avoid the need for general
processing at the vertex level

Maintaining data in a format that our
hardware can process directly

Runtime Selection

In our engine, all LOD processing for
a given object is driven by a single

The LOD value is stored both as a float
(0.0 to 1.0) and as a discrete BYTE (1..X)

Each sub
system that wants to do LOD
can use either version of the LOD metric
to control behavior

Runtime Selection

The LOD metric is stored for each object or
“sector” (world section)

Based on many factors (highest to lowest

Estimated screen space (size / distance)

Overall performance or estimated triangle counts
for scene (scene metric)

Current player control mode (interact or cutscene,
combat or stealth)

“Importance” of the object (active AI vs. inactive

Viewing angle for terrain blocks

Geometric LOD

Geometric LOD is the most interesting &
complex topic for games

There are three main goals we try to
achieve with geometric LOD:

Send less data to the GPU to avoid exceeding its

Utilize less bus bandwidth moving data into the
graphics unit

Try achieve a constant average triangle size to
balance load between vertex and pixel units

Compiled Models

Most game engines are constructed to
load “compiled” models

Vertex data is adjusted to match native

Triangles are batched to minimize state
changes and fit within hardware limits

Optimum strips are constructed

DisplayLists/Pushbuffers are compiled

Compiled models are highly platform

Basic LOD Choices

Based on platform specifics, we select
a simple half
edge collapse operation
as the basis of our LOD

Minimizes memory use, vertex data
remains unchanged

Minimizes dynamically changing vertex
data, which minimizes bandwidth & FIFO

Allows us to address problems with
property discontinuities

Calculating LOD

We perform all our LOD computation off
during model compilation

We offer the artists a choice of LOD metric to use
when computing automatic LOD levels

We chose an LOD scheme that is based on
edge collapse operations only

Less memory, more static data set

The LOD is constructed based on edge score

Each edge in the model is given a score based on
its length, curvature, or other factors

Vertices are also given scores to control which
endpoint is preserved during the edge collapse

Calculating LOD

We begin by building an augmented
“collapse vertex” structure for the model

Links to neighbor verts (edges)

Links to associated faces

Link and score of “least cost” edge

Identification of “border” or “seam” verts

Links to “paired” verts

Links to the actual “render” vertices

This process happens after vertices are split
due to texture/normal/UV changes

This means one collapse vertex can be linked to
multiple “export” vertices

Calculating LOD

We add game
specific restrictions to LOD

Either adjust the vertex score, exempt it entirely,
or link its removal to that of another vertex

Texture or UV mapping “seams” due to
composited textures

Vertex normal discontinuities (hard edge)

Unpaired edges

Artist influence (blind vertex data in Maya)

We also use domain
specific knowledge to
adjust scoring algorithm

Terrain blocks use z (height) differential as main
score factor

Shadow/collision LOD ignores texture/UV seams

Calculating LOD

Once we have a full set of edge scores, we
select the least cost edge and remove its
least cost vertex

edge collapse to the higher
cost endpoint

Record the operation in fields in our underlying

Remove degenerate triangles

compute all edge costs in neighboring

Repeat until only non
collapsible edges remain

Note on quality

Our reduction and scoring system is simple,
but accuracy suffers

Because of this, we have found that the last 10%
or so of the collapse operations are judged by
artists as being unsatisfactory

We allow the export process to specify some
control over the quality

Limit on the maximum cost collapse that will be
executed (default excludes about 10% of

specific tweaks to the computed LOD

Calculating LOD

The results of this operation are two new
data fields in our renderable vertex

The “collapseOrder” field gives the ordering of the
collapse operation

The “collapseTo” field is the destination vertex for
the edge collapse operation that removes this
vertex from the mesh

Using these fields, we can export the LOD in
various ways in the final compilation

Since the LOD metrices are all export
we can adopt improvements periodically
without affecting run
time data

Just re
export to get benefits of better reduction

Discrete LOD

Discrete LOD is still the workhorse of game
mesh LOD

Each level can undergo heavy pre
processing for
ordering or displaylist creation

Artists can hand
tune the reduction for visual

Can optionally replace both vertices and index
lists, or just indices to save memory

We represent discrete LOD by loading
multiple sets of face index lists, or separate
“index buffers”

Vertex data is unchanged

Exporting Discrete LOD

We can use our computed data to export
any number of discrete LOD steps

Pick a desired number of vertices for the LOD level

Calculate how many collapse operations will reach
this level

Build an indexed ordering for the mesh

For any vertex with a “collapseOrder” value lower
than the # of operations, replace its index with its
“collapseTo” index

Repeat until a vertex is reached that has a higher
collapseOrder field

Process each index ordering for strips &
cache coherency, create packets, etc.

Discrete Blended LOD

To minimize “popping” that occurs during
the LOD switch, we can use image

When an object needs to change between discrete
LOD levels, it is queued for blending

During blending, the object is actually rendered
twice, at both LOD levels, and the alpha values are

In practice, we find this is useful for larger
objects or terrain blocks, but not useful for
typical models

Continuous LOD

Continuous LOD can be an effective
extension to discrete
LOD for games

Reductions with greater granularity can
avoid visible “popping”

It can also save memory compared to
storing a high number of discrete levels

Our continuous implementation is
based mainly on half
edge collapse

This is the best way to keep our data

CLOD Implementation

To implement run
time CLOD, what
we’re effectively doing is moving our
line creation of discrete LOD index
lists to the run
time engine

To save memory, we re
order vertices in
order of their “collapseOrder” field

We export a separate parallel array to
contain the “collapseTo” index for each

CLOD Runtime

At run
time, we select a desired number of
vertices and repeat the recursive collapse

Each index replaced with its collapseTo until a
value less than the desired size is reached

For efficiency, we re
order our original index
list in reverse
collapse order

This allows us to stop when the first degenerate
triangle is detected during the collapse process

The result is a new indexing of the mesh
with the precise number of vertices

Result is cached in our model instance data

CLOD Advantages

This method maps moderately well to
console needs

The vertex data remains static and

indexing can be cached over multiple
frames to amortize costs

Minimal storage costs above cost of
storing basic model data

2 bytes per vert fixed

Can actually be more memory
efficient than
discrete LOD, but not by a lot

CLOD Disadvantages

The biggest challenge with CLOD is to optimize the
index ordering

Normally we perform intense, off
line strip
generation to achieve this

With an index list that could change every frame,
we aren’t able to spend time generating strips

We can still “compile” displaylists, etc. but at
some additional cost

Skip strips and similar techniques of partial
buffering can help address these concerns

Exploit the fact that most of the model remains
unchanged after each step

Geometric LOD

Vertex Shader LOD

Vertex “shader” refers to the processing
path required to setup each vertex in the

Newer PC and console hardware allow for
extremely complex vertex operations
including transformation, blending, and

The throughput of the GPU in verts/sec
varies by orders of magnitude depending
on the processing required

textured, un
lit = 30M V/s

texture, 4 Lights = 9M V/s

Lighting LOD

One of the most costly parts of vertex
processing is lighting calculation

Generally the cost increases linearly with
the number of active lights.

All games do basic operations like
selecting the X brightest nearby lights for
each mesh

The number of lights X can be
increased/decreased based on LOD metrics


Because lighting is so expensive, a
common optimization is to pre
lights when possible

A non
moving (or rarely
moving object) can have
the lighting contribution from all nearby, non
moving lights calculated offline & stored in per
vertex color channel

As long as certain conditions hold, the
object is rendered with a 0
light path

If additional moving lights come into range, the
hardware allows us to add dynamic and pre
calculated colors in hardware

If the object moves, it can revert to real

Lighting LOD

At lower LOD levels, we can use
simpler lighting equations

Use a static envmap (spherical or cubic)
and normal
based texture projection to
approximate diffuse lighting

Switch to purely ambient lighting or
directional lighting at low LOD

At lower LOD levels, shadow
generation is reduced or disabled

Remove self
shadowing, remove accurate
projected shadow volumes or textures

Projected Lighting

A common technique in current games is to
use texture projection to simulate complex
lighting scenarios

Generally this requires an additional rendering
pass on affected meshes

At lower LOD, we attempt to replace a projected
light with a similar point or spotlight

Match color & size to approximate the texture effect

We also begin to exclude smaller objects from

Light will affect walls, but not characters

Vertex Shader LOD

After lighting, the next most costly
operation is skinning or blending the vertex

Can be performed by fixed
function matrix
blending, or programmable vertex shader

Our goal with LOD is to use the existing model
data but to simplify the vertex processing math

We create N versions of all active game
vertex processing functions

All accept the same input data

Selection is driven at run
time by the shared “LOD

Essentially its discrete vertex LOD

Model Coordinate System

We store vertex position and normal data in
“model space”

This enables us to select between several types of
vertex processing when needed

If we ignore all bone associations and render with
a single transform, we get the “at
rest” model

If we store bone influences in sorted order, we
can blend only against the first bone to get less
accurate skinning

Skeleton LOD

The number of bones in a model skeleton
can also affect performance

Our vertex shader offers a fixed number of
matrices that can be loaded into hardware
registers simultaneously

This limits on the number of faces we can render
before re
loading these registers (batch size)

We can replace a vertex
>bone binding with
that bone’s parent to eliminate “leaf” bones

Their geometry will behave as if the removed
bones are fused in their at
rest pose

This needs to be done off
line because it affects
how we split the model into render groups

Other Vertex LOD

At lower LOD, we replace accurate
normal vectors with
space normal vectors

Requires less CPU assistance on some

We can often reduce the accuracy of
skinning/blending for normal vectors
before we do the same for position

Effects of inaccurate normals are far less

Pixel Shader LOD

Pixel shader LOD simply means having multiple
implementations of each raster
level visual effect

Alternate versions would achieve a similar visual
result with fewer render passes, texture stages, or
texture fetches

Disabling multi
pass techniques is particularly effective
because it benefits geometric LOD as well

Reducing texture stages or fetches increases pixel

Generally implemented simply as multiple code paths
selectable according to LOD metrics

Light mapped walls can revert to vertex

Bumpmaps, Envmaps are blended out


The most extreme form of geometric
LOD is replacing a complex object
with an imposter

The imposter can be a flat, textured quad

Or it can be a simple geometric shell

The goal is to approximate the shape &
color of the original object at great

Some game objects are always
rendered as imposters

Particles, explosions, bullets, foliage

Billboard Imposter

The billboard imposter replaces a complex shape
with a flat textured quad

Can be rotated to face the camera in 1, 2 or 3
axes, depending on object symmetry

The texture can contain multiple frames to
represent different angles or animation frames

The engine can blend between frames to improve
fidelity, or use 3D volume textures to perform
hardware blending

Typically billboard imposters use masked (1
alpha) texture images so the actual quad outline
is not visible

“Z sprites” can provide imposters that z
more accurately, particularly useful in clusters of

Dynamic Texture Imposter

texture is a common &
reasonably efficient console pipeline

dynamic texture imposters use valuable
texture memory

Gives better simulation of animation, lighting, and
movement of the replaced objects

We allocate a pool of textures for dynamic
imposters at startup and re
use them when

A large crowd scene might re
use each imposter
many times

Geometric Imposter

A Geometric imposter uses a rigid 3D model
in place of a complex articulated 3D model

The “rigid mesh” vertex shader is usually several
times faster than skinned/blended

The imposter can use simpler shaders, fewer
textures, and larger render batches

Geometric imposters look better when viewed
from multiple angles (object rotating or camera

Can take up less memory

than multi
frame texture
imposters, and can render nearly as quickly

Terrain LOD

Terrain LOD is often handled specially

Mainly because the terrain is very large
compared to the viewer (player)

Our terrain is not stored as a heightfield,
so we can do more arbitrary shapes

We break the terrain into separate blocks
according to a 2D grid overlay

Terrain LOD

Each block has discrete LOD levels pre
computed and compiled into display lists

At run
time, an LOD factor is computed
for each block

Based on distance, viewing angle, viewer

Vertices that lie along the boundaries
between blocks are not subject to

This avoids opening gaps and allows each block
to LOD independently

space blending can help hide

Image Processing


Gameplay elements that are only of player
interest at close range can be alpha blended out at
increasing z

Powerups, small detail models, ground cover
foliage, atmosphere objects, etc.

Depth of Field effects

If the game utilizes a depth
field effect to blur
distant objects, the game can use far more
aggressive distance LOD schemes

Visual LOD

Creating a special LOD geometry for shadow

Could use more aggressive methods beyone half
edge collapse to generate silhouettes

Because shadows don’t have texture/lighting
concerns, we can be more aggressive in choosing

Automatic Collision geometry

Currently we create collision geometry using
simple volume shapes, or convex hull algorithms

More demanding games could use some of the
based LOD reductions to create better
collision geometry

Future Directions

Subdivision & curved surfaces

If future platforms increase RAM sizes and are
fast enough to render 1
pixel, its unclear if
subdiv is needed

However, artists are adopting this rapidly for
cutscene work, so data
sharing is appealing benefit

Subdivision with hardware support that was
effectively “free” would definitely find an

Otherwise, we expect that next
generation projects
will continue to encode more data into textures and
use programmable shaders to simulate details

Future Directions

Vertex processing hardware is
becoming more general

Will allow more meaningful per
processing for LOD schemes

Possibly more emphasis on view
dependent schemes


Surface Simplification Using Quadric Error Metrics,

by Michael Garland and Paul Heckbert, SIGGRAPH 97

Bischoff, "Towards Hardware Implementation of Loop Subdivision",
Proceedings 2000 SIGGRAPH/EUROGRAPHICS Workshop on Graphics
Hardware, August 2000

Brickhill, "Practical Implementation Techniques for Multi
Subdivision Surfaces". GDC Conference Proceeding, 2001.