COMP 5411:
ADVANCED COMPUTER GRAPHICS
FALL 2013
Rendering Pipeline and Vertex Processing
1
Drawing 3D models on the computer screen
(the fast version!)
The rendering pipeline
GPU
•
Efficient architecture
•
1000+ parallel processors!
–
Working on vertices
–
Working on pixels
•
SIMD processing
–
Must execute same code
–
Ok for graphics applications
•
Improving at faster rate than CPUs
Rasterizing
Triangles
•
In interactive graphics, triangles rule the world
•
Two main reasons:
–
Lowest common denominator for surfaces
•
Can represent any surface
with arbitrary accuracy
•
Splines, mathematical functions, volumetric isosurfaces…
–
Planarity, mathematical simplicity lends itself to simple,
regular rendering algorithms
•
Such algorithms embed well in hardware
How to render them?
•
Developer specifies in OpenGL/DirectX
–
Rendering state
(transformation matrices, texture mapping, etc…)
–
Triangles and vertices
(with colors, etc…)
•
Graphics card processing
–
Vertex processing
(model, view, projection transformation)
–
Triangle scan

conversion
–
Pixel processing
(set color)
Rendering Triangles
•
Developer specifies in OpenGL/DirectX
–
Rendering state
(transformation matrices, texture mapping, etc…)
–
Triangles and vertices
(with colors, etc…)
•
Graphics card processing
–
Vertex processing
(model, view, projection transformation)
–
Triangle scan

conversion
–
Pixel processing
(set color)
Fully
programmable!
Transform and Lighting Background
•
GPU Transform and Lighting of vertices
–
available on
consumer graphics cards circa 1999
–
Also called fixed function pipeline
–
Previously normally done on CPU
•
Forced the use of simple lighting models
–
Other variations couldn’t be handled by the graphics
hardware
Fixed Function Pipeline
Vertex Data
(Model space)
Fixed Function
Transform and
Lighting
Clipping and Viewport Mapping
Texture Stages
Fog, Alpha, Stencil Depth Testing
Geometry Stage
Rasterizer Stage
Programmable Pipeline
Vertex Data
(Model space)
Fixed Function
Transform and
Lighting
Clipping and Viewport Mapping
Texture Stages
Fog, Alpha, Stencil Depth Testing
Geometry Stage
Rasterizer Stage
Vertex Shader
Pixel Shader
Programmable Pipeline Overview
•
Replaces fixed function stages
•
You could use both then, but not simultaneously
•
Introduced on Radeon 8500 / GeForce3
•
Possible to emulate full functionality of the fixed
function pipeline at the same speed
–
Everything past ATI Radeon 9700 only has
programmable units in hardware
High

level shading languages
•
Programming the pipeline stages
•
Languages:
–
Microsoft’s HLSL
•
Compiles for DirectX
–
OpenGL’s GLSL
•
Compiles for OpenGL
•
Fully approved by ARB
–
CG
•
From Nvidia
•
Similar to HLSL
Vertex
Shaders
•
Executed on a per

vertex basis during a rendering pass
•
Vertex Transformations
(model
–
view

projection)
•
Vertex Lighting (Gouraud, Phong, etc)
•
Animation and vertex blending
•
Object and procedural deformation
–
Twisting, bending
Rasterizer
•
Vertex shader output is directed to rasterizer
•
Rasterizer interpolates per

vertex values
•
Interpolated data sent per

pixel to the pixel shader
program
Pixel
Shaders
•
Executed on a per

pixel, per

object basis during a
rendering pass
•
Typically produces a color value as an output
•
Flexible way to create more realistic illumination
models and visual effects
Parallel Execution
•
Both vertex and pixel shaders run in parallel
•
GPU can process hundreds at the same time
•
Recall that must execute same code! (SIMD)
15
From object coordinates to screen coordinates
Vertex transformation
Geometry
•
Composed of two parts
–
Vertex data in a vertex array/buffer
–
Primitives in an index/element array/buffer
•
Vertices must include a position
–
Normal, color, texture coordinates optional
•
Primitives reference one or more vertices
–
Triangles, lines, points
•
Geometry transformations act on vertices
–
And as a consequence change primitives too
Vertex transformation
Object or
Model
Coordinates
World
Coordinates
Eye or
Camera
Coordinates
Clip Space
Device
Coordinates
Window or
Screen
Coordinates
Modeling
transformation
Viewing
transformation
Projection
transformation
Perspective
division
Viewport
transformation
Why this is important
•
There used to be functions to do all this
–
Usually done automatically by GPU
–
Deprecated as of OpenGL 3.0
–
Removed as of OpenGL 3.1
–
Current OpenGL is 4.4…
•
You are on your own now
–
Direct3D has helper functions
•
Some algorithms require custom matrices
–
It is good to understand what is going on
Positions in homogeneous coordinates
•
Borrowed from projective geometry
–
To each point (x,y,z) in R
3
, associate a line (x,y,z,w) in R
4
–
The line goes through the origin and (x,y,z,1)
–
To go from R
4
to R
3
, divide by w: (x/w, y/w, z/w)
•
Linear transformations in this space
are more powerful
Modeling transformations
Translation
Scaling
Rotation
•
Object coordinates to World coordinates
•
Position, scale and orient objects in the world
Vertex transformation
Object or
Model
Coordinates
World
Coordinates
Eye or
Camera
Coordinates
Clip Space
Device
Coordinates
Window or
Screen
Coordinates
Modeling
transformation
Viewing
transformation
Projection
transformation
Perspective
division
Viewport
transformation
Viewing transformation
•
World coordinates to Eye/Camera coordinates
•
Combination of Translation and Rotation
•
Typical way of specifying the transformation
–
Eye, center, up
Viewing transformations
Matrix4 LookAt( Vector3 eye, Vector3 center, Vector3 up )
{
Vector3 zaxis = normal(center

eye);
// The "look

at" vector.
Vector3 xaxis = normal(cross(up, zaxis));
// The "right" vector.
Vector3 yaxis = cross(zaxis, xaxis);
// The "up" vector.
// Create a 4x4 orientation matrix from the right, up, and at vectors
Matrix4 orientation = {
xaxis.x, xaxis.y, xaxis.z, 0,
yaxis.x, yaxis.y, yaxis.z, 0,
zaxis.x, zaxis.y, zaxis.z, 0,
0,
0,
0,
1
};
// Create a 4x4 translation matrix by negating the eye position.
Matrix4 translation = {
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
0,

eye.x,

eye.y,

eye.z,
1
};
// Combine the orientation and translation to compute the view matrix
return
( translation * orientation );
}
Vertex transformation
Object or
Model
Coordinates
World
Coordinates
Eye or
Camera
Coordinates
Clip Space
Device
Coordinates
Window or
Screen
Coordinates
Modeling
transformation
Viewing
transformation
Projection
transformation
Perspective
division
Viewport
transformation
Orthographic projection
Orthographic projection
31
Perspective projection
Perspective projection
Vertex transformation
Object or
Model
Coordinates
World
Coordinates
Eye or
Camera
Coordinates
Clip Space
Device
Coordinates
Window or
Screen
Coordinates
Modeling
transformation
Viewing
transformation
Projection
transformation
Perspective
division
Viewport
transformation
Viewport transformation
Research topic:
Caching vertex transformations
The post

transform vertex cache
•
Vertices are transformed on demand
•
Transforming vertices can be costly
•
Hardware uses an optimization
–
Cache transformed vertices (FIFO)
•
Requires a software strategy
–
Reorder triangle list for vertex locality
•
Average Cache Miss Ratio (ACMR)
–
# transformed vertices / # triangles
–
varies within [0.5
–
3]
•
From Euler’s formula
Transformed vertices
Input vertices
ACMR Minimization
•
NP

Complete problem
–
GAREY et. al [1976]
•
Optimal value is
–
BAR

YEHUDA and GOTSMAN [1996]
•
Heuristics reach near

optimal results [0.6
–
0.7]
–
Hardware cache sizes range within [4
–
64]
•
Substantial impact on rendering cost
–
Everybody does it
Random triangle order
~3 v/t
FIFO cache holding 6 vertices
Long triangle strips
~1 v/t
Trivial x3 improvement
Can we do better?
Lets look at some
research on this area
Vertex data access
= cache hit
= cache miss
traditional strips
with caching
transfer ~0.5 vertex/tri
assume in cache
transfer ~1.0 vertex/tri
Vertex data access
# misses
0
1
2
3
traditional strips
with caching
transfer ~
0.5
vertex/tri
transfer ~
1.0
vertex/tri
Example
before optimization
# misses
0
1
2
3
after optimization [Hoppe 99]
Nearly 2x improvement!
Two reordering techniques
[Hoppe 99]
•
Greedy strip

growing
–
very fast
•
Local optimization
–
improve initial greedy solution
–
very slow
Greedy strip

growing
•
Inspired by [Chow97]
To decide when to restart strip,
perform lookahead cache simulations
1
2
3
4
When to restart strip?
good strip length
(cache size 4)
2
1
3
3
2
4
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
When to restart strip?
good strip length
strip too long
jump in
miss rate!
(cache size 4)
4
3
2
1
2
1
3
3
2
4
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
3
1
2
4
4
2
3
1
3
1
4
2
4
2
3
1
Lookahead simulations
•
Perform
s
simulations
(a)
restart immediately, after 0 faces
(b)
restart after
0 < i < s
faces
If
(a)
is best, restart strip
4
3
2
1
Result
traditional long strips
face order
within strip
strip restart
Result
traditional long strips
greedy strip

growing
Result
before
after
Local optimization
Apply perturbations to face ordering if cost is lowered:
Initial order F
F
1..x

1
F
y+1..m
F’=Reflect
x,y
(F)
F
1..x

1
F
y+1..m
F’=Insert1
x,y
(F)
F
y
F
1..x

1
F
y+1..m
F’=Insert2
x,y
(F)
F
y

1..y
F
y
x
y
F
x
F
x..y

1
F
y..x
F
y..x
F
y
F
y

1..y
F
x..y

2
F
x..y

1
Result
greedy strip

growing
local optimization
~ 4% gain
Choice of cache size
size 16 sufficient for most gain
Cache replacement policy
all is OK
FIFO
LRU
(cache size 4)
2
1
3
3
2
4
1
4
3
2
1
4
3
2
1
4
3
2
1
Cache replacement policy
FIFO
LRU
strips twice as long
(cache size 4)
2
1
3
1
4
3
2
3
4
2
1
3
1
4
2
4
3
2
1
3
1
4
2
2
1
3
3
2
4
1
4
3
2
1
4
3
2
1
4
3
2
1
Comparison
FIFO
LRU
FIFO
LRU
Comparison
Summary
•
Vertex caching reduces geometry
processing by factor of 1.6 to 1.9
•
Transparent to application:
simply pre

process the models (fast)
•
Supports dynamic geometry
Related interesting directions
•
Issue of cache size
–
Find face ordering good for all sizes?
–
Standardize on size 16?
–
Reprocess mesh at load time
•
Interaction with texture caching
•
Cache efficiency during runtime geometry
connectivity changes
Observations
•
Bus bandwidth not an issue (data on GPU)
•
Fetching from video memory not too bad
•
Re

processing vertex is bad
–
Post

transform vcache more significant
•
Hoppe forces strips
–
ACMR hovers around 1 when thrashing
–
Good for legacy (very legacy!) hardware
–
But, not optimal
•
Sensitive to cache sizes slightly smaller
MeshReorder optimizing for different cache sizes
0.5
1
1.5
2
2.5
3
0
8
16
24
32
40
48
56
64
72
Cache size used to measure ACMR
ACMR
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
The Pre

Transform vertex ordering
•
So far we only sorted triangles
•
Vertices have to be fetched from memory
•
As they are referenced by triangles
–
If
they are not found in the post

cache
•
Arrange them in the order they are referenced
•
Re

label triangles after reordering
•
Most vertices are accessed sequentially
–
Unless evicted by the post

cache & needed again
•
Sequential access improves bandwidth
Comments 0
Log in to post a comment