Rendering Pipeline and Vertex Processing

birdsowlSoftware and s/w Development

Dec 2, 2013 (3 years and 6 months ago)

77 views

COMP 5411:

ADVANCED COMPUTER GRAPHICS

FALL 2013


Rendering Pipeline and Vertex Processing

1

Drawing 3D models on the computer screen

(the fast version!)

The rendering pipeline

GPU


Efficient architecture


1000+ parallel processors!


Working on vertices


Working on pixels


SIMD processing


Must execute same code


Ok for graphics applications



Improving at faster rate than CPUs

Rasterizing
Triangles


In interactive graphics, triangles rule the world


Two main reasons:


Lowest common denominator for surfaces


Can represent any surface
with arbitrary accuracy


Splines, mathematical functions, volumetric isosurfaces…


Planarity, mathematical simplicity lends itself to simple,
regular rendering algorithms


Such algorithms embed well in hardware

How to render them?


Developer specifies in OpenGL/DirectX


Rendering state

(transformation matrices, texture mapping, etc…)


Triangles and vertices

(with colors, etc…)



Graphics card processing


Vertex processing

(model, view, projection transformation)


Triangle scan
-
conversion


Pixel processing

(set color)


Rendering Triangles


Developer specifies in OpenGL/DirectX


Rendering state

(transformation matrices, texture mapping, etc…)


Triangles and vertices

(with colors, etc…)



Graphics card processing


Vertex processing

(model, view, projection transformation)


Triangle scan
-
conversion


Pixel processing

(set color)



Fully

programmable!

Transform and Lighting Background


GPU Transform and Lighting of vertices


available on
consumer graphics cards circa 1999


Also called fixed function pipeline


Previously normally done on CPU



Forced the use of simple lighting models


Other variations couldn’t be handled by the graphics
hardware


Fixed Function Pipeline

Vertex Data

(Model space)

Fixed Function

Transform and

Lighting

Clipping and Viewport Mapping

Texture Stages

Fog, Alpha, Stencil Depth Testing

Geometry Stage

Rasterizer Stage

Programmable Pipeline

Vertex Data

(Model space)

Fixed Function

Transform and

Lighting

Clipping and Viewport Mapping

Texture Stages

Fog, Alpha, Stencil Depth Testing

Geometry Stage

Rasterizer Stage

Vertex Shader

Pixel Shader

Programmable Pipeline Overview


Replaces fixed function stages


You could use both then, but not simultaneously


Introduced on Radeon 8500 / GeForce3



Possible to emulate full functionality of the fixed
function pipeline at the same speed


Everything past ATI Radeon 9700 only has
programmable units in hardware

High
-
level shading languages


Programming the pipeline stages


Languages:


Microsoft’s HLSL


Compiles for DirectX


OpenGL’s GLSL


Compiles for OpenGL


Fully approved by ARB


CG


From Nvidia


Similar to HLSL

Vertex
Shaders


Executed on a per
-
vertex basis during a rendering pass


Vertex Transformations

(model


view
-

projection)


Vertex Lighting (Gouraud, Phong, etc)


Animation and vertex blending


Object and procedural deformation


Twisting, bending

Rasterizer


Vertex shader output is directed to rasterizer


Rasterizer interpolates per
-
vertex values


Interpolated data sent per
-
pixel to the pixel shader
program

Pixel
Shaders


Executed on a per
-
pixel, per
-
object basis during a
rendering pass


Typically produces a color value as an output


Flexible way to create more realistic illumination
models and visual effects

Parallel Execution


Both vertex and pixel shaders run in parallel


GPU can process hundreds at the same time


Recall that must execute same code! (SIMD)

15

From object coordinates to screen coordinates

Vertex transformation

Geometry


Composed of two parts


Vertex data in a vertex array/buffer


Primitives in an index/element array/buffer


Vertices must include a position


Normal, color, texture coordinates optional


Primitives reference one or more vertices


Triangles, lines, points


Geometry transformations act on vertices


And as a consequence change primitives too

Vertex transformation

Object or
Model
Coordinates

World
Coordinates

Eye or
Camera
Coordinates

Clip Space

Device
Coordinates

Window or
Screen
Coordinates

Modeling

transformation

Viewing

transformation

Projection

transformation

Perspective

division

Viewport

transformation

Why this is important


There used to be functions to do all this


Usually done automatically by GPU


Deprecated as of OpenGL 3.0


Removed as of OpenGL 3.1


Current OpenGL is 4.4…


You are on your own now


Direct3D has helper functions


Some algorithms require custom matrices


It is good to understand what is going on

Positions in homogeneous coordinates


Borrowed from projective geometry


To each point (x,y,z) in R
3
, associate a line (x,y,z,w) in R
4


The line goes through the origin and (x,y,z,1)


To go from R
4
to R
3
, divide by w: (x/w, y/w, z/w)



Linear transformations in this space

are more powerful





Modeling transformations

Translation

Scaling

Rotation


Object coordinates to World coordinates


Position, scale and orient objects in the world

Vertex transformation

Object or
Model
Coordinates

World
Coordinates

Eye or
Camera
Coordinates

Clip Space

Device
Coordinates

Window or
Screen
Coordinates

Modeling

transformation

Viewing

transformation

Projection

transformation

Perspective

division

Viewport

transformation

Viewing transformation


World coordinates to Eye/Camera coordinates


Combination of Translation and Rotation


Typical way of specifying the transformation


Eye, center, up


Viewing transformations

Matrix4 LookAt( Vector3 eye, Vector3 center, Vector3 up )

{


Vector3 zaxis = normal(center
-

eye);


// The "look
-
at" vector.


Vector3 xaxis = normal(cross(up, zaxis));
// The "right" vector.


Vector3 yaxis = cross(zaxis, xaxis);


// The "up" vector.




// Create a 4x4 orientation matrix from the right, up, and at vectors


Matrix4 orientation = {


xaxis.x, xaxis.y, xaxis.z, 0,


yaxis.x, yaxis.y, yaxis.z, 0,


zaxis.x, zaxis.y, zaxis.z, 0,


0,


0,


0,


1


};





// Create a 4x4 translation matrix by negating the eye position.


Matrix4 translation = {


1,


0,


0,


0,


0,


1,


0,


0,


0,


0,


1,


0,


-
eye.x,
-
eye.y,
-
eye.z,


1


};




// Combine the orientation and translation to compute the view matrix


return

( translation * orientation );

}

Vertex transformation

Object or
Model
Coordinates

World
Coordinates

Eye or
Camera
Coordinates

Clip Space

Device
Coordinates

Window or
Screen
Coordinates

Modeling

transformation

Viewing

transformation

Projection

transformation

Perspective

division

Viewport

transformation

Orthographic projection

Orthographic projection

31

Perspective projection

Perspective projection

Vertex transformation

Object or
Model
Coordinates

World
Coordinates

Eye or
Camera
Coordinates

Clip Space

Device
Coordinates

Window or
Screen
Coordinates

Modeling

transformation

Viewing

transformation

Projection

transformation

Perspective

division

Viewport

transformation

Viewport transformation

Research topic:

Caching vertex transformations

The post
-
transform vertex cache


Vertices are transformed on demand


Transforming vertices can be costly


Hardware uses an optimization


Cache transformed vertices (FIFO)


Requires a software strategy


Reorder triangle list for vertex locality


Average Cache Miss Ratio (ACMR)


# transformed vertices / # triangles


varies within [0.5

3]


From Euler’s formula

Transformed vertices

Input vertices

ACMR Minimization


NP
-
Complete problem


GAREY et. al [1976]


Optimal value is


BAR
-
YEHUDA and GOTSMAN [1996]


Heuristics reach near
-
optimal results [0.6

0.7]


Hardware cache sizes range within [4

64]


Substantial impact on rendering cost


Everybody does it

Random triangle order

~3 v/t

FIFO cache holding 6 vertices

Long triangle strips

~1 v/t

Trivial x3 improvement

Can we do better?

Lets look at some
research on this area

Vertex data access

= cache hit

= cache miss

traditional strips

with caching

transfer ~0.5 vertex/tri

assume in cache

transfer ~1.0 vertex/tri

Vertex data access

# misses

0

1

2

3

traditional strips

with caching

transfer ~
0.5
vertex/tri

transfer ~
1.0
vertex/tri

Example

before optimization

# misses

0

1

2

3

after optimization [Hoppe 99]

Nearly 2x improvement!

Two reordering techniques
[Hoppe 99]


Greedy strip
-
growing


very fast


Local optimization


improve initial greedy solution


very slow

Greedy strip
-
growing


Inspired by [Chow97]


To decide when to restart strip,


perform lookahead cache simulations

1

2

3

4

When to restart strip?

good strip length

(cache size 4)

2

1

3













3

2

4

1











4

3



2

1











4



3

2

1













4

3

2

1













4

3

2

1











4

3

2

1

When to restart strip?

good strip length

strip too long



jump in
miss rate!

(cache size 4)











4

3

2

1





2

1

3













3

2

4

1











4

3



2

1











4



3

2

1













4

3

2







1











4

3







2

1







3

1



2







4







4

2



3

1















3

1

4

2















4

2



3

1





Lookahead simulations


Perform
s

simulations

(a)

restart immediately, after 0 faces

(b)

restart after
0 < i < s

faces


If
(a)

is best, restart strip







4



3

2

1







Result

traditional long strips

face order

within strip

strip restart

Result

traditional long strips

greedy strip
-
growing

Result

before

after

Local optimization

Apply perturbations to face ordering if cost is lowered:

Initial order F

F
1..x
-
1

F
y+1..m

F’=Reflect
x,y
(F)

F
1..x
-
1

F
y+1..m

F’=Insert1
x,y
(F)

F
y

F
1..x
-
1

F
y+1..m

F’=Insert2
x,y
(F)

F
y
-
1..y

F
y

x

y

F
x

F
x..y
-
1

F
y..x

F
y..x

F
y

F
y
-
1..y

F
x..y
-
2

F
x..y
-
1

Result

greedy strip
-
growing

local optimization

~ 4% gain

Choice of cache size

size 16 sufficient for most gain

Cache replacement policy



all is OK

FIFO

LRU

(cache size 4)

2

1

3













3

2

4

1











4

3



2

1











4



3

2

1













4

3

2

1





Cache replacement policy

FIFO

LRU


strips twice as long

(cache size 4)

2

1

3













1

4

3

2











3



4

2

1









3

1



4

2









4

3





2

1













3

1

4

2





2

1

3













3

2

4

1











4

3



2

1











4



3

2

1













4

3

2

1





Comparison

FIFO

LRU

FIFO

LRU

Comparison

Summary


Vertex caching reduces geometry

processing by factor of 1.6 to 1.9


Transparent to application:


simply pre
-
process the models (fast)


Supports dynamic geometry

Related interesting directions


Issue of cache size


Find face ordering good for all sizes?


Standardize on size 16?


Reprocess mesh at load time


Interaction with texture caching


Cache efficiency during runtime geometry
connectivity changes

Observations


Bus bandwidth not an issue (data on GPU)


Fetching from video memory not too bad


Re
-
processing vertex is bad


Post
-
transform vcache more significant


Hoppe forces strips


ACMR hovers around 1 when thrashing


Good for legacy (very legacy!) hardware


But, not optimal


Sensitive to cache sizes slightly smaller

MeshReorder optimizing for different cache sizes
0.5
1
1.5
2
2.5
3
0
8
16
24
32
40
48
56
64
72
Cache size used to measure ACMR
ACMR
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
The Pre
-
Transform vertex ordering


So far we only sorted triangles


Vertices have to be fetched from memory


As they are referenced by triangles


If

they are not found in the post
-
cache


Arrange them in the order they are referenced


Re
-
label triangles after reordering


Most vertices are accessed sequentially


Unless evicted by the post
-
cache & needed again


Sequential access improves bandwidth