Procedural Graphics with DirectX 11

mattednearAI and Robotics

Dec 1, 2013 (3 years and 6 months ago)

73 views

Procedural Graphics with DirectX 11



James P Reid

Department of Software Engineering

University of Wisconsin
-

Platteville

Platteville, Wisconsin 53818

reidj@uwplatt.edu

Abstract

Over
the past decade, computer graphics have become more demanding both in detail,

and in the
size of the area
be
ing

rendered. These demands are expanding faster than modern graphics
hardware can keep up. In this paper I will be describing a popular technique for solving this
problem
,

which can cut memory usage by up to 90%. This technique has been around since the
earl
y 80s, but using modern hardware, and the power of DirectX 11, worlds with near infinite
size can be created that maintain all the detail
desired

by the end user.



Introduction

As a brief introduction to this paper and the concepts it covers, I would firs
t like to go over a few
basic graphics co
ncepts to make some of the ideas mentioned later

easier to understand.

Meshes

and Height
-
Maps

At the basic level,
computer generated graphics are generated by creating meshes out of arrays of
vertices. A vertex is
simply a point in
a three

dimensional space where two or more points mee
t.
By combining hundreds or even

thousands of vertices together, objects can be represented in a
way that a computer will understand. For example, a simple box can be represented by th
e
combination of eight different vertices. The computer can then draw lines between these vertices
in a triangle pattern (two triangle per side of that box) and lay colors or even textures on that grid
of vertices. Now that we have a grid of vertices repre
senting our object, we can call this a mesh.


The other main concept needed is a height
-
map. This is usually a square
,

grayscale image where
each pixel in that image represents a different height on

the mesh it is being mapped to.
These
values normally
range from 0
-
255 (8
-
bit values)
, giving the mesh 255 distinctly different height
values. Now the level of detail for a height map can be increased if the map where to use all
three color channels

and the alpha channel.
A color channel is a set of 8 bits in

a 32
-
bit integer in
the RGBA scheme. This means the first 8 bits of the 32
-
bit integer are the red channel, the
second 8
-
bits and the green channel, etc.
This would allow for a 32
-
bit height for each pixel
allowing for over four billion distinct height va
lues on a mesh.

-

2
-


History

Procedural graphics where initially implemented in games because of memory constraints in the
early computers capable of supporting games. By forci
ng the game to generate

algorithm
ically at
run time, designers w
ere able to cre
ate
massive worlds that occupied only small amounts of hard
disk space and RAM.

One of the first games to use this technique was a 1982 game
Elite,
which
was generated completely procedurally. The original des
ign for this game suggested having

a
total of 2
48

g
alaxies, with 256 solar systems in each galaxy. This meant there would be trillions
of items to explore in the game and it was all built into a single algorithm.


Procedural generation didn’t stop when computers became faster and able to handle more
vertic
es per frame, this technique was used to create even more believable worlds in high
definition, while the game designers could focus on the story line instead of how to fit all these
graphics into a set amoun
t of space. This led to a type o
f artificial int
elligence, allowing the
game to become dynamic
rather than static. This was
based on where the player was, and what
they were doing there.

Procedural Graphics
:
What I
s
I
t
& Why Use It
?

According to Wikipedia, procedural graphics is content that is generated algorithmically rather
than manually. This means the content seen in the graphics world will be generated at run
time
(on the fly) as opposed to
compile time
(
having the data stored a
nd loaded from file
)
.

Most games today use height maps and meshes stored in file
s,

and load them when the
application calls for them. This means the computer will be spending a lot of time accessing the
hard drive and parsing out new data to display to the

world. This is what causes a loading screen
to appear
in most modern games
,

and 3D graphics generators such as a virtual world used to
represent a location (e.g. Google Earth).

The primary use for procedural generation is to reduce the amount of total
har
d disk space

used
by the application. Storing data on a

hard drive is very expensive for

the system, espec
ially when
the application has

to load new data or needs to write data that is currently stored in memory
back to the ha
rd drive
in order
to make spac
e in RAM

for the new data.

By generating everything
at run time algorithmically, the application no longer needs to constantly be co
mmunicating with
the hard drive. This is very good considering the slowest part of a modern computer is usually
the hard drive. The only downside to using a procedural approach is the need for a fast
Central
Processing Unit (CPU) and Graphics Processing Un
it (GPU)

in order to generate the data
required by th
e system at a decent frame rate. Considering today’s modern hardware for C
PUs
and G
PUs, there really isn’t much reason to worry about having the necessary speed to generate
all the data needed when CPUs
generally run in the 3
.0
-
4
.0

gigahertz range and most modern
GPUs have over one thousand stream processors.

A stream processor is basically a single core
running around 800 megahertz that works in parallel with all other stream processors on the
GPU.

-

3
-


Noise

Algorithms

In order to generate something procedurally, an algorithm must be used. This is where

noise
becomes very important for

generating objects in the virtual world that aren’t being stored
directly on the hard drive. Noise comes in many different fl
avors, the two most popular b
eing
Perlin noise, and Voronoi n
oise. These are the two most widely used algorithms when attempting
to generate a virtual world procedurally. There is another popular type of noise know as fractal
noise, but this i
s simply a va
riation on Perlin n
oise
.

After the noise algorithm has been used, the values it generates are assigned to each


value in a






grid of vertices or mesh. This deforms the grid
,

creating a random, and realistic looking
terrain

involving low areas for bodies of water, and high areas that can represent mountains. This
is just one simple application of noise, but it is probably the most notable.

Perlin Noise

Perlin n
oise is the most widely used algorithm when creating a procedural

graph
ics engine. It
was invented by M
ath
/computer science

professor
Dr.
Ken Perlin

in 1985
.

Since the invention of
his noise algorithm, it has become the standard for computer graphics and movement.

The algorithm itself is actually quite simple. It
consists of taking pseudorandom numbers over
different amplitudes and frequencies and summing up all

iterations into one final
number
.
In one
dimension, this would be a simple line that looks most jagged as the frequency increases and
amplitude decreases o
ver increasing iterations. In two dimensions, this would start as several
large blots on an image

and

the
iteration
s following

having more

blots.

The noise function is
dependent on five

main values. The first is a random seed to be used with a
pseudorando
m number generator that returns values between 0 and 1.

The second is the
frequency of the plot over time.

The third is the amplitude of the plot. This determines the
maximum and minimum values that will be generated by the function. The fourth is the number
of octaves in the function, or how many times to iterate through. The fifth is the persistence. This
nu
mber changes the amplitude over time by multiplying the current amplitude of the iteration by
the persistence. A higher persistence will result in
smaller

amplitude
s

for each octave in the
algorithm.

Once the function finishes, it adds up all the octaves i
nto one final curve.

To do this
,

a smoothing
function is used which will interpolate the points dependent on all the other points around it. This
becomes far more complex when moving into two, and ev
en three dimensions, but

modern
hardware can be exploited

to speed through these
calculations and return an

interpolated point.

The basic smoothing equation for this interpolation has recently changed from








to











[
3].

Voronoi Noise

Voronoi noise comes from
a mathematics decomposition know as a Voronoi diagram. This type
of math was first used by
Descartes

in 1644
, and has continued to evolve since.

When using this
-

4
-


method for computer graphics, Lloyd’s algorithm is used. This algorithm starts with an initial
distribution of samples
,

and then compute
s

the noise image. The image is computed by creating
the Voronoi diagram of all the points, integrating each cell
,

computing the centroid of that cell,
and finally moving all points within the cell to its centroid. This gives the image a cellular look
to it with all cells being fairly random in terms of their size and distribution.

The Quadtree Structure

Up until thi
s point, procedural graphics may seem quite simple to create and manage. The
terrain, clouds, etc. made by this method are very convincing and completely remove the need to
store a mesh on the hard drive. However, there is a major downfall when generating
a virtual
world in this manner, its scalability.

When creating small worlds the methods mentioned thus
-
far will work very well, but if the wo
rld
w
ere to start getting large, the computer would eventually spend so much time trying to process
all of the vert
ices that it wou
ldn’t be able to maintain

sixty frames per second anymore. When
the application drops below sixty frames per second,
it is noticeable to
the human eye
.
The
solution to this problem is to use a quadtree structure for storing the vertex data
in the virtual
world.

This structure is used was used in the solutions for each different tech
nique popularly
used today [1]
[2][4][5][6].

A quadtree is basically a binary tree that has four nodes instead of two. This means we can take a
square mesh
, split
it into four nodes, split

each of those nodes into four more, and so on down the
tree. A word of caution: try not to store too much data in each node of the tree since the memory
requirement quadruples for each level of depth traveled down the tree.

There
are many uses for the quadtree, and all of them drastically increase the performance of the
application. The main uses are view frustum culling, level of detail, vertex reduction, grid
location, and collision detection. Now only some of these apply to proc
edural graphics and those
are frustum culling, level of detail, and vertex reduction. The others are just mentioned as proof
of how powerful a quadtree really is, and thes
e are not

even all the reasons for using them.

By using quadtrees, vertices can be re
moved or added based on the camera’s current position,
where it is looking, and the level of detail desired at different locations within the virtual world.
Efficiency in computers graphics is all about quantity. The more vertices there are currently
being

rendered, the more work the computer will have to do. Adding effects is surprisingly fast
work for a modern GPU since it can process over one thousand vertices in parallel at speed
s

around 1 gigahertz. The main point is that we want quality for our world and each vertex, not
quantity.

The following sections will go over some popular methods to remove and add vertices
based on the current state of the application. This makes the virtu
al world very dynamic, rather
than a single static mesh that consumes processing power.

The University of Massachusetts

presents several different applications of the quadtree, including the quadtree mesh
,

which is
what will be used for the rest of
th
e met
hods discussed in this

paper

[4]
.

-

5
-


View Frustum Culling

One of the many ta
sks a quadtree allows

is view frustum culling. When viewing a virtual world,
only certain areas of the world can be seen at any given snapshot in time.
The view frustum
refers to the
area currently viewable by the camera. This includes a near and far plane of viewing
making object
s

too close or too far away unseen by the camera.
Culling refers to the removal of
unwanted vertices in the meshes currently being rendered. By dividing the m
esh up in quads, the
application is able to traverse down the tree at run time and determine which quads are currently
visible to the camera’s view.

Below is a very good example of the result when using this
technique. As we can see, the entire upper left
quadrant of this terrain mesh has been removed
from the viewing area and not rendered by the graphics device.
Only the gray areas in Fig. 1
will
be sent to the graphics device for rendering
.



Figure
1
: View frustum culling with a quadtree.


-

6
-


Level of
Detail (LOD)

Level of detail is the process of adding and
adding/
removing quadtree nodes from the virtual
world based on the current state of the application.
At the very top level of a quadtree, there will
be a single parent node that can be traversed
down;

however, we may not need to traverse all the
way down this quadtree because only certain parts of it are important to the current view frustum
of the camera.

To solve this area selection issue
, as the world is being generated for the current
frame us
ing noise algorithms and other basic constants such as size and height of the grid,
additional quadtree nodes may be added to the mesh in areas much closer to the camera. This
results in having many more vertices near the camera and far less in the distanc
e. When
examined, the mesh will ap
pear to have

higher resolution squares that reduce in resolution the
farther from the camera they are.

[1], [2], and [6] all use a similar approach to create different
levels of detail in newer and more efficient manner
s
,
once again allowing for much larger worlds
to be rendered in even higher detail, while keep
ing

the system usage at a minimum.

This concept is where DirectX 11, more importantly its sub API Di
rect3D 11
,

makes a huge
difference

in performance and rendering c
apabilities. Starting from the beginning, the world may
start out as a flat plain off in the distance, which can be represented by only four vertices forming
two triangles for the square. As the camera moves closer, additional nodes will be generated
using

Direct3D 11’s geometry shader. This part of the graphics pipeline allows additional
vertices to be added to a set of vertices already on the GPU.
Using this concept, additional
vertices will be added depending

on the distance from the camera
to the curren
t

rendered vertex
.

Vertices farther from the camera will only traverse down the quad tree one or two times, if any.
A set of vertices right next to the camera will traverse several levels down the quadtree, greatly
increasing the detail seen by the camera
.

Now that the GPU is adding vertices based on the camera’s position and viewing direction in the
virtual world, we can start with a massive mesh containing only a few vertices, and add
additional vertices using a noise algorithm based on the current vertex
distance from the camera.
When completed all the areas viewable to the camera will have a different concentration of
vertices based on how far away from
the camera they are. Areas
nearby will have many vertices
,

while those areas off in the distance have f
ew since the viewer
would not be able to
see the detail
anyway. Now as the camera moves around, vertices will be added as seen fit by the LOD and
noise algorithms implemented on the GPU, removing a large portion of the work from the CPU
so it can spend its

time calculating other things such as shadows or artificial intelligence for the
game.

Vertex Interpolation Error

There is one side effect of using this technique known as the vertex interpolation error. This
occurs when the level of detail changes in the

quadtree mesh. When the camera moves closer to a
specific point in the virtual world, the GPU will add more vertices as the camera gets closer to
increase the LOD seen by the user. As those vertices are being added, some visual popping may
occur because t
he exact location of the new vertices may be slightly different than that of the
vertices they just replaced
. This problem has many solutions, but the easiest is to simply take the
midpoint of the new vertex and the pervious vertex it replaced, and place t
he new vertex at that
-

7
-


midpoint. Interpolation is then used to determine where to place the additional vertices that
increase the resolution of the mesh at this location since those vertices didn’t replace anything.

Texture Morphing

Another issue that may o
ccur when changing the level of detail for a mesh is the textures will not
line up with each other since they are being placed next to each other on vertices of different
resolutions. On one side of thi
s ‘line’ which can be seen in [2
] the vertices will be

at one level of
detail, and on the other side there will be twice as many vertices, so the texture will be repeated
twice as much with its tiling.

To counter this issue
,

we first have to change the texture sample rate based on the depth

of the
current mes
h in the quadtree
. Then a method known as texture morphing can be used to smooth
out the edges of the two different si
des to make them appear as though

they smoothly transition
from one LOD to the next without
t
he end user

noticing
.



Figure 2: Textures laid on two different LODs
in a terrain
showing the line creating by
disagreeing texture coordinates at the two different LODs.



Figure 3: The same snapshot of a terrain as in Figure 2, but with texture morphing to hide the
difference
in LOD.

-

8
-


Vertex Reduction

As discussed

previously, the total number of vertices rendered per frame is what slows down a
system. Having few vertices at high quality will result in a good looking virtual world that w
ill
not

stress the system
to the point wher
e it cannot
render what it is being sent.

To accomplish this,
the application should be removing redu
ndant vertices where they are not

needed. A great
example of this is a large flat area
with
in a terrain. If the application renders all vertices in the
base grid
, after LOD is applied

there may be extra data
that does not improve
the visual beauty
of the world, but rather stresses the system more. A large flat area really only needs four vertices
to make the square needed to draw this part of the terrain,

so the application removes all t
hose
unneeded
vertices before sending them to get rendered.

The basic idea behind this is to traverse down the quadtree and co
mpare the slopes of each vertex
vector to the others around them. If the application determines t
he slope of
a certain area is

flat
enough, it will alter the quadtree structure by remove any nodes that are

no
t needed

at this
specific area in the mesh.

This must be generalized to work in all areas of the terrain since a
steep slope can be flat, but the

application may not see it that way.

The entire idea here is to
remove redundant data without sacrificing any visual quality.

When done properly, up to 90% of
the original vertices can be removed, which will greatly improve the performance of the system
a
llowing it to work on other tasks such as shadows/artificial intelligence/etc.

Tessellation

Now with Direct3D 11, the programmer is able to alter how the graphics pipelinewill tessellate a
mesh
,

and send it to the output device in the end.

Simply, tessellation is adding more vertices to
an exi
sting mesh to increase its visual

detail.

The easiest way to accomplish this task is to add
new vertices to the midpoint of each existing triangle in the mesh. This will effectively double
the overall
detail of the area selected for tessellation since there are now twice as many vertices,
and this

will allow for bumpier, or

“sharper” terrain.

There are many other methods to do
tessellation, and some of them are quite elegant since they will increase the

detail seen by the end
user, but not reduce system performance to a noticeable level.

Tessellation is actually one of the first tasks to take place when rendering a set of vertices. The
graphics device will first receive all the data in the vertex shader

stage. After this it will send the
vertices to the hull shader and domain shader. These two shaders are where tessellation takes
place if the programmer decides to utilize this powerful feature of Direct3D 11. After that the
new sets of vertices are sent
to the geometry shader to add or remove any parts of the quadtree
dependent on the current state of the application. Finally
,

the data is rasterized and sent to the
pixel shader for coloring each individual pixel on the screen.
Rasterization is the process

of
taking a three dimensional representation, and placing it on a two dimensional screen.
This final
stage is where more effects will take place such as lighting and reflections.

Placing

the tessellation
step
so early in the graphics pipeline allows the g
eometry shader to add
to what the graphics device can already do very quickly if needed, or remove any vertices
because a flat area is being rendered
, and

only a few vertices are needed to

view the scene.

-

9
-


Conclusion

Procedural graphics may not
always be
what designers are looking for when creating a game or
some sort of graphical representation since it can be very difficult to create an algorithm for
premade art such as cities.

However, learning to use procedural methods for very large worlds
will allow
future game designers to render these virtual worlds with extreme detail, and not need
to run them on some sort of super computer that has multiple graphics devices.

By running through the method
s stated in this paper, a large
-
scale world can be rendered w
ith
relative ease on a lesser system.

First, view frustum culling will be used on the CPU to determine
what nodes of the mesh are even sent to the graphics device for rendering

and what level of
detail each of those nodes should be rendered at
. This will r
emove large amounts of data right
from the start.

Next
,

a noise algorithm will be used in the graphics device’s
vertex and
geometry
shader to generate new vertices or alter the heights of existing vertices to generate a realistic
terrain
.

After the generat
ion
of

the terrain, tessellation will be applied to increase detail in areas
desired by the application.
Now that a large mesh exists, the graphics device can remove an
y
unwanted vertices that are no
t adding to what is currently being viewed. Finally
,

textures can be
morphed to cover up any level of detail breaks
,

preventing the end user from ever noticing that
the distance has very little detail.

Using these methods has been proven to result in very efficient and highly detailed pieces of art.

Graphic
s programmers today have been utilizing procedural generation more and more since
they wish to keep their games on single DVDs. This means compressing data, or finding some
other way to store that same piece of data.
Storing algorithms take

up almost no sp
ace and are
able to generate the same visually stunning worlds, at half the graphics device power as needed
before.

References

[1]
Schneider, J., &Westermann, R. (2006).

Gpu
-
friendly high
-
quality terrain rendering
.,

Computer Graphics and Visualization Group, TechnischeUniversitätMünchen, Munich,
Germany.



[2]
Yusov, E., &Turlapov, V. (2007).

Gpu
-
optimized efficient quad
-
tree based progressive
multiresolution model for interactive large scale terrain rendering
., Depa
rtment of
Computational Mathematics and Cybernetics, Nizhny Novgorod State University, Nizhny
Novgorod, Russia.



[3]
Perlin, K. (2001).

Improving noise
.
, Media Research Laboratory, Dept. of Computer Science,
New York University , New York City, NY, .



[4]
Verts, W. T., & Hill, Jr., F. S. (1989).

Quadtree meshes
, COINS Department, ECE
Department, University of Massachusetts, Amherst, MA, .

-

10
-




[5]
Olsen, J. (2004).

Realtime procedural terrain generation
.,

Department of Mathematics And
Computer Science (IMADA), University of Southern Denmark, .



[6]
Bernhardt, A., Maximo, A., Velho, L., Hnaidi, H., &Cani, M. (2011).

Real
-
time terrain
modeling using cpu

gpu coupled computation
.,

INRIA, Grenoble Univ., Univ. Lyon 1, IMPA,
France, Brazil.