skillfulwolverineSoftware and s/w Development

Dec 2, 2013 (4 years and 5 months ago)


May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
Department of Computer Science,University of Kentucky,
One Quality Street,Suite 854,Lexington,KY 40507,USA
Driven by the need for interactive entertainment,modern PCs are equipped with
specialized graphics processors (GPUs) for creation and display of images.These
GPUs have become increasingly programmable,to the point that they now are
capable of efficiently executing a significant number of computational kernels from
non-graphical applications.In this introductory paper we first present a high-
level overview of modern graphics hardware’s architecture,then introduce several
applications in scientific computing that can be efficiently accelerated by GPUs.
Finally we list programming tools available for application development on GPUs.
As the mass-market emphasis in computing has shifted from word process-
ing and spreadsheets to interactive entertainment,computer hardware has
evolved to better support these new applications.Most of the performance-
limiting processing today involves creation and display of images;thus,a new
entity has appeared within most computer systems.Between the system’s
general-purpose processor and the video frame buffer,there is now a special-
ized Graphic Processing Unit (GPU).
Early GPUs were not really processors,but hardwired pipelines for each
of the most common rendering tasks.As more complex 3D-transformations
have become common in a wide range of applications,GPUs have become
increasingly programmable,to the point that they noware capable of efficiently
executing a significant number of computational kernels from non-graphical
A GPU is simpler and more efficient than a conventional PC processor
(CPU) because a GPU only needs to perform a relatively simple set of ar-
ray processing operations (but at a very high speed).Many problems in sci-
entific computing,such as physically-based simulation,information retrieval,
and data mining,can boil down to relatively simple matrix operations.This
characteristic makes these problems ideal candidates for GPU acceleration.
In this introductory paper we first present a high-level overview of modern
May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
graphics hardware’s architecture and its phenomenal development in recent
years.Then we introduce a large array of non-graphical computational tasks,in
particular,linear algebra operations,that have been successfully implemented
on GPUs and obtained significant performance improvements.Finally we list
programming tools available for application development on GPUs.Some of
themare designed to allow programming GPUs with familiar C-like constructs
and syntax,without worrying about the details of the hardware.They hold
the promise of bringing the vast computational power in GPUs to the broad
scientific computing community.
2.A Brief Overview of GPUs
In this section,we will explain the basic architecture of GPUs and the potential
advantages of using GPUs to solve scientific problems.
2.1.The Rendering Pipeline
GPUs are dedicated processors designed specifically to handle the intense com-
putational requirements of display graphics,i.e.,rendering texts or images over
30 frames per second.As depicted in Figure 1,a modern GPU can be ab-
stracted as a rendering pipeline for 3D computer graphics (2D graphics is just
a special case)
Fram e buffer
Geom etric
prim itives
Fragm ent
Figure 1.Rendering Pipeline
The inputs to the pipeline are geometric primitives,i.e.,points,lines,poly-
gons;and the output is the framebuffer–a two-dimensional array of pixels that
will be displayed on screen.
The first stage operates on geometric primitives described by vertices.In
this vertex-processing stage vertices are transformed and lit,and primitives are
clipped to a viewing volume in preparation for the next stage,rasterization.
The rasterizer produces a series of framebuffer addresses and color values,each
is called a fragment that represents a portion of a primitive that corresponds
to a pixel in the framebuffer.
Each fragment is fed to the next fragment processing stage before it finally
alters the framebuffer.Operations in this stage include texture mapping,depth
test,alpha blending,etc.
May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
2.2.Recent Trend in GPUs
Until a fewyears ago,commercial GPUs,such as the RealityEngine fromSGI
implement in hardware a fixed rendering pipeline with configurable parameters.
As a result their applications are restricted to graphical computations.Driven
by the market demand for better realism,the current generation of commercial
GPUs such as the NVIDIA GeForce FX
and the ATI Radeon 9800
significant programmable functionalities in both the vertex and the fragment
processing stage(stages with double-lines in Figure 1).They allow developers
to write a sequence of instructions to modify the vertex or fragment output.
These programs are directly executed on the GPUs to achieve comparable
performance to fixed-function GPUs.
In addition to programable functionalities in modern GPUs,their support
for floating point output has been improving.GPUs on the market today
support up to 32-bit floating point output.Such a precision is usable for many
diverse applications other than computer graphics.
Jul-98 Feb-99 Aug-99 Mar-00 Oct-00 Apr-01 Nov-01 May-02 Dec-02 Jun-03 Jan-04
Date Introduced
Spec int200 Benchmark
Millions of Triangles per Second
GeForce 256
Radeon 8500
Radeon 9800
P4 -3.2Ghz
Figure 2.A graph of performance increase over time for CPUs and GPUs.GPU perfor-
mance has increased at a faster rate than CPUs.(Data courtesy of Anselmo Lastra).
GPUs have also demonstrated a rapid improvement in performance during
the past few years.In Figure 2,we plot the performance increase of both
GPUs and commodity Central Processor Units (CPUs).Similar to the number
of integer operations per second for CPUs,a typical benchmark to gauge a
GPU’s performance is the number of triangles it can process every second.
We can see that GPUs have maintained a performance improvement rate of
approximately 3X/year,which exceeds the performance improvement of CPUs
at 1.6X/year.This is because CPUs are designed for low latency computations,
while GPUs are optimized for high throughput of vertices and fragments
Low latency on memory-intensive applications typically requires large caches,
May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
which use a large silicon area.Additional transistors are used to greater effect
in GPU architectures because they are applied to additional functional units
that increase throughput
3.Applications of GPUs for General-Purpose Computation
With the wide deployment of inexpensive yet powerful GPUs in the last several
years,we have seen a surge of experimental research in using GPUs for tasks
other than rendering.For example,Yang experimented with
using GPUs to solve computer visions problems
;Holzschuch and Alonso
to speed visibility queries
;Hoff compute generalized Voronoi
and proximity information
;and Lok to reconstruct an object’s
visual hull given live video frommultiple cameras
.Each of these applications
obtained significant performance improvements by exploiting the speed and the
inherent parallelism in modern graphics hardware.
For the scope of this paper,we introduce several representative approaches
to accelerate linear algebra operations on GPUs.
Larsen and McAllister present a technique for large matrix-matrix multi-
plies using low cost graphics hardware
.The method is an adaptation of
the technique from parallel computing of distributing the computation over a
logically cube-shaped lattice of processors and performing a portion of the com-
putation at each processor.Graphics hardware is specialized in a manner that
makes it well suited to this particular problem,giving faster results in some
cases than using a general-purpose processor.A more complete and up-to-date
implementation of dense matrix algebra is presented by Morav´anszky
The paper of Bolz et al.shows two basic,broadly useful,computational
kernels implemented on GPUs:a sparse matrix conjugate gradient solver,and
a regular-grid multigrid solver
.Performance analysis with realistic appli-
cations shows that a GPU-based implementation compares favorable over its
CPU counterpart.A similar framework for implementation of linear algebra
operators on GPUs is by Kr¨uger and Westermann
,which focuses on sparse
and banded matrices.
There are many other algorithms for scientific computing that have been
implemented on GPUs,including FFT
,level set
,and various types
of physically-based simulations
.Interested readers are referred to for other general-purpose applications on GPUs.
4.GPU Programming Languages
While many non-graphical applications on GPUs have obtained encouraging
results by exploiting GPU’s fast speed and high bandwidth,the development
process is not trivial.Many of the existing applications are written using
low level assemble languages that are directly executed on the GPU.There-
May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
fore,novice developers are faced with a steep learning curve to master a thor-
ough understanding of the graphics hardware and its programming interfaces,
namely OpenGL
and DirectX
Fortunately,this is rapidly changing with several high-level languages avail-
able.The first is Cg – a system for programming graphics hardware in a C-like
.It is,however,still a programming language geared towards ren-
dering tasks and tightly coupled with graphics hardware.
There are other high-level languages,such as Brook for GPUs and Sh,
which allow programming GPUs with familiar constructs and syntax,with-
out worrying about the details of the hardware.Brook extends C to include
simple data-parallel constructs,enabling the use of the GPU as a streaming
coprocessor.Sh is a metaprogramming language that offers the convenient
syntax of C++ and takes the burden of register allocation and other low-level
issues away from the programmer.While these languages are not fully mature
yet,they are the most promising ones to allow non-graphics researchers or
developers to tap into the vast computational power in GPUs.
The versatile programmability and improved floating-point precisions now
available in GPUs make them useful coprocessors for scientific computing.
Many non-trivial computational kernels have been successfully implemented
on GPUs to receive significant acceleration.As graphics hardware continues
to evolve at a faster speed than CPUs and more “user-friendly” high-level pro-
gramming languages are becoming available,we believe communities outside
computer graphics can also benefit from the fast processing speed and high
bandwidth that GPUs offer.We hope this introductory paper will encourage
further thinking along this direction.
The author would like to thank Hank Dietz for providing some of the materials
in this paper.This work is supported in part by fund fromthe office of research
at the University of Kentucky and Kentucky Science &Engineering Foundation
Ad´am Morav´anszky.Dense Matrix Algebra on the GPU.In Shaderx2:Shader
Programming Tips & Tricks With Directx 9.Wordware,2003.
2.K.Akeley.Realityengine graphics.In Proceedings of SIGGRAPH,1993.
3.ATI Technologies Inc.ATI Radeon 9800,2003.
4.Jeff Bolz,Ian Farmer,Eitan Grinspun,and Peter Schrder.Sparse Matrix Solvers
on the GPU:Conjugate Gradients and Multigrid.ACM Transactions on Graph-
ics (SIGGRAPH 2003),22(3),2003.
May 27,2004 18:11 Proceedings Trim Size:9in x 6in GPU-survey
5.M.Harris.Real-Time Cloud Simulation and Rendering.PhD thesis,Department
of Computer Science,Univ.of North Carolina at Chapel Hill,2003.
6.M.Harris,W.Baxter,T.Scheuermann,and A.Lastra.Simulation of Cloud
Dynamics on Graphics Hardware.In Proceedings of Graphics Hardware,pages
92 – 101,2002.
7.Nicolas Holzschuch and Laurent Alonso.Using graphics hardware to speed-up
visibility queries.Journal of Graphics Tools,5(2):33–47,2000.
8.Kenneth E.Hoff III,John Keyser,Ming C.Lin,Dinesh Manocha,and Tim Cul-
ver.Fast Computation of Generalized Voronoi Diagrams Using Graphics Hard-
ware.In Proceeding of SIGGRAPH 99,pages 277–286,August 1999.
9.Kenneth E.Hoff III,Andrew Zaferakis,Ming C.Lin,and Dinesh Manocha.
Fast and simple 2d geometric proximity queries using graphics hardware.In
2001 ACM Symposium on Interactive 3D Graphics,pages 145–148,March 2001.
ISBN 1-58113-292-1.
10.T.Kim and M.Lin.Visual Simulation of Ice Crystal Growth.In Proceedings
of ACM SIGGRAPH/Eurographics Symposium on Computer Animation 2003,
pages 92 – 101,2003.
11.Jens Krger and Rdiger Westermann.Linear Algebra Operators for GPU Im-
plementation of Numerical Algorithms.ACM Transactions on Graphics (SIG-
GRAPH 2003),22(3),2003.
12.E.Scott Larsen and David K.McAllister.Fast Matrix Multiplies using Graphics
Hardware.In Proceeding of Super Computer 2001,November 2001.
13.A.E.Lefohn,J.Kniss,C.Hansen,and R.T.Whitaker.Interactive Deformation
and Visualization of Level Set Surfaces Using Graphics Hardware.In Proceedings
of IEEE Visualization,2003.
14.E.Lindholm,M.Kilgard,and H.Moreton.AUser Programmable Vertex Engine.
In Proceedings of SIGGRAPH,pages 149–158,2001.
15.B.Lok.Online Model Reconstruction for Interactive Virtual Environments.In
Proceedings 2001 Symposium on Interactive 3D Graphics,pages 69–72,Chapel
Hill,North Carolina,March 2001.
17.K.Moreland and E.Angel.The FFT on a GPU.In SIGGRAPH/Eurographics
Workshop on Graphics Hardware 2003 Proceedings,pages 112–119,2003.
18.NVIDIA.Cg:C for Graphics,2002.
19.NVIDIA.GeForce FX,2003. desktop.html.
20.M.Segal and K.Akeley.The OpenGL Graphics System:ASpecification (Version
21.S.Tomov,M.McGuigan,R.Bennett,G.Smith,and J.Spiletic.Benchmarking and
Implementation of Probability-Based Simulations on Programmable Graphics
Cards.Computers & Graphics,2004.
22.R.Strzodka and M.Rumpf.Level set segmentation in graphics hardware.In
Proceedings of the International Conference on Image Processing,2001.
23.Ruigang Yang and Marc Pollefeys.Multi-Resolution Real-Time Stereo on Com-
modity Graphics Hardware.In Proceedings of Conference on Computer Vision
and Pattern Recognition (CVPR),pages 211–218,2003.
24.Ruigang Yang and Greg Welch.Fast Image Segmentation and Smoothing Us-
ing Commodity Graphics Hardware.Journal of Graphics Tools,special issue on
Hardware-Accelerated Rendering Techniques,7(4):91–100,2003.