Evolution of the Graphics Process Units

birdsowlΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

73 εμφανίσεις

Evolution of the Graphics
Process Units

Dr. Zhijie Xu

z.xu@hud.ac.uk

A few words about me


Research Interests: Simulation


VR


Game


(backing to) VR and CG.


Fascinated by the R&D in CG and
progresses on rendering devices


Retina display and brain wave
control

Outline


History of the GPUs


Process Paradigm and Programming
Model


Current Research Hotspots


Future Trend

Forewords from the IEEE

Visualization
2005 Conference


Desktop computer architecture is at a turning point.
In the last two years, CPU speeds have nearly
stopped increasing and all major CPU manufacturers
have announced multi
-
core, parallel processors.


Future performance improvements will
predominantly come from parallelism rather than
from an ever
-
increasing uni
-
processor clock speed.


Commodity graphics processors (GPUs), in contrast,
already contain many parallel processing units and
are capable of sustaining computation rates greater
than ten times that of a modern CPU. The GPU
programming model, however, is very different from
traditional CPU models.

What is a GPU?

“A
Graphics Processing
Unit

or
GPU

(also
occasionally called
Visual
Processing Unit

or
VPU
) is
a dedicated graphics
rendering device for a
personal computer or game
console. Modern GPUs are
very efficient at
manipulating and displaying
computer graphics, and their
highly
-
parallel structure
makes them more effective
than typical CPUs for a range
of complex algorithms.”


-

Definition from
wikipedia.org


Radeon 9800 Pro

History of GPUs


The pre
-
GPU era


VGAs in the 80s


4 (or even 5) generations of GPUs
in the last decade


Fixed functions vs. programmability


API support


OpenGL, Direct3D (v6.0 to v9.0)


Shader models (v1.0


v3.0)

History of GPUs


generations in the
function’s term


First
-
Generation GPUs


Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s
Voodoo3;DX6 feature set.


Second
-
Generation GPUs


1999
-
2000; Nvidia’s GeForce256 and GeForce2, ATi’s
Radeon7500, and S3’s Savage3D; T&L; OpenGL and
DX7;Configurable.


Third
-
Generation GPUs


2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB,
DX7/8; Vertex Programmability + ASM


Fourth
-
Generation GPUs


2002 onwards; GeForce FX family, Radeon 9700;
OpenGL+extensions, DX9; Vertex/Pixel Programability +
HLSL; 0.13
μ

Process, 125M T/C, 200M T/S.


I have just seen Radeon X1900 last Thursday


History of GPUs
-

generations in the
stream processing’s term


Pre
-
NV2x: no explicit support for stream processing.
Kernel operations are usually hidden in the API and
provide too little flexibility for general use.


NV2x: kernel stream operations are now explicitly under
the programmer's control but only for vertex processing
(fragments are still using old paradigms). No branching
support severely hampers flexibility but some algorithms
can be run (notably, low
-
precision fluid simulation).


RD3xx: increased performance and precision with limited
support for branching/looping in both vertex and fragment
processing. The model is now flexible enough to cover
many purposes.


NV4x: Very flexible branching support although some
limitations still exists on the number of operations to be
executed and strict recursion depth. Performance is
estimated to be from 20 to 44GFLOPs.

What GPUs are capable of?

Why shifting from CPU to GPU?


Why not just keep increasing the CPU
speed and leave the GPU to handle what
is its best?


CPU speed is reaching a bottle neck (how
many transistors can be integrated on a
chip)


Solution, in the future, nano technology, short
term, dual core machines (double CPUs),
clustered CPUs, …, even grid computing and
supercomputing


GPU facing the same problem, but still have
space to press on due to its task specific
designs and parallelism paradigm

Hungers for More Computational
Powers


volume, speed, accuracy


Supercomputing (parallel computing)


Applications, particle dynamics, network analysis,
finite element analysis, ocean tide analysis,
virtual universe simulation, airplane design, other
military simulation, etc.


Japanese Earth Simulator, champion of 2002
(5120 NEC CPUs)


IBM Blue Gene winner in 2005 (65536 Duel
-
core
PowerPC CPUs)


What’s missing in the formula?
-

COST


Process Paradigm and Programming
Model


Real
-
time computer graphics hardware is
transiting from supporting a few fixed algorithms
to being fully programmable. At the same time,
the performance of graphics processors (GPUs) is
increasing at a rapid rate because GPUs can
effectively exploit the enormous parallelism
available in graphics computations.


These improvements in GPU flexibility and
performance are likely to continue in the future,
and will allow developers to write increasingly
sophisticated and diverse programs that execute
on the GPU.

From Sequential to Parallel Paradigm


Conventional, sequential
paradigm


for(int i = 0; i < 100 * 4; i++)

result[i] = source0[i] + source1[i];



Parallel SIMD paradigm, packed
registers


for(int el = 0; el < 100; el++)

vector_sum(result[el], source0[el],
source1[el]);



Parallel Stream paradigm
(SIMD/MIMD)


streamElements 100

streamElementFormat 4 numbers

elementKernel "@arg0+@arg1"

result = kernel(source0, source1)


Stream processing

is a
relatively new, yet very
successful paradigm to
allow parallel processing
at never
-
seen
-
before
efficiency with minimal
effort. Compared to
existing architectures,
stream processors are
able to provide up to 20X
the performance at the
same power dissipation
and die size.

GPU Rendering Pipeline

Source nVidia



Vertex Shader Introduction


Data Flow in the Pipeline


A scene description:
vertices, triangles, colors,
lighting


Transformations that map
the scene to a camera
viewpoint


“Effects”: texturing, shadow
mapping, lighting
calculations


Rasterizing: converting
geometry into pixels


Pixel processing: depth
tests, stencil tests, and
other per
-
pixel operations.

The Motivation for High Level Languages


Graphics hardware has
become increasingly
more powerful


Programming powerful
hardware with
assembly code is hard


GeForce FX supports
programs more than
1,000 assembly
instructions long


Programmers need the
benefits of a high
-
level
language:


Easier
programming


Easier code reuse


Easier debugging

Assembly




DP3 R0, c[11].xyzx, c[11].xyzx;

RSQ R0, R0.x;

MUL R0, R0.x, c[11].xyzx;

MOV R1, c[3];

MUL R1, R1.x, c[0].xyzx;

DP3 R2, R1.xyzx, R1.xyzx;

RSQ R2, R2.x;

MUL R1, R2.x, R1.xyzx;

ADD R2, R0.xyzx, R1.xyzx;

DP3 R3, R2.xyzx, R2.xyzx;

RSQ R3, R3.x;

MUL R2, R3.x, R2.xyzx;

DP3 R2, R1.xyzx, R2.xyzx;

MAX R2, c[3].z, R2.x;

MOV R2.z, c[3].y;

MOV R2.w, c[3].y;

LIT R2, R2;

...

HLL

float4

cSpecular =
pow
(
max
(0,
dot
(Nf, H)),
phongExp).xxx;

float4

cPlastic = Cd * (cAmbient + cDiffuse)
+ Cs * cSpecular;

GPU Programming


Game Applications:


Per
-
pixel lighting


Vertex displacement


Furs and Shines (ATi demos)


Various Shading Models (Treasure box
and RenderMonkey)


Bump map creation and the virtual
earth

One more reason to have a decent
Graphics Card with a decent GPU
mounted ..


Microsoft Windows Vista Operating System


To be released at the end of this year


Aero glass 3D interface


More than half of all PCs (more than 63% of
203million PCs) won’t support it because the
integrated graphics adaptor only support
Windows2000 and WindowsXP’s 2D interface


Aero Glass is part of the Vista’s interface


Aero,
which requires the graphics card to support
DirectX9.0c, for example, Nvidia GeForce5900


In 2005, there were over 22.3 million standalone
graphics cards (market value over 10 billion dollars)
sold globally, in which more than 72% (13.4
million) can support Aero Glass


Microsoft announced last week, the next big game
title released


Ring II


will only run on Vista


Vista causes legal battles with PC manufacturers

Non
-
Game Applications: GPGPU


Recent advances in programmability and
architectural design have enabled the use
of GPU processors for general purpose
computation.


Applications in:


Linear algebra


Geometric Computing


Database and Stream Mining


GPU

Ray Tracing


Advanced Image Processing


Computational Fluid dynamics (CFD) and
Finite Element Analysis

Problems Need to be Solved


Significant barriers exist for the developer
who wishes to use the inexpensive power
of commodity graphics hardware, whether
for in
-
game simulation of physics or for
conventional computational science.


These chips are designed for and driven
by video game development; the
programming model is unusual, the
programming environment is tightly
constrained, and the underlying
architectures are largely secret.

Potential Research Areas


GPGPU Building Blocks


Mapping computational concepts to the GPU


Linear algebra


Sorting and searching


Geometric Computing


High
-
level Languages and Debugging Tools


Computational Building blocks



Math: Linear Algebra, Finite Difference, Finite
Element


General Algorithms: Searching, Sorting, etc.

Progress on GPGPU


GPGPU Programming Library


GLIT, Accelerator


Increased pressure on manufacturers
from "GPGPU users" to improve hardware
design, usually focusing on adding more
flexibility to the programming model.

Summary


The graphics processor (GPU) on today's commodity video
cards has evolved into an extremely powerful and flexible
processor.


The latest graphics architectures provide tremendous memory
bandwidth and computational horsepower, with fully
programmable vertex and pixel processing units that support
vector operations up to full IEEE floating point precision.


High level languages have emerged for graphics hardware,
making this computational power accessible. Architecturally,
GPUs are highly parallel streaming processors optimized for
vector operations, with both MIMD (vertex) and SIMD (pixel)
pipelines.


GPUs are capable of general
-
purpose computation beyond the
graphics applications for which they were designed. But
application programming barriers need to be taken down.



Questions