Some terms GPU evolution Historical PC architecture - University of ...

pumpedlessΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

88 εμφανίσεις

CS/CoE1541 Introduction to Computer Architecture
Graphics and Computing GPUs
Sangyeun
Cho
Sangyeun
Cho
Dept. of Computer Science
University of Pittsburgh
Some terms

GPU = graphics processing unit

Integrates2D/3Dgraphics,images,andvideothatenablewindow
-
Integrates

2D/3D

graphics,

images,

and

video

that

enable

window
based OSes, GUIs, video games, visual imaging applications, and video

Visualcomputing

Visual

computing

A mix of graphics processing and computing that lets you visually
interact with computed objects via graphics, images, and video

Heterogeneous system

A system combining different processor types; a PC is a heterogeneous
CPU
GPU
CPU
-
GPU
system
University of Pittsburgh
GPU evolution

VGA in early 90’s

Amemorycontrolleranddisplaygeneratorconnectedtosome(video)
A

memory

controller

and

display

generator

connected

to

some

(video)

RAM

By 1997, VGA controllers were incorporating some 3D
accelerationfunctions
acceleration

functions

In 2000, a single chip graphics processor incorporated almost
every detail of the traditional high-end workstation graphics
pipeline(1
st
generationGPUs)
pipeline

(1
st
generation

GPUs)

More recentl
y
,
p
rocessor instructions and memor
y
hardware
ypy
were added to support general-purpose programming
languages

Hardware has evolved to include double-
p
recision floatin
g
-
p
oint
University of Pittsburgh
pg
p
operations and massive parallel programmable processors
Historical PC architecture
University of Pittsburgh
Contemporary PC architecture
University of Pittsburgh
Contemporary PC architecture
University of Pittsburgh
More terms

OpenGL

Astandardspecificationdefiningacross
-
language,cross
-
platformAPI
A

standard

specification

defining

a

cross
language,

cross
platform

API

for writing applications that produce 2D and 3D computer graphics

DirectX

DirectX

(Microsoft) A collection of APIs for handling tasks related to multimedia,
especially game programming and video

CUDA (compute unified device architecture)

(nVIDIA) A scalable parallel programming model and language based
C/CiilllilffGPUdli
on
C/C
++;
i
t
i
s a para
ll
e
l
programm
i
ng p
l
at
f
orm
f
or
GPU
s an
d
mu
l
t
i
core
CPUs
University of Pittsburgh
Graphics “logical” pipeline

Input assemble
r
collects vertices and primitives

Vertex shaderexecutes per-vertex processing, e.g.,
transforming the vertex 3D position into a screen position,
lighting the vertex to determine its colo
r

Geometry shaderexecutes per-primitive processing

Setu
p
/rasterize
r
g
enerates
p
ixel fra
g
ments that are covered
p
gpg
by a geometric primitive

Pixel shaderperforms per-fragment processing, e.g.,
inter
p
olatin
g

p
e
r
-fra
g
ment
p
arameters
,
texturin
g,
and colorin
g;

pgp
gp,g,g;
it makes extensive use of sampled and filtered lookups into
large 1D, 2D, or 3D arrays called textures

Raster o
p
erations
p
rocessin
g
sta
g
e
p
erforms Z-buffer de
p
th
University of Pittsburgh
ppgg
p
p
testing and stencil testing
Graphics “logical” pipeline
“fixed” hardware functions“programmable” functions
Various objects and buffers are allocated in the GPU memory hierarchy
University of Pittsburgh
Basic unified GPU architecture
University of Pittsburgh
Pixel shaderexample
// called for each pixel thread
idflti(
vo
id
re
fl
ec
ti
on
(
float2texCoord: TEXCOORD0,
float3reflection_dir: TEXCOORD1,
out float4color
:
COLOR,
uniform float shiny,
uniform sampler2D surfaceMap,
uniform samplerCUBEenvMap)
{
{
// fetch the surface color from a texture
float4 surfaceColor= tex2D(surfaceMap, texCoord);
// fetch reflected color by sampling a cube map
float4 reflectedColor= texCUBE(envMap, reflection_dir);
// output is weighted average of the two colors
color = lerp(surfaceColor, reflectedColor, shiny);
}
University of Pittsburgh
}
CUDA

Developed by nVIDIAin 2007

An data-parallel extension to the C/C++ languages for
scalable parallel programming of manycoreGPUs and
ltiCPU
mu
lti
core
CPU
s

CUDA
p
rovides three ke
y
abstractions

a hierarch
y
of thread
py
y
groups, shared memories, and barrier synchronization
Thildlti

Th
e programmer or comp
il
er
d
ecomposes
l
arge compu
ti
ng
problems into many small problems that can be solved in
parallel
University of Pittsburgh
Decomposition of result data
University of Pittsburgh
Nested levels and memory
University of Pittsburgh
Core count independence
University of Pittsburgh
Restrictions

Threads and thread blocks may only be created by invoking a
parallelkernel,notfromwithinaparallelkernel
parallel

kernel,

not

from

within

a

parallel

kernel

Thread blocks must be independent (no scheduling/ordering
requirement)

Theabovetworestrictionsallowanefficienthardwaremanagementand

The

above

two

restrictions

allow

an

efficient

hardware

management

and

scheduling of threads and thread blocks

Recursive function calls are not allowed
CUDAtdtdltbtht

CUDA
programs mus
t
copy
d
a
t
a an
d
resu
lt
s
b
e
t
ween
h
os
t

memory and device memory

DMA block transfer minimizes the overhead of CPU-GPU data transfer

Compute intensive problems amortize the data transfer overheads
University of Pittsburgh