Programming Massively Parallel Processors

pumpedlessΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 15 μέρες)

107 εμφανίσεις

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

1

Programming Massively Parallel
Processors




Chapter 2:

GPU Computing History

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

2

Host

Vertex Control

Vertex

Cache

VS/T&L

Triangle Setup

Raster

Shader

ROP

FBI

Texture

Cache

Frame

Buffer

Memory

CPU

GPU

Host Interface

A Fixed Function
GPU Pipeline

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

3

Texture mapping example: painting a world map
texture image onto a globe object.

Texture Mapping Example

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

4

3D Application

or Game

3D API:

OpenGL or
Direct3D

Programmable

Vertex

Processor

Primitive

Assembly

Rasterization &
Interpolation

3D API
Commands

Transformed
Vertices

Assembled
Polygons,
Lines, and
Points

GPU
Command &
Data Stream

Programmable

Fragment

Processor

Rasterized

Pre
-
transformed

Fragments

Transformed

Fragments

Raster

Operation
s

Framebuffer

Pixel
Updates

GPU

Front
End

Pre
-
transformed
Vertices

Vertex Index
Stream

Pixel
Location
Stream

CPU


GPU Boundary

CPU

GPU

An example of separate vertex processor and fragment processor in
a programmable graphics pipeline

Programmable Vertex and Pixel Processors

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

5

L2

FB

SP

SP

L1

TF

Thread Processor

Vtx Thread Issue

Setup / Rstr / ZCull

Geom Thread Issue

Pixel Thread Issue

Data Assembler

Host

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Unified Graphics Pipeline

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE 498AL, University of Illinois, Urbana
-
Champaign

6

CUDA


General Purpose Computation using GPU


© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE 498AL, University of Illinois, Urbana
-
Champaign

7

What is (Historical) GPGPU ?


General Purpose computation using GPU and graphics API in
applications other than 3D graphics


GPU accelerates critical path of application



Data parallel algorithms leverage GPU attributes


Large data arrays, streaming throughput


Fine
-
grain SIMD parallelism


Low
-
latency floating point (FP) computation


Applications


see //GPGPU.org


Game effects (FX) physics, image processing


Physical modeling, computational engineering, matrix algebra,
convolution, correlation, sorting

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE408, University of Illinois, Urbana
-
Champaign

8

Input Registers

Fragment Program



Output Registers

Constants

Texture

Temp Registers

per thread

per Shader

per Context


FB Memory

The restricted input and output capabilities of a shader programming model.

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE 498AL, University of Illinois, Urbana
-
Champaign

9

Previous GPGPU Constraints


Dealing with graphics API


Working with the corner cases of the
graphics API


Addressing modes


Limited texture size/dimension


Shader capabilities


Limited outputs


Instruction sets


Lack of Integer & bit ops


Communication limited


Between pixels


Scatter a[i] = p

Input Registers

Fragment Program



Output Registers

Constants

Texture

Temp Registers

per thread

per Shader

per Context


FB Memory

© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE 498AL, University of Illinois, Urbana
-
Champaign

10

CUDA


“Compute Unified Device Architecture”


General purpose programming model


User kicks off batches of threads on the GPU


GPU = dedicated super
-
threaded, massively data parallel co
-
processor


Targeted software stack


Compute oriented drivers, language, and tools


Driver for loading computation programs into GPU


Standalone Driver
-

Optimized for computation


Interface designed for compute


graphics
-
free API


Data sharing with OpenGL buffer objects


Guaranteed maximum download & readback speeds


Explicit GPU memory management


© David Kirk/NVIDIA and Wen
-
mei W. Hwu, 2007
-
2010

ECE 498AL, University of Illinois, Urbana
-
Champaign

11

G80 CUDA mode


A
Device
Example


Processors execute computing threads


New operating mode/HW interface for computing

Load/store

Global Memory

Thread Execution Manager

Input Assembler

Host

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Parallel Data

Cache

Load/store

Load/store

Load/store

Load/store

Load/store