Brook for GPUs

coleslawokraΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

90 εμφανίσεις

Brook for GPUs

Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman

Pat Hanrahan

February 10th, 2003

February 11th, 2004

2

Brook
:
general purpose
streaming

language


developed for PCA Program/Merrimac



compiler: RStream


Reservoir Labs



DARPA PCA Program


Stanford: SmartMemories


UT Austin: TRIPS


MIT: RAW



Brook version 0.2 spec: http://merrimac.stanford.edu


Brook for GPUs: http://brook.sourceforce.net

Stream

Execution Unit

Stream

Register File

Memory

System

Network

Interface

Scalar

Execution

Unit

text

text

DRDRAM

Network

February 11th, 2004

3

Brook
:
general purpose
streaming

language


stream programming model


enforce data parallel computing


streams


encourage arithmetic intensity


kernels



C with streams

February 11th, 2004

4

Brook for gpus


demonstrate gpu streaming coprocessor


make programming gpus easier


hide texture/pbuffer data management


hide graphics based constructs in CG/HLSL


hide rendering passes


virtualize resources


performance!


… on applications that matter


highlight gpu areas for improvement


features required general purpose stream
computing

February 11th, 2004

5

system outline


.br


Brook source files


brcc

source to source compiler


brt

Brook run
-
time library

February 11th, 2004

6

Brook language

streams


streams


collection of records requiring similar computation


particle positions, voxels, FEM cell, …


float3 positions<200>;

float3 velocityfield<100,100,100>;




encourage data parallelism


February 11th, 2004

7

Brook language

kernels


kernels


functions applied to streams


similar to for_all construct


kernel void foo (float a<>, float b<>,





out float result<>) {


result = a + b;

}


float a<100>;

float b<100>;

float c<100>;


foo(a,b,c);

for (i=0; i<100; i++)


c[i] = a[i]+b[i];



no dependencies between stream elements



encourage high arithmetic intensity



February 11th, 2004

8

Brook language

kernels


Ray Triangle Intersection


kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[],


RayState oldraystate<>,


GridTrilist trilist[],


out Hit candidatehit<>) {


float idx, det, inv_det;


float3 edge1, edge2, pvec, tvec, qvec;


if(oldraystate.state.y > 0) {


idx = trilist[oldraystate.state.w].trinum;


edge1 = tris[idx].v1
-

tris[idx].v0;


edge2 = tris[idx].v2
-

tris[idx].v0;


pvec = cross(ray.d, edge2);


det = dot(edge1, pvec);


inv_det = 1.0f/det;


tvec = ray.o
-

tris[idx].v0;


candidatehit.data.y = dot( tvec, pvec ) * inv_det;


qvec = cross( tvec, edge1 );


candidatehit.data.z = dot( ray.d, qvec ) * inv_det;


candidatehit.data.x = dot( edge2, qvec ) * inv_det;


candidatehit.data.w = idx;


} else {


candidatehit.data = float4(0,0,0,
-
1);


}

}


February 11th, 2004

9

Brook language

additional features


reductions


scalar


stream


stride & repeat


GatherOp & ScatterOp


a[i] += p


p = a[i]++

February 11th, 2004

10

brcc compiler

infrastructure


based on ctool


http://ctool.sourceforge.net


parser


build code tree


extend C grammar to accept Brook


convert


tree transformations


codegen


generate cg & hlsl code


call cgc, fxc


generate stub function

February 11th, 2004

11

Applications

Ray
-
tracer

FFT

Segmentation

Linear Algebra:


BLAS, LINPACK, LAPACK


February 11th, 2004

12

Brook Performance

February 11th, 2004

13

GPU Gotchas

Time

Registers Used

February 11th, 2004

14

GPU Gotchas

NVIDIA NV3x: Register usage vs. Time


Time

Registers Used

February 11th, 2004

15

GPU Gotchas

NVIDIA:


Register Penalty


Render to Texture Limitation


Requires explicit copy or heavy pbuffer
solution


Superbuffer extension needed

http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf



February 11th, 2004

16

GPU Gotchas

ATI Radeon 9800 Pro


Limited dependent
texture lookup


96 instructions


24
-
bit floating point


s16e7

Integers up to 131,072

(s23e8: 16,777,216)


Memory Refs

Math Ops

Memory Refs

Math Ops

Memory Refs

Math Ops

Memory Refs

Math Ops

1

2

3

4

February 11th, 2004

17

GPU Catch
-
Up!


Integer & Bit Ops & Double Precision


Memory Addressing


CGC/FXC Performance


Hand code performance critical code


No native reduction support


No native scatter support


p[i] = a (indirect write)


No programmable blend


GatherOp / ScatterOp


Limited 4x4 output


Brook virtualized kernel outputs


Readback still slow


NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback


ATI DirectX: 550 MB/sec Download 50 MB/sec Readback

February 11th, 2004

18

GPUs of the future (we hope)


Complete Instruction Sets


Integers, Bit Ops, Doubles, Mem Access


Integration


Streaming coprocessor not just a rendering device


Streaming architectures


SDRAM

SDRAM

SDRAM

SDRAM

Stream

Register File

ALU Cluster

ALU Cluster

ALU Cluster

February 11th, 2004

19

Brook for GPUs


Release v0.3 available on Sourceforge


Project Page


http://graphics.stanford.edu/projects/brook


Source


http://www.sourceforge.net/projects/brook


Over 4K downloads!


Questions?

Fly
-
fishing fly images from
The English Fly Fishing Shop