GPU-Programming

ruralrompSoftware and s/w Development

Dec 2, 2013 (3 years and 6 months ago)

79 views

GPU Programming

GPU Programming

GPU Programming Model

Data

in computer graphics
or in

scientific computing
naturally
leads its self to
stream programming
.
Stream data is a collection of records requiring similar manipulation or computation. The set of
commands applied to this data is referred to as a kernel. Kernels are typically implemented at the
fragment processing stage in the GPU pipeline. In the GPU, data is manipulated as vertices and textures.
The GPU is a SIMD processor
with multiple pipeli
nes that

can handle a large amount of arithmetic
operations. This allows the GPU to hide the memory latency.

CPU
-
GPU Concepts

Most visualization and scientific computing applications can be implemented as a stream programming
model. GPU programming is in
herently tied to graphics concepts such as drawing triangles and pixels.
We will use a cloud simulation example to compare the method of programming in the CPU and the
GPU.

T
he figure below,
shows that
the algorithm consist
s

of
several steps that contin
uously update the
computational grid in which the phenomenon is being solved.


1.


On the CPU,
stream

and data arrays

can be modeled as
textures

on the GPU. A texture is a 2D
array of pixel information such as color and location.

Reading data from an array

on the CPU is
equivalent to sampling a texture in the GPU.

2.

Body loop steps or computational kernels

on the CPU are equivalent to
fragment

programs

on
the GPU.

3.

Rendering a texture

is equivalent to writing back data to an
array or stream
.

4.

To invoke a comp
utation, you simply trigger the drawing of the geometry .

In summary, rasterization is equivalent to kernel invocation and texture coordinates are equivalent to a
computational domain. A computation invocation on the CPU amounts to drawing a quad
on

the G
PU.

Toolkits and Languages



High level shading languages
:

this languages allow the programmer to
manipulate the vertex and
fragment processors.



Cg: C for Graphics




HLSL: The D3D Shading Language




The OpenGL Shading Language




GP
GPU Languages
:
maps high level concepts used in computer graphics and scientific computing to
simpler commands for the benefit of users with limited graphics background. Such constructs
include matrix algebra and ray tracing steps.



Sh

-

University of Waterloo




Brook
-

Stanford
University




CUDA SDK



Includes a C compiler and ma
ny
libraries
.


Sample Code

The following code illustrates the previous GPU programming concepts through a real
-
time edge
detection.

We ran this code on an NVidia
Quadro FX1100
.


Edge Dete
ction

//
---------------------------------------------------------------------------

// www.GPGPU.org

// Sample Code

//
---------------------------------------------------------------------------

// Copyright (c) 2004 Mark J. Harris and GPGPU.org

//
---------------------------------------------------------------------------

// This software is provided 'as
-
is', without any express or implied

// warranty. In no event will the authors be held liable f
or any

// damages arising from the use of this software.

//

// Permission is granted to anyone to use this software for any

// purpose, including commercial applications, and to alter it and

// redistribute it freely, subject to the following restrictions:

//

// 1. The origin of this software must not be misrepresented; you

// must not claim that you wrote the original software. If you use

// this software in a product, an acknowledgment in the product

// documentation would be appreciated but is n
ot required.

//

// 2. Altered source versions must be plainly marked as such, and

// must not be misrepresented as being the original software.

//

// 3. This notice may not be removed or altered from any source

// distribution.

//

//
-----------------
----------------------------------------------------------

// Author: Mark Harris (harrism@gpgpu.org)

//
---------------------------------------------------------------------------

// GPGPU Lesson 0: "helloGPGPU".

//
-----------------------------------------
----------------------------------

//

// GPGPU CONCEPTS Introduced:

//

// 1.) Texture = Array

// 2.) Fragment Program = Computational Kernel.

// 3.) One
-
to
-
one Pixel to Texel Mapping:

// a) Data
-
Dimensioned Viewport, and

//

b) Orthographic Projection.

// 4.) Viewport
-
Sized Quad = Data Stream Generator.

// 5.) Copy To Texture = feedback.

//

// For details of each of these concepts, see the explanations in the

// inline "GPGPU CONCEPT" comments in the co
de below.

//

// APPLICATION Demonstrated: A simple post
-
process edge detection filter.

//

//
---------------------------------------------------------------------------

#include

<stdio.h>

#include

<assert.h>

#include

<stdlib.h>

#define

GLEW_STATIC 1

#inclu
de

<gl/glut.h>

#include

<cg/cgGL.h>


// forward declarations

class

HelloGPGPU;

void

reshape(
int

w,
int

h);


// globals

CGcontext g_cgContext;

CGprofile g_cgProfile;

HelloGPGPU *g_pHello;


// This shader performs a 9
-
tap Laplacian edge detection filter
.

static

const

char

*edgeFragSource =

"half4 edges(half2 coords : TEX0,
\
n"

" uniform sampler2D texture) : COLOR
\
n"

"{
\
n"

"

static const half offset = 1.0 / 512.0;
\
n"


" half4 c = tex2D(texture, coords);
\
n"

" half4 bl = tex2D(texture, coords + half2(
-
offset,
-
offset));
\
n"

" half4 l = tex2D(texture, coo
rds + half2(
-
offset, 0));
\
n"

" half4 tl = tex2D(texture, coords + half2(
-
offset, offset));
\
n"

" half4 t = tex2D(texture, coords + half2( 0, offset));
\
n"

" half4 ur = tex2D(texture, coords + half2( offset, offset));
\
n"

"

half4 r = tex2D(texture, coords + half2( offset, 0));
\
n"

" half4 br = tex2D(texture, coords + half2( offset, offset));
\
n"

" half4 b = tex2D(texture, coords + half2( 0,
-
offset));
\
n"

" // scale by 8 to brighten the edges

\
n"

" return 8 * (c +
-
0.125 * (bl + l + tl + t + ur + r + br + b));
\
n"

"}
\
n"
;


// This class encapsulates all of the GPGPU functionality of the example.

class

HelloGPGPU

{

public
:
// methods


HelloGPGPU(
int

w,
int

h)


: _rAngle(0),


_iWidth(w),


_iHeight(h)


{


// Create a simple 2D texture. This example does not use


// render to texture
--

it just copies from the framebuffer
to the


// texture.



// GPGPU CONCEPT 1: Texture = Array.


// Textures are the GPGPU equivalent of arrays in standard


// computation. Here we allocate a texture large enough to fit our


// data (which is arbitrary in t
his example).


glGenTextures(1, &_iTexture);


glBindTexture(GL_TEXTURE_2D, _iTexture);


glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);


glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);



glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);


glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);


glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, _iWidth, _iHeight,


0, GL_RGB, GL_FLOAT, 0);




// GPGPU CONCEPT 2: Fragment Program = Computational Kernel.


// A fragment program can be thought of as a small computational


// kernel that is applied in parallel to many fragments


// simultaneously. Here we load a kerne
l that performs an edge


// detection filter on an image.



// Create the edge detection fragment program


_fragmentProgram = cgCreateProgram(g_cgContext, CG_SOURCE,


edgeFragSource, g_cgProfi
le,


"edges"
, NULL);



// Create the texture parameter for the fragment program


if
(_fragmentProgram != NULL)


{


cgGLLoadProgram(_fragmentProgram);


_textureParam = cgGet
NamedParameter(_fragmentProgram,
"texture"
);


}


}



~HelloGPGPU()


{


cgDestroyProgram(_fragmentProgram);


}



// This method updates the texture by rendering the geometry (a teapot


// and 3 rotating tori) and copying the im
age to a texture.


// It then renders a second pass using the texture as input to an edge


// detection filter. It copies the results of the filter to the texture.


// The texture is used in HelloGPGPU::display() for displaying the


// res
ults.


void

update()


{


_rAngle += 0.5f;



// store the window viewport dimensions so we can reset them,


// and set the viewport to the dimensions of our texture


int

vp[4];


glGetIntegerv(GL_VIEWPORT, vp);



// GPGPU CONCEPT 3a: One
-
to
-
one Pixel to Texel Mapping: A Data
-


// Dimensioned Viewport.


// We need a one
-
to
-
one mapping of pixels to texels in order to


// ensure every element of our texture is processe
d. By setting our


// viewport to the dimensions of our destination texture and drawing


// a screen
-
sized quad (see below), we ensure that every pixel of our


// texel is generated and processed in the fragment program.


glV
iewport(0, 0, _iWidth, _iHeight);




// Render a teapot and 3 tori


glClear(GL_COLOR_BUFFER_BIT);


glMatrixMode(GL_MODELVIEW);


glPushMatrix();


glRotatef(
-
_rAngle, 0, 1, 0.25);


glutSolidTeapot(0.5);



glPopMatrix();


glPushMatrix();


glRotatef(2.1 * _rAngle, 1, 0.5, 0);


glutSolidTorus(0.05, 0.9, 64, 64);


glPopMatrix();


glPushMatrix();


glRotatef(
-
1.5 * _rAngle, 0, 1, 0.5);


glutSolidTorus(0
.05, 0.9, 64, 64);


glPopMatrix();


glPushMatrix();


glRotatef(1.78 * _rAngle, 0.5, 0, 1);


glutSolidTorus(0.05, 0.9, 64, 64);


glPopMatrix();




// copy the results to the texture


glBindTexture(GL_T
EXTURE_2D, _iTexture);


glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, _iWidth, _iHeight);





// run the edge detection filter over the geometry texture


// Activate the edge detection filter program


cgGLBindProgram(
_fragmentProgram);


cgGLEnableProfile(g_cgProfile);




// bind the scene texture as input to the filter


cgGLSetTextureParameter(_textureParam, _iTexture);


cgGLEnableTextureParameter(_textureParam);





// GPGPU CONCEPT 4: Viewport
-
Sized Quad = Data Stream Generator.


// In order to execute fragment programs, we need to generate pixels.


// Drawing a quad the size of our viewport (see above) generates a


// fragment for every pix
el of our destination texture. Each fragment


// is processed identically by the fragment program. Notice that in


// the reshape() function, below, we have set the frustum to


// orthographic, and the frustum dimensions to [
-
1,1].
Thus, our


// viewport
-
sized quad vertices are at [
-
1,
-
1], [1,
-
1], [1,1], and


// [
-
1,1]: the corners of the viewport.


glBegin(GL_QUADS);


{


glTexCoord2f(0, 0); glVertex3f(
-
1,
-
1,
-
0.5f);


g
lTexCoord2f(1, 0); glVertex3f( 1,
-
1,
-
0.5f);


glTexCoord2f(1, 1); glVertex3f( 1, 1,
-
0.5f);


glTexCoord2f(0, 1); glVertex3f(
-
1, 1,
-
0.5f);


}


glEnd();




// disable the filter


cgGLDisableTexture
Parameter(_textureParam);


cgGLDisableProfile(g_cgProfile);




// GPGPU CONCEPT 5: Copy To Texture (CTT) = Feedback.


// We have just invoked our computation (edge detection) by applying


// a fragment program to a viewp
ort
-
sized quad. The results are now


// in the frame buffer. To store them, we copy the data from the


// frame buffer to a texture. This can then be fed back as input


// for display (in this case) or more computation (see



// more advanced samples.)



// update the texture again, this time with the filtered scene


glBindTexture(GL_TEXTURE_2D, _iTexture);


glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, _iWidth, _iHeight);




// restore t
he stored viewport dimensions


glViewport(vp[0], vp[1], vp[2], vp[3]);


}



void

display()


{


// Bind the filtered texture


glBindTexture(GL_TEXTURE_2D, _iTexture);


glEnable(GL_TEXTURE_2D);



// render a full
-
s
creen quad textured with the results of our


// computation. Note that this is not part of the computation: this


// is only the visualization of the results.


glBegin(GL_QUADS);


{


glTexCoord2f(0, 0); glVertex3f(
-
1,
-
1,
-
0.5f);


glTexCoord2f(1, 0); glVertex3f( 1,
-
1,
-
0.5f);


glTexCoord2f(1, 1); glVertex3f( 1, 1,
-
0.5f);


glTexCoord2f(0, 1); glVertex3f(
-
1, 1,
-
0.5f);


}


glEnd();



glDisable(GL_TEXTURE_2D);



}


protected
:
// data


int

_iWidth, _iHeight;
// The dimensions of our array


float

_rAngle;
// used for animation




unsigned

int

_iTexture;
// The texture used as a data array



CGprogram _fragmen
tProgram;
// the fragment program used to update


CGparameter _textureParam;
// a parameter to the fragment program

};


// GLUT idle function

void

idle()

{


glutPostRedisplay();

}


// GLUT display function

void

display()

{


g_pHello
-
>update
();
// update the scene and run the edge detect filter


g_pHello
-
>display();
// display the results


glutSwapBuffers();

}


// GLUT reshape function

void

reshape(
int

w,
int

h)

{


if

(h == 0) h = 1;




glViewport(0, 0, w, h);




// GPGP
U CONCEPT 3b: One
-
to
-
one Pixel to Texel Mapping: An Orthographic


// Projection.


// This code sets the projection matrix to orthographic with a range of


// [
-
1,1] in the X and Y dimensions. This allows a trivial mapping of



// pixels to texels.


glMatrixMode(GL_PROJECTION);


glLoadIdentity();


gluOrtho2D(
-
1, 1,
-
1, 1);


glMatrixMode(GL_MODELVIEW);


glLoadIdentity();

}


// Called when Cg detects an error

void

cgEr
rorCallback()

{


CGerror lastError = cgGetError();




if
(lastError)


{


printf(
"%s
\
n
\
n"
, cgGetErrorString(lastError));


printf(
"%s
\
n"
, cgGetLastListing(g_cgContext));


printf(
"Cg error!
\
n"
);


}

}


// Called at startup

void

initialize()

{



// Setup Cg


cgSetErrorCallback(cgErrorCallback);


g_cgContext = cgCreateContext();




// get the best profile for this hardware


g_cgProfile = cgGLGetLatestProfile(CG_GL_FRAGMENT);


assert(g_cgProfile != CG_PROFILE_UNKNOWN);


cgGLSet
OptimalOptions(g_cgProfile);



// Create the example object


g_pHello =
new

HelloGPGPU(512, 512);

}


// The main function

void

main()

{


glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA);


glutInitWindowSize(512, 512);


glutCreateWindow(
"Hello,
GPGPU!"
);



glutIdleFunc(idle);


glutDisplayFunc(display);


glutReshapeFunc(reshape);



initialize();



glutMainLoop();

}