Parallel Computing with GPUs & CUDA

pumpedlessΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 4 χρόνια και 29 μέρες)

78 εμφανίσεις

Training Program on

GPU Programming

with CUDA

31
st

July, 7
th

Aug, 14
th

Aug 2011

CUDA Teaching Center @ UoM

Training Program on

GPU Programming with CUDA

Sanath Jayasena

CUDA Teaching Center @ UoM

Day 1, Session 1

Introduction


Outline


Training Program Description


CUDA Teaching Center

at UoM


Subject Matter


Introduction to GPU Computing


GPU Computing with CUDA


CUDA Programming Basics

July
-
Aug 2011

3

CUDA Training Program

Overview of Training Program


3 Sundays, starting 31
st

July


Schedule and program outline



Main resource persons


Sanath Jayasena, Jayathu Samarawickrama, Kishan
Wimalawarna, Lochandaka Ranathunga


Dept of Computer Science & Eng, Dept of Electronic &
Telecom. Engineering (of Faculty of Engineering) and
Faculty of IT

July
-
Aug 2011

CUDA Training Program

4

CUDA Teaching Center


UoM was selected as a CTC


A group of people from multiple Depts


http://research.nvidia.com/content/cuda
-
teaching
-
centers


Benefits


Donation of hardware by NVIDIA (GeForce
GTX480s and Tesla C2070)


Access to other resources


Expectations


Use of the resources for teaching/research,
industry collaboration

July
-
Aug 2011

CUDA Training Program

5

GPU Computing: Introduction


Graphics Processing Units
(
GPU
s)


high
-
performance many
-
core processors that can
be used to accelerate a wide range of applications


GPGPU

-

General
-
Purpose computation on
Graphics Processing Units


GPUs lead the race for floating
-
point
performance since start of 21
st

century


GPUs are being used as parallel processors

July
-
Aug 2011

CUDA Training Program

6

GPU Computing: Introduction


General computing, until end of 20
th

century


Relied
on the advances in hardware
to increase
the
speed of
software/apps


Slowed down since then due to


Power consumption issues


Limited productivity within a single processor


Switch to multi
-
core
and many
-
core models


Multiple
processing
units (processor cores) used
in
each chip to increase the processing
power


Impact on software developers?

July
-
Aug 2011

CUDA Training Program

7

GPU Computing: Introduction


A sequential program will only run on one of
the cores, which will not become any faster



With each new generation of processors


Software that will continue to enjoy performance
improvement will be
parallel programs


Where, multiple threads of execution cooperate
to achieve the functionality faster

July
-
Aug 2011

CUDA Training Program

8

CPU
-
GPU Performance Gap

July
-
Aug 2011

CUDA Training Program

9

Source: CUDA Prog. Guide 4.0

CPU
-
GPU Performance Gap

July
-
Aug 2011

CUDA Training Program

10

Source: CUDA Prog. Guide 4.0

GPGPU & CUDA


GPU designed as a numeric computing engine


Will not perform well on some tasks as CPUs


Most applications will use both CPUs and GPUs



CUDA


NVIDIA’s parallel computing architecture aimed at
increasing computing performance by harnessing
the power of the GPU


A programming model

July
-
Aug 2011

CUDA Training Program

11

More Details on GPUs


GPU is typically a computer card, installed into
a PCI Express 16x slot


Market leaders: NVIDIA, Intel, AMD (ATI)


Example NVIDIA GPUs (donated to UoM)

GeForce GTX 480

Tesla 2070

July
-
Aug 2011

12

CUDA Training Program

Example Specifications

GTX 480

Tesla 2070

Peak double precision
floating point
performance

650 Gigaflops

515 Gigaflops

Peak single precision
floating point
performance

1300 Gigaflops

1030 Gigaflops

CUDA cores

480

448

Frequency of CUDA
Cores

1.40

GHz


1.15

GHz

Memory size (GDDR5)

1536

MB


6
GigaBytes

Memory bandwidth

177.4
GBytes
/sec

150
GBytes
/sec

ECC Memory

NO

YES

July
-
Aug 2011

13

CUDA Training Program

CPU vs. GPU Architecture

The GPU devotes more transistors for computation

July
-
Aug 2011

14

CUDA Training Program

CPU
-
GPU Communication

July
-
Aug 2011

15

CUDA Training Program

CUDA Architecture


CUDA is NVIDA’s solution to access the GPU


Can be seen as an extension to C/C++

CUDA Software Stack

July
-
Aug 2011

16

CUDA Training Program

CUDA Architecture

There are two main parts


1.
Host

(
CPU part
)

-
Single Program, Single Data


2.
Device

(
GPU part
)

-
Single Program, Multiple
Data

July
-
Aug 2011

17

CUDA Training Program

CUDA Architecture

GRID Architecture

July
-
Aug 2011

18

CUDA Training Program

The
Grid

1.
A group of threads all running

the same kernel

2.
Can run multiple grids at once

The
Block

1.
Grids composed of blocks

2.
Each block is a logical unit
containing a number of
coordinating threads and
some amount of shared
memory

Some Applications of GPGPU

Computational Structural Mechanics

Bio
-
Informatics and Life Sciences

Computational Electromagnetics and

Electrodynamics

Computational Finance

July
-
Aug 2011

19

CUDA Training Program

Some Applications…

Computational Fluid Dynamics

Data Mining, Analytics, and Databases

Imaging and Computer Vision

Medical Imaging

July
-
Aug 2011

20

CUDA Training Program

Some Applications…

Molecular Dynamics

Numerical Analytics

Weather, Atmospheric, Ocean Modeling

and Space Sciences

July
-
Aug 2011

21

CUDA Training Program

CUDA Programming

Basics

Accessing/Using the CUDA
-
GPUs


You have been given access to our cluster


User accounts on 192.248.8.13x


It is a Linux system


CUDA Toolkit and SDK for development


Includes CUDA C/C++ compiler for GPUs (“nvcc”)


Will need C/C++ compiler for CPU code


NVIDIA device drivers needed to run programs


For programs to communicate with hardware


July
-
Aug 2011

CUDA Training Program

23

Example Program 1



__global__
” says
the function is to be
compiled to run on
a “device” (GPU),
not “host” (CPU)


Angle brackets

<<<
“ and “
>>>
” for
passing
params
/
args

to runtime

July
-
Aug 2011

CUDA Training Program

24

#include <
cuda.h
>

#include <
stdio.h
>


__global__ void kernel (void)
{ }


int

main (void)

{


kernel <<< 1, 1 >>> ();



printf
("Hello World!
\
n");




return 0;

}


A function executed on the GPU
(device) is usually called a “
kernel


Example Program 2


Part 1

July
-
Aug 2011

CUDA Training Program

25

As can be seen in next slide:



We can pass parameters to a
kernel

as we would
with any C function




We need to allocate memory to do anything useful
on a device, such as return values to the host

Example Program 2


Part 2

int main (void) {


int c, *dev_c;




cudaMalloc ((void **) &dev_c, sizeof (int));


add <<< 1, 1 >>> (2,7, dev_c);


cudaMemcpy(&c, dev_c, sizeof(int),






cudaMemcpyDeviceToHost);


printf(“2 + 7 = %d
\
n“, c);


cudaFree(dev_c);




return 0;

}


July
-
Aug 2011

CUDA Training Program

26

Example Program 3

Within host (CPU) code, call the kernel by using <<<
and >>> specifying
the grid size (number of blocks)
and/or

the block size (number of threads)
-

(
more
details later
)

July
-
Aug 2011

27

CUDA Training Program

Example Program 3
…contd

July
-
Aug 2011

28

CUDA Training Program

Note:

Details on
threads

and
thread IDs
will come later

Example Program 4

July
-
Aug 2011

29

CUDA Training Program

Grids, Blocks and Threads

July
-
Aug 2011

30

CUDA Training Program


A
grid

of size 6 (3x2
blocks)


Each
block

has 12
threads (4x3)


Conclusion


In this session we discussed


Introduction to GPU Computing


GPU Computing with CUDA


CUDA Programming Basics


Next session


Data Parallelism


CUDA Programming Model


CUDA Threads

July
-
Aug 2011

CUDA Training Program

31

References for this Session


Chapters 1 and 2
of: D. Kirk and W. Hwu,
Programming Massively Parallel Processors
,
Morgan Kaufmann, 2010


Chapters 1
-
4
of: E. Kandrot and J. Sanders,
CUDA by Example
, Addison
-
Wesley, 2010


Chapters 1
-
2
of:
NVIDIA

CUDA C
Programming Guide
, NVIDIA Corporation,
2006
-
2011 (Versions 3.2 and 4.0)

July
-
Aug 2011

CUDA Training Program

32