ST810 Advanced Computing

pumpedlessΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

54 εμφανίσεις

ST810 Lecture 17
ST810 Advanced Computing
Lecture 17:Parallel computing – GPU part I
Eric B.Laber Hua Zhou
Department of Statistics
North Carolina State University
Mar 13,2013
ST810 Lecture 17
GPU
Outline
GPU computing
Hardware
GPU computing – overview
Matlab
R
ST810 Lecture 17
GPU
Hardware
Typical GPUs on current laptops
I
E.g.,my MacBook Pro has an Intel
R

HD Graphics 4000 (built in
with i7-3720QM CPU) and a NVIDIA
R

GeForce GT 650M GPU
I
NVIDIA
R

GT 650M has 1G memory,524K L2 cache,384 cores
@0.9 GHz
I
Theoretical throughput:641.3 SP GFLOPS
ST810 Lecture 17
GPU
Hardware
Typical GPUs on current desktops
I
My desktop (Dell Alienware) has a NVIDIA
R
GeForce GTX 580
GPU
I
GTX 580 has 1.5G memory,786K L2 cache,512 cores @1.59
GHz
I
Theoretical throughput:1581 SP GFLOPS
I
Release Price:$500 (Nov 2010)
ST810 Lecture 17
GPU
Hardware
Typical GPUs on current servers
I
The teaching server has 4 x NVIDIA
R
Tesla M2070Q
I
Each Tesla M2070Q has 6G memory (5.25G with ECC),786K L2
cache,448 cores @1.15 GHz
I
Theoretical throughput:4 x 1288 SP GFLOPS or 4 x 512 DP
GFLOPS
ST810 Lecture 17
GPU
Hardware
Graphics Processing Units (GPUs)
I
Ubiquitous in today’s hardware (PCs,laptops,servers)
I
Cost effective for high performance computing
I
Rapid growth in recent years
I
Our department has at least two GPU servers.Many nodes in
NCSU HPC henry2 are equipped with GPUs too
ST810 Lecture 17
GPU
Hardware
GPU vs CPU architecture
I
GPUs contain 100s of processing cores on a single chip;several
chips can fit in a desktop PC
I
Each core carries out the same operations in parallel on different
input data – single program,multiple data (SPMD) paradigm
Extremely high arithmetic intensity *if* one can transfer the data onto
and results off of the processors quickly
ST810 Lecture 17
GPU
Hardware
CPU GPU
An analogy taken from Andrew Beam’s presentation in ST790
ST810 Lecture 17
GPU
GPU computing – overview
GPGPU - General purpose GPU computing
My experience
I
Almost always involve (new) algorithm development and/or
revamping CPU code
I
Research before going for GPGPU (next slide)
I
Easier to develop in C/C++ (free compiler),Fortran (compiler
$),and Matlab
I
Do not reinvent the wheel – use libraries
ST810 Lecture 17
GPU
GPU computing – overview
Before using GPUs
0.Frustrated by slow code...
1.Am I using the right algorithm(s)?
Go to your ST758 notes or a numerical analysis book.E.g.,for
massive data (terabytes),an O(n
2
) algorithm vs an O(nlogn)
means a 31710 years vs 27 seconds difference on a TFLOPS
supercomputer
2.Repeat:Profile and optimize original code
3.Can a compiled language or optimized library (MKL,ATLAS)
help?
4.Identify the bottleneck routine and research the potential gain on
GPU
5.Can my data fit into GPU memory?
6.Can other routines be easily implemented on GPU?Is that
necessary?
7.Decide the toolchain:Matlab,CUDA,PGI toolchain,...
ST810 Lecture 17
GPU
GPU computing – overview
GPGPU development
A few approaches to developing GPGPU code
I
CUDA
R
toolchain provided by NVIDIA
R
I
free
I
C/C++
I
only for NIVIDA cards
I
PGI
R
toolchain (CUDA Fortran)
I
$$$
I
C/C++,Fortran
I
only for NVIDIA cards
I
OpenCL
TM
(Open Computing Language)
I
open source
I
Specs for cross-platform,parallel programming of modern
processors (PCs,servers,handheld/embedded devices)
I
Adopted by Intel,AMD,...
I
Use a higher level language such as Matlab
ST810 Lecture 17
GPU
GPU computing – overview
Which card to use?AMD vs NVIDIA
I
NVIDIA cards are more widely adopted for GPGPU
E.g.,GPU servers in our department and NCSU henry2 cluster
all have NVIDIA
I
NVIDIA has a much richer set of GPU math libraries
I
Cross-platform feature of OpenCL is attractive
AMD NVIDIA
Cards ATI Radeon GTX,Tesla
Language OpenCL CUDA C/C++,PGI CUDA Fortran
GPU math libraries APPML (BLAS,FFT) cuBLAS,cuFFT,cuSPARSE
cuRAND,CUDA MATH,Thrust,...
Platforms Linux,Windows Linux,Windows,MacOS
ST810 Lecture 17
GPU
Matlab
GPU computing in Matlab
Getting started
I
gpuDevice():query GPU device
I
methods(’gpuArray’):built-in functions that support GPU
ST810 Lecture 17
GPU
Matlab
290 built-in functions in Matlab 2012b support GPU
ST810 Lecture 17
GPU
Matlab
Scheme for GPU algorithm development on Matlab
% transfer data to GPU and initialize variables
gX = gpuArray (X);
gY = gpuArray (Y);
gBetahat = gpuArray.randn (5,1);
...
% computation on GPU
...
% transfer result off GPU
betahat = gather (gBetahat);
Key:minimize memory transfer between host memory and GPU
memory
ST810 Lecture 17
GPU
Matlab
Benchmarking
I
Always benchmark the bottleneck routine before embarking on
GPU
I
E.g.,to benchmark A\b (solve linear equations) on my desktop
I
paralleldemo_gpu_backslash() in Matlab 2012b
0
2000
4000
6000
8000
10000
12000
0
100
200
300
400
500
600
700
Matrix size
Gigaflops
Single−precision performance


GPU
CPU
1000
2000
3000
4000
5000
6000
7000
8000
9000
20
40
60
80
100
120
140
160
180
200
Matrix size
Gigaflops
Double−precision performance


GPU
CPU
Intel i7 960 CPU vs NVIDIA GTX 580 GPU
ST810 Lecture 17
GPU
R
GPU computing in R
I
Not supported in base R (opportunity?HiPLARM package)
I
A few contributed packages in specific application areas:
gputools (some data-mining algorithms),cudaBayesreg
(fMRI analysis),...
I
Develop in C/C++ or Fortran and call compiled code fromR