Prospects of using GPU in desktop-grid systems - GRID'2012

gradebananaΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

94 εμφανίσεις

"Distributed Computing and Grid
-
technologies
in Science and Education
"

PROSPECTS OF USING GPU IN
DESKTOP
-
GRID SYSTEMS

Klimov

Georgy

Dubna
, 2012

AGENDA



Grid & GPU


GPU architecture


CUDA technologies


Grid
-
projects with GPU using


Monotonic Basin Hopping method


CUDA
-
realization of MBH


Further investigations plan


Summary


Grid & GPU

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

GPU advantages:


~33% of all PCs are equipped
with modern GPU (~60%
-

Nvidia
)



Common usage of GPU
resources <5% (HD film)



GPU optimized for working with
huge textures arrays



Modern GPUs
consist
of
tens or
even
hundreds cores. It means
great performance for some kinds
of tasks

Problems, solving by
Grid:



e
ffective using of
existing resources



working with huge
data arrays



providing high
performance

GPU architecture


scalable
array
of
ТРС


w
ith it’s own DRAM

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012



8 Scalar Processors



2 Special Functions Units



Double Precision Unit



Register File



Shared Memory



Texture Memory Cache



Constant Memory Cache


CUDA technology

CUDA


Compute Unified
Device
Architecture



Supports
all
NVidia GPUs
starting from
GeForce 8
-
x series



Low
level access to the hardware
-

graphics API knowledge
not required



CUDA programming language is based on C/C++ syntax


easier porting of existing
code



Greater performance comparing to
OpenCL

(50
-
100%
performance increase in different researches)

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

CUDA technology

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

CUDA programming model

CUDA technology

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

CUDA threads hierarchy


Threads groups in Blocks (1, 2 or 3
-
dim)



Blocks groups in Grid (1 or 2
-
dim)



Treads within Block:


Sharing
data through
shared memory


Synchronizing
their execution



Threads from different blocks operate
independently



Built variables
threadIdx
,
blockIdx

etc.

CUDA technology

Memory type

Access

Level

Speed

Registers

R/W

Per
-
thread

High

(on chip)

Local

R/W

Per
-
thread

Low (DRAM)

Shared

R/W

Per
-
block

High (on chip)

Global

R/W

Per
-
grid

Low (DRAM)

Constant

R/O

Per
-
grid

High (L1 cache)

Texture

R/O

Per
-
grid

High (L1 cache)

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

CUDA memory hierarchy

Grid
-
projects with GPU using

GPUgrid.net
-

volunteer distributed computing project for biomedical
research from the
Universitat

Pompeu

Fabra

in Barcelona (Spain
)


Collatz

Conjecture
-

research in mathematics, specifically testing
the
Collatz

Conjecture also known as 3x+1 or HOTPO (half or triple plus
one).


PrimeGrid

-

to bring the excitement of prime finding

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Monotonic Basin Hopping method

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

1
.
Start from point
x
0


2
.
Repeat until the stop condition
:

2.
1.
generate point
Φ(x
)

2.
2.
apply
the local minimization algorithm
to
the point
Φ (
x)

→ get
point
x
1
.

2.3. if

f (x
1

) < f (x)
,
then
x
= x
1


3
.
Return
x

Algorithm steps:

* Gradient descent was used as
local
minimization algorithm

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Ymin

Ymax

I, j

Xmin

Xmax


Divide
the
research
area into equal square
areas



Each thread
implements the
algorithm
in it’s area



Find minimum among
the results of
each
thread

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

GPU1
-

Tesla 10:

max threads per block = 512

max threads per dim = 512

max blocks per dim = 65535

number of
multiproc

= 30



GPU2
-

GeForce

GT 525M
:

max threads per block = 1024

max threads per dim = 1024

max blocks per dim = 65535

number of
multiproc

= 2


CPU
-

I
ntel

core2duo T6400

number of cores = 2

Clock speed = 2 GHz





Used hardware:

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012


Four parameters: the radius of the
“jump”
of the algorithm MBH
-

r
, the
maximum number of steps in the cycle
-

N
, the number of
blocks
launched
-

Nb

and the number of threads
per
block
-

Nt


Set
Nb

and
Nt


The radius
r

is
calculated as half of a square
area
diametr


The number of
cycle’s steps
N

is determined
a
result of the experiment
*


4
test
functions were selected:
Ackley,
Griewank
,
Rastrigin
, Shubert

Methodology of the experiment

1.
The result is considered
valid if it
differs from the
tabular
less
than 0.001

2.
The
result
is considered valid if an average of 9 times out of
10 gives the right within the specified accuracy of the answer

3.
The time averaged over 20 runs of the
program

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

AVG executing time

CPU

160
sec

GeForce GT 525M

35
sec

Tesla

10

1.5
sec

Results for Ackley function

Number of treads per block

Number of treads per block

block

blocks

blocks

blocks

block

blocks

blocks

blocks

Minimal time of finding
extremum
, sec

Minimal time of finding
extremum
, sec

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Results for
Griewank

function

AVG executing time

CPU

155
sec

GeForce GT 525M

33
sec

Tesla

10

2.2
sec

Number of treads per block

Number of treads per block

block

blocks

blocks

blocks

block

blocks

blocks

blocks

Minimal time of finding
extremum
, sec

Minimal time of finding
extremum
, sec

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Results for
Rastrigin

function

AVG executing time

CPU

1
25
sec

GeForce GT 525M

28.5
sec

Tesla

10

2.0
sec

Number of treads per block

Number of treads per block

block

blocks

blocks

blocks

block

blocks

blocks

blocks

Minimal time of finding
extremum
, sec

Minimal time of finding
extremum
, sec

CUDA
-
realization of MBH

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Results for Shubert function

AVG executing time

CPU

300
sec

GeForce GT 525M

82
sec

Tesla

10

4.3
sec

block

blocks

blocks

blocks

Number of treads per block

Number of treads per block

block

blocks

blocks

blocks

Minimal time of finding
extremum
, sec

Minimal time of finding
extremum
, sec

Further investigations plan


Use more complicated and accurate local
optimization methods


Uprgrade method of parallization


Improve algorithm of MBH “jump” set
-
up


Build solution for Molecular cluster modeling
based on MBH method


Integrate CUDA
-
solution to BNB
-
Grid project


Describe class of functions that can be
effectively processed on GPUs

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

Summary


There are huge share of GPUs among PCs


GPU is a multicore system


CUDA is one of the technologies that provides
great performance of GPU calculations


There are a number of Grid
-
projects that
already use CUDA


Tests shows that in some cases GPU perform
5
-
100 times better than CPU

PROSPECTS OF USING GPU IN DESKTOP
-
GRID SYSTEMS

Klimov

G., CMC MSU 2012

THANKS FOR YOUR ATTENTION!