slides

compliantprotectiveΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

77 εμφανίσεις

1

1

A GPU
Accelerated Explicit Finite
-
volume
Euler Equation Solver
with
Ghost
-
cell
Approach

F.
-
A. Kuo
1,2
, M.R. Smith
3
, and J.
-
S. Wu
1*

1
Department of Mechanical Engineering

National
Chiao

Tung University

Hsinchu
, Taiwan

2
National Center for High
-
Performance Computing, NARL

Hsinchu
, Taiwan

3
Department of Mechanical Engineering

National Cheng Kung University

Tainan, Taiwan


*E
-
mail:
chongsin@faculty.nctu.edu.tw


2013 IWCSE

Taipei, Taiwan

October 14
-
17, 2013

Session:
Supercomputer/GPU and
Algorithms (GPU
-
2)


Background & Motivation


Objectives


Split HLL (SHLL) Scheme


Cubic
-
Spline Immersed Boundary Method
(IBM)


Results & Discussion


Parallel Performance


Demonstrations


Conclusion and Future work


Outline

2

2

3

3

Background

& Motivation

4


Computational fluid dynamics (
CFD
) has played
an important role in accelerating the progress of
aerospace/space and other technologies.


For several challenging 3D flow problems,
parallel computing of CFD

bceomes

necessary to
greatly shorten the very lengthy computational
time.


Parallel computing of CFD has evolved from SIMD
type
vectorized

processing

to SPMD type
distributed
-
memory

processing for the past 2
decades, mainly because of the much lower cost
for H/W of the latter and easier programming.


4

Parallel CFD

5


SIMD
(Single instruction, multiple data), which is a
class of parallel computers, performs the same
operation on multiple data points at the
instruction level

simultaneously.


SSE/AVX instructions in CPU and GPU computation,
e.g.,
CUDA
.


SPMD
(Single program, multiple data) is a
higher
level abstraction

where programs are run across
multiple processors and operate on different
subsets of the data.


Message passing programming on distributed memory
computer architectures, e.g.,
MPI
.

5

SIMD vs. SPMD

6


Most well
-
known parallel CFD codes adopt
SPMD parallelism using
MPI
.


e.g., Fluent (Ansys), CFL3D (NASA), to name a few.


Recently, because of the potentially very
high

C/P
ratio by using graphics processor units
(GPUs), parallelization of CFD code using GPUs
has become an active research area based on
CUDA
, developed by Nvidia.


However,
redesign of the numerical scheme

may be necessary to take full advantage of the
GPU architecture.

6

MPI vs. CUDA

7


Split
Harten
-
Lax
-
van Leer (SHLL) scheme
(
Kuo

et al
., 2011)


a highly local numerical scheme, modified from the
original HLL scheme


Cartesian grid


~
60 times

of speedup (
Nvidia

C1060 GPU vs. Intel X5472
Xeon CPU) with
explicit
implementation


However, it is difficult to treat objects with
complex
geometry

accurately, especially for
high
-
speed

gas
flow.
One example

is given in the next page.


Thus, how to take advantage of easy
implementation of
Cartesian grid on GPUs
, while
improving the capability of treating
objects with
complex geometry

becomes important in further
extending the applicability of SHLL scheme in CFD
simulations.

7

Split HLL Scheme on GPUs

8


Spurious waves

are often generated using
staircase
-
like solid surface for high
-
speed gas
flows.






8

Shock direction

Staircase
-
like

IBM

Staircase
-
like vs. IBM

9


Immersed boundary method (
IBM
)
(
Peskin
, 1972; Mittal &
Iaccarino
,
2005 )


easy treatment of objects with
complex geometry

on a
Cartesian grid


grid computation near the objects become
automatic
or
very easy


easy treatment of
moving objects

in computational
domain w/o
remeshing


Major idea of IBM is simply to
enforce the B.C.’s at
computational grid points

thru interpolation among
fluid grid and B.C.’s at solid boundaries.


Stencil of IBM operation is
local

in general.


Enabling an efficient use of
original numerical
scheme
, e.g., SHLL


Easy
parallel implementation

9

Immersed Boundary Method

10

10

Objectives

11


To develop and validate an
explicit cell
-
centered finite
-
volume

solver for solving
Euler equation, based on SHLL scheme, on
a Cartesian grid with
cubic
-
spline IBM on
multiple GPUs


To study the
parallel performance

of the
code on single and multiple GPUs


To demonstrate the
capability
of the code
with several applications

11

Goals

12

Split HLL Scheme

13

SHLL Scheme
-

1

13
























1
1
L
R
R
R
R
R
L
L
L
L
S
S
U
S
F
S
S
U
S
F
F
i
-
1

i

i+1

SIMD model for 2D flux computation

+Flux

-

Flux























A
L
R
R
R
R
A
R
L
L
L
L
S
S
U
S
F
S
S
U
S
F
F
1
1



















2
1
2
1
2
L
L
L
L
L
a
DU
D
F
F






















2
1
2
1
2
R
R
R
R
R
a
DU
D
F
F


Original HLL

Introduce local
approximations

Final form (SHLL) is a
highly local scheme


New S
R

& S
L

term are approximated w/o involving the
neighbor
-
cell data.


A
highly local flux

computational scheme: great for
GPU!

14

14




















2
1
2
1
2
L
L
L
L
L
a
DU
D
F
F






















2
1
2
1
2
R
R
R
R
R
a
DU
D
F
F


Final Form (SHLL)


Flux computation is
perfect

for GPU
application.


Almost the same as the
vector addition

case.


>
60 times

speedup possible using a
single Tesla C1060 GPU device.
Performance compares to single thread
of a high
-
performance CPU (Intel Xeon
X5472)


i
-
1

i

i+1

SIMD model for 2D flux computation

+Flux

-

Flux

SHLL Scheme
-

2

15

Cubic
-
spline IBM

16

16

Two Critical Issues of IBM


How to approximate solid boundaries?


Local
Cubic Spline

for reconstructing solid
boundaries w/ much fewer points


Easier calculation of
surface normal/tangent


How to apply IBM in a cell
-
centered FVM
framework?


Ghost
-
cell approach


Obtain ghost cell properties by the interpolation
of data among neighboring fluid cells


Enforce BCs at solid boundaries to ghost cells
through data mapping from image points

17

1.
Define a cubic
-
spline function for each segment of
boundary data to best fit solid boundary geometry

2.
Identify all the solid cells, fluid cells and ghost points

3.
Locate image points corresponding to ghost cells

Cell Identification

Solid
cell

Fluid
cell

Solid
boundary
curve

Ghost
cell

18

Cubic
-
S
pline
Reconstruction

(Solid Boundary)


The cubic spline
method provides
the advantages
including :

1.
A high order curve
fitting boundary

2.
Find these ghost
cells easily.

3.
Calculate the
normal vector
which is normal to
the body surface.

18

19

19

BCs of
Euler
E
q
ns.

0
0
0
n
V
n
T
n










,,ghost
,,ghost



image
ghost
n image n
t image t
image
ghost
V V
V V
T T
 

 


n
unit normal of body
surface

Approximated form

20


Approximate the properties of the image
points using
bi
-
linear interpolation

among
neighboring fluid cells

IBM Procedures

Image
point

Ghost
point

Interpolation

Fluid
cell

Solid
cell

21

SHLL/IBM Scheme on
GPU

22

Nearly All
-
Device Computation

22

Initialize

Flux calculation

State
calculation

CFL calculation



new
dt


new
dt
Set GPU device ID

and
flowtime

T >
flowtime

flowtime

+=
dt




Output the result

True

False

Device

Host

Start

IBM

23

23

Results & Discussion

(Parallel Performance)

24


Also named as

Schardin’s

problem




Test Conditions


Moving shock w/ Mach
1.5


Resolution:

2000x2000 cells


CFL
max
=0.2


Physical time:

0.35 sec. for
9843

time
-

steps using one GPU

24

Parallel Performance
-

1

L=1

H=1

Moving shock
x

0.2 @ t=0

25


Resolution


2000x2000 cells


GPU cluster


GPU: Geforce GTX590
(2x 512 cores, 1.2 Ghz

3GB DDR5)


CPU: Intel Xeon X5472


Overhead w/ IBM


3% only


Speedup



GPU/CPU: ~ 60x



GPU/GPU: 1.9 @2 GPUs



GPU/GPU: 3.6 @4 GPUs


0
0.5
1
1.5
2
2.5
3
3.5
4
0
100
200
300
400
500
600
1 GPU
2 GPUs
4 GPUs
Compute time
Speedup
25

Sec.

Speedup

Parallel Performance
-

2

26

Results & Discussion

(Demonstrations)

27

27


In the case of
400x400

cells w/o IBM,
the staircase solid boundary
generates spurious waves, which
destroys the accuracy of the surface
properties.


By comparison, the case w/ IBM
shows much more improvement for
the surface properties.

w/ IBM

w/o IBM

Shock over a finite wedge

-

1

28

28

with IBM

w/o IBM

Shock over a finite wedge

-

2


All important physical phenomena are well captured
by the solver with IBM without spurious wave
generation.

t= 0.35 s

Density contour comparison

29

Transonic Flow past a NACA Airfoil

pressure

pressure

IBM result

Staircase boundary
w/o IBM

In the left case, the spurious waves appear near the solid boundary,

but in the right case, we modify the boundary by using the IBM.

30

Transonic Flow past a NACA Airfoil

Upper surf.

Lower surf.

Distribution
of pressure around the surface of
the airfoil

Ghost cell method,
J
. Liu et al., 2009

New approach method

These 2 results are very closed, and the right result is
made by Liu in 2009, and the left result is made by the
cubic spline IBM.

31

Transonic Flow past a NACA Airfoil

Top
-
side shock wave comparison

*
Petr

Furmánek
, “
Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in
Prague, 2008

New approach method

Furmanek
*, 2008

32

Transonic Flow past a NACA Airfoil

Bottom
-
side shock wave comparison

*
Petr

Furmánek
, “
Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in
Prague, 2008

New approach method

Furmanek
*, 2008

33

Conclusion & Future Work

34


A cell
-
centered 2
-
D finite
-
volume solver for the
inviscid

Euler equation, which can easily treat
objects with complex geometry on a Cartesian
grid by using the cubic
-
spline IBM on multiple
GPUs, is completed and validated



The addition of cubic
-
spline IBM only increase 3%
of the computational time, which is negligible.


Speedup for GPU/CPU

generally exceeds 60 times
on a single GPU (
Nvidia
,
Telsa

C1060) as
compared to that on a single thread of an Intel
X5472 Xeon CPU.


Speedup for GPUs/GPU

reaches 3.6 at 4 GPUs
(GeForce) for a simulation w/ 2000x2000 cells.

Summary

35


To modify the Cartesian grid to the
adaptive mesh grid.


To simulate the moving boundary problem
and real
-
life problems with this immersed
boundary method


To change the SHLL solver to the true
-
direction finite volume solver, likes QDS

Future Work

36

36

Thanks for your patient

and Questions ?