1
1
A GPU
Accelerated Explicit Finite

volume
Euler Equation Solver
with
Ghost

cell
Approach
F.

A. Kuo
1,2
, M.R. Smith
3
, and J.

S. Wu
1*
1
Department of Mechanical Engineering
National
Chiao
Tung University
Hsinchu
, Taiwan
2
National Center for High

Performance Computing, NARL
Hsinchu
, Taiwan
3
Department of Mechanical Engineering
National Cheng Kung University
Tainan, Taiwan
*E

mail:
chongsin@faculty.nctu.edu.tw
2013 IWCSE
Taipei, Taiwan
October 14

17, 2013
Session:
Supercomputer/GPU and
Algorithms (GPU

2)
Background & Motivation
Objectives
Split HLL (SHLL) Scheme
Cubic

Spline Immersed Boundary Method
(IBM)
Results & Discussion
Parallel Performance
Demonstrations
Conclusion and Future work
Outline
2
2
3
3
Background
& Motivation
4
Computational fluid dynamics (
CFD
) has played
an important role in accelerating the progress of
aerospace/space and other technologies.
For several challenging 3D flow problems,
parallel computing of CFD
bceomes
necessary to
greatly shorten the very lengthy computational
time.
Parallel computing of CFD has evolved from SIMD
type
vectorized
processing
to SPMD type
distributed

memory
processing for the past 2
decades, mainly because of the much lower cost
for H/W of the latter and easier programming.
4
Parallel CFD
5
SIMD
(Single instruction, multiple data), which is a
class of parallel computers, performs the same
operation on multiple data points at the
instruction level
simultaneously.
SSE/AVX instructions in CPU and GPU computation,
e.g.,
CUDA
.
SPMD
(Single program, multiple data) is a
higher
level abstraction
where programs are run across
multiple processors and operate on different
subsets of the data.
Message passing programming on distributed memory
computer architectures, e.g.,
MPI
.
5
SIMD vs. SPMD
6
Most well

known parallel CFD codes adopt
SPMD parallelism using
MPI
.
e.g., Fluent (Ansys), CFL3D (NASA), to name a few.
Recently, because of the potentially very
high
C/P
ratio by using graphics processor units
(GPUs), parallelization of CFD code using GPUs
has become an active research area based on
CUDA
, developed by Nvidia.
However,
redesign of the numerical scheme
may be necessary to take full advantage of the
GPU architecture.
6
MPI vs. CUDA
7
Split
Harten

Lax

van Leer (SHLL) scheme
(
Kuo
et al
., 2011)
a highly local numerical scheme, modified from the
original HLL scheme
Cartesian grid
~
60 times
of speedup (
Nvidia
C1060 GPU vs. Intel X5472
Xeon CPU) with
explicit
implementation
However, it is difficult to treat objects with
complex
geometry
accurately, especially for
high

speed
gas
flow.
One example
is given in the next page.
Thus, how to take advantage of easy
implementation of
Cartesian grid on GPUs
, while
improving the capability of treating
objects with
complex geometry
becomes important in further
extending the applicability of SHLL scheme in CFD
simulations.
7
Split HLL Scheme on GPUs
8
Spurious waves
are often generated using
staircase

like solid surface for high

speed gas
flows.
8
Shock direction
Staircase

like
IBM
Staircase

like vs. IBM
9
Immersed boundary method (
IBM
)
(
Peskin
, 1972; Mittal &
Iaccarino
,
2005 )
easy treatment of objects with
complex geometry
on a
Cartesian grid
grid computation near the objects become
automatic
or
very easy
easy treatment of
moving objects
in computational
domain w/o
remeshing
Major idea of IBM is simply to
enforce the B.C.’s at
computational grid points
thru interpolation among
fluid grid and B.C.’s at solid boundaries.
Stencil of IBM operation is
local
in general.
Enabling an efficient use of
original numerical
scheme
, e.g., SHLL
Easy
parallel implementation
9
Immersed Boundary Method
10
10
Objectives
11
To develop and validate an
explicit cell

centered finite

volume
solver for solving
Euler equation, based on SHLL scheme, on
a Cartesian grid with
cubic

spline IBM on
multiple GPUs
To study the
parallel performance
of the
code on single and multiple GPUs
To demonstrate the
capability
of the code
with several applications
11
Goals
12
Split HLL Scheme
13
SHLL Scheme

1
13
1
1
L
R
R
R
R
R
L
L
L
L
S
S
U
S
F
S
S
U
S
F
F
i

1
i
i+1
SIMD model for 2D flux computation
+Flux

Flux
A
L
R
R
R
R
A
R
L
L
L
L
S
S
U
S
F
S
S
U
S
F
F
1
1
2
1
2
1
2
L
L
L
L
L
a
DU
D
F
F
2
1
2
1
2
R
R
R
R
R
a
DU
D
F
F
Original HLL
Introduce local
approximations
Final form (SHLL) is a
highly local scheme
New S
R
& S
L
term are approximated w/o involving the
neighbor

cell data.
A
highly local flux
computational scheme: great for
GPU!
14
14
2
1
2
1
2
L
L
L
L
L
a
DU
D
F
F
2
1
2
1
2
R
R
R
R
R
a
DU
D
F
F
Final Form (SHLL)
Flux computation is
perfect
for GPU
application.
Almost the same as the
vector addition
case.
>
60 times
speedup possible using a
single Tesla C1060 GPU device.
Performance compares to single thread
of a high

performance CPU (Intel Xeon
X5472)
i

1
i
i+1
SIMD model for 2D flux computation
+Flux

Flux
SHLL Scheme

2
15
Cubic

spline IBM
16
16
Two Critical Issues of IBM
How to approximate solid boundaries?
Local
Cubic Spline
for reconstructing solid
boundaries w/ much fewer points
Easier calculation of
surface normal/tangent
How to apply IBM in a cell

centered FVM
framework?
Ghost

cell approach
Obtain ghost cell properties by the interpolation
of data among neighboring fluid cells
Enforce BCs at solid boundaries to ghost cells
through data mapping from image points
17
1.
Define a cubic

spline function for each segment of
boundary data to best fit solid boundary geometry
2.
Identify all the solid cells, fluid cells and ghost points
3.
Locate image points corresponding to ghost cells
Cell Identification
Solid
cell
Fluid
cell
Solid
boundary
curve
Ghost
cell
18
Cubic

S
pline
Reconstruction
(Solid Boundary)
The cubic spline
method provides
the advantages
including :
1.
A high order curve
fitting boundary
2.
Find these ghost
cells easily.
3.
Calculate the
normal vector
which is normal to
the body surface.
18
19
19
BCs of
Euler
E
q
ns.
0
0
0
n
V
n
T
n
,,ghost
,,ghost
image
ghost
n image n
t image t
image
ghost
V V
V V
T T
n
unit normal of body
surface
Approximated form
20
Approximate the properties of the image
points using
bi

linear interpolation
among
neighboring fluid cells
IBM Procedures
Image
point
Ghost
point
Interpolation
Fluid
cell
Solid
cell
21
SHLL/IBM Scheme on
GPU
22
Nearly All

Device Computation
22
Initialize
Flux calculation
State
calculation
CFL calculation
new
dt
new
dt
Set GPU device ID
and
flowtime
T >
flowtime
flowtime
+=
dt
Output the result
True
False
Device
Host
Start
IBM
23
23
Results & Discussion
(Parallel Performance)
24
Also named as
“
Schardin’s
problem
”
Test Conditions
–
Moving shock w/ Mach
1.5
–
Resolution:
2000x2000 cells
–
CFL
max
=0.2
–
Physical time:
0.35 sec. for
9843
time

steps using one GPU
24
Parallel Performance

1
L=1
H=1
Moving shock
x
0.2 @ t=0
25
Resolution
2000x2000 cells
GPU cluster
GPU: Geforce GTX590
(2x 512 cores, 1.2 Ghz
3GB DDR5)
CPU: Intel Xeon X5472
Overhead w/ IBM
3% only
Speedup
GPU/CPU: ~ 60x
GPU/GPU: 1.9 @2 GPUs
GPU/GPU: 3.6 @4 GPUs
0
0.5
1
1.5
2
2.5
3
3.5
4
0
100
200
300
400
500
600
1 GPU
2 GPUs
4 GPUs
Compute time
Speedup
25
Sec.
Speedup
Parallel Performance

2
26
Results & Discussion
(Demonstrations)
27
27
In the case of
400x400
cells w/o IBM,
the staircase solid boundary
generates spurious waves, which
destroys the accuracy of the surface
properties.
By comparison, the case w/ IBM
shows much more improvement for
the surface properties.
w/ IBM
w/o IBM
Shock over a finite wedge

1
28
28
with IBM
w/o IBM
Shock over a finite wedge

2
All important physical phenomena are well captured
by the solver with IBM without spurious wave
generation.
t= 0.35 s
Density contour comparison
29
Transonic Flow past a NACA Airfoil
pressure
pressure
IBM result
Staircase boundary
w/o IBM
In the left case, the spurious waves appear near the solid boundary,
but in the right case, we modify the boundary by using the IBM.
30
Transonic Flow past a NACA Airfoil
Upper surf.
Lower surf.
Distribution
of pressure around the surface of
the airfoil
Ghost cell method,
J
. Liu et al., 2009
New approach method
These 2 results are very closed, and the right result is
made by Liu in 2009, and the left result is made by the
cubic spline IBM.
31
Transonic Flow past a NACA Airfoil
Top

side shock wave comparison
*
Petr
Furmánek
, “
Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in
Prague, 2008
New approach method
Furmanek
*, 2008
32
Transonic Flow past a NACA Airfoil
Bottom

side shock wave comparison
*
Petr
Furmánek
, “
Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in
Prague, 2008
New approach method
Furmanek
*, 2008
33
Conclusion & Future Work
34
A cell

centered 2

D finite

volume solver for the
inviscid
Euler equation, which can easily treat
objects with complex geometry on a Cartesian
grid by using the cubic

spline IBM on multiple
GPUs, is completed and validated
The addition of cubic

spline IBM only increase 3%
of the computational time, which is negligible.
Speedup for GPU/CPU
generally exceeds 60 times
on a single GPU (
Nvidia
,
Telsa
C1060) as
compared to that on a single thread of an Intel
X5472 Xeon CPU.
Speedup for GPUs/GPU
reaches 3.6 at 4 GPUs
(GeForce) for a simulation w/ 2000x2000 cells.
Summary
35
To modify the Cartesian grid to the
adaptive mesh grid.
To simulate the moving boundary problem
and real

life problems with this immersed
boundary method
To change the SHLL solver to the true

direction finite volume solver, likes QDS
Future Work
36
36
Thanks for your patient
and Questions ?
Comments 0
Log in to post a comment