Algorithm and Scaling
(Issues) for Aerospace (CFD)
Codes
Sukumar
Chakravarthy
src@metacomptech.com
www.metacomptech.com
1
Scope of Presentation
Range of aerospace CFD and related
applications
Hierarchy of simulation approaches
Hierarchy of algorithmic approaches
Algorithm and scalability issues and
considerations
2
Presentation Approach & Goals
A picture is worth a thousand words
We will use ten thousand words and 1
picture
== eleven thousand word

equivalents
Catalog, serve as collective conscience
Discuss relationship between application
needs, algorithms, modeling approaches
and HPC issues and possibilities
3
CFD++ Aerospace Applications
External aerodynamics
Propulsion integration
Component integration
Systems
Cabin airflow
FADEC
Icing
Fuel tank purge
Thrust reverser
Propulsion
Nozzle design
Jet noise
CFD++ Aerospace Applications
Plumes
Trajectory
Aerodynamic coefficients
Drag polar
Dynamic derivatives
Store separation
Canopy separation
Sabot separation
Stage separation
Pilot seat ejection
Projectiles
Spinning projectiles
CFD++ Aerospace Applications
Synthetic jets
Turbomachinery
Blade design
Blade cooling
Pulsed detonation
Flapping wings
Flexible wings
Entomopters
Helicopters
Propellers, rotors
Parachutes
Parachutists, sky

diving
CFD++ Aerospace Applications
Spacecraft launch
Reentry vehicles
Rocket assisted landings (Earth, Mars, Venus)
X

Prize vehicles
Land speed record vehicles
Bullets, artillery rounds
Liquid fuel breakup
Liquid fuel sloshing, feed
Acceleration, deceleration effects
Aeroacoustics
Flow Structure Interaction (FSI)
What’s special about Aerospace CFD?
Extremes of scales, operating conditions,
physics and chemistry, speeds, application

specific needs (extraction of useful
information)
Nonlinearity is most often inherent
It is not just the simulation itself that
counts
If there is no information output required,
no need to do the simulation
Hierarchy of problem classes
Steady state/unsteady problems
Small, medium and large scale problems
Entire configurations as well as analysis of
components
Engineering analysis, scientific analysis,
trouble shooting
All speeds, atmospheric conditions, diverse
fluids and their properties
10
Physics (nature)
Math Model of Physics
Numerical Model of Math Model
Computational Model
Human(s) in the loop
Simulation Results
Common Elements of Simulations
Common Underlying Physical Processes
ij
P
11
Convection:
Production:
Dissipation:
Redistribution:
Diffusion:
Evolution:
t
u
u
j
i
k
k
j
i
x
u
u
u
.
ij
*
ij
ij
d
Summary of some HPC issues
Loading the problem, saving final results
Checkpointing
Computational vs. communications
performance (scalability)
Data extraction issues
Robustness (10000

way parallel should be as
robust as serial algorithm)
Data

center issues (throughput, storage)
Visualization, interaction with running case
12
Modeling Hierarchy
Potential flow assumption
Small

disturbance approaches
Inviscid

flows taken separately, and
hybridized with boundary layer theory
Reynolds/Favre

averaged N

S equations
with phenomenological turbulence models
LES and hybrid RANS

LES approaches
Special equations and models
13
Mesh possibilities
Surface mesh only (panel methods)
Cartesian mesh, almost Cartesian mesh
Structured mesh
–
hex (3D) & quad (2D)
Unstructured
–
all cell types
Hybrid structured and unstructured meshes,
hex

core meshes
Patched and overset meshes
Moving (dynamic) meshes
Flexible boundaries and meshes
14
“Extreme Grids”
Aspect ratios of 10000 to 1 or more
(boundary layer resolution with Y+ < 1)
Mesh sizes of hundreds of million and more
Extreme grid
spacings
present in mesh
15
Numerical approaches
Explicit and implicit
Fractional steps and factored schemes
Finite volume, finite difference schemes
Finite element schemes
Spectral and spectral element schemes
“Local” schemes and “global” schemes
16
Some HPC algorithmic challenges
Challenges of making implicit schemes be
really implicit on multi

CPU computations
Ensure insensitivity of results to variations in
number of parallel processes used
How to make the 10000

way parallel
computation as robust as the serial algorithm
How to make the 10000

way parallel
computation converge as well but in much
less time
17
Adaptive meshes
Adaptive elements (cells)
Adaptive grids
H

adaptation, P

adaptation, H

P

adaptation
18
Classification of Algorithms
Low information density schemes
–
expand
stencil to improve accuracy
High information density schemes
–
expand
information content per cell (e.g. use values
and derivatives, or values at multiple
collocation points)
Homogeneity (or lack of) of
discretization
and
solution methodology
Homogeneity (or lack of) underlying physics
models
19
The usual scalability considerations
Computation and communication
Computation versus communication
Overlap of computation and
communication
Bulk of communication for local schemes
can follow pattern of one to a few
connectivity
Global operations
–
global reductions often
determine scalability
20
21
Recent Scalability Improvements
CFD++ now scales well to very large number of cores
The scalability improvements are universal
–
they apply to all modern HPC
platforms from all vendors
Tests have shown effective performance all the way up to 4096 cores
Even relatively small grids (e.g. 16 million cells) scale well to 2048 or even
4096 cores, depending on computer and type of case run
Goal
–
to demonstrate similar performance on 10000 to 40000 cores
0
1
2
3
4
5
6
0
200
400
600
800
1000
1200
Scaling
Performance
# of CPU cores
0
10
20
30
40
0
1000
2000
3000
4000
5000
Scaling
Performanc
e
# of CPU cores
Ex 1: 33M cells, Computer 1, Case 1
Ex 2: 16M cells, Computer 2, Case 2
Some Influences on Scalability
Effect of physics
–
increased sophistication
means more computation, often more
scalability
Effect of
numerics
–
increased accuracy
means more computation, and more
communication, often more scalability
Effect of grid
–
more grid means more
computation and less communication for
“local” algorithms
22
Additional thoughts on Parallel Processing
Two ways of using multiple compute
engines
Parallel computations
Pipelined computations
Pipelined algorithms have not been
exploited too much at the HPC level
Process level and thread level parallelism
beginning to be combined (e.g. to exploit
GPGPUs)
23
Load balancing issues
Structured vs. unstructured grids (usually
solved by weighted domain decomposition)
Adaptive algorithms and adaptive meshes
Different physics in different regions
Moving meshes and overset meshes
24
Optimization considerations
Parallel algorithms for optimization
How to use large numbers of processors
E.g. Do many cases in parallel
Pre

compute cases matrix, sensitivity, etc.
and then train neural networks or tabulate
sensitivity before applying optimization
procedure
25
Multi

physics considerations
Communications between non

homogeneous simulation tools
Communications between diverse hardware
platforms
Tight coupling vs. loose coupling
considerations
26
Need for Parallel I/O and File systems
Very large scale problems
Very large number of processors
Initial load and final save + intermediate
data output
Asymmetric data extraction needs
27
Typical “post

processing” needs
Global information (forces and moments,
lift, drag, torque)
Semi

global information (forces and
moments along wing span, along fuselage)
Reduced subsets
–
iso

surfaces, surface
data, cut

planes
Time

averages versus instantaneous values
In

situ “post”

processing can be very useful
28
Single and Distributed File Parallel I/O
Parallel I/O (PIO) can be accomplished in two
ways
In Single

File mode, PIO reads and writes
from the current full

mesh/full

solution files.
In Distributed

File mode, PIO reads and writes
from a set of files (e.g. placed in
subdirectories) associated with each parallel
process
29
Interactive massively parallel computing
Steady state versus Transient (unsteady)
computations
Links with front

end and graphical
processing
Even post processing of large scale
problems may require substantial parallel
computing resources
One should not just focus on the “batch”
computing model
30
Some elements of the balancing act
Computation
Communication
Memory requirements
I/O requirements
Accuracy requirements
Robustness requirements
In

situ solution processing requirements
31
Bandwidths to consider
Number of cores vs. number of I/O
channels
Memory bandwidth from core to memory
Memory
access conflicts
32
Some old ideas revisited
Paying more attention to connectivity
architecture
Minimization of hops
Domain decomposition that minimizes
traffic between switches
How many switches or hops (groups of
nodes), how many nodes, how many
processors in a node, how many cores per
processor
33
Final thoughts
The challenge of producing codes that work in
the user’s hands and computing facilities
Ease of use
Scalability and effectiveness vs. just scalability
Resource maximization versus minimization
What can be done with less
What can be done with more
What more can be
done with less
Thank you
34
Comments 0
Log in to post a comment