RPIatSuperComputing2011_01x - SCoReC

exhaustedcrumΜηχανική

24 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

89 εμφανίσεις


NYS High Performance Computation
Consortium funded by NYSTAR at $1M/year
for 3 years


Goal is to provide NY State users support in
the application of HPC technologies in:


Research and discovery


Product development


Improved engineering and manufacturing
processes


The HPC
2

is a distributed activity
-

participants


Rensselaer, Stony Brook/Brookhaven, SUNY
Buffalo, NYSERNET

2


Xerox


Corning


ITT Fluid Technologies:
Goulds

Pumps


Global Foundries

Objectives


Demonstrate
end
-
to
-
end solution of two
-
phase flow
problems.


Couple with structural mechanics boundary
condition.


Provide interfaced, efficient and reliable software
suite for guiding design
.

Tools


Simmetrix SimAppS Graphical Interface


mesh
generation and problem definition


PHASTA


two
-
phase level set flow solver


PhParAdapt



solution transfer and mesh adaptation
driver


Kitware

Paraview



visualization

Systems



CCNI
BG/L, CCNI
Opterons

Cluster

REPLACE
WITH
ANIMATION

Fluid ejected into air.

Ran on 4000 CCNI BG/L cores.


Six iterations of mesh adaptation on two
-
phase simulation.

Autonomously ran on 128 cores of CCNI
Opterons

for approximately 4 hours



Initial work interfaces simulations
through serial file formats for
displacement and pressure data.


Structural mechanics simulation
runs in serial. PHASTA simulation
runs in parallel.


Distribute serial displacement data
to partitioned PHASTA mesh.


Aggregate partitioned PHASTA
nodal pressure data to serial input
file.


Modifications to automated mesh
adaptation Perl script.


Structural Mechanics Mesh
of Input Face

PHASTA Partitioned Mesh
of Input Face

Objectives


Demonstrate
capability of available computational
tools/resources for parallel simulation of highly
viscous sheet flows.


Solve a model sheet flow problem relevant to the
actual process/geometry.


Develop and define processes for high fidelity twin
screw extruder parallel CFD simulation
.

Investigated Tools

(to date)


ACUSIM AcuConsole and AcuSolve, Simmetrix
MeshSim,
Kitware

Paraview

Systems



CCNI
Opterons

Cluster


High Aspect Ratio Sheet


Aspect ratio : 500:1


Element count: 1.85
Million


7
mins

on 512 cores


300
mins

on 8 cores


9


Mesh generation in
Simmetrix

SimAppS

graphical interface.


Gaps that are ~1/180 of large
feature dimension
.

10

* http
://en.wikipedia.org/wiki/Plastics_extrusion

** https
://sites.google.com/site/oscarsalazarcespedescaddesign/project03

Single Screw Extruder CAD**

Conceptual
Rendering
of Single Screw Extruder
Assembly*

Objectives


Apply
HPC systems and software to setup
and run 3D pump flow simulations in hours
instead of days.


Provide automated mesh generation for fluid
geometries with rotating components.

Tools



ACUSIM Suite, PHASTA, ANSYS CFX,
FMDB,
Simmetrix

MeshSim, Kitware
Paraview

Systems



CCNI
Opterons

Cluster


AcuConsole

Interface


Problem definition, mesh
generation, runtime monitor,
and data visualization



Simmetrix

provided customized mesh generation and
problem definition GUI after iterating with industrial
partner.


Supports
automated identification of
pump
geometric model features
and application of
attributes


Problem
definition with support for exporting data
for
multiple CFD analysis tools.


Reduced mesh generation time frees engineers to
focus on simulation and design optimizations


improved
products


Goal: Develop

simulation
technologies that
allow

practitioners
to

evaluate
systems of
interest.


To meet this goal we


Develop
adaptive methods

for reliable simulations


Develop methods to do all
computation on
massively parallel computers



Develop
multiscale

computational

methods


Develop
interoperable technologies
that

speed
simulation system development


Partner on the construction of

simulation
systems
for
specific applications
in multiple areas



Software available (
http://www.scorec.rpi.edu/software.php
)


Some tools not yet linked


email
shephard@scorec.rpi.edu

with any questions


Simulation Model and Data Management


Geometric model interface to interrogate CAD models


Parallel mesh topological representation


Representation of tensor fields


Relationship manager


Parallel Control


Neighborhood aware message packing


Iterative mesh partition improvement with multiple criteria


Processor mesh entity reordering to improve cache
performance


Adaptive Meshing


Adaptive mesh modification


Mesh curving


Adaptive Control


Support for executing parallel adaptive unstructured mesh
flow simulations with PHASTA


Adaptive
multimodel

simulation infrastructure


Analysis



P
arallel
H
ierarchic
A
daptive
S
tabilized
T
ransient
A
nalysis
software for compressible or incompressible, laminar or
turbulent, steady or unsteady flows on 3D unstructured
meshes (with U. Colorado)


Parallel hierarchic
multiscale

modeling of soft tissues


Interoperable Technologies for Advanced

Petascale

Simulations (ITAPS)

Mesh

Geometry

Relations

Field

Common

Interfaces

Component

Tools


Are unified

by

Petascale

Integrated

Tools


Build on

Mesh

Adapt

Interpolation

Kernels

Swapping

Dynamic

Services

Geom
/Mesh

Services

AMR

Front tracking

Shape

Optimization

Solution

Adaptive

Loop

Solution

Transfer

Petascale

Mesh

Generation

Smoothing

Front

tracking


Excellent strong scaling


Implicit time integration


Employs the partitioned mesh for
system formulation and solution


Specific number of ALL
-
REDUCE
communications also required


#Proc.

El./core

t(sec)

scale

512

204,800

2120

1

1,024

102,400

1052

1.01

2,048


51,200

529

1.00

4,096


25,600

267

0.99

8,192


12,800

131

1.02

16,384


6,400

64.5

1.03

32,768

3,200

35.6

0.93

105M vertex mesh (CCNI Blue Gene/L)

1 billion element anisotropic mesh on
Intrepid Blue Gene/P

#of
cores

Rgn

imb

Vtx
imb

Time (s)

Scaling

16k

2.03%

7.13%

222.03

1

32k

1.72%

8.11%

112.43

0.987

64k

1.6%

11.18%

57.09

0.972

128k

5.49%

17.85%

31.35

0.885

Without
ParMA

partition improvement strong
scaling factor
is 0.88
(time is 70.5
secs
).

C
an yield 43
cpu
-
years savings
for production runs!


AAA 5B elements: full
-
system scale
on
Jugene

(IBM BG/P system)




Requires functional support for


Mesh distribution


Mesh
level
inter
-
processor communications


Parallel mesh
modification


Dynamic load balancing


Have parallel implementations for each


focusing on increasing scalability


Initial mesh: uniform, 17 million mesh
regions


Adapted mesh: 160 air bubbles 2.2 billion
mesh regions


Multiple predictive load balance steps used
to make the adaptation possible


Larger meshes possible (not out of memory
)

Initial and adapted mesh


(zoom
of

a

bubble), colored by

magnitude
of mesh size field

Mesh size field of air bubbles distributing in a
tube

(
segment of the
model


64 bubbles total)


Test strong scaling uniform refinement
on Ranger 4.3M to 2.2B
elements




Nonuniform

field driven refinement
(with mesh
optimization)
on Ranger
4.2M to
730M
elements (time for
dynamic load balancing not included)



Nonuniform

field driven refinement
(with mesh optimization operations)
on Blue Gene/P 4.2M to
730M
elements (time for dynamic load
balancing not included)

# of Parts

Time (s)

Scaling

2048

21.5

1.0

4096

11.2

0.96

8192

5.67

0.95

16384

2.73

0.99

# of Parts

Time (s)

Scaling

2048

110.6

1.0

4096

57.4

0.96

8192

35.4

0.79

# of Parts

Time (s)

Scaling

4096

173

1.0

8192

105

0.82

16384

66.1

0.65

32768

36.1

0.60


Tightly coupled


Adv: Computationally
efficient


Disadv
: More complex code
development


Example: Explicit solution of

cannon blasts



Loosely coupled


Adv: Ability to use existing

analysis codes


Disadv
: Overhead of multiple
structures and data conversion


Example: Implicit high
-
order

Active flow control modeling


t=0.0

t=2e
-
4

t=5e
-
4

Adaptive Loop Construction


Adaptive
Loop
Driver


C++


Coordinates API calls to execute solve
-
adapt loop


phSolver



Fortran 90


Flow solver scalable to 288k cores of
BG
-
P, Field
API


phParAdapt



C++


Invokes
parallel mesh adaptation


SCOREC FMDB and
MeshAdapt
,
Simmetrix

MeshSim

and
MeshSimAdapt


Adaptive Loop
Driver

phSolver

phParAdapt

27

Compact Mesh and
Solution Data

Mesh Data
Base

Solution
Fields

Field
API

Field
API

Control

Control

Field
Data

Field
Data

28


mesh close
-
up before
and after correcting
invalid mesh regions
marked in yellow


Mesh curving applied to 8
-
cavity
cryomodule

simulations


2.97 Million curved regions


1,583 invalid elements corrected


leads to stable simulation and
executes 30% faster


FETD for short
-
range wakefield calculations


Adaptively refined meshes have
1~1.5


million curved regions


Uniform refined mesh using small mesh

size has
6

million curved regions


Electric fields on the three refined curved meshes

Initial
mesh has 7.1 million regions

Initial
mesh
is

isotropic outside boundary layer

The adapted mesh: 42.8 million regions


7.1M
-
>10.8M
-
>21.2M
-
>33.0M
-
>42.8M

Boundary layer based mesh adaptation

Mesh is
anisotropic



Multiscale

simulation
linking
microscale

network
model to a
macroscale

finite element continuum
model.


Collaborating with
experimentalists

at the
University of Minnesota



Macroscale

Model

Microscale Model

Nano
-
indentation of a thin film.

Concurrent model

configuration at 60th load

step (3 A indentation

displacement). Colors
represent

the sub
-
domains in which

various models are used.

Nano
-
void subjected to hydrostatic

tension. Finite element
discretization

of the problem domain and

dislocation structures.

Parallel

Computing

Methods

size scale

circuits

devices

atoms/carriers

design

manufacture

use/performance

Simulation

Automation

Components

Device

simulation

Super
-
resolution

lithography tools

Reactive ion

etching

variation
-
aware

circuit design

1st principles

CMOS modeling

Modeling/simulation
development

Technology
development

Mechanics of

damage nucleation
in devices

33

E

Fermi level

N

U

U

N

Poisson

Schr
ödi
nger

U

I

Input to circuit level from
atomic level physics


As Si CMOS devices shrink nanoelectronic effects
emerge.


Fermi
-
function based analysis gives way

to quantum energy
-
level analysis.


Poisson and Schrodinger equations reconciled

iteratively, allowing for current predictions.


Carrier dynamics respond to strain

in increasingly complex ways from mobility

changes to tunneling effects.


New functionalities might be exploited


Single
-
electron transistors


Graphene semiconductors


Carbon nanotube conductors


Spintronics


encoding information into charge carrier’s spin

34


Motivation:


Reducing feature size in has made the
modeling of

underlying physics
critical.


In projective lithography simple biases

not adequate


In holographic lithography near
-
field
phenomenon is predominant


Modeling approach must be based on
Maxwell’s equations


Goal
:



Develop unified computational
algorithms for the design and analysis of
super
-
resolution lithographic processes
that model the underlying physics

with high fidelity

Projective Lithography

Holographic

Lithography

35


To handle SRAM
-
scale systems, we expect much larger computational
systems,
e.g.
, 10
5
-

10
6
surface elements.


Transport tracking scales O(
n
2
) with number of surface elements
n
.


Parallelizes well


every view factor can be computed completely
independently of every other view factor, giving almost linear speed up.


Computational complexity of chemistry solver depends upon particular
chemical mechanisms associated with etch recipe. Tend to be O(
n
2
).

Cut away view of reactive ion etch simulation of an aspect ratio 1.4 via into a dielectric substrate with
7% porosity, and complete selectivity with respect to the underlying etch stop. A generic ion
-
radical
etch model was used. ~10
3

surface elements. [Bloomfield

et al
., SISPAD 2003, IEEE.]

36


At 90 nm and below, devices have come to rely on increased carrier mobility
produced by strained silicon.


As devices scale down, the relative importance of scattering centers
increases.


Can we have our cake and eat it too? How much strain can be built into a
given device before processing variations and thermo
-
mechanical load
during use cause critical dislocation shedding?

Continuum FEM calculations

automatically identify critical

high
-
stress regions.

A local atomistic problem is constructed and
an MD simulation is run, looking for criticality.
Results feed back to continuum.

37


Advanced meshing tools

and expertise exist at RPI

and associated spin
-
off


Leverage tools to support

CCNI projects such as the
advanced device
-
modeling.


Local refinement and
adaptivity

can help carry the
computation resources
further. “More bang for the
buck.”


38