

1


Special Problem Report
Investigation of a new Parallel Algorithm for Turbomachinery
Calculations and New Results for Self

Recirculation Control
submitted in partial fulfillment of the
requirement for the degree of
Masters of Science
In
Aerospace En
gineering
by
Joseph Gillman
gtg638s@mail.gatech.edu
under the guidance of
Dr. Lakshmi N. Sankar
School of Aerospace Engineering
Georgia Institute of Technology
Atlanta, GA
December 2004
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


2


Introduction
Parallel computing has given the world of Com
putational Fluid Dynamics a new method
for decreasing time required to obtain a solution. Three

dimensional viscous,
turbulent
Navier

Stokes solvers for turbomachinery, like GT

TURBO

3D
,
are
very
time

intensive
. Normally it
takes a day or more to run a co
de on a single

processor PC for 35000 iterations.
Adding parallel directives to the code, such as Message Passing Interface (MPI) or
OpenMP is the easiest way to take advantage of parallelized computing, instead of re

writing the
code in High

Performance
Fortran. The computing facilities at Georgia Institute of Technology
include a High Performance Computing (HPC) Center where such parallel codes can be
implemented and tested by the Georgia Tech research community.
“The OpenMP Application Program Interfa
ce (API) supports multi

platform shared

memory parallel programming in C/C++ and Fortran on all architectures, including Unix
platforms and Windows NT platforms. Jointly defined by a group of major computer hardware
and software vendors, OpenMP is a porta
ble, scalable model that gives shared

memory parallel
programmers a simple and flexible interface for developing parallel applications for platforms
ranging from the desktop to the supercomputer.”[
1
]
In this parallelization study, OpenMP is used to paralle
lize GT

TURBO

3D. MPI is concurrently
being used by other researchers working with this code, however OpenMP is attractive because
less additional programming is required to implement it, compared to MPI. However, OpenMP
gives the programmer much less con
trol over important aspects of parallelization, most notably
the division of the computational domain among all processors.
Before the author commenced working on this project, the GT

TURBO

3D code with a
Alternating

Direction
Implicit scheme was already
implemented. During the course of this
research, it was desired to move away from the ADI scheme towards a Lower

Upper
Symmetric
Gauss

Seidel (LU

SGS) scheme because the latter has improved numerical stability
characteristics. However, the LU

SGS scheme
is inherently sequential (not parallelizable) so
parallelization is not straightforward as it was for the ADI scheme.
The DP

LUR scheme [
2
]
was chosen because of all parallelizable variants of the LU

SGS scheme, this required the least
increase in memory
and
CPU work [3,4
]. However, the results with DP

LUR were not as good
as with the original ADI scheme so either other more computationally expensive schemes can be
investigated in the future, or the ADI scheme should be used if parallelization is necessar
y.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


3


Towards the end of the research period, the author began working in analysis of a “self

recirculation” control concept for Rotor 67.
It is desired to increase the stable operating range of
an axial compressor at the low

mass

flow

rate end, but the des
ign mass flow rate of Rotor 67 is
very close to its stall/surge limit. For the full results of that research the reader is referred to [
5
],
but some results not in that work are presented in this paper.
Implicit Time Marching Schemes
The 3

D unsteady comp
ressible Reynolds averages Navier

Stokes equations are solved in
the strong conservation law form. The governing equations are solved in the control volume
form and can be written as:
S
V
S
S
d
n
S
d
n
qdV
t
ˆ
k
ˆ
T
j
ˆ
S
i
ˆ
R
ˆ
k
ˆ
G
j
ˆ
F
i
ˆ
E
Here, S and V refer to the control surface and
control volume respectively, and
n
is the
outward normal vector to surface S.
The inviscid fluxes E, F, and G are calculated to third order spatial accuracy using Roe’s
approximate Riemann solver whereas the viscous fluxes R, S and
T are computed explicitly. The
time derivatives are computed with a two

or three

point forward difference scheme, which is
thus first or second order accurate in time.
LU

SGS Scheme:
In three dimensional Cartesian coordinates, the conservative differe
ntial form of the
Navier

Stokes equations in generalized curvilinear coordinate system can be written as:
T
S
R
G
F
E
q
t
where the boldface type illustrates that this is a vector
equation representing the five equations of continuity, u

, v

, and
w

momentum, and energy
respectively, and the primes denote that the inviscid and viscous fluxes have been transformed
from Cartesian (x,y,z) coordinates to computational space (
). In the remainder of this
report, this will be understood and boldface
and primes will be omitted.
By applying a first

order approximation to the time derivative of the state vector and
linearizing the inviscid fluxes in time (2) can be rewritten in the linearized delta form as
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


4


n
RHS
t
q
t
C
t
B
t
A
I
ˆ
ˆ
ˆ
where
q
E
A
ˆ
,
q
F
B
ˆ
,and
q
G
C
ˆ
are 5x5 matrices describing the inviscid flux Jacobians.
n
RHS
denotes the right

hand side
calculated by “roeresi” and “stress” subroutines at time step “n” and is described by
n
n
n
n
n
n
n
T
S
R
t
G
F
E
t
RHS
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
and
terms are difference
operators which operate on
q in the computational directions (
).
The required matrix inversions are extremely computationally expensive and so Jameson and
Yoon
[12]
proposed that the inviscid flux Jacob
ians be split along the spectral radius as
2
ˆ
max,
A
A
where
max
is the normal velocity through the face (the i

constant face for the A
case) plus the local speed of sound.
C
B
,
are found by replacing
max,
,
ˆ
A
with
max,
,
ˆ
B
and
max,
,
ˆ
C
.
is a dissipation term that enhances stability if
1
by increasing diagonal
dominance of the system of equations, however the rate of co
nvergence is reduced. In this
implementation,
is left at 1.
[
6]
It should be noted that, with this approximation, the
eigenvalues of these split fluxes are no longer consistent with the true propagation speeds of the
flow, which ma
y introduce additional error in time accuracy and in wave propagation.
The eigenvalues of the plus

matrices are now greater than or equal to zero, while those of the
minus

matrices are less than or equal to zero. Now the system of equations to solve becom
es
n
RHS
t
q
t
C
t
B
t
A
t
C
t
B
t
A
I
where
represents backward differences and
represents forward differences as follows:
k
j
i
k
j
i
q
q
A
q
A
,
,
1
,
,
,
k
j
i
k
j
i
q
q
A
q
A
,
,
,
,
1
and similarly for B and C in the
j

and k

directio
ns, respectively.
t
z
y
x
z
z
z
y
z
x
z
y
y
z
y
y
x
y
x
x
z
x
y
x
x
z
y
x
t
k
w
E
k
v
E
k
u
E
k
E
k
w
k
v
k
w
k
u
k
w
k
w
k
k
w
k
v
k
v
k
u
k
v
k
v
k
k
w
k
u
k
v
k
u
k
u
k
u
k
k
k
k
k
C
B
A
1
1
1
1
2
1
1
1
1
2
1
1
1
1
2
0
ˆ
,
ˆ
,
ˆ
2
2
2
2
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


5


In the above formulation,
2
1
2
2
2
2
w
v
u
,
w
k
v
k
u
k
z
y
x
,
t
k
,
2
t
E
E
, and the only difference between
C
B
A
ˆ
,
ˆ
,
ˆ
is that “k
” is replaced by
or
respectively.
[7
]
The Lower

Upper factorization proceeds as follows:
L D U q RHS
, where U is an upper diagonal triangular matrix with null matrices on
the diagonal, D is a block diagonal matrix and L is the lower
diagonal matrix. For the case where
D is a non

singular matrix, this can be rewritten as:
RHS
q
U
D
I
L
D
D
1
1
RHS
q
U
D
D
L
D
1
, where
C
C
B
B
A
A
t
I
L
D
C
C
B
B
A
A
t
I
U
D
, and
C
C
t
B
B
t
A
A
t
I
D
.
Since
2
ˆ
max,
A
A
,
2
max,
A
A
and therefore the diagonal becomes
2
2
2
max,
max,
max,
I
.
The diagonal and plus/minus matrices are arranged into a large block matrix as follows:
RHS
tJ
q
C
B
A
C
B
A
D
ijk
ijk
ijk
ijk
ijk
ijk
ijk
And the Lower

Upper, Symmetric Gauss

Seidel Sweeps sol
ve for the change in state vector by
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


6


RHS
tJ
q
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
D
A
D
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
*
(lower sweep, solve for
*
q
)
*
1
q
q
D
A
D
B
A
D
C
B
A
D
C
B
A
D
C
B
A
D
C
B
A
D
D
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
ijk
(upper sweep, solve for
q
).
Since both sweeps update the state vector continuously, the algorithm is
sequential and not
immediately parallelizable.
DP

LUR Scheme
It is desired to implement the LU

SGS scheme on parallel computers in order to increase
speed of calculations and decrease turn

around time for a CFD solution to be found. As
described earlie
r, the data dependencies that arise from updating the solution vector (
k
j
i
q
,
,
)
during both L

and U

sweeps make cause LU

SGS to have many more parallelization problems
than the original ADI time

marching scheme which solves the system of e
quations with the
Thomas Algorithm.
The Data

Parallel Lower

Upper

Reduction (DP

LUR) Scheme attempts to use Yoon and
Jameson’s LU

SGS scheme as a starting point, and make some adjustments to remove the data
dependencies and make the scheme parallelizable.
The Symmetric Gauss

Seidel sweeps are replaced by a series of pointwise relaxation steps using
the following scheme. First, the off

diagonal terms are neglected and the right

hand side
n
RHS
is divided by the diagonal operator from the L
U

SGS scheme to obtain
0
U
, the initial
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


7


sub

iteration of the state vector solution:
n
k
j
i
k
j
i
n
C
n
B
n
A
RHS
t
I
U
,
,
1
,
,
0
. Then a
series of kmax relaxation steps are made as follows:
1
2
,
,
1
,
,
1
1
2
,
,
1
,
,
1
,
,
,
,
1
,
,
2
k
k
j
i
n
k
j
i
k
k
j
i
n
k
j
i
k
j
i
n
k
j
i
k
j
i
n
C
n
B
n
A
k
U
A
U
A
tJ
RHS
t
I
U
1
2
1
,
,
1
,
,
1
2
1
,
,
1
,
,
1
2
,
1
,
,
1
,
1
2
,
1
,
,
1
,
k
k
j
i
n
k
j
i
k
k
j
i
n
k
j
i
k
k
j
i
n
k
j
i
k
k
j
i
n
k
j
i
U
C
U
C
U
B
U
B
. Then after the k
2max
sub

iterations are completed, the state vector is updated by
max
2
,
,
1
,
,
k
k
j
i
n
k
j
i
U
q
. This was presented
originally for two

dimensions by [
2
] and extended to three dimensions in
[3
]
Parallel Implementation using OpenMP
Reference [
8
] is a comprehensiv
e guide to using OpenMP and is an excellent reference
describing the benefits, methods and techniques for proper parallelization on shared memory
computers. The advantage over using OpenMP versus, say, explicit message

passing (as in
MPI) is that much le
ss programming is required to add the parallelization routines. Some
adjustments might need to be made to the code, such as changing the order of nested loops to
improve parallel performance, but in general the structure of the parallel code is identical
to the
serial version of the code.
However, there is no fine control over how little or how much data is
communicated. Also in MPI, the programmer can manually divide the computational domain
among the processors in a manner he or she decides is the most
efficient. Then, each processor
can do work serially, for example, they can use the sequential LU

SGS scheme, and
communicate the necessary data across the processor

domain borders. In this manner good
parallel speedup would be achieved even though the
main scheme is itself not parallel. Since this
is not the case for OpenMP, a new time

marching scheme must replace the LU

SGS, such as DP

LUR described
previously or DPLR [9
].
In the most basic form, OpenMP directives are used to parallelize for

do loops
. Other
portions of the program are run serially. The best parallel performance is achieved for a large
loop with many calculations because parallelization requires overhead, and the desired parallel
speedup would not be reached for a small loop. Contrar
y to programming for vector computers,
where the innermost loop is put through the vector pipe, when using OpenMP parallelization it is
the outermost loop that is parallelized. This saves some overhead which would be required for
thread creation and destr
uction if the innermost loop wa
s parallelized [10
]. The compiler decides
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


8


which portion of the computational domain (in general, imax x jmax x kmax) each processor
works with.
In implementing OpenMP in GT

TURBO

3D, the parallelization of the outermost in a
set
of nested do loops is done by the command, “C$OMP PARALLEL DO”. Since the C is in the
first column, a non

parallel compiler will ignore the line completely. Therefore the same code
can be compiled on a PC to be executed serially. Virtually every su
broutine has a parallel set of
nested loops. Sometimes extra statements must be inserted after the “C$OMP PARALLEL DO”
line in order to maintain good parallel performance.
By default, all variables within the parallel do loop are shared between all proces
sors. Anything
on the left hand side of an equation however, must be declared as private, so that each processor
updates the variable independently in its section of the computational domain. These variables
are listed under the C$OMP PARALLEL DO line a
s
“
C$OMP+PRIVATE(
privatevar1,privatevar2….
)
”. These private variables have multiple
storage locations , one within the execution context of each thread, for the duration of the parallel
construct. Read or write commands to this variable are for each sep
arate, private instance on
each individual processor’s memory allocation and is inaccessible to the other threads. In
contrast, the shared variables only have one memory location, which all threads can read and
write to, which facilitates communicat
ion be
tween the OpenMP threads[8
].
Yet another clause is the reduction clause, which has storage properties resembling both
private and shared variables. This reduction clause denotes a scalar variable which is the result
of some operation (e.g. addition or mul
tiplication) on the components of a vector, such as the
resul
t of a dot

product [10
]. The reduction attribute is the most efficient way for the compiler to
handle these “mixed” private and
shared variables [8
]. It is implemented with slightly different
sy
ntax, “
C$OMP+REDUCTION(+:
reductionvar
)
” (note the plus sign and colon in front of the
variable name, no spaces).
The parallel codes described in this report were run on the High Performance Computing
(HPC) resources available at Georgia Tech. The specifi
c machine used is called “buteo” and its
specifications are listed on the Georgia Tech HPC website
[
11
]
as follows:
SGI® Origin
TM
3800
Machine name:
buteo
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


9


Front end (
vireo
): 8

processor R14000
TM

based Origin
TM
300, 8 GB memory
32 MIPS®
TM
R14000 processo
rs in the new NUMAflex
TM
architecture, answering to the name
buteo
600 MHz Clock speed
16 GBytes 64

bit shared memory (2 GBytes per node board)
8 MB cache per node board
1.6 GB/sec, each direction, NUMAflex connectivity between C

and R

bricks
.
OS: IRIX
TM
6.5
The code is compiled on the vireo front

end machine using the SGI Fortran 77 compiler, which
is accessed by SecureCRT (a secure Telnet application supported by Georgia Tech Office of
Information Technology) at hostname vireo.gatech.edu. Once logge
d into this host, WinSCP3 is
used (with separate login) to upload the source code, grid file and other input files to the vireo
server. The compiler command used is “f77
–
O2
–
mp code.f
–
o code.exe”. “

O2” is a scalar
optimizer, and the HPC website notes
that “

O2
performs global optimizations, but does not
affect numerical accuracy.” A different optimizer, “

O3” is recommended for specifying
“automatic parallelization,” in which no OpenMP directives are added to the code, and the
compiler decides the bes
t way to parallelize. This was not used as it is inefficient with a large
CFD code such as GT

TURBO

3D.
Once compiled, an .exe executable file is created, but in order to run on the buteo
machine, it cannot be executed immediately. A batch file must be wr
itten that tells the vireo
front

end machine to use the buteo computer, specify how many processors are desired, etc. An
example of a batch file is shown here:
#PBS

N DPAxial_par8_b
#PBS

l ncpus=8
#PBS

q buteo
#PBS

k oe
#PBS

m abe
export MP_SET_NUMT
HREADS=8
cd /scratch/buteo/gtg638s/DP

LUR
limit stacksize unlimited
./LUaxial

par4test5.exe
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


10


This batch file is saved in the user’s vireo.gatech.edu home directory. The first line is the
name of the batch file, in this case “DPAxial_par8_b.” The second line
specifies that 8
processors are desired for this job. The next line specifies which computer to run the program
will be used (buteo is used exclusively in this study). The fourth line “

k oe” specifies that both
output (o) and error (e) files will be save
d in the user’s home directory (which is where the batch
file is located). For example, in GT

TURBO

3D, when running serially on PC the screen outputs
“dq1max”, theta, and iteration number, so this “screen output” will be saved as a text file. The
line “

m
abe” specifies that an email notification will be sent upon code abortion (a), beginning
(b) and ending (e). The line “export MP_SET_NUMTHREADS=8” tells the IRIX operating
system that 8 CPUs will be utilized. The previous line told the program scheduler
how many
processors are used while this line is used by the operating system. It is imperative that the two
numbers match (8 in this case). Lastly, the “cd….” line describes how to get from the location of
the batch file (normally the home directory as de
scribed above) to the executable file. The
scratch directory is often used because GT

TURBO

3D outputs the grid and solution file after a
specified number of iterations, and since these files can be quite large, the scratch folder is the
best place from wh
ich to execute the program as disk space is unlimited. However, an automated
system deletes older files from time to time so every file, including grid and input files, must
have a “new” timestamp. These files can be opened in a text editor and re

saved wi
thout
changing the numbers, and this prevents the files from being deleted before the program can
execute.
After the code is compiled and all input files are placed with the executable in the scrath
directory, the batch job is submitted from the vireo.gate
ch.edu command prompt (using
SecureCRT) by “qsub DPAxial_par8_b” and then the batch job will execute once it gets to the
front of the queue.
Parallelization Results
As mentioned previously, it was desired to parallelize the LU

SGS scheme in a manner
such t
hat the speedup gain would be similar to that of the original ADI scheme. The ADI
speedup factor was measured as approximately 6 when using 8 CPUs. When OpenMP routines
were added directly to the LU

SGS scheme, the speedup factor was only 3 for use of 8
processors. This is because much of the computational work in GT

TURBO

3D is done in the
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


11


SLPS subroutine (which calls the implicit time

marching scheme

be it ADI, LU

SGS, or DP

LUR). Therefore the data dependencies of the LU

SGS scheme have a large impa
ct on parallel
performance. Simply adding the OpenMP directives to a non

parallelizable LU

SGS scheme
yields an unsatisfactory result.
The references which discuss the parallelization of the LU

SGS scheme all use Message
Passing Interface, instead of OpenM
P. It was hoped that the algorithmic adjustments to the LU

SGS scheme should be the same whether using OpenMP or MPI. Some schemes mentioned in
these papers
su
ch as the “Hybrid DP

LUR” [2,3
] can only use message passing routines because
precise control is
required over which data is communicated between processors. In the Hybrid
DP

LUR for example, LU

SGS is performed on each individual processor, and on the data

domain border between two processors, DP

LUR is used because of its parallelizable nature.
Two
implementations of the parallelized LU

SGS scheme were attempted in this study. In one,
the L elimination and U elimination routines were kept intact, but there was the addition of sub

iterations on
q
, the updated state vector. The
state vector in the k

direction was based on the
previous sub

iteration, and therefore the data

dependency in the k

direction was removed and
should therefore be parallelizable. For this case the speed

up factor improved to match that of
the ADI scheme. I
n addition the results which were output by the code for mass flow rate and
stagnation pressure ratios mirrored the LU

SGS scheme very closely. However, the code became
unstable after only 1600 iterations, and this happened despite using a large number of
sub

iterations (16). Using more and more sub

iterations was necessary to increase the stability
(measured by the number of iterations the code was able to complete), which increased the CPU
effort required.
The second attempt more closely resembled the DP

LUR scheme of [
2
]. Instead of L and
U sweeps, both with sub

iterations, the symmetric Gauss

Seidel sweeps are replaced by the
point
wise
relaxation scheme described earlier. There are drawbacks from using such a scheme,
however. Much of the increased stab
ility from using LU

SGS is lost because the Gauss

Seidel
sweeps are replaced by Jacobi sweeps, which are less stable and take an increased number of
iterations to achieve convergence. When this was implemented on GT

TURBO

3D, the program
ran for 16000 ite
rations, without the need to increase the number of sub

iterations all the way up
to 16 as was the case earlier. However, the state vector update was so small that there was
hardly any noticeable change from one time step to the next when examining the re
sults file.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


12


For further investigation, it is recommended that a hybrid scheme be investigated with MPI
routines. Using schemes that could be implemented with OpenMP take much more CPU time
than the ADI

parallel version of the code do to the requirement of
sub

iterations. Therefore it is
recommended to continue use of the parallel ADI scheme when quick turnarounds are desired, as
the ADI scheme is sufficiently stable for the design mass flow rate condition of Rotor 67 and
Rotor 37. The self

recirculation
control code, described in the next section, uses the ADI scheme
and it is parallelized with the speedup shown in Figure[
1
] as the ADI curve.
Speedup study for parallelization
0
1
2
3
4
5
6
7
8
9
10
0
2
4
6
8
10
12
14
16
18
# cpu
speedup factor
ADI
LUSGS
DPLUR
Figure
1
: Speed

up chart for original ADI scheme, sequential LU

SGS scheme, and DP

LUR
scheme
Although the speed

up rate for DP

LUR is the best, the speedup factor calculated is
relative to performing DP

LUR on one processor.
This is much more computationally expensive
that the ADI scheme, as discussed earlier. Therefore even with the improved speed

up, the fact
t
hat the DP

LUR currently gives less than optimal results suggests that even if the DP

LUR code
is improved further so that a converged solution is achieved, it is much more efficient to use the
parallel ADI scheme for parallel computation.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


13


Self

Recirculat
ion Control Concept
It has been shown previously in num
erical studies by Niazi [13], and Stein [14
] that open

and closed

loop bleeding and injection of air at the case can both improve stability of a
compressor and increase its operating range at lower ma
ss flow rates. The author contributed to a
study using GT

TURBO

3D of a unique “self

recirculation” control concept first developed by
Hath
away [15
].
Active control approaches, while effective, have two drawbacks. Firs
tly, since the air is
being ble
d, we
have a loss of high pressure air which hinders the total pressure rise from the
compressor. And secondly there is still the need for an active controller and actuator.
[
16
]
Hathaway
’s concept
has already been applied to the Rotor 67 configuration.
The conc
ept is
shown in Figure [
2
].
Hub
Case
Injection
Bleed
Rotor
In Flow
m
1
m
2
Hub
Case
Injection
Bleed
Rotor
In Flow
Hub
Case
Injection
Bleed
Rotor
In Flow
m
1
m
2
Figure
2
: ‘Self

recirculation’ control treatment concept as proposed by Hathaway
[15]
Hathaway describes that a main barrier to increased performance and operating range is
the airflow at the tip. Referred to as “tip

blocka
ge effects,” the vortices and reversed flow
regions in the small clearance region between the spinning blade tip and the stationary case are
detrimental to overall compressor stage performance. Therefore it is desired to alleviate this
problem with the ble
ed

injection control concept.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


14


Airflow is bled downstream of the blade trailing edge and then re

injected upstream of
the leading edge near the blade tip. The concept requires both bleed and injection mass flow
rates to be the same.
The injected fluid is
directed to lie along the casing end

wall to energize the
local
low momentum fluid. The injection port is located just upstream of the region of low
momentum fluid
[16
]
.
The necessary adjustments to the GT

TURBO

3D code and boundary conditions are
described
fully in
[16] and [5
] and the latest results from the num
erical study are given in [5
]. In
this work, some extra descriptions of the work that went into the study which were not described
in the final paper
[
5
] are provided.
Figure
3
: Tip
blockage
vortices shown
Mid

pitch, 30 kg/sec, ¼ rotor rev
olution
Low relative static
pressure
Higher relative
static pressure
Shock location
Reversed flow
regions near blade
tip
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


15


Figure
4
: This shows the stagnation of the bleed port is higher than that of the injection port
At the beginning of the author’s contributions to the self

recirculation concept study, it
was desired to show that th
e self

recirculation was indeed a passive concept by ensuring that the
stagnation pressure at the bleed port was higher than that at the injection port. This difference in
stagnation pressure is the only way to drive the bled airflow from the bleed port ba
ck to the
injection port.
Figure
4
shows that
indeed this is the case. Normalized stagnation pressure is
shown in the chart in the upper right

hand corner,
and the stagnation pressure at the bleed port is
higher than that at the injection port.
The next s
eries of figures show an uncontrolled case at low mass flow rate. There is no
bleed or injection, and even after many rotor revolutions, where in a stable case a converged
solution would result, it is clearly seen that vortices are still being shed in the
unstable case. The
pictures show velocity vectors and pressure contours as computed by Fieldview.
The velocity
vectors also show reversed flow near the blade tip, when there is a vortex. The case shown is for
a specified mass flow rate through the compres
sor of 31 kg/sec.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


16


(a) (b)
(d)
Figure
5
: Uncontrolled case, (a): theta=
1017°
, (b): theta=
10
89
°
, (c): theta=
1
162
°
,
(
d
): theta=
1
235
°
. This is at
k=21 (mid

pitch).
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


17


For the controlled case, these unstable phenomena are stabilized by the bleeding and injection,
and the vortices and reversed flow regions disappear two revolutions before the above pictures
were taken.
The initial instabiliti
es near the blade tip are no longer present at theta of 363° as
shown in the following set of pictures.
(a) (b)
(c)
(d)
Figure
6
: Recirculation case, (a): theta=
72
°
, (b): theta=
1
45
°
, (c): theta=
218
°
, (d): theta=
290
°
.
This is at k=21 (mid

pitch).
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


18


Figure
7
: theta= 363°
also at mid

pitch.
Note th
at the blockage vortex at the tip is no longer
present
and it did not
arise again
for the remainder of the computational
simulation
(up to 2
rotor
revolutions).
Conclusion
s
In this study, a parallel version of LU

SGS was attempted. The results are not
as good as
with the original ADI parallel scheme, and
more CPU time is required for parallelized LU
schemes compared to ADI schemes. Therefore it is recommended that current design needs for a
rapid turnaround time use the ADI version of the code.
F
urther
work
could
also investigate
DP

LR as an alternative
to DP

LUR
. However, DP

LR requires even more memory and processor power than the DP

LUR scheme. In a recent
attempt to implement the DP

LR scheme on a wind

rotor code for a grid of approximately 2
milli
on grid points,
the parallel SGI machines would not run the program due to the excessive
memory required.
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


19


New results for the recirculation concept applied to Rotor 67 using GT

TURBO

3D were
also presented. The operating range of the compressor was succ
essfully extended to lower mass
flow rates with this passive control concept.
The full pape
r [5
] gives a detailed analysis of the
results and is due to be presented at the 43
rd
Annual Aerospace Sciences Meeting and Exhibit in
Reno Nevada in January 2005.
References
[1]
www.openmp.org
[2] Wright, Michael J., Candler, Graham V., Prampolini, Marco, “Data

Parallel Lower

Upper

Relaxation Method for the Navier

Stokes Equations,” AIAA Journal Vol. 34, No. 7, pp. 1371

1377, July 1996.
[3] Wissink, Andrew M., Lyri
ntzis, Anastasios S., and Strawn, Roger C., “Parallelization of the
Three

Dimensional Flow Solver for Euler Rotorcraft Aerodynamics Predictions,” AIAA Journal
Vol. 34, No. 11, Nov 1996.
[4] Wissink, Andrew M., “Efficient Parallel Implicit Methods for Rota
ry

Wing Aerodynamics
Calculations,” Ph.D. Dissertation, University of Minnesota, May 1997.
[5]
Iyengar, Gillman, Niazi, and Sankar, “Numerical Studies of Two Approaches for Surge
Control in Axial Compressors,” AIAA Conference Reno, NV,
2005
.
[6] Tomaro, R
obert, “An Implicit Higher

Order Spatially Accurate Scheme for Solving Time
Dependent Flows on Unstructured Meshes,”
Ph.D Dissertation, Georgia Institute of Technology,
School of Aerospace Engineering, December 1997.
[7] Wang, Gang, “Study of a Low

Disper
sion Finite Volume Scheme in Rotorcraft Noise
Prediction,” Ph.D Dissertation, Georgia Institute of Technology, School of Aerospace
Engineering, April 2002.
[8] Chandra, Rohit, et. Al., “Parallel Programming in OpenMP”, Morgan Kaufman Publishers,
2001.
[9]
Wright, Michael J., Candler, Graham V., Bose, Deepak, “Data

Parallel Line Relaxation
Method for the Navier

Stokes Equations,” AIAA Journal, Vol. 36, No. 9, pp. 1603

1609, Sept
1998.
[10] Shonkwiler, R., “Introduction to Parallel and Scientific Computing,”
Georgia Institute of
Technology, 2004
Joseph Gillman
MS Aerospace Special Problem
Fall 2004
Georgia Institute of Technology


20


[11]
http://www.hpc.gatech.edu/facilities/o3800.html
[12] Yoon, S., Jameson, A., “Lower

Upper Symmetric

Gauss

Seidel Method for the Euler and
Navier

Stokes Equations,” Journal of AIAA, Vol.26, No. 9.
[13] Niazi, S., “N
umerical Simulation of Rotating Stall and Surge Alleviation in Axial
Compressors,” Ph.D. Dissertation, Georgia Institute of Technology, Aerospace Engineering, July
2000.
[14] Stein, A., “Computational Analysis of Stall and Separation Control in Centrifugal
Compressors,” Ph.D. Dissertation, Georgia Institute of Technology, Aerospace Engineering,
May 2000.
[
15
] Hathaway, M.D., “Self

Recirculating Casing Treatment Concept for Enhanced Compressor
Performance,” T
urbo

Expo 2002, Amsterdam, Holland
.
[
16
]
Iyengar
, Vishwas, “Advanced Control Techniques for Modern Compressor Rotors,” MS
Special Problem Report, Georgia Institute of Technology, School of Aerospace Engineering,
July 2004.
Appendix A: DPLUR code
Appendix B: Parallelized Lower and Upper Sweeps
Appendix
C: Composition of A,B, and C matrices
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο