北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
172
The Parallel Computation of a
3D_Steady
Conduction
Problem with Gauss

Seidel Method
三维稳态传热问题的并行计算
Cheng MuLin
Mechanical and Engineering Science Department, PeiK
ing Universi
ty
Abstract
In this paper, I use MPICH to implement the parallel
computation of a 3D

Steady conduction problem. Running cases
with different mesh and processor number closely tests the parallel
performance of this program.
摘要
本文中采用
MPICH
实现了一个
3D

Stead
y
的传热问题的
平行计算求解
.
通过运行具有不同的网格数目
,
进程数目的程式
,
对该程式的平行效率进行了测试
,
发现具有线性加速比
.
这表明
本程式具有较高的平行效率
.
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
173
Introduction
History
Single

Processor supercomputers achieved unheard of speeds beyond 100
million instructions per second, and pushed hardware technology to the p
hysica
l
limits of chip building. And so it will come to the end, because there are physical and
architectural bounds that limit the computational power that can be achieved with a
single

processor system. But the computing tasks from the scientific field, such a
s
CFD (Computational Fluid Dynamic), nuclear physics and so on, are more and more
complex which demand huge memory and high computing speed. Thus the parallel
computer system is designed to match this need. Because the whole task is split to
some small pie
ces or steps and each processor has one or more pieces or steps running
on itself, different pieces or steps are done at the same time and the whole task can be
finished more quickly than on a single

processor computer. But different processors in
a parall
el computation are not independent with each other in most cases, so data and
message exchanging are unavoidable which are very slow comparing to the CPU
speed. These data and message passing is the most important factor that limits the
speed of parallel c
omputers speed.
During recent years, different paradigms of parallelism are developed suitable
for different application field. Following table (tab.1) shows a classification system,
which is not a complete one, but includes the major approaches taken by s
cientists,
engineers, and researchers in a variety of fields, who apply parallel computing.
Vector/Array is taken as the parallelism paradigm at the beginning period of parallel
computation research. Now, MIMD (Mutiple

Instructions

Mutiple

Data) is the mos
t
general form and SIMD (Single

Instructions

Multiple

Data) and SPMD
(Single

Program

Multiple

Data) forms of parallelism appear to be appropriate for
scientific problems whose data are regular
and whose calculations are uniform and
repetitious.
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
174
Table.1
During this summer holiday, I study MPI and MPICH, and then develop a
parallel program with MPICH for a 3D

Steady Conduction problem with the guidance
of Pro. Lin. This paper includes the most part of my work.
Basic Idea of Parallel Com
putation
MPI and MPICH
Message Passing is a Paradigm used widely on certain classes of parallel
machines, especially those with distributed memory. To reduce the repetitious work of
vendors who apply parallel computing, MPI(Message Passing Interface) is
defined
which try to define both the syntax and semantics of a core of library routines that will
be useful to a wide users and efficiently implementable on a wide range of computers.
MPI describes all MPI function in the language

independent notation and
the ANSI C
version of the functions is provided, the
FORTRAN
77 version of the same functions
is also provided. MPICH is a portable implementation of the full MPI specification
for a wide variety of parallel and distributed computing
environments
.
Measu
re of Performance
For a single

processor computer,
MIPS (
Million Instructions Per Second) and
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
175
MFLOPS (
Million Floating Point Operation Per Second) are traditional measures for
the performance. For a parallel computer system, Speedup is
an
often

quoted
mea
sure
for parallel performance, although it is also a controversial one. Speedup is defined as
following:
0
( )
T
Speedup
T N
(1)
where T
0
is the time to compute a certain problem using a serial program on one
processor. And T(N) is the tim
e to compute a certain problem using a parallel
program
on N processors. That is to say Speedup is computed by dividing the time to compute
a solution using one processor by the solution time using N processors. But in practice,
T(1) is used for T
0
instead
for simplicity. Thus speedup can be computed as following:
(1)
( )
T
Speedup
T N
(2)
However, we should remember the slight difference between T(1) and
T
0
, which
comes from using different programs in which one is a serial one and the other is
a
parallel one.
Parallel Computation
Problem
Description
A 3D

Steady Conduction Problem is considered in this paper. The Problem is
shown as figure 1.
The Length (L) of the bar is 0.4m, the width(D) of the bar is 0.1m
and the height(H) of the bar is
0.1m.too. Aluminum is selected as the material of the
bar and the material is homogeneous through the whole bar. Parameters used about
Aluminum is
shown as following:
There is a temperature difference at two ends of the bar, the left end is heated to 10
0K
and the right end is kept at 0K, so heat will move from the left to the right and
temperature will reach a steady distribution through the whole bar. For other four
K
kg
W
k
ductivity
thermalcon
K
kg
J
C
pressure
constat
at
heat
Specific
m
kg
Density
p
/
237
/
903
/
2702
3
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
176
faces of the bar, the adiabatic boundary condition is set, that is to say, no heat escap
es
from these four faces of the bar. A heat source S is under consideration, and S is the
function of temperature T. S(T) can be used to represent many cases in which the bar
gets or losses heat through no

mechanical process, such as radiation, chemical
re
action and so on.
fig.1 Problem description
Equations
Because this is a conduction problem without fluid motion,
governing
equation
is a Poisson Equation, as following.
2 2 2
2 2 2
( ) 0
T T T
S T
x y z
(3)
At two ends, the
boundary condition is:
0
100 0
T T K x m
(4)
1
0
T T K x Lm
(5)
For four faces of the bar, the adiabatic condition can be expressed as following:
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
177
0,0
T
y m or y Dm
y
(6)
0,0
T
z m or z Hm
z
(7)
Equation6~10 decide the distribution of temperature T through the bar. Because
my focus is the parallel computation performance, the boundary condition is designed
carefully so that the problem can be solved anal
ytically when S(T) is set to ZERO.
Obviously
, a linear solution can be given:
1 0
0
(,,)
T T
T x y z T x
L
(8)
This equation will be used to compare with the numerical result from parallel
computation.
Discretization and Solution Method
A constructral mesh is used as is shown in figure.1. The finite

difference method
is used. First, S(T) is linearized to
( )
C P
S T S S T
(9)
where Sc, Sp are not constants and vary with Temperature T. Second, equ.6 is
integra
ted on the control volume around the gird point. At last, the temperature on the
gird points is substituted into the equations and the finite

difference equation can be
expressed in this form:
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
178
z
y
x
S
a
a
a
a
a
a
a
z
y
x
S
b
x
y
x
k
a
x
y
x
k
a
x
x
z
k
a
x
x
z
k
a
x
z
y
k
a
x
z
y
k
a
b
T
a
T
a
T
a
T
a
T
a
T
a
T
a
P
B
T
S
N
W
E
P
c
b
B
t
T
s
S
n
N
w
W
e
E
B
B
T
T
S
S
N
N
W
W
E
E
P
P
)
(
)
(
)
(
)
(
)
(
)
(
where T
P,
T
E
T
W,
T
N,
T
S,
T
T,
T
B
are
the value on the
center
point, east one, west
one, north one, south one, top one, bottom one respectively.
z
y
x
,
,
is the
dimension of the
control
volume. Additionally, the boundary condition need some
carefully consideration without basic
difference
to above.
Although the Gauss

Seidel line

by

line method will make the iteration of the
solution converge more quickly than the Gauss

Seidel point

by

point method, we still
use the point

by

point method for the reason of parallel
programming
.
F
or parallel computing, the mesh are split by several faces perpendicular to the x
direction to some
approximate
equivalent
blocks
. Each processor will burden
computing on one block, and the value of grid points on
splitting
faces should be
passing between
processors. The computing process is split and the data resource is
not split. That is to say, at the beginning of parallel computing, all the processors
finish initialization at the same time and then compute its own block, at last each
computing node sen
ds the result to node 0.Node 0 collects the result and outputs it to
file.
Result
Three
computers, which has two CPUs,
are used to construct a parallel computer.
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
179
Using a mesh with 20*5*5 grid points, the iteration converges to a numerical solution
with
1.0E

6 residual, after 3142 times iterations.Figure.2 shows the distribution of
temperature through the bar.
fig.2 Temperature Distribution
The figure shows that the temperature is constant when x is constant and
distributes linearly
along the x direction. This numerical result is coherent with the
analytical result(equ.8
), which
shows the correction of the parallel program.
To test the parallel performance of my program, more cases with different mesh and
processors number have bee
n tested on the same parallel computer. The iteration
times and the solution time consuming on each processor for every case are recorded.
We find that iteration times very slight increase when processors number increases
from 1 to 5(figure.3).
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
180
Relative Iteration times IncreasementProcessors Number
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
0
1
2
3
4
5
6
Processors Number
Relative Iteration times Increasement
20*5*5
60*15*15
120*30*30
fig.3 Relative Iteration Times Increasement
If comparing the parallel program with the serial one, the reason for the iteration
times increasing can be found easily.
The solution time consuming on the node 0 is slightly larger than that
on other
nodes, which
is caused by the last step Reduction Operation in parallel computing. So
the solution time consuming on the node 0 is used as the whole solution time. The
solution time increases with the grid points number increasing when the residu
al is
fixed which is shown in
figure
.4
.
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
181
Ti meProcessors Number
0.00E+00
5.00E+03
1.00E+04
1.50E+04
2.00E+04
Processors Number
time(s)
20*5*5
60*15*15
90*25*25
120*30*30
20*5*5
2.0268E+00
4.3419E+00
5.0346E+00
6.6446E+00
8.1674E+00
60*15*15
5.7377E+02
3.3798E+02
2.5750E+02
2.2130E+02
2.0940E+02
90*25*25
3.3529E+03
2.0993E+03
1.5585E+03
1.2754E+03
1.0859E+03
120*30*30
1.6379E+04
9.0460E+03
6.5874E+03
5.3099E+03
4.4930E+03
1
2
3
4
5
Fig.4 Computation Time
SpeedupProcessors Number
0.000E+00
5.000E01
1.000E+00
1.500E+00
2.000E+00
2.500E+00
3.000E+00
3.500E+00
4.000E+00
1
2
3
4
5
Processors Number
Speedup
20*5*5
60*15*15
90*25*25
120*30*30
fig.5 Speedup
Figure.5 shows the speedup curves. Each curve represents a kind of
mesh, which
has different grid points number. When t
he grid points number is small, such as
20*5*5 in figure.5, the speedup will be less than 1
and
decrease with processors
number increasing. Because there are relative massive data passed between processors
comparing to the grid points number, the parallel
computing speed is greatly cut down
that it is more slowly than the single

processor computing. When the grid points
number is large enough, such as 60*15*15, 90*25*25 or 120*30*30, the speedup will
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
182
be more than 1 and increases when more processors are add
ed to parallel computation.
What is more, the speedup curve
approaches
to a linear line from a curve when the
grid
point’s
number is large enough, such as 120*30*30. A linear speedup curve
whose slope is approximate 0.66 shows the program has good parallel
computation
performance.
Because
the speedup is the function of mesh and processors number, another
speedup curves figure is given as
following (
figure.6), which shows the relation
between grid points number and speedup.
SpeedupGrid Points
0.0000
0.5000
1.0000
1.5000
2.0000
2.5000
3.0000
3.5000
4.0000
0.00
1.00
2.00
3.00
4.00
5.00
6.00
Grid Points
Speedup
1
2
3
4
5
fig.
6 Speedup
Discussion
Another kind of mesh partition is also
used
, but less speedup is got because
more data needs to be passed between processors. All the result shows that the time
consuming on
communication
between processors greatly limit the paral
lel
computation speed. There are three traditional methods to conquer this defect. One is
improving the hardware of parallel computer, but this always leads to the expensive
price. The second one is to change the interconnection network(IN) topology of
par
allel computer. The last one is to develop new
algorithms, which are differ
ent from
present ones for serial programs and suitable for the parallel computation.
Acknowledge
During this summer holiday, I come to Taiwan for research and
communication
.
北京大学
政学者论文集（
2001
年）
三维稳态传热问题的并行计算
183
My
teacher Prof. Lin have not only given me much useful
guidance
, but also help me
overcome some difficulties on living. My
lab mates
, such as LoWei, Weng PeiShen,
Lin ZhengWei, Li NongMing and other students, also give me lots of help and I
cannot
finish thi
s paper without their help. At last, I should give my most earnest
acknowledge to Prof. Shen JunShan and Prof. Li ZhengDao for that they give me this
chance.
Reference
[1]
“
Numerical Heat Transfer and Fluid Flow
”
Suhas V.Patankar Hemisphere
Publishi
ng Corporation Washington New York London(1979).
[2]
“
Numerical Methods
”
J.Douglas Faires and Richard L.Burden PWS

KENT
Publishing Company Boston
[3]
“
Introduction to Parallel Computing Ted G.Lewis and Hesham El

Rewini with
In

Kyu Kim P
rentice Hall, En
glewood, New Jersey 07632
[4]
“
Users
’
s Guide for MPICH, a Portable Implementation of MPI William Gropp
and Ewing Lusk Mathematics and Computer Science Division
指
导教师
：
林昭
安
，
男
，
台湾
新竹清华大
学
动力机
械
系
教授
。
主要
从
事
湍
流
的
数
值模
拟研
究
及
大
涡模
拟
(LES)
的
并
行计
算
。
Comments 0
Log in to post a comment