Parallel Programming With MPI

desirespraytownΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

72 εμφανίσεις

Lab 3 in SMD
150

Lp4 2004









Parallel Programming With MPI

LAB THREE

SMD150

Lp4


2004

1 Objectives


The purpose of this lab is to explore to processor level parallelism to optimize the FIR filter

algorithm
utilized in labs #1 & 2. This will be accomp
lished using parallel programming of a

multi
-
computer
/cluster

system. There are multiple methods that can be used for this implementation. This lab

will involve the use
of MPI

(Message
-
Passing Interface). It is not a programming language but rather

a libra
ry of functions
which can be used for inter
-
processor communication when implementing a

multiprocessor solution. The
programming language for this lab will be C as in the first lab.


However, in this case the lab will be done on
a Linux

cluster Seth, at Um

www.hpc2n.umu.se

rather
than in the
PC

lab (as was the

case for labs#1 & 2).

It is easy to port the MPI

programs to other machines
as well
,

including
other PC

clusters or dedica
ted computers such as the

IBM SP system.

A significant
difference between instructionally level parallelism and processor level parallelism is

what takes
responsibility for implementing the parallelism. In most instructional level parallelism

design, the
p
rocessor or even the compiler will take responsibility for implementation of the

parallelism. MMX
/SSE

instructions can be considered more of an exception rather than the norm, as they

required the programmer
to partition the operations. In most superscalar

processors, parallelism is

automated. In contrast, most
processor level parallelism requires the programmer to partition the

problem across the various resources
or in this case the various processors. This will be the case for

MPI.


2 Prelab


When you do

this lab it is expected that you read/go through the documentation listed below.



Lab 1 & 2 should be complete.



You should have attended the lecture that covered MPI or reviewed the notes from that lecture.



You should review “The Message Passing Interface
(MPI) Standard” at

http://www
-
unix.mcs.anl.gov/mpi/
(many informative links are available here)



Check
HPC2N

-

High Performance Computing Center North

www.
hpc2n
.umu.se




3 Setting up the environment


When logging in

on s
eth.hpc2n.umu.se

use

your account according to
labstatus
list
found on the
homepage. E.g.



ssh seth.hpc2n.umu.se

l student21



Y
ou will
end up at

the

/afs


file system.
The first time you login, change you password using
;


passwd


To receive ma
ils sent

to you
r

account,
edit the

.
/Public/
.forward


file.



E
dit

your “
.
/Public/.ba
shrc
” file. Replace “smd150”

to set the aliases to fit

your username
(homedirectory)
.


# My own bash settings



PATH=${PATH}::/opt/scali/bin/:.



alias CD="cd /kfs/home/s/smd150"



alias rm="rm
-
i"



alias sq="showq|grep smd150"



alias qs="qstat|grep smd150"



alias lss="ls
-
lt|head
-
10"


Then you can use CD to go to the /kfs file system.
Notice,

all runs must take

place on the /kfs file
system!


Copy the archive

lab3
_2004
.tar


from
the lab homepage using;


>
wget http://www.sm.luth.se/csee/co
urses/smd/150/labs/lab3_2004.tar.gz


Unpac
k with ;

>
tar
-
xv
z
f lab3
_2004
.tar
.gz


The tar
file

contain
s

everything you need to

compile and run the application

except for the implementation
of the fir algorithm (which you will do as the primary component of

lab
3).
For this lab you will be

provided .c & .h file which provide the framework for the lab.
Here is a short description of the files.




main.c
Contains the main functio
n.

It reads in t
he file “source
.wav and writes out to the


dest. wav”file. Review the code to
understand its operation. This is where the fir

algorithm
function
-
call and the MPI function
-
calls
may be placed.



wave.h/.c
Methods for reading/writing wave files to/from disk. F
or simplicity we use the

read/write16mono( ) methods since they check that the .wav file really is 16 bit mono

and return
the sound data buffer as a signed short array. There is no need to change

anything in these
files,
they are only utilized to parse the

actual sound samples from

the standard .wav header information
whi
ch is also contained within the
file.



fir.h/.c
As a starting point you
should

use your lab1 c
-
implementation . The fir.h file and a

skeleton for fir.c will be available for you.




Makefile
T
he Makefile to compile the c
-
code for MPI.

Use make to compile.


Use the filter “
fi
lter_lp.bin” and the soundfile “
200_4000.wav


for your experiments. Notice, the
filenames are hardcoded into your application to allow batch processing.

It may be necessary
for you to
make own filters and sound files using Matlab in order to make the program scale better.


3.1 Testing the environment


When you have edited your environment you should test that it works as expected
. Compile the system
using the M
akefile with th
e command


>
make


D
uring the development stage

you can testrun your code by;



>

mpimon ./filter
--

k214 2



This command runs you program locally on the login node (k214) using 2 processors.

Notice, each node is
a dual processor machine.


U
se
sftp

to tra
nsfer files from

seth


to
back to ‘
sigma.sm.luth.se


where you can
verify your results with
matlab as done in lab 1.


4 Assignment

4.1 Step one
-

Parallelize the code


First implement the algorithm with straightforward c
-
code, so you can verify that the
algorithm

works.
When the implementation has

been verified you can start to parallelize the code. It doesn’t have to be

efficient just as long as it works. The requir
ement is that you have an implementation

that spreads out the

calculation
over the process
ors and that you can run it on a generic number of
nodes
. It should

work on at
least 1 to 32

nodes

and it has

to distribute the calculations evenly over the processors.


Edit your submit file

sub


to match your tests, i.e
.

nr of nodes, cpu:s.

To submit yo
ur p
e
r
formance test

use;


>

qsub sub



The result files are stored in the same directory as the executable.

Standard out is stored in a files named
"res".

A good suggestion is to name your jobs in an intelligent manner for ea
sier monitoring, this is useful

if more than one job i
s

submit
t
ed.

For each test you perform

make sure that the result is correct.


The view all que
ue
d jobs us
e
;


>

showq

|

less

or;


>

qstat

|

less

4.2 Step two
-

Optimize the code


Optimize the code to run faster. The goal should be to

make the routine scale with the number of
processor elements, so that more processors give faster

execution.


Some
generic
hints & issues:



Tweak compiler

options. C
hange the parameters for optimization in the "Makefile", on the
webpage
www.hpc2n.umu.se

there a
re some suggestions about usefu
l optimization flags.



Try different
communication strategies
to see what gives bes
t performance
(broadcast, send
,
scatter

etc
.
)
.



The latency of the messages
might also be a limiting fact
or, so t
ry to perform

calculations while
waiting for
communication

to finish
.



The performance might be limited by the network bandwidth
, try to minimize communication.



Make sure you have a fast code for the
inner
loop
-

single processor optimizations.



Revi
sit
http://www
-
unix.mcs.anl.gov/mpi/

for additional hints.






4.3 Step three


Scaling,
Time the code


Time

the execution with different number

of processors

applied
to get a nice plot over how
your
imple
mentation
scales. The tests should be done using at least the following

n
umber of
nodes
: 1, 2, 4,
8
,
16, 32

and you are free to experiment with other numbers of
nodes

(don’t use more than 64 as your jobs
might be queued for long periods
)
.


You
will
use the

function MPI_Wtime() to get the wall clock time passed, it returns the current time

as a double
, see
“main.c”
. The data points that you get from the testruns should be plotted in a jpg file
, you
may
use matlab

or excel

to generate the plot
.


Having achiev
ed the desired results in steps 1 & 2 & 3, the mandatory programming portion of this

lab is now complete. Please feel free to experiment further with the MPI.


5 Goal


In order to get this lab
passed you must;

Parallelize the co
d
e and it have to work on 1
to 32

nodes

and
distribute the calculations evenly. Make a plot (jpg) over the timing of the different number of processors
and
add

it to the
.
tar

file
.

See the web page for additional bonus assignments.

6 Submission


For submission of the lab, write an em
ail with the format shown below.




mkdir lab3



copy the files main.c, wave.c/h, fir
.c/h
,
plot.jpg,
group.txt

and Makefile to
that directory.



tar
-
cvf

lab3
_$GROUPNUMBER
.tar lab3

(where $GROUPNUMBER is your lab group
number)



gzip
-
9 lab3
_$GROUPNUMBER
.tar


Att
ach the
lab3
_$GROUPNUMBER
.tar.gz file to an email.
Send the email to:

smd
150
@sm.luth.se
Note!
For due date refer to the lab page. Make sure to submit the lab
before

this deadline for full credit.



Email format

Subject:
lab3 smd
150

Attachments:
lab3
_$GROUP
NUMBER
.tar.gz

Body of mail:

Firstname1 Lastname1 email address

Firstname2 Lastname2 email address


This is how we will test and verify you lab

We will compile your “package” and then run the program and compare the output with our result

file. If
the outpu
t is correct, then you have

passed the lab.