An Introduction to Parallel

footballsyrupSoftware and s/w Development

Dec 1, 2013 (3 years and 9 months ago)

57 views

An Introduction to Parallel
Programming with MPI

March 22, 24, 29, 31


2005

David Adams

daadams3@vt.edu

http://research.cs.vt.edu/lasca/schedule


Outline


Disclaimers


Overview of basic parallel programming on a
cluster with the goals of MPI


Batch system interaction


Startup procedures


Quick review


Blocking message passing


Non
-
blocking message passing

Collective communications

Review

Functions we have covered in detail:

MPI_INIT



MPI_FINALIZE

MPI_COMM_SIZE

MPI_COMM_RANK

MPI_SEND



MPI_RECV

MPI_ISEND


MPI_IRECV

MPI_WAIT



MPI_TEST

Useful constants:

MPI_COMM_WORLD

MPI_ANY_SOURCE

MPI_ANY_TAG


MPI_SUCCESS

MPI_REQUEST_NULL

MPI_TAG_UB

Collective Communications

Transmit data to all processes within a communicator domain.
(All processes in MPI_COMM_WORLD for example.)

Called by every member of a communicator but can not be
relied on to synchronize the processes (except
MPI_BARRIER).

Come only in blocking versions and standard mode
semantics.

Collective communications are SLOW but are a convenient
way of passing the optimization of data transfer to the vendor
instead of the end user.

Everything accomplished with collective communications
could also be done using the functions we have already gone
over. They are simply shortcuts and implementer
optimizations for communication patterns that are used often
by parallel programmers.

BARRIER

MPI_BARRIER(COMM, IERROR)


IN INTEGER COMM


OUT IERROR

Blocks the caller until all processes in the group have
entered the call to MPI_BARRIER.

Allows for process synchronization and is the only
collective operation that guarantees synchronization at
the call even though others could synchronize as a side
effect.

Broadcast

MPI_BCAST(BUFFER, COUNT, DATATYPE, ROOT, COMM,
IERROR)


INOUT <type> BUFFER(*)


IN INTEGER COUNT, DATATYPE, ROOT, COMM


OUT IERROR

Broadcasts a message from the process with rank root to all
processes of the communicator group.

Serves as both the blocking send and blocking receive for message
completion and must be called by every processor in the
communicator group.

Conceptually, this can be viewed as sending a single message from
root to every processor in the group but MPI implementations are
free to make this more efficient.

On return, the contents of the root processor’s BUFFER have been
copied to all processes

Broadcast

A
0

Data




Processes

A
0

Data




Processes

A
0

A
0

A
0

A
0

A
0

Gather

MPI_GATHER(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF,
RECVCOUNT, RECVTYPE, COMM, IERROR)


OUT <type> RECVBUF(*)


IN <type> SENDBUF(*)


IN INTEGER SENDCOUNT, RECVCOUNT, SENDTYPE, RECVTYPE,
COMM


OUT IERROR

Each process (including the root) sends the contents of its send
buffer to the root process.

The root process collects the messages in rank order and stores
them in the RECVBUF.

If there are
n

processes in the communicator group then the
RECVBUF must be
n

times larger than the SENDBUF.

RECVCOUNT = SENDCOUNT, meaning that the function is looking
for the count of objects of type RECVTYPE that it is receiving from
each process.

Gather

A
0

Data




Processes

A
1

A
2

A
3

A
4

A
5

A
0

A
1

Data




Processes

A
2

A
3

A
4

A
5

Scatter

MPI_SCATTER(SENDBUF, SENDCOUNT, SENDTYPE,
RECVBUF, RECVCOUNT, RECVTYPE, COMM,
IERROR)


OUT <type> RECVBUF(*)


IN <type> SENDBUF(*)


IN INTEGER SENDCOUNT, RECVCOUNT, SENDTYPE,
RECVTYPE, COMM


OUT IERROR

MPI_SCATTER is the inverse of MPI_GATHER.

The outcome of this function is for root to take its
SENDBUF and split it into
n

equal segments, 0 through
(
n
-
1), where the
i
th segment is delivered to the
i
th
process in the group.

Scatter

A
0

A
1

Data




Processes

A
2

A
3

A
4

A
5

A
0

Data




Processes

A
1

A
2

A
3

A
4

A
5

ALLGATHER

A
0

Data




Processes

B
0

C
0

D
0

E
0

F
0

A
0

B
0

Data




Processes

C
0

D
0

E
0

F
0

A
0

B
0

C
0

D
0

E
0

F
0

A
0

B
0

C
0

D
0

E
0

F
0

A
0

B
0

C
0

D
0

E
0

F
0

A
0

B
0

C
0

D
0

E
0

F
0

A
0

B
0

C
0

D
0

E
0

F
0

ALLTOALL

A
0

A
1

Data




Processes

A
2

A
3

A
4

A
5

B
0

B
1

B
2

B
3

B
4

B
5

C
0

C
1

C
2

C
3

C
4

C
5

D
0

D
1

D
2

D
3

D
4

D
5

E
0

E
1

E
2

E
3

E
4

E
5

F
0

F
1

F
2

F
3

F
4

F
5

A
0

B
0

Data




Processes

C
0

D
0

E
0

F
0

A
1

B
1

C
1

D
1

E
1

F
1

A
2

B
2

C
2

D
2

E
2

F
2

A
3

B
3

C
3

D
3

E
3

F
3

A
4

B
4

C
4

D
4

E
4

F
4

A
5

B
5

C
5

D
5

E
5

F
5

Global Reductions

MPI can perform a global reduction operation
across all members of a communicator group.

Reduction operations include operations like:


Maximum


Minimum


Sum


Product


ANDs and ORs

MPI_REDUCE

MPI_REDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,
ROOT, COMM, IERROR)


OUT <type> RECVBUF(*)


IN <type> SENDBUF(*)


IN INTEGER COUNT, DATATYPE, OP, ROOT, COMM


OUT IERROR

Combines the elements provided in the input buffer of each process
in the group, using the operation OP, and returns the combined
value in the output buffer of the process with rank ROOT.

Predefined operations include:


MPI_MAX


MPI_MIN

MPI_SUM


MPI_PROD

MPI_LAND

MPI_BAND


MPI_LOR


MPI_BOR

MPI_LXOR


MPI_BXOR



Helpful Online Information

Man pages for MPI:


http://www
-
unix.mcs.anl.gov/mpi/www/

MPI homepage at Argonne National Lab:


http://www
-
unix.mcs.anl.gov/mpi/

Some more sample programs:


http://www
-
unix.mcs.anl.gov/mpi/usingmpi/examples/main.htm

Other helpful books:


http://fawlty.cs.usfca.edu/mpi/


http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=3614

Some helpful UNIX commands:


http://www.ee.surrey.ac.uk/Teaching/Unix/