Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Synchronous
Computations
Chapter 6
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Synchronous Computations
.
In a (fully) synchronous
application, all the processes
synchronized at regular points
MPI_Barrier
()
A basic mechanism for
synchronizing processes
Called by each process in the
group, blocking until all
members of the group have
reached the
barrier
call
and
only returning then
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Barrier Implementation
Centralized counter implementation
(a
linear barrier
):
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Tree barrier
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Butterfly Barrier
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Fully Synchronized Computation Examples
Data Parallel Computations
Same operation performed on different data elements
simultaneously; i.e., in parallel.
Particularly convenient because:
•
Ease of programming (essentially only one program).
•
Scale easily to larger problem sizes.
•
Many numeric and some non

numeric problems can be
cast in a data parallel form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Prefix Operations
•
Given
a
list
of
numbers,
x
0
,
…
,
x
n

1
,
compute
all
the
partial
summations,
i
.
e
.:
x
0
x
0
+ x
1
x
0
+
x
1
+
x
2
x
0
+
x
1
+
x
2
+
x
3
…
•
Any
associative
operation
(e
.
g
.
‘+’,
‘*’,
Bitwise

AND
etc
.
)
can
be
used
.
•
Practical
applications
in
areas
such
as
sorting
,
recurrence
relations
,
and
polynomial
evaluation
.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Parallel Prefix Sum
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Shift

by

2
k
(in Gray

Code Order) can be done in two routing steps on a hypercube
Example
: Shift

by 4 on a 16 PE hypercube
0
1
000 001 011 010

110 111 101 100
o
100 101 111 110

010 011 001 000
A B C D

E F G H
o
I J K L

M N O P
w
id
:
000
001 011
010 110 111 101 100
1) shift

by

2 in reverse order
P O B A

D C F E
o
H G J I

L K N M
2) shift

by

2 again (in reverse order)
M N O P

A B C D
o
E F G H

I J K L
T
par
= 2 routing steps = O(1)
MIMD
Powershift
on a Hypercube
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
**HERE Solving
a General System of Linear
Equations
where
x
0
,
x
1
,
x
2
, …
x
n

1
are the unknowns
1
1
0
1
1
0
1
1
1
1
0
1
1
1
11
10
1
0
01
00
n
n
n
n
n
n
n
n
b
b
b
x
x
x
a
a
a
a
a
a
a
a
a
)
)(
(
)
(
)
(
)
(
)
(
By rearranging the
i
th
equation:
•
x
i
is expressed in terms
of the other unknowns
•
This can be used as an
iteration formula for
each of the unknowns
to obtain better
approximations.
j
i
i
j
j
ij
i
ii
i
n
n
i
i
ii
i
i
x
a
b
a
x
x
a
x
a
x
a
x
a
1
0
1
1
1
1
0
0
1
0
)
(
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
11
Jacobi Iterative Method
)
1
)(
1
(
11
00
)
1
)(
1
(
1
)
1
(
0
)
1
(
)
1
(
1
11
10
)
1
(
0
01
00
0
0
0
0
0
0
n
n
n
n
n
n
n
n
a
a
a
D
a
a
a
a
a
a
a
a
a
A
b
Ax
1
1
0
1
)
1
(
0
)
1
(
)
1
(
1
10
)
1
(
0
01
1
1
0
)
1
)(
1
(
11
00
1
1
0
0
0
0
*
/
1
0
0
0
/
1
0
0
0
/
1
n
n
n
n
n
n
n
n
n
x
x
x
a
a
a
a
a
a
b
b
b
a
a
a
x
x
x
]
)
(
[
)
(
)]
(
[
1
x
D
A
b
D
x
x
D
A
b
Dx
b
x
D
A
D
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Parallel Jacobi iterations
1
1
0
1
1
0
1
1
1
10
1
0
01
1
1
0
1
1
11
00
1
1
0
0
0
0
1
0
0
0
1
0
0
0
1
n
n
n
n
n
n
n
n
n
x
x
x
a
a
a
a
a
a
b
b
b
a
a
a
x
x
x
)
(
)
(
)
(
)
(
)
)(
(
*
/
/
/
P
0
P
1
P
n

1
Jacobi
method
will
converge
if
diagonal
value
a
ii
(
i
,
0
≤
i
<
n)
has
an
absolute
value
greater
than
the
sum
of
the
absolute
values
of
the
other
a
ij
’s
on
the
same
row
.
Then
A
matrix
is
called
diagonally
dominant
:
1
1
0
1
0
n
i
a
a
ii
n
i
j
j
ij
,
,
If
P
<<
n,
how
do
you
do
the
data
&
task
partitioning?
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Compare
values
computed
in
one
iteration
to
values
obtained
from
the
previous
iteration
.
Terminate
computation
when
all
values
are
within
given
tolerance
:
or
:
In
either
case,
you
need
a
‘
global
sum
’
(
MPI_Reduce
)
operation
.
Q
:
Do
you
need
to
execute
it
after
each
and
every
iteration
?
Termination
1
1
0
1
n
i
x
x
t
i
t
i
,
,
,
tolerance
error
tolerance
error
)
(
1
0
2
1
n
i
t
i
t
i
x
x
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Parallel Code
Process
Pi
could be of the form
x[
i
] = b[
i
];
/*initialize unknown*/
for (
iter
= 0;
iter
< limit;
iter
++) {
sum =

a[
i
][
i
] * x[
i
];
for (j = 0; j < n; j++)
/* compute summation */
sum = sum + a[
i
][j] * x[j];
new_x
[
i
] = (b[
i
]

sum) / a[
i
][
i
];
/* compute unknown */
All

to

All

Broadcast
(&
new_x
[
i
]);
/*
bcast
/
rec
values */
Global_barrier
();
/* wait for all
procs
*/
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
•
The problem space is divided into cells.
•
Each cell can be in one of a finite number of states.
•
Cells affected by their neighbors according to certain rules, and all cells
are affected simultaneously in a “generation.”
•
Rules re

applied in subsequent generations so that cells evolve, or
change state, from generation to generation.
•
Most famous cellular automata is the “
Game of Life
” devised by
John
Horton Conway
, a Cambridge mathematician.
Other fully synchronous problems
Cellular Automata
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Board game

theoretically infinite 2

dimensional array of cells.
Each cell can hold one “organism” and has eight neighboring cells.
Initially, some cells occupied.
The following rules were derived by Conway after a long period of
experimentation:
1. Every organism with two or three neighboring organisms
survives for the next generation.
2. Every organism with four or more neighbors dies from overpopulation.
3. Every organism with one neighbor or none dies from isolation.
4. Each empty cell adjacent to exactly three occupied neighbors will give
birth to an organism.
The Game of Life
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Fish
Might move around according to these rules:
1.
If there is one empty adjacent cell, the fish moves to this cell.
2.
If there is more than one empty adjacent cell, the fish moves to one
cell chosen at random.
3.
If there are no empty adjacent cells, the fish stays where it is.
4.
If the fish moves and has reached its breeding age, it gives birth to a
baby fish, which is left in the vacating cell.
5.
Fish die after
x
generations.
Simple Fun Examples of Cellular Automata
“Sharks and Fishes”
An ocean could be modeled as a 3

dimensional array of cells.
Each cell can hold one fish or one shark (but not both).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki
nson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.
Sharks
Might be governed by the following rules:
1.
If one adjacent cell is occupied by a fish, the shark moves to this cell and
eats the fish.
2.
If more than one adjacent cell is occupied by a fish, the shark chooses one
fish at random, moves to the cell occupied by the fish, and eats the fish.
3.
If no fish are in adjacent cells, the shark chooses an unoccupied adjacent
cell to move to in a similar manner as fish move.
4.
If the shark moves and has reached its breeding age, it gives birth to a baby
shark, which is left in the vacating cell.
5.
If a shark has not eaten for
y
generations, it dies.
Comments 0
Log in to post a comment