# Lecture 6 - Synchronous Computations

AI and Robotics

Dec 1, 2013 (4 years and 5 months ago)

104 views

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Synchronous
Computations

Chapter 6

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Synchronous Computations

.

In a (fully) synchronous
application, all the processes
synchronized at regular points

MPI_Barrier
()

A basic mechanism for
synchronizing processes

Called by each process in the
group, blocking until all
members of the group have
reached the
barrier

call

and
only returning then

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Barrier Implementation

Centralized counter implementation
(a
linear barrier
):

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Tree barrier

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Butterfly Barrier

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Fully Synchronized Computation Examples

Data Parallel Computations

Same operation performed on different data elements

simultaneously; i.e., in parallel.

Particularly convenient because:

Ease of programming (essentially only one program).

Scale easily to larger problem sizes.

Many numeric and some non
-
numeric problems can be
cast in a data parallel form

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Prefix Operations

Given

a

list

of

numbers,

x
0
,

,

x
n
-
1
,

compute

all

the

partial

summations,

i
.
e
.:

x
0

x
0

+ x
1

x
0

+
x
1

+
x
2

x
0

+
x
1

+
x
2

+
x
3

Any

associative

operation

(e
.
g
.

‘+’,

‘*’,

Bitwise
-
AND

etc
.
)

can

be

used
.

Practical

applications

in

areas

such

as

sorting
,

recurrence

relations
,

and

polynomial

evaluation
.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Parallel Prefix Sum

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Shift
-
by
-
2
k

(in Gray
-
Code Order) can be done in two routing steps on a hypercube

Example
: Shift
-
by 4 on a 16 PE hypercube

0

1

000 001 011 010
-

110 111 101 100
o

100 101 111 110
-

010 011 001 000

A B C D
-

E F G H
o

I J K L
-

M N O P

w
id
:
000
001 011

010 110 111 101 100

1) shift
-
by
-
2 in reverse order

P O B A
-

D C F E
o

H G J I
-

L K N M

2) shift
-
by
-
2 again (in reverse order)

M N O P
-

A B C D
o

E F G H
-

I J K L

T
par

= 2 routing steps = O(1)

MIMD
Powershift

on a Hypercube

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

**HERE Solving
a General System of Linear
Equations

where

x
0
,
x
1
,
x
2
, …
x
n
-
1

are the unknowns

1
1
0
1
1
0
1
1
1
1
0
1
1
1
11
10
1
0
01
00
n
n
n
n
n
n
n
n
b
b
b
x
x
x
a
a
a
a
a
a
a
a
a

)
)(
(
)
(
)
(
)
(
)
(
By rearranging the
i
th

equation:

x
i

is expressed in terms
of the other unknowns

This can be used as an
iteration formula for
each of the unknowns
to obtain better
approximations.

j
i
i
j
j
ij
i
ii
i
n
n
i
i
ii
i
i
x
a
b
a
x
x
a
x
a
x
a
x
a
1
0
1
1
1
1
0
0
1
0
)
(

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

11

Jacobi Iterative Method

)
1
)(
1
(
11
00
)
1
)(
1
(
1
)
1
(
0
)
1
(
)
1
(
1
11
10
)
1
(
0
01
00
0
0
0
0
0
0

n
n
n
n
n
n
n
n
a
a
a
D
a
a
a
a
a
a
a
a
a
A
b
Ax

1
1
0
1
)
1
(
0
)
1
(
)
1
(
1
10
)
1
(
0
01
1
1
0
)
1
)(
1
(
11
00
1
1
0
0
0
0
*
/
1
0
0
0
/
1
0
0
0
/
1
n
n
n
n
n
n
n
n
n
x
x
x
a
a
a
a
a
a
b
b
b
a
a
a
x
x
x

]
)
(
[

)
(

)]
(
[
1
x
D
A
b
D
x
x
D
A
b
Dx
b
x
D
A
D

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Parallel Jacobi iterations

1
1
0
1
1
0
1
1
1
10
1
0
01
1
1
0
1
1
11
00
1
1
0
0
0
0
1
0
0
0
1
0
0
0
1
n
n
n
n
n
n
n
n
n
x
x
x
a
a
a
a
a
a
b
b
b
a
a
a
x
x
x

)
(
)
(
)
(
)
(
)
)(
(
*
/
/
/
P
0

P
1

P
n
-
1

Jacobi

method

will

converge

if

diagonal

value

a
ii

(

i
,

0

i

<

n)

has

an

absolute

value

greater

than

the

sum

of

the

absolute

values

of

the

other

a
ij
’s

on

the

same

row
.

Then

A

matrix

is

called

diagonally

dominant
:

1
1
0
1
0

n
i
a
a
ii
n
i
j
j
ij

,

,

If

P

<<

n,

how

do

you

do

the

data

&

partitioning?

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Compare

values

computed

in

one

iteration

to

values

obtained

from

the

previous

iteration
.

Terminate

computation

when

all

values

are

within

given

tolerance
:

or
:

In

either

case,

you

need

a

global

sum

(
MPI_Reduce
)

operation
.

Q
:

Do

you

need

to

execute

it

after

each

and

every

iteration

?

Termination

1
1
0
1

n
i
x
x
t
i
t
i
,
,
,

tolerance
error

tolerance
error

)
(

1
0
2
1
n
i
t
i
t
i
x
x
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Parallel Code

Process
Pi
could be of the form

x[
i
] = b[
i
];

/*initialize unknown*/

for (
iter

= 0;
iter

< limit;
iter
++) {

sum =
-
a[
i
][
i
] * x[
i
];

for (j = 0; j < n; j++)

/* compute summation */

sum = sum + a[
i
][j] * x[j];

new_x
[
i
] = (b[
i
]
-

sum) / a[
i
][
i
];

/* compute unknown */

All
-
to
-
All
-
(&
new_x
[
i
]);

/*
bcast
/
rec

values */

Global_barrier
();

/* wait for all
procs

*/

}

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

The problem space is divided into cells.

Each cell can be in one of a finite number of states.

Cells affected by their neighbors according to certain rules, and all cells
are affected simultaneously in a “generation.”

Rules re
-
applied in subsequent generations so that cells evolve, or
change state, from generation to generation.

Most famous cellular automata is the “
Game of Life
” devised by
John
Horton Conway
, a Cambridge mathematician.

Other fully synchronous problems

Cellular Automata

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Board game
-

theoretically infinite 2
-
dimensional array of cells.

Each cell can hold one “organism” and has eight neighboring cells.
Initially, some cells occupied.

The following rules were derived by Conway after a long period of
experimentation:

1. Every organism with two or three neighboring organisms

survives for the next generation.

2. Every organism with four or more neighbors dies from overpopulation.

3. Every organism with one neighbor or none dies from isolation.

4. Each empty cell adjacent to exactly three occupied neighbors will give

birth to an organism.

The Game of Life

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Fish

Might move around according to these rules:

1.
If there is one empty adjacent cell, the fish moves to this cell.

2.
If there is more than one empty adjacent cell, the fish moves to one
cell chosen at random.

3.
If there are no empty adjacent cells, the fish stays where it is.

4.
If the fish moves and has reached its breeding age, it gives birth to a
baby fish, which is left in the vacating cell.

5.
Fish die after
x
generations.

Simple Fun Examples of Cellular Automata

“Sharks and Fishes”

An ocean could be modeled as a 3
-
dimensional array of cells.

Each cell can hold one fish or one shark (but not both).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wi
lki

Sharks

Might be governed by the following rules:

1.
If one adjacent cell is occupied by a fish, the shark moves to this cell and
eats the fish.

2.
If more than one adjacent cell is occupied by a fish, the shark chooses one
fish at random, moves to the cell occupied by the fish, and eats the fish.

3.
If no fish are in adjacent cells, the shark chooses an unoccupied adjacent
cell to move to in a similar manner as fish move.

4.
If the shark moves and has reached its breeding age, it gives birth to a baby
shark, which is left in the vacating cell.

5.
If a shark has not eaten for
y
generations, it dies.