Parallel Programming Models

unevenoliveΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 4 μήνες)

55 εμφανίσεις

Parallel Programming Models

Jihad El

These slides are based on the book:

Introduction to Parallel Computing,

Barney, Lawrence
Livermore National Laboratory


arallel programming models in common use:

Shared Memory


Message Passing

Data Parallel


Parallel programming models are abstractions above
hardware and memory architectures.

Shared Memory Model

Tasks share a common address space, which they
read and write asynchronously.

Various mechanisms such as locks / semaphores
may be used to control access to the shared

An advantage of this model from the
programmer's point of view is that the notion of
data "ownership" is lacking, so there is no need
to specify explicitly the communication of data
between tasks.

Program development can often be simplified.


Is difficult to understand and manage data

Keeping data local to the processor that works on
it conserves memory accesses, cache refreshes
and bus traffic that occurs when multiple
processors use the same data.

Unfortunately, controlling data locality is hard to
understand and beyond the control of the average


The native compilers translate user program
variables into actual memory addresses,
which are global.

Common distributed memory platform
implementations does not exist.

A shared memory view of data even though
the physical memory of the machine was
distributed, impended as
virtual shared

Threads Model

A single process can have
multiple, concurrent
execution paths.

The main program loads and
acquires all of the necessary
system and user resources .

It performs some serial work,
and then creates a number of
tasks (threads) that run

Threads Cont.

The word of a thread can be described as a subroutine
within the main program.

All the thread shares the memory space

Each thread has local data.

They save the overhead of replicating the program's

Threads communicate with each other through global

Threads requires synchronization constructs to insure that
more than one thread is not updating the same global
address at any time.

Threads can come and go, but main thread remains present
to provide the necessary shared resources until the
application has completed.

Message Passing Model

Message Passing Model
is used

A set of tasks that use
their own local memory
during computation.

Multiple tasks can reside
on the same physical
machine as well across an
arbitrary number of

Message Passing Model

Tasks exchange data through communications
by sending and receiving messages.

Data transfer usually requires cooperative
operations to be performed by each process.

The communication processes may exist on
the same machine of different machines

Data Parallel Model

Most of the parallel work
focuses on performing
operations on a data set.

The data set is typically
organized into a

A set of tasks work
collectively on the same
data structure, each task
works on a different
partition of the same data

Data Parallel Model Cont.

Tasks perform the same operation on their
partition of work.

On shared memory architectures, all tasks
may have access to the data structure through
global memory. On distributed memory
architectures the data structure is split up and
resides as "chunks" in the local memory of
each task.

Designing Parallel Algorithms

The programmer is typically responsible for
both identifying and actually implementing

Manually developing parallel codes is a time
consuming, complex, error
prone and iterative

Currently, The most common type of tool used
to automatically parallelize a serial program is
a parallelizing compiler or pre

A parallelizing compiler

Fully Automatic

The compiler analyzes the source code and identifies
opportunities for parallelism.

The analysis includes identifying inhibitors to parallelism and
possibly a cost weighting on whether or not the parallelism
would actually improve performance.

Loops (do, for) loops are the most frequent target for automatic

Programmer Directed

Using "compiler directives" or possibly compiler flags, the
programmer explicitly tells the compiler how to parallelize the

May be able to be used in conjunction with some degree of
automatic parallelization also.

Automatic Parallelization Limitations

Wrong results may be produced

Performance may actually degrade

Much less flexible than manual parallelization

Limited to a subset (mostly loops) of code

May actually not parallelize code if the
analysis suggests there are inhibitors or the
code is too complex

The Problem & The

Determine whether or not the problem is one that can actually be

Identify the program's

Know where most of the real work is being done.

Profilers and performance analysis tools can help here

Focus on parallelizing the hotspots and ignore those sections of the
program that account for little CPU usage.


in the program

Identify areas where the program is slow, or bounded.

May be possible to restructure the program or use a different
algorithm to reduce or eliminate unnecessary slow areas

Identify inhibitors to parallelism. One common class of inhibitor is
data dependence
, as demonstrated by the Fibonacci sequence.

Investigate other algorithms if possible. This may be the single most
important consideration when designing a parallel application.


Break the problem into discrete "chunks" of
work that can be distributed to multiple tasks.

domain decomposition

functional decomposition.

Domain Decomposition

The data associated with
a problem is

Each parallel task then
works on a portion of the

This partition could be
done in different ways.

Row, Columns, Blocks,
Cyclic, etc.

Functional Decomposition

The problem is
decomposed according
to the work that must
be done. Each task
then performs a
portion of the overall


Cost of communications

Latency vs. Bandwidth

Visibility of communications

Synchronous vs. asynchronous communications

Scope of communications



Efficiency of communications

Overhead and Complexity



Lock / semaphore

Synchronous communication operations

Data Dependencies

A dependence exists between program
statements when the order of statement
execution affects the results of the program.

A data dependence results from multiple use
of the same location(s) in storage by different

Dependencies are important to parallel
programming because they are one of the
primary inhibitors to parallelism.

Load Balancing

Load balancing refers to the practice of
distributing work among tasks so that all tasks
are kept busy all of the time. It can be
considered a minimization of task idle time.

Load balancing is important to parallel
programs for performance reasons. For
example, if all tasks are subject to a barrier
synchronization point, the slowest task will
determine the overall performance.