ppt - Nestor Home - University of Sunderland

compliantprotectiveSoftware and s/w Development

Dec 1, 2013 (4 years and 7 months ago)


Finding Concurrency


Harry R. Erwin

University of Sunderland


Design models for concurrent algorithms

Patterns for finding design models

Task decomposition

Data decomposition

What’s not parallel


Feedback opportunity


Clay Breshears (2009) The Art of Concurrency: A
Thread Monkey's Guide to Writing Parallel
Applications, O'Reilly Media, Pages: 304.

Mattson, T G; Sanders, B A; and B L Massingill (2005)
Patterns for Parallel Programming, Addison

Mordechai Ben
Ari (2006) Principles of Concurrent and
Distributed Programming, Addison

Wolfgang Kreutzer (1986) System Simulation:
Programming Styles and Languages, Addison


Gamma, Helm, Johnson, and Vlissides, 1995, Design Patterns,

The Portland Pattern Repository:

Resources on Parallel Patterns

Visual Studio 2010 and the Parallel Patterns Library



Alexander, 1977, A Pattern Language: Towns/Buildings/
Construction, Oxford University Press. (For historical interest.)

Flynn’s Taxonomy (from Wikipedia)

Single Instruction

Multiple Instruction

Single Data


a sequential
computer with no
parallelism in

or data.


multiple instruction
streams operate on a single
data stream. Unusual
used for reliability.

best known example is the
space shuttle flight control
computer, where the
results of each instruction
stream must agree.




a parallel computer
where sequential

instructions operate on
multiple data streams. An
array processor or GPU.


a distributed
system with multiple CPUs.

Finding Concurrency

Chapter 2 of

begins by mentioning
Mattson, et al. (2005). That book defines a
pattern language for parallel programming,
and explores four design spaces where
patterns provide solutions. These include:

Finding Concurrency

Algorithm Structure

Supporting Structures

Implementation Mechanisms

Finding Concurrency Design Space

This contains six patterns:

Task Decomposition

Data Decomposition

Group Tasks

Order Tasks

Data Sharing

Design Evaluation


(2009) explores the first two patterns
in detail. I will summarise all six here

and add
some slides after the first two to cover


Task Decomposition

What tasks can execute concurrently to solve the

The programmer starts by investigating the
computationally intensive parts of the problem, the
key data structures, and how the data are used.

The tasks may be clear. Your concerns are flexibility,
efficiency, and simplicity.

Identify lots of tasks

they can be merged later or
threads can perform multiple tasks. Look at function
calls and loops.

Finally look at the data.

Possible Ways to Organise Your Tasks

Have the main method create and start the
threads. It then waits for all tasks to complete
and finally generates the results.

You can also create and start threads as needed.
This is preferred if the need for threads is not
clear until the program has been running. A
recursive or binary search is an example.

t’s cheaper not to start a thread if it’s likely you
will have to stop it.

Content of your Task Decomposition

What are the tasks?

What are the dependencies between tasks?

How are tasks assigned to threads?


Explore concurrent execution of your threads.

Do a desk (manual) simulation, or

Program a simulation.

Look for correctness

you want to avoid race conditions
and ensure data are shared when required.

Look for efficiency

all threads with parallel tasks should
be sharing the computer. If threads are blocked, the design
is inefficient. Balance your threads.

Focus on the resource
intensive parts of the program.
Often the limiting resources are a surprise.

At least one task per thread or core and task should
actually do enough useful work to justify their existence.

Data Decomposition

Look for parallelism in the problem’s data.

If the most computationally intensive part of the
problem involves a large data structure, and the data in
the structure can be manipulated in parallel, consider
organising your tasks around that manipulation.

Consider flexibility, efficiency, and simplicity in your

Chunk the data so it can be operated on in parallel.
Look for array
based processing and recursion. Plan for
scalability and efficiency.

Finally, look at the tasks.

Possible Ways to Organise your Data

Consider the structure of your data.

Consider restructuring your data to support parallel

Arrays are good for data parallelism. Divide them along
one or more of their dimensions.

Fixed format tables are also good for data parallelism.
Statistical data frames lend themselves to parallel

Lists are good, but only if you have random access to

Load balancing is


How do you divide your data into chunks?

How do you ensure that the task responsible
for a chunk has access to the data it needs to
do its job?

How are data chunks assigned to threads?

Content of a Data Decomposition

Chunking the data:

Individual elements




What do the boundaries between chunks look
like? They should have small ‘area’ to
minimise interference.

Data Synchronisation

Consider efficiency. There are two

Copy the data over before it is needed. (Storage is
required, and the data need to be frozen after

Share the data when it is needed. (Time is
required, both to move the data and to wait for
the transfer to complete. Locking may be required
while the data are used.)

Consider how often will copying be needed.

Data Scheduling

You can assign data to specific threads
statically or dynamically.

Static is easier to implement.

Dynamically allows load
balancing and
supports scalability.

Your task may have to wait for another thread
to run, so you need to consider dynamic
scheduling of tasks, which is messy…

Group Tasks

How can tasks be grouped to simplify managing

This is done after the task decomposition.

If tasks share constraints or are closely related,
consider grouping them so that one feeds another or
they form a larger task. You want an organised team of
tasks, not a large number of individual tasks.

Consider the following possibilities: order dependency,
simultaneous execution, free concurrency.

Look at various possible groupings and organisations.

Order Tasks

Given a collection of tasks, in what order must
they run?

You will need to find and enforce the order
dependencies of the system.

The order needs to be restrictive enough that the
order dependencies are enforced, but no more
restrictive than that for maximum efficiency.

Consider data ordering and limitations imposed
by external services.

Data Sharing

How should data be shared among the tasks you have

Classify data into task
local data and shared data and
then define a protocol for data sharing.

Consider race conditions and synchronisation
overhead. Avoid joins if the threads involved have very
different resource requirements or timing.

Data can be read
only, effectively
local, or read
Look at replication for read
only data. Some read
data summarise information collected by individual
tasks, or data may be modified by a single task. Look at
using local copies of these data.

Design Evaluation

Time to ask yourself, am I done?

Iterate over possible designs to choose the
best one.

Perhaps prototype the design to gain an
understanding of where the time and
resources are going.

Check each possible design for correctness
and efficiency. Consider the hardware

Four Key Factors

Efficiency **

Simplicity *

Portability *

Scalability ***

What’s Not Parallel

Having a baby

Algorithms, functions, or procedures with persistent

Recurrence relations using data from loop t in loop t+1.
If it’s loop
, you can ‘unwind’ the loop for some

Induction variables incremented non
linearly with each
loop pass.

Reductions transforming a vector to a value.

carried dependence

where data generated in a
previous loop iteration is used in the current iteration.

Modelling Massive Parallelism

Eventually, you’ll be asked to model a massively
parallel system, consisting of about 10,000
workstations communicating with a flight database.

You may be tempted to define 10,000 threads, each
modelling a workstation. Don’t go there.

Why? Because threads take up storage and have
overhead. Also, operating systems cannot deal with
that many threads simultaneously. UNIX, for example,
is limited to 32 threads.

There’s a better way, called ‘event
stepped simulation’.


Treat each workstation thread as a task. For each, keep
track of what is next to be done and when.

Define a simulation thread that works with a priority
queue. It also keeps track of a clock. The priority queue
maintains task actions in time order.

The simulation thread asks the priority queue for the
next action, updates the clock to the time of that
action, performs any associated commands, and files
the next task action(s) in the priority queue, scheduled
for its next action time.

We will explore this next week in Tutorial.


Take out a piece of paper.

Write down:

What’s working.

What isn’t.

What you would do differently.

Hand it in.

I’ll go over the comments next lecture.

Note next lecture looks at some code, and
there are no slides.