Finding Concurrency
CET306
Harry R. Erwin
University of Sunderland
Roadmap
•
Design models for concurrent algorithms
•
Patterns for finding design models
•
Task decomposition
•
Data decomposition
•
What’s not parallel
•
Conclusions
•
Feedback opportunity
Texts
•
Clay Breshears (2009) The Art of Concurrency: A
Thread Monkey's Guide to Writing Parallel
Applications, O'Reilly Media, Pages: 304.
•
Mattson, T G; Sanders, B A; and B L Massingill (2005)
Patterns for Parallel Programming, Addison
-
Wesley.
•
Mordechai Ben
-
Ari (2006) Principles of Concurrent and
Distributed Programming, Addison
-
Wesley.
•
Wolfgang Kreutzer (1986) System Simulation:
Programming Styles and Languages, Addison
-
Wesley.
Resources
•
Gamma, Helm, Johnson, and Vlissides, 1995, Design Patterns,
Addison
-
Wesley.
•
The Portland Pattern Repository:
http://c2.com/ppr/
•
Resources on Parallel Patterns
http://www.cs.uiuc.edu/homes/snir/PPP/
•
Visual Studio 2010 and the Parallel Patterns Library
http://msdn.microsoft.com/en
-
us/magazine/dd434652.aspx
http://www.microsoft.com/download/en/details.aspx?id=19222
http://msdn.microsoft.com/en
-
us/library/dd492418.aspx
•
Alexander, 1977, A Pattern Language: Towns/Buildings/
Construction, Oxford University Press. (For historical interest.)
Flynn’s Taxonomy (from Wikipedia)
Single Instruction
Multiple Instruction
Single Data
SISD
—
a sequential
computer with no
parallelism in
instructions
or data.
MISD
—
multiple instruction
streams operate on a single
data stream. Unusual
?v
used for reliability.
The
best known example is the
space shuttle flight control
computer, where the
results of each instruction
stream must agree.
Multiple
Data
SIMD
—
a parallel computer
where sequential
instructions operate on
multiple data streams. An
array processor or GPU.
MIMD
—
a distributed
system with multiple CPUs.
Finding Concurrency
•
Chapter 2 of
Breshears
begins by mentioning
Mattson, et al. (2005). That book defines a
pattern language for parallel programming,
and explores four design spaces where
patterns provide solutions. These include:
–
Finding Concurrency
–
Algorithm Structure
–
Supporting Structures
–
Implementation Mechanisms
Finding Concurrency Design Space
•
This contains six patterns:
–
Task Decomposition
–
Data Decomposition
–
Group Tasks
–
Order Tasks
–
Data Sharing
–
Design Evaluation
•
Breshears
(2009) explores the first two patterns
in detail. I will summarise all six here
and add
some slides after the first two to cover
Breshears
’
points.
Task Decomposition
•
What tasks can execute concurrently to solve the
problem?
•
The programmer starts by investigating the
computationally intensive parts of the problem, the
key data structures, and how the data are used.
•
The tasks may be clear. Your concerns are flexibility,
efficiency, and simplicity.
•
Identify lots of tasks
—
they can be merged later or
threads can perform multiple tasks. Look at function
calls and loops.
•
Finally look at the data.
Possible Ways to Organise Your Tasks
•
Have the main method create and start the
threads. It then waits for all tasks to complete
and finally generates the results.
•
You can also create and start threads as needed.
This is preferred if the need for threads is not
clear until the program has been running. A
recursive or binary search is an example.
•
I
t’s cheaper not to start a thread if it’s likely you
will have to stop it.
Content of your Task Decomposition
•
What are the tasks?
•
What are the dependencies between tasks?
•
How are tasks assigned to threads?
Consider
•
Explore concurrent execution of your threads.
–
Do a desk (manual) simulation, or
–
Program a simulation.
•
Look for correctness
—
you want to avoid race conditions
and ensure data are shared when required.
•
Look for efficiency
—
all threads with parallel tasks should
be sharing the computer. If threads are blocked, the design
is inefficient. Balance your threads.
•
Focus on the resource
-
intensive parts of the program.
Often the limiting resources are a surprise.
•
At least one task per thread or core and task should
actually do enough useful work to justify their existence.
Data Decomposition
•
Look for parallelism in the problem’s data.
•
If the most computationally intensive part of the
problem involves a large data structure, and the data in
the structure can be manipulated in parallel, consider
organising your tasks around that manipulation.
•
Consider flexibility, efficiency, and simplicity in your
design.
•
Chunk the data so it can be operated on in parallel.
Look for array
-
based processing and recursion. Plan for
scalability and efficiency.
•
Finally, look at the tasks.
Possible Ways to Organise your Data
•
Consider the structure of your data.
•
Consider restructuring your data to support parallel
operations.
•
Arrays are good for data parallelism. Divide them along
one or more of their dimensions.
•
Fixed format tables are also good for data parallelism.
Statistical data frames lend themselves to parallel
algorithms.
•
Lists are good, but only if you have random access to
sublists
.
•
Load balancing is
important
.
Consider
•
How do you divide your data into chunks?
•
How do you ensure that the task responsible
for a chunk has access to the data it needs to
do its job?
•
How are data chunks assigned to threads?
Content of a Data Decomposition
•
Chunking the data:
–
Individual elements
–
Rows
–
Columns
–
Blocks
•
What do the boundaries between chunks look
like? They should have small ‘area’ to
minimise interference.
Data Synchronisation
•
Consider efficiency. There are two
approaches:
–
Copy the data over before it is needed. (Storage is
required, and the data need to be frozen after
copying.)
–
Share the data when it is needed. (Time is
required, both to move the data and to wait for
the transfer to complete. Locking may be required
while the data are used.)
•
Consider how often will copying be needed.
Data Scheduling
•
You can assign data to specific threads
statically or dynamically.
•
Static is easier to implement.
•
Dynamically allows load
-
balancing and
supports scalability.
•
Your task may have to wait for another thread
to run, so you need to consider dynamic
scheduling of tasks, which is messy…
Group Tasks
•
How can tasks be grouped to simplify managing
dependencies.
•
This is done after the task decomposition.
•
If tasks share constraints or are closely related,
consider grouping them so that one feeds another or
they form a larger task. You want an organised team of
tasks, not a large number of individual tasks.
•
Consider the following possibilities: order dependency,
simultaneous execution, free concurrency.
•
Look at various possible groupings and organisations.
Order Tasks
•
Given a collection of tasks, in what order must
they run?
•
You will need to find and enforce the order
dependencies of the system.
•
The order needs to be restrictive enough that the
order dependencies are enforced, but no more
restrictive than that for maximum efficiency.
•
Consider data ordering and limitations imposed
by external services.
Data Sharing
•
How should data be shared among the tasks you have
defined?
•
Classify data into task
-
local data and shared data and
then define a protocol for data sharing.
•
Consider race conditions and synchronisation
overhead. Avoid joins if the threads involved have very
different resource requirements or timing.
•
Data can be read
-
only, effectively
-
local, or read
-
write.
Look at replication for read
-
only data. Some read
-
write
data summarise information collected by individual
tasks, or data may be modified by a single task. Look at
using local copies of these data.
Design Evaluation
•
Time to ask yourself, am I done?
•
Iterate over possible designs to choose the
best one.
•
Perhaps prototype the design to gain an
understanding of where the time and
resources are going.
•
Check each possible design for correctness
and efficiency. Consider the hardware
environment.
Four Key Factors
•
Efficiency **
•
Simplicity *
•
Portability *
•
Scalability ***
What’s Not Parallel
•
Having a baby
•
Algorithms, functions, or procedures with persistent
state.
•
Recurrence relations using data from loop t in loop t+1.
If it’s loop
t+k
, you can ‘unwind’ the loop for some
parallelism.
•
Induction variables incremented non
-
linearly with each
loop pass.
•
Reductions transforming a vector to a value.
•
Loop
-
carried dependence
—
where data generated in a
previous loop iteration is used in the current iteration.
Modelling Massive Parallelism
•
Eventually, you’ll be asked to model a massively
parallel system, consisting of about 10,000
workstations communicating with a flight database.
•
You may be tempted to define 10,000 threads, each
modelling a workstation. Don’t go there.
•
Why? Because threads take up storage and have
overhead. Also, operating systems cannot deal with
that many threads simultaneously. UNIX, for example,
is limited to 32 threads.
•
There’s a better way, called ‘event
-
stepped simulation’.
Approach
•
Treat each workstation thread as a task. For each, keep
track of what is next to be done and when.
•
Define a simulation thread that works with a priority
queue. It also keeps track of a clock. The priority queue
maintains task actions in time order.
•
The simulation thread asks the priority queue for the
next action, updates the clock to the time of that
action, performs any associated commands, and files
the next task action(s) in the priority queue, scheduled
for its next action time.
•
We will explore this next week in Tutorial.
Conclusion
•
Take out a piece of paper.
•
Write down:
–
What’s working.
–
What isn’t.
–
What you would do differently.
•
Hand it in.
•
I’ll go over the comments next lecture.
•
Note next lecture looks at some code, and
there are no slides.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment