ppt - Nestor Home - University of Sunderland

compliantprotectiveSoftware and s/w Development

Dec 1, 2013 (3 years and 8 months ago)

56 views

Finding Concurrency

CET306

Harry R. Erwin

University of Sunderland

Roadmap


Design models for concurrent algorithms


Patterns for finding design models


Task decomposition


Data decomposition


What’s not parallel


Conclusions


Feedback opportunity

Texts


Clay Breshears (2009) The Art of Concurrency: A
Thread Monkey's Guide to Writing Parallel
Applications, O'Reilly Media, Pages: 304.


Mattson, T G; Sanders, B A; and B L Massingill (2005)
Patterns for Parallel Programming, Addison
-
Wesley.


Mordechai Ben
-
Ari (2006) Principles of Concurrent and
Distributed Programming, Addison
-
Wesley.


Wolfgang Kreutzer (1986) System Simulation:
Programming Styles and Languages, Addison
-
Wesley.

Resources


Gamma, Helm, Johnson, and Vlissides, 1995, Design Patterns,
Addison
-
Wesley.


The Portland Pattern Repository:
http://c2.com/ppr/


Resources on Parallel Patterns
http://www.cs.uiuc.edu/homes/snir/PPP/



Visual Studio 2010 and the Parallel Patterns Library
http://msdn.microsoft.com/en
-
us/magazine/dd434652.aspx

http://www.microsoft.com/download/en/details.aspx?id=19222

http://msdn.microsoft.com/en
-
us/library/dd492418.aspx



Alexander, 1977, A Pattern Language: Towns/Buildings/
Construction, Oxford University Press. (For historical interest.)

Flynn’s Taxonomy (from Wikipedia)

Single Instruction

Multiple Instruction

Single Data

SISD

a sequential
computer with no
parallelism in

instructions
or data.

MISD

multiple instruction
streams operate on a single
data stream. Unusual
?v
used for reliability.

The
best known example is the
space shuttle flight control
computer, where the
results of each instruction
stream must agree.

Multiple

Data

SIMD

a parallel computer
where sequential

instructions operate on
multiple data streams. An
array processor or GPU.

MIMD

a distributed
system with multiple CPUs.

Finding Concurrency


Chapter 2 of
Breshears

begins by mentioning
Mattson, et al. (2005). That book defines a
pattern language for parallel programming,
and explores four design spaces where
patterns provide solutions. These include:


Finding Concurrency


Algorithm Structure


Supporting Structures


Implementation Mechanisms

Finding Concurrency Design Space


This contains six patterns:


Task Decomposition


Data Decomposition


Group Tasks


Order Tasks


Data Sharing


Design Evaluation


Breshears

(2009) explores the first two patterns
in detail. I will summarise all six here

and add
some slides after the first two to cover
Breshears

points.

Task Decomposition


What tasks can execute concurrently to solve the
problem?


The programmer starts by investigating the
computationally intensive parts of the problem, the
key data structures, and how the data are used.


The tasks may be clear. Your concerns are flexibility,
efficiency, and simplicity.


Identify lots of tasks

they can be merged later or
threads can perform multiple tasks. Look at function
calls and loops.


Finally look at the data.

Possible Ways to Organise Your Tasks


Have the main method create and start the
threads. It then waits for all tasks to complete
and finally generates the results.


You can also create and start threads as needed.
This is preferred if the need for threads is not
clear until the program has been running. A
recursive or binary search is an example.


I
t’s cheaper not to start a thread if it’s likely you
will have to stop it.

Content of your Task Decomposition


What are the tasks?


What are the dependencies between tasks?


How are tasks assigned to threads?

Consider


Explore concurrent execution of your threads.


Do a desk (manual) simulation, or


Program a simulation.


Look for correctness

you want to avoid race conditions
and ensure data are shared when required.


Look for efficiency

all threads with parallel tasks should
be sharing the computer. If threads are blocked, the design
is inefficient. Balance your threads.


Focus on the resource
-
intensive parts of the program.
Often the limiting resources are a surprise.


At least one task per thread or core and task should
actually do enough useful work to justify their existence.

Data Decomposition


Look for parallelism in the problem’s data.


If the most computationally intensive part of the
problem involves a large data structure, and the data in
the structure can be manipulated in parallel, consider
organising your tasks around that manipulation.


Consider flexibility, efficiency, and simplicity in your
design.


Chunk the data so it can be operated on in parallel.
Look for array
-
based processing and recursion. Plan for
scalability and efficiency.


Finally, look at the tasks.

Possible Ways to Organise your Data


Consider the structure of your data.


Consider restructuring your data to support parallel
operations.


Arrays are good for data parallelism. Divide them along
one or more of their dimensions.


Fixed format tables are also good for data parallelism.
Statistical data frames lend themselves to parallel
algorithms.


Lists are good, but only if you have random access to
sublists
.


Load balancing is
important
.

Consider


How do you divide your data into chunks?


How do you ensure that the task responsible
for a chunk has access to the data it needs to
do its job?


How are data chunks assigned to threads?

Content of a Data Decomposition


Chunking the data:


Individual elements


Rows


Columns


Blocks


What do the boundaries between chunks look
like? They should have small ‘area’ to
minimise interference.

Data Synchronisation


Consider efficiency. There are two
approaches:


Copy the data over before it is needed. (Storage is
required, and the data need to be frozen after
copying.)


Share the data when it is needed. (Time is
required, both to move the data and to wait for
the transfer to complete. Locking may be required
while the data are used.)


Consider how often will copying be needed.

Data Scheduling


You can assign data to specific threads
statically or dynamically.


Static is easier to implement.


Dynamically allows load
-
balancing and
supports scalability.


Your task may have to wait for another thread
to run, so you need to consider dynamic
scheduling of tasks, which is messy…

Group Tasks


How can tasks be grouped to simplify managing
dependencies.


This is done after the task decomposition.


If tasks share constraints or are closely related,
consider grouping them so that one feeds another or
they form a larger task. You want an organised team of
tasks, not a large number of individual tasks.


Consider the following possibilities: order dependency,
simultaneous execution, free concurrency.


Look at various possible groupings and organisations.

Order Tasks


Given a collection of tasks, in what order must
they run?


You will need to find and enforce the order
dependencies of the system.


The order needs to be restrictive enough that the
order dependencies are enforced, but no more
restrictive than that for maximum efficiency.


Consider data ordering and limitations imposed
by external services.

Data Sharing


How should data be shared among the tasks you have
defined?


Classify data into task
-
local data and shared data and
then define a protocol for data sharing.


Consider race conditions and synchronisation
overhead. Avoid joins if the threads involved have very
different resource requirements or timing.


Data can be read
-
only, effectively
-
local, or read
-
write.
Look at replication for read
-
only data. Some read
-
write
data summarise information collected by individual
tasks, or data may be modified by a single task. Look at
using local copies of these data.

Design Evaluation


Time to ask yourself, am I done?


Iterate over possible designs to choose the
best one.


Perhaps prototype the design to gain an
understanding of where the time and
resources are going.


Check each possible design for correctness
and efficiency. Consider the hardware
environment.

Four Key Factors


Efficiency **


Simplicity *


Portability *


Scalability ***

What’s Not Parallel


Having a baby


Algorithms, functions, or procedures with persistent
state.


Recurrence relations using data from loop t in loop t+1.
If it’s loop
t+k
, you can ‘unwind’ the loop for some
parallelism.


Induction variables incremented non
-
linearly with each
loop pass.


Reductions transforming a vector to a value.


Loop
-
carried dependence

where data generated in a
previous loop iteration is used in the current iteration.



Modelling Massive Parallelism


Eventually, you’ll be asked to model a massively
parallel system, consisting of about 10,000
workstations communicating with a flight database.


You may be tempted to define 10,000 threads, each
modelling a workstation. Don’t go there.


Why? Because threads take up storage and have
overhead. Also, operating systems cannot deal with
that many threads simultaneously. UNIX, for example,
is limited to 32 threads.


There’s a better way, called ‘event
-
stepped simulation’.

Approach


Treat each workstation thread as a task. For each, keep
track of what is next to be done and when.


Define a simulation thread that works with a priority
queue. It also keeps track of a clock. The priority queue
maintains task actions in time order.


The simulation thread asks the priority queue for the
next action, updates the clock to the time of that
action, performs any associated commands, and files
the next task action(s) in the priority queue, scheduled
for its next action time.


We will explore this next week in Tutorial.

Conclusion


Take out a piece of paper.


Write down:


What’s working.


What isn’t.


What you would do differently.


Hand it in.


I’ll go over the comments next lecture.


Note next lecture looks at some code, and
there are no slides.