# Task Parallelism using Agent Library

Λογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 4 χρόνια και 7 μήνες)

149 εμφανίσεις

using Agent Library

Worawan

Diaz
Carballo
,
Phd

Department of Computer Science

Faculty of Science and Technology

Thammasat

University

1

Outline

Parallel Pattern Library (Cont
.)

Recap
-
based Programming
Models

Asynchronous Agents
Library

Concurrent
Containers

Demo and laboratory

CS427 Introduction to Parallel Computing

2

Chapter Objectives

After finish this module, students should..

o
understand the core concepts of the task
-
based

o
identify the suitable programming model (data
-

or

o
be able to develop task parallel applications using the
Asynchronous Agents Library

o
understand the concepts of concurrent containers
and be able to use appropriate containers for
developing a parallel application.

CS427 Introduction to Parallel Computing

3

PARALLEL PATTERN LIBRARY
(Cont.)

http://
msdn.microsoft.com/en
-
us/library/dd728066.aspx

#include
<
ppl.h
>

using namespace
Concurrency;

CS427 Introduction to Parallel Computing

4

Visual C++
10 Parallel Algorithms

parallel_for
(x, y, step,
λ
);

parallel_for_each
(it,
λ
)

parallel_invoke
(
λ
,
λ
)

/

::run

(
λ
)

Native concurrency runtime

CS427 Introduction to Parallel Computing

5

Example:
Bitonic

Sort

A
bitonic

sorting
network sorts n elements (of a
bitonic

sequence) in

(log
2
n).

A
bitonic

sequence
𝑎
0
,
𝑎
1
,

,
𝑎
𝑛

1

is a
sequence with the property that either

1.
there exists an index
𝑖
,
0

𝑖

𝑛

1
, such that
𝑎
0
,

,
𝑎
𝑖

is monotonically increasing and
𝑎
𝑖
+
1
,

,
𝑎
𝑛

1

is monotonically
decreasing, or

2.
there exists a
cyclic shift
of indices that
(1)

is
satisfied.

For example,
1,

2,

4,

7,

6,

0

,
8,

9,

2,

1,

0
,
4

are
bitonic

sequences.

CS427 Introduction to Parallel Computing

6

Example of log 16
bitonic

splits

Original Sequence

3 5 8 9 10 12 14 20 95 90 60 40 35 23 18 0

Bitonic

split & merge

3 5 8
9 10 12 14

0
95 90 60 40 35 23
18 20

3 5 8 0 10
12 14

9 35 23 18 20 95 90 60 40

3 0 8 5 10 9 14 12 18 20 35 23 60 40 95 90

0 3 5 8 9 10 12 14 18 20 23 35 40 60 90 95

CS427 Introduction to Parallel Computing

7

Example: Parallel
Bitonic

Sort

Create a new function,
called

parallel_bitonic_mege
, which uses
the

parallel_invoke

algorithm to merge the
sequences in parallel if there is sufficient amount
of work to do. Otherwise, call

bitonic_merge

to
merge the sequences serially
.

Perform a process that resembles the one in the
previous step, but for the

bitonic_sort

function
.

the

parallel_bitonic_sort

function that sorts the
array in increasing order.

CS427 Introduction to Parallel Computing

8

Parallel_invoke

Wrapped Up

The

parallel_invoke

by performing the last of the series of tasks on
the calling context.

For
example,
in the
parallel_bitonic_sort

function,

o

the
first task runs on a separate context,

o
and
the second task runs on the calling context.

CS427 Introduction to Parallel Computing

9

// Sort the partitions in parallel.

parallel_invoke
(

[&
items,lo,m
] {
parallel_bitonic_sort
(items, lo,

m, INCREASING); },

[&
items,lo,m
] {
parallel_bitonic_sort
(items, lo + m,

m, DECREASING); } );

-
BASED
PROGRAMMING MODELS

Steve Teixeira

CS427 Introduction to Parallel Computing

10

CPU0

CPU1

CPU2

CPU3

-

Dynamic
scheduling improves
performance by distributing work
efficiently
at runtime.

CPU0

CPU1

CPU2

CPU3

source: http://az8714.vo.msecnd.net/presentations/FT52
-
Teixeira
.
ppt
x

Program

Work
-
Stealing Scheduler

Lock
-
Free

Global
Queue

Local

Work
-
Stealing
Queue

Local
Work
-
Stealing
Queue

Worker

Worker

Starvation Detection

Hill
-
climbing

Messaging & Dataflow patterns

We are all trained the think like machines in terms of
sequential flow of operations on data

think in terms of chunks of work rather than execution
flow

Tasks, however, still require coordination of state around
shared data

o
And complexity increases with the size of the code base

Writing to actor
-
message or dataflow patterns enable you
to design around data flow and avoid shared state

o
Value grows as system size and parallelism scales up

ASYNCHRONOUS AGENTS LIBRARY

Data Flow & Message Passing using Visual C++ 10

CS427 Introduction to Parallel Computing

14

Messaging and Agents

Not all patterns map to loops or tasks.

o
Pipelines, state machines, producer/consumer

Agent
: an asynchronous object that
communicates through message passing.

Message Blocks
: participants in message
-
passing which transport from source to target.

Message
: encapsulates state that is transferred
between message blocks.

Asynchronous Agents Library

Message blocks for storing
data
(Core Message Blocks)

o
unbounded_buffer
<T>

o
overwrite_buffer
<T>

o
single_assignment
<T>

Message blocks for
pipelining (Execute a fn.
asynchronously when work

transformer<T,U
>

call<T
>

Message blocks for joining
data

(Waiting efficiently on a set of
message blocks)

o
choice

o
join

Cooperatively send &

send
,
asend

Simple Agents Example

send

unbounded_buffer

transformer

(reverse)

glorp
”`

propagate

glorp
”`

propagate

prolg
”`

Simple Agents
Example:
ReverserAgent

class

ReverserAgent

:
public

Concurrency::agent

{

private
:

transformer<
string,string
> reverser;

public
:

unbounded_buffer
<string
>
inputBuffer
;

ReverseAgent
()

:
reverser([] (string in)
-
> string
{

string
reversed(in);

reverse(
reversed.begin
(),

reversed.end
());

return

reversed;

})

{

(&reverser);

}

protected:

virtual
void
run
();

};

unbounded_buffer

transformer

(reverse)

propagate

glorp
”`

propagate

prolg
”`

Simple Agents Example:
ReverserAgent
::run

void
ReverserAgent
::run() {

for (;;) {

if (s == "pots") {

done();

return;

}

cout

<< "Received message : " << s <<
endl
;

}

}

Simple Agents
Example:

Sending messages

void main()

{

ReverserAgent

reverseAgent
;

reverseAgent.start
();

for (;;) {

string s;

cin

>> s;

send(
reverseAgent.inputBuffer
, s);

if (s == "stop")

break;

}

agent::wait(&
reverseAgent
);

}

send

CONCURRENT CONTAINERS

CS427 Introduction to Parallel Computing

21

Concurrent Containers

-
safe, lock
-
free containers provided:

o
concurrent_vector
<T>
:

Lock
-
free
push_back
, element access, and iteration

No deletion!

o
concurrent_queue
<T
>
:

Lock
-
free push and pop

o
concurrent_unordered_map
<T,U
>

o
concurrent_set
<T
>

concurrent_vector<T>

#include
<
ppl.h
>

#include

<
concurrent_vector.h
>

using namespace
Concurrency;

concurrent_vector
<
int
>
carmVec
;

parallel_for
(2, 5000000, [&
carmVec
](
int

i) {

if
(
is_carmichael
(i))

carmVec.push_back
(i);

});

concurrent_queue
<T>

#include

<
ppl.h
>

#include
<
concurrent_queue.h
>

using namespace

Concurrency;

concurrent_queue
<
int
>
itemQueue
;

parallel_invoke
([&
itemQueue
]

{
// Produce 1000 items

for

(
int

i=0; i<1000; ++i)

itemQueue.push
(i);

},

[&
itemQueue
] {
// Consume 1000 items

for

(
int

i=0; i<1000; ++i
) {

int

result =
-
1;

while

(!
itemQueue.try_pop
(result))

Context::Yield();

ProcessItem
(result);

}

});

Take
-
aways

The “Many Core Shift” is happening

VS2010 with the Concurrency Runtime can help

Use PPL & Agents to express your
potential

concurrency

Let the runtime figure out the
actual

concurrency

Asynchronous Agents provide isolation from
shared state

Concurrent collections are scalable and lock
-
free

Summary

Dev10 brings parallel computing to the mainstream

o
Managed code: TPL and PLINQ

o
Native code: PPL and asynchronous agents

o
Debugging and profiling

More work ongoing in the Parallel Computing Platform
team

o
TPL Dataflow

o
Visual Studio
Async

o
CTP available now!

Notheast

.NET 4 by Chris Bowen
(
http://
channel9.msdn.com/Blogs/dpeeast/Nort
heast
-
-
Parallel
-
Programming
-
with
-
NET
-
4
)

Parallel Programming for Managed Developers
with the Next Version of MS VS (VS2010) by
Daniel Moth

a the PDC 2008
(
http
://
channel9.msdn.com/blogs/pdc2008/tl26
)