Task Parallelism using Agent Library

coleslawokraΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

130 εμφανίσεις

Task Parallelism

using Agent Library

Worawan

Diaz
Carballo
,
Phd


Department of Computer Science

Faculty of Science and Technology

Thammasat

University

1

Outline


Parallel Pattern Library (Cont
.)


Recap
: Task
-
based Programming
Models


Asynchronous Agents
Library


Concurrent
Containers


Demo and laboratory

CS427 Introduction to Parallel Computing

2

Chapter Objectives


After finish this module, students should..

o
understand the core concepts of the task
-
based
parallel programming model (task parallelism)

o
identify the suitable programming model (data
-

or
task parallelism) for various problems

o
be able to develop task parallel applications using the
Asynchronous Agents Library

o
understand the concepts of concurrent containers
and be able to use appropriate containers for
developing a parallel application.




CS427 Introduction to Parallel Computing

3

PARALLEL PATTERN LIBRARY
(Cont.)

http://
msdn.microsoft.com/en
-
us/library/dd728066.aspx

#include
<
ppl.h
>

using namespace
Concurrency;


CS427 Introduction to Parallel Computing

4

Visual C++
10 Parallel Algorithms


parallel_for
(x, y, step,
λ
);


parallel_for_each
(it,
λ
)


parallel_invoke
(
λ
,
λ
)


task_group

/
task_handle


task_group
::run

(
λ
)


Native concurrency runtime


CS427 Introduction to Parallel Computing

5

Example:
Bitonic

Sort


A
bitonic

sorting
network sorts n elements (of a
bitonic

sequence) in

(log
2
n).


A
bitonic

sequence
𝑎
0
,
𝑎
1
,

,
𝑎
𝑛

1

is a
sequence with the property that either

1.
there exists an index
𝑖
,
0


𝑖


𝑛

1
, such that
𝑎
0
,

,
𝑎
𝑖

is monotonically increasing and
𝑎
𝑖
+
1
,

,
𝑎
𝑛

1

is monotonically
decreasing, or

2.
there exists a
cyclic shift
of indices that
(1)

is
satisfied.


For example,
1,

2,

4,

7,

6,

0

,
8,

9,

2,

1,

0
,
4

are
bitonic

sequences.



CS427 Introduction to Parallel Computing

6

Example of log 16
bitonic

splits


Original Sequence

3 5 8 9 10 12 14 20 95 90 60 40 35 23 18 0


Bitonic

split & merge

3 5 8
9 10 12 14

0
95 90 60 40 35 23
18 20

3 5 8 0 10
12 14

9 35 23 18 20 95 90 60 40

3 0 8 5 10 9 14 12 18 20 35 23 60 40 95 90

0 3 5 8 9 10 12 14 18 20 23 35 40 60 90 95



CS427 Introduction to Parallel Computing

7

Example: Parallel
Bitonic

Sort


Create a new function,
called

parallel_bitonic_mege
, which uses
the

parallel_invoke

algorithm to merge the
sequences in parallel if there is sufficient amount
of work to do. Otherwise, call

bitonic_merge

to
merge the sequences serially
.


Perform a process that resembles the one in the
previous step, but for the

bitonic_sort

function
.


Create an overloaded version of
the

parallel_bitonic_sort

function that sorts the
array in increasing order.

CS427 Introduction to Parallel Computing

8

Parallel_invoke

Wrapped Up


The

parallel_invoke

algorithm reduces overhead
by performing the last of the series of tasks on
the calling context.


For
example,
in the
parallel_bitonic_sort

function,

o

the
first task runs on a separate context,

o
and
the second task runs on the calling context.

CS427 Introduction to Parallel Computing

9

// Sort the partitions in parallel.


parallel_invoke
(


[&
items,lo,m
] {
parallel_bitonic_sort
(items, lo,



m, INCREASING); },


[&
items,lo,m
] {
parallel_bitonic_sort
(items, lo + m,







m, DECREASING); } );

RECAP: TASK
-
BASED
PROGRAMMING MODELS

Steve Teixeira

CS427 Introduction to Parallel Computing

10

CPU0

CPU1

CPU2

CPU3

Load
-
Balancing of Tasks

Dynamic
scheduling improves
performance by distributing work
efficiently
at runtime.

CPU0

CPU1

CPU2

CPU3

source: http://az8714.vo.msecnd.net/presentations/FT52
-
Teixeira
.
ppt
x

Program
Thread

Work
-
Stealing Scheduler

Lock
-
Free

Global
Queue

Local

Work
-
Stealing
Queue

Local
Work
-
Stealing
Queue

Worker
Thread 1

Worker
Thread p





Task 1

Task 2

Task 3

Task 5

Task 4

Task 6

Thread Management:


Starvation Detection


Idle Thread Retirement


Hill
-
climbing

Messaging & Dataflow patterns


We are all trained the think like machines in terms of
sequential flow of operations on data


Tasks are better than threads because tasks enable you
think in terms of chunks of work rather than execution
flow


Tasks, however, still require coordination of state around
shared data

o
And complexity increases with the size of the code base


Writing to actor
-
message or dataflow patterns enable you
to design around data flow and avoid shared state

o
Value grows as system size and parallelism scales up

ASYNCHRONOUS AGENTS LIBRARY

Data Flow & Message Passing using Visual C++ 10

CS427 Introduction to Parallel Computing

14

Messaging and Agents


Not all patterns map to loops or tasks.

o
Pipelines, state machines, producer/consumer


Agent
: an asynchronous object that
communicates through message passing.


Message Blocks
: participants in message
-
passing which transport from source to target.


Message
: encapsulates state that is transferred
between message blocks.

Asynchronous Agents Library

Message blocks for storing
data
(Core Message Blocks)

o
unbounded_buffer
<T>

o
overwrite_buffer
<T>

o
single_assignment
<T>

Message blocks for
pipelining (Execute a fn.
asynchronously when work
is received)


transformer<T,U
>


call<T
>

Message blocks for joining
data

(Waiting efficiently on a set of
message blocks)

o
choice

o
join

Cooperatively send &
receive messages


send
,
asend


receive


try_receive

Simple Agents Example

send

unbounded_buffer

transformer

(reverse)

receive


glorp
”`

propagate


glorp
”`

propagate


prolg
”`

Simple Agents
Example:
ReverserAgent

class

ReverserAgent

:
public

Concurrency::agent

{

private
:


transformer<
string,string
> reverser;

public
:


unbounded_buffer
<string
>
inputBuffer
;



ReverseAgent
()


:
reverser([] (string in)
-
> string
{



string
reversed(in);


reverse(
reversed.begin
(),




reversed.end
());



return

reversed;


})



{


inputBuffer.link_target
(&reverser);



}


protected:


virtual
void
run
();

};

unbounded_buffer

transformer

(reverse)

receive

propagate


glorp
”`

propagate


prolg
”`

Simple Agents Example:
ReverserAgent
::run

void
ReverserAgent
::run() {


for (;;) {


string s = receive(&reverser);



if (s == "pots") {


done();


return;


}



cout

<< "Received message : " << s <<
endl
;


}

}

Simple Agents
Example:

Sending messages

void main()

{


ReverserAgent

reverseAgent
;


reverseAgent.start
();



for (;;) {


string s;


cin

>> s;


send(
reverseAgent.inputBuffer
, s);


if (s == "stop")


break;


}



agent::wait(&
reverseAgent
);

}

send

CONCURRENT CONTAINERS

CS427 Introduction to Parallel Computing

21

Concurrent Containers


Two thread
-
safe, lock
-
free containers provided:

o
concurrent_vector
<T>
:


Lock
-
free
push_back
, element access, and iteration


No deletion!

o
concurrent_queue
<T
>
:


Lock
-
free push and pop


Sample pack adds:

o
concurrent_unordered_map
<T,U
>

o
concurrent_set
<T
>

concurrent_vector<T>

#include
<
ppl.h
>

#include

<
concurrent_vector.h
>


using namespace
Concurrency;


concurrent_vector
<
int
>
carmVec
;


parallel_for
(2, 5000000, [&
carmVec
](
int

i) {



if
(
is_carmichael
(i))


carmVec.push_back
(i);

});

concurrent_queue
<T>

#include

<
ppl.h
>

#include
<
concurrent_queue.h
>

using namespace

Concurrency;


concurrent_queue
<
int
>
itemQueue
;

parallel_invoke
([&
itemQueue
]

{
// Produce 1000 items



for

(
int

i=0; i<1000; ++i)



itemQueue.push
(i);


},




[&
itemQueue
] {
// Consume 1000 items



for

(
int

i=0; i<1000; ++i
) {


int

result =
-
1;



while

(!
itemQueue.try_pop
(result))



Context::Yield();



ProcessItem
(result);



}



});

Take
-
aways


The “Many Core Shift” is happening


VS2010 with the Concurrency Runtime can help


Use PPL & Agents to express your
potential

concurrency


Let the runtime figure out the
actual

concurrency


Parallel iteration can help your application scale


Asynchronous Agents provide isolation from
shared state


Concurrent collections are scalable and lock
-
free

Summary


Dev10 brings parallel computing to the mainstream

o
Managed code: TPL and PLINQ

o
Native code: PPL and asynchronous agents

o
Debugging and profiling



More work ongoing in the Parallel Computing Platform
team

o
TPL Dataflow

o
Visual Studio
Async

o
CTP available now!

Interesting Links


Notheast

Roadshow: Parallel Programming with
.NET 4 by Chris Bowen
(
http://
channel9.msdn.com/Blogs/dpeeast/Nort
heast
-
Roadshow
-
Parallel
-
Programming
-
with
-
NET
-
4
)


Parallel Programming for Managed Developers
with the Next Version of MS VS (VS2010) by
Daniel Moth

a the PDC 2008
(
http
://
channel9.msdn.com/blogs/pdc2008/tl26
)



MSDN Northeast Roadshow
(
http://code.msdn.microsoft.com/northeast
)


CS427 Introduction to Parallel Computing

27