Parallel Programming in .NET

perchorangeΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 4 χρόνια και 7 μήνες)

147 εμφανίσεις

Parallel Programming in .NET

Kevin Luty

Department of Software Engineering

University of Wisconsin



As hardware trends change from increas

clock speeds of processors to fitting multiple central
processing units, or cor
es, into one processor,

developers will now have to take into effect the
efficiency of their code. Software engineers will now have to understand the design patterns and

available to them in order to create fast, responsive applications to continuously appease
customers on the widely accepted business platform, Microsoft Windows. The most recent

released by Microsoft has made parallel programming easier
to design and code
than ever before
. To accomplish this, the software engineer will have to understand the
difference between data parallelism versus task parallelism, and when to apply the appropriate
software design to the given probl
ems. This discussi
on will define parallelism, differentiate

rallel loops versus parallel tasks, discuss how and when to use the appropriate design patterns
and briefly describe supporting frameworks
and tools
for parallel programming.


History of Parallel

In the late


and 1970’s
, computer scientists made it possible with the help of hardware


to implement p
arallel computing into supercomputers. During the 1980’s,
continuous development allowed scientists t
o build a supercomputer

using 64

Intel 8086/8087
microprocessors. This proved that when using mass market chipsets
, or massively parallel
processors (MPPs)
, extreme performance was able to be obtained, thus research and
development efforts woul
d continue.

In the 1980’
s, cl
usters came about to replace applications built using MPPs. Clusters are
essentially a parallel

computing machine that is

connected to a network
using many off
. Modern day clusters are now th
e dominating architecture of data centers a
round the


As clock speeds have increased due to the decrease in size of transistors, the amount of cores in a
processor is now the main focus in developing today’s processor. The reason for this



is because the efficiency benefit
s gained in creating multi
core processors outweighs the
costs of increasing clock
speeds of the processors.
So, because of increasing restrictions and
standards on energy efficiency with electronic devices, developing multi
core processors will
become th
e primary focus.


History of Parallel Software

In the early ages of parallel computing, many computers were shipped with a single core, single
. Because of this, sequential programs have been easy to write versus parallel programs,
as they still are today. Now that the
amount of cores in processors has

been increasing

it is the
job of the
software architects to

the changes in the env
ironment and adapt to them.

As the amount of cores in a processor started to increase quickly,

the lack in number of API

programming interface that allows the developer to
use already developed
, in t
his case, parallel
programming code

that supported
parallel programming made it hard for developers to create
parallel applications

The reason for this being that the industry did not create stand

for the
parallel architectures

. In the 1990’s standards began to emerge for concurrent programming;

by the year 2000, Message Passing Interface (MPI), POSIX threads (pthreads), and Open
Multiprocessing (OpenMP), have all been created to help software developers become successful
parallel program developers.

Most recently, the newest libraries to th
e scene of Microsoft Windows parallel progra
mming has
been the release of
.NET Framework

. Included in this release were

the Parallel Pattern Library


Task Parallel Library


and Parallel Language
Integrated Query (PLINQ)
, which are

mostly comm
only used with C++

C#. These libraries have made it easier to implement
design patterns, discussed later, in practice.

Benefits of Understanding Parallel Programming

As multi
core devices become more prominent in computing devices, the understanding of
parallel programming allows a software developer to become a much more powerful and needed
resource in the work place.

The main benefit of using parallel programming
is to make efficient use of the cores in a
processor. As many software developers still write sequential programs

because they have no
knowledge of
parallel programming

y are not making use of the other cores. For instance, if
a program is to initiali
ze an array of size 1,000,000, the software will loop through said amount
of times. Using the TPL, a developer can use a parallel loop that will automatically create
multiple tasks, on separate threads, which divides the initialization process evenly amon
g cores.
Dividing the work among the cores means that there will be a significant decrease in the amount
of time it takes to complete the task. This however, is a simple application of parallel
. Later discussion will inform on

when to apply
parallel programming techniques.


In addition, parallel programming in .NET was written to automatically handle the hardware
capabilities. This means that if there is only one core, or many cores, parallel program
in .NET will handle all the situation
s and will make use of the hardware when it is readily
available. In other words, if a parallel program is run on a processor with one core, it will run
the same as a sequential program
. Since .NET handles this, it takes a great amount of
pressure of
f of the developer, allowing him or her to focus strictly on the software instead of
worrying about how the hardware handles the code.

Another benefit to understanding parallel programming in will allow the software developer to
successfully debug their s
oftware using tools available through organizations that provide
parallel programming libraries.

In .NET, Visual Studio 2010 allows the user to run the
Performance Profiler. The Performance Profiler outputs visual representations of concurrency

CPU usage, and other useful information that allows the developer to implement more
productive software.

The number of benefits is endless; however

it is important to address the most important. Later
in this discussion it will be easy to pick out nume
rous advantages of why one should understand
parallel programming.

Parallel Programming in .NET Defined

Although it is easy to see the benefits of parallel programming from the reading above, there is a
whole new aspect to software design patterns and p
ractices and when they should be applied.
This discussion will cover how
define data parallelism versus task parallelism

; then extend into
the programming techniques

used for each type, as illustrated in
Figure 1.


Figure 1: Parallel programming
design patte
rns for each type of parallelism


Identifying Data Parallelism and Task Parallelism

Data Parallelism

Data parallelism is the pro
cess of making
changes to

many data types in a set simultaneously

tionally, data parallelism can be understood as performing the same operation to a set of
data types at the same time
. For instance, if an array of 50 strings needed to be reversed, it is
reasonable to use parallel programming techniques to call the “
stringName.Reverse()” function
because all of the data is independent of each other. On the opposite side of the spectrum it
would be useless to try and concatenate pairs of strings
in the array because it eliminates the
dependencies of the data set.

Task Parallelism

Task parallelism is the process of executing a number of tasks concurrently
. Since tasks run
on separate threads, we can use task parallelism to
complete operations in parallel, then later

the tasks to work together


using “fo

achieve an

. In
task parallelism is much more difficult to design for due to the extensive scheduling that needs to
take place when executing tasks in parallel; however, it can be done.


Design Patterns

The most
important part of choosing the correc
t design to use when implementing

a parallel

is taking into effect the potential parallelism. Potential parallelism is the notion of
correctly identifying when it is acceptable to use parallel
programming techn
iques so the
software runs faster

when hardware is readily available
The design patterns covered in this
discussion are parallel loops, parallel tasks, parallel aggregation, futures,

Parallel Loops

The importance of the Parallel
Loops pattern is to make absolutely certain that the data set that is
being operated on is independent of each other, in other words the steps of a loop must not
change shared variables of a data element. Add
itionally, a developer should identify the prob
at hand, or opportunities, before implementing parallel loops.
The two type

of parallel loops are
parallel for
loop and the parallel for
each loop.

Due to the simplicity of converting a for
loop into a parallel for

a common
misunderstanding is

that they perform the same. The only guarantee parallel loops have is that
all data elements will be changed by the end of loop, meaning loops that have loop body
dependency will fail. The most prevalent case of this is when a developer tries to create
a loop
sums up the total value of an array. The best way to identify if there is a loop body dependency
is if a variable is declared outside of the scope of the loop.

Lastly, it is also safe to assume loops
that have a step size other than one are data d


A benefit of using parallel loops is the usefulness of the exception handling that comes with
them. Exception handling that is used in sequential code practices can be used the same way in
parallel programming with one exception. When an
exception is thrown, it will then become a
part of a set of exceptions; and this set of exceptions is of type “System.AggregateException.”
Within this set of exceptions is it easy to see

what exceptions

have occurred
for loop

itionally, it
will provide the developer with what operation was being executed during the
time of the exception.



As an added bonus, the parallel loops also come with a safety harness. Since para
llel loops are
partition focused

meaning they spli
t up iter
ations among cores

they still communicate at the
system level. For instance, if a parallel loop were to throw an exception in the first iteration of
50, the TPL will halt all other iterations on each core before the software becomes overwhelmed
with excep
tion handling. This is particularly good for loops with a large range to iterate through.


The added advantage of using the Parallel Loops design pattern is having the ability to customize
the performance of the loops. Microsoft has now made it e
asy enough to change a few fields to
increase or decrease performance of our loops. This would be useful is making products more
valuable assuming that the customer has the hardware to support the implemented parallelism.
he TPL expose

options, namely

of the parallel loops
, like

, and
, which all
allow the

e developer to determine how many cores the software is run on

. Making use of
these options would allow the
business to throttle speeds of the software, ultimately increasing
profits so long as the customer is willing to pay.

One last common problem with parallel loops is oversubscription and undersubscription.
Oversubscription occurs when there are too many t
hreads for the amount of logical cores

and the
tasks on the threads take an l
onger than normal time to run
Simply put, if there are eight
threads created with only four cores available, the cores have more threads subscribed
to them
than can take car
e of. On the other hand, undersubscription is when cores aren’t being us
when they are free to work
. So, if there are four threads created with four cores available,
and the develop

sets the

to two, that would mean

two core
s are doing
all of the work when it could be split up evenly

among the four cores
. In short, the optimum

of threads for a parallel loop

is equal to the number of logical cores divided by the
average fraction of core utilization per task.

Figure 2
represents the calculation of a processor
with four cores where each task uses 10% of a single core’s resource
, where


is the optimum
number of
threads each core should run.



Figure 2: Calculation to find optimum number of threads per


Parallel Aggregation

Parallel Aggregation design patterns are somewhat similar to Parallel Loops. Parallel

or the Map/Reduce Pattern,

is specifically for the “computing sum” example
discussed in Parallel Loops. The only difference is that in Parallel Aggregation the computed
sum will be data elements that are using unshared, local variables
. In short, Parallel
Aggregation uses the

input from multiple elements and combines them for a single output

An example that uses Parallel Aggregation

would be: a developer is given an amount N, number
of arrays containing all similar data elements. The developer then must

accumulate the

and add all of the subtotals together, resulting in one final total. Since there are multiple inputs
and one final output, it is obvious we will want to use the Parallel Aggregation pattern
. This
can be done by using the parallel for

or parallel for
each loop with the added PLINQ merge
command will allow the developer to get the proper result.

Parallel Aggregation is most effective when using PLINQ. Although the
details about the syntax
of PLINQ are

out of the scope of this discuss
ion, it is a helpful library to learn in order to cut
costs while parallel programming.

Parallel Tasks

The Parallel Task design pattern has made asynchronous programming much easier due to the
intricate design of the

n. Now, in .NET is it easy to create

new threads to complete tasks using the
, and this allows the software developers to
create asynchronous code. U

these resources allows Windows

to use

it’s built in

to automatically hand
le threads on different cores, thus in
creasing the speed of

It is helpful to remember what a task is. A task is a single operation that can run asynchronously,
on a different thread, without any noticeable changes happening in the software th
at creates i
With that being said, the Parallel Task pattern can now be applied to an example.

A situation where the Parallel Task pattern should be used would be an application where
multiple operations should run concurrently. A prime example
would be implementing


that trends
that is being read asynchronously with an external
I/O card. Assuming the chart is tracking multiple variables (frequency, voltage, current, etc.) and
has multiple axes for each
variable, it should update all the informati
on as fast as it can, at the
same time for each variable. To do this, it would be required to use the

for each task that needs to be started. In this case, the Tasks would be to

for each variable that is going to be trended. Calling the

function for each Task in a while
loop followed by the


then create
a new thread for each Task, execute the communication

to retrieve each value

then the software would wait for all of the Tasks to complete. When that is done, the values that
result in the completed Tasks can be used to update the charts and the while loop can be executed

the Pa
rallel Loops design pattern, the TPL handles exceptions for the Task class the same
way. The added feature is that the developer can now use the

of the

exception set to determine where specifically in the code the excepti
on is
occurring. This is important because the
holds key inf
ormation like which thread
had thrown

the exception along with
any exceptions that occurred with the function calls inside
the Task.


Futures is a design pattern that can

be compared with household activities. For example, “while
brushing teeth, put slippers on and let the dog out.”
A Future software design

is based on how
the developer forks the flow of the control in a program
. The fork in the previous example
ld be at the brushing teeth, and it would fork into two other tasks, putting slippers on and
letting the dog out. In the end, it will result in one overall output, the dog will be let out, teeth
will be brushed, and the slippers will be put on.

To bett
er understand the analogy, the tasks in the Futures pattern are also described as
continuation tasks
name, Futures, means that a task can be started on a separate thread
while the software continues to run. Then, another function can use the Fut
ure result as a
parameter. I
f the task has not completed nor

returned the passed

literal future

result, the function
will wait for the task to

otherwise it will immediately begin if the task has already
returned the result.


Figure 3 helps to id
entify when a Future design could be ported from sequential code. When
there is code that depends on the previous result of a task, then implementing a Future design
would be a great idea. In Figure 3, it is noticeable that variable
cannot be complete

is computed.

Figure 3: Sequential code.

Figure 4 is the parallel version of Figure 3. The task, named
, calculates in parallel with c
and d, and the result can later be used to calculate
. The advantage of using the

is to prevent the user from having to poll to check if the task is done running. If

finished, the TPL knows to automatically wait
for the result before

tries to be calculated. This
demonstrates the powerful potential of TP
L because the .NET framework handles these cases
automatically, whereas years ago, everything would have to be done by the developer.

Figure 4: Parallel code implementation of a Future design code snippet.


The pipeline design is used when t
here is a specific process of tasks that will be completed in
order every time. For instance, consider the process of preparing a bowl of cereal. One
possibility of a process would be: get bowl, open cereal container, pour cereal into bowl, open
milk, an
d pour milk into cereal. Following this process every time will create the same expected
outcome, and because of this the software can be designed using the pipeline design.

The pipeline design will usually use
the collection class called
BlockingCollection. With this
type, the developer has the ability to limit the capacity of items, or tasks, in the collection and
can has a degree of control with
how fast tasks are processing. Since the

is derived from .NET’
s Concurrency object, they can automatically release and accept
tasks that want to be removed or added.

Applying Figure 5 to a structure similar to what was mentioned above; helps visualize the power
of usefulness of BlockingCollection. BlockingCollectio
ns can be thought of as queues. When
an object is added to the collection, the collection can run the task for an object automatically,
and when it is done, it will release it to the next blocking collection. What this does it eliminates
the need to poll

the thread to see if it is done with the task. This significantly increases
efficiency of a core, ultimately creating faster response times.


Figure 5: BlockingCollections acting as buffers for a pipeline design



With a swift Google search of
“Performance Profiler” will result in a list of 3

party profilers
available to multiple operating systems, however the best profiler for the .NET developer is the
native tool built into Visual Studio 2010.

Visual Studi
o 20

Visual Studio 2010 Ultimate and Premium include the tools to analyze .NET software written
both sequentially and in parallel. This tool is called the Performance Profiler. Here a profile can
be created to
see thread contentions during runtime of soft
ware. The output would dis
play a
chart as seen in Figure 6


igure 6
Output from the thread contention tool

in VS2010


Using a tool like this allows the developer to
look at each thread created by the software and by
using the x
axis as a time re
ference, sees if any threads are in contention. Furthermore, the
developer can use the zoom/pan options to fit a given window and determine how long threads
are in contention. Also the tool will specifically say which and how many threads are in

This tool is important for debugging purposes because it gets down to the nitty
gritty of the
software. With this tool, the developer can not only see thread contentions, but also tell which
pieces of code

within the thread are causing the problems. This is a significant factor in
determine if the developer is correctly using parallel programming techniques.

Supportive Libraries

Although they are not required, Microsoft has introduced libraries that
have made parallel
ramming in .NET more easy to do and faster to write. In much more detail, the Rx and
PLINQ libraries will be briefed to gain conceptual understanding.


Parallel Language
Integrated Query, or PLINQ,

is an integrated query lan
guage that was built
with the same intentions as LINQ which was introduced with .NET Framework 3.5. The use of
Parallel LINQ makes it easy for the software developer to retrieve a collection of objects with the
similar s

used in database queries. Al
though it is not required to use PLINQ to implement a
powerful parallel program, it does have advantages


The most advantageous aspect of
PLINQ is the fact a developer can write a query to retrieve custom objects.


h over 200 different extensions


PLINQ, using the correct syntax, at the right time, can
make or break the speed of the software

. For example, it would be more efficient to write a
loop for a small set of known data than to write a query statement, it simply goes faster.
e is no set way to calculate if your query is faster than your loop without physically timing
it, however PLINQ will also make use of multiple cores when executing the query. In short, it is
up to the discretion of the developer to make correct use of PLI


Reactive E
, or Rx,

is a highly supportive library used by some parallel programmers.
Introduced in .NET Framework 3.5 SP1, it provides additional LINQ to Object queries available
to the developer
. The main necessity of Rx in the parallel programming world is not only to
make use of
additional queries, but also implement the Push or Pull model in parallel programs.
For example, Rx could be implemented to push the results of one task, Task A, to a
nother task,
Task B. An example of the pull method would be: Task B would implement a subscriber that
would watch for data to be ready in Task A, and pull it from Task A when it has a valid result. To
remind the reader, Reactive Extensions would be highl
y valuable when implementing the Futures
parallel design pattern.


Parallel computing is has been around for decades, and not until recent has it been made easy to
implement parallel software. With the most recent release of .NET Framework 4,

Microsoft has
eased the pressure put on software developers to take into account the way their software

the hardware. Due to the research put into parallel programming in .NET,
software engineers
can now implem
ent some design patterns like, Paral
lel L
, Parallel Tasks, and F

successfully develop a software application. From old parallel libraries to new, it is always
important to remember that timely response is a golden rule of software design; and parallel
programming is now in the e
ye of the developer.




Campbell, Colin, et al.
Parallel Programming with Microsoft .NET: Design Patterns for
Decomposition and Coordination on Multicore Architectures.

2010. Print.

Computer cluster

Wikimedia Foundation, Inc. Retrieved October 31, 2012, from


Data Paralle

(n.d.). FL: Wikimedia Foundation, Inc. Retrieved October 31, 2012,



Hillar, Gaston C.
Professional Parallel Programming with C#: Master Parallel Extensions
with .NET 4.

Indiana: Wiley. 2011. Print.


J. A
lbahari and B. Albahari.
C# 4 in a Nutshell.
O’Reilly, fourth edition, 2010. Print.


THE MANYCORE SHIFT: Microsoft Parallel Computing In
itiative Ushers
Computing into
the Next Era

(2007, November). Retrieved October 31, 2012, from




In M
October 31, 2012, from


Skeet, Jon.
C# In Depth
. Con
necticut: Manning. 2011. Print.


T. G. Mattson, B. A. Sanders, and B. L. Massingill.
Patterns for Paralle
l Programming
Wesley, 2004. Print


Toub, Stephan
(2010, July 10).
Patterns of Parallel Programming CSharp
Patterns of
Parallel Programming: Understanding and Applying Parallel Patterns with the .NET
Framework 4 and Visual C#

Retrieved Octo
ber 30, 2012, from