Parallel Programming in .NET

perchorangeΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

108 εμφανίσεις




Parallel Programming in .NET



Kevin Luty

Department of Software Engineering

University of Wisconsin
-
Platteville

E
-
mail:
lutyk@uwplatt.edu



Abstract



As hardware trends change from increas
ing

clock speeds of processors to fitting multiple central
processing units, or cor
es, into one processor,

developers will now have to take into effect the
efficiency of their code. Software engineers will now have to understand the design patterns and
tools

available to them in order to create fast, responsive applications to continuously appease
customers on the widely accepted business platform, Microsoft Windows. The most recent

.NET
Framework
4
released by Microsoft has made parallel programming easier
to design and code
than ever before
. To accomplish this, the software engineer will have to understand the
difference between data parallelism versus task parallelism, and when to apply the appropriate
software design to the given probl
ems. This discussi
on will define parallelism, differentiate

pa
rallel loops versus parallel tasks, discuss how and when to use the appropriate design patterns
,
and briefly describe supporting frameworks
and tools
for parallel programming.





History



History of Parallel
Hardware


In the late

1960
’s

and 1970’s
, computer scientists made it possible with the help of hardware

architecture

to implement p
arallel computing into supercomputers. During the 1980’s,
continuous development allowed scientists t
o build a supercomputer

using 64

Intel 8086/8087
microprocessors. This proved that when using mass market chipsets
, or massively parallel
processors (MPPs)
, extreme performance was able to be obtained, thus research and
development efforts woul
d continue.
[6]


In the 1980’
s, cl
usters came about to replace applications built using MPPs. Clusters are
essentially a parallel

computing machine that is

connected to a network
using many off
-
the
-
shelf
co
mputers
. Modern day clusters are now th
e dominating architecture of data centers a
round the
world.

[2]


As clock speeds have increased due to the decrease in size of transistors, the amount of cores in a
processor is now the main focus in developing today’s processor. The reason for this
shift

of
2

focus

is because the efficiency benefit
s gained in creating multi
-
core processors outweighs the
costs of increasing clock
speeds of the processors.
So, because of increasing restrictions and
standards on energy efficiency with electronic devices, developing multi
-
core processors will
become th
e primary focus.

[1]



History of Parallel Software


In the early ages of parallel computing, many computers were shipped with a single core, single
processor
. Because of this, sequential programs have been easy to write versus parallel programs,
as they still are today. Now that the
amount of cores in processors has

been increasing
,

it is the
job of the
software architects to
understand

the changes in the env
ironment and adapt to them.


As the amount of cores in a processor started to increase quickly,

the lack in number of API
s

a
programming interface that allows the developer to
use already developed
, in t
his case, parallel
programming code

that supported
parallel programming made it hard for developers to create
parallel applications
.


The reason for this being that the industry did not create stand
ards

for the
parallel architectures

[6]
. In the 1990’s standards began to emerge for concurrent programming;

by the year 2000, Message Passing Interface (MPI), POSIX threads (pthreads), and Open
Multiprocessing (OpenMP), have all been created to help software developers become successful
parallel program developers.
[1]


Most recently, the newest libraries to th
e scene of Microsoft Windows parallel progra
mming has
been the release of
.NET Framework

4
. Included in this release were

the Parallel Pattern Library

(PPL)
,

Task Parallel Library

(TPL)

and Parallel Language
-
Integrated Query (PLINQ)
, which are

mostly comm
only used with C++
and

C#. These libraries have made it easier to implement
design patterns, discussed later, in practice.



Benefits of Understanding Parallel Programming


As multi
-
core devices become more prominent in computing devices, the understanding of
parallel programming allows a software developer to become a much more powerful and needed
resource in the work place.


The main benefit of using parallel programming
is to make efficient use of the cores in a
processor. As many software developers still write sequential programs

because they have no
knowledge of
parallel programming

the
y are not making use of the other cores. For instance, if
a program is to initiali
ze an array of size 1,000,000, the software will loop through said amount
of times. Using the TPL, a developer can use a parallel loop that will automatically create
multiple tasks, on separate threads, which divides the initialization process evenly amon
g cores.
Dividing the work among the cores means that there will be a significant decrease in the amount
of time it takes to complete the task. This however, is a simple application of parallel
programming
. Later discussion will inform on

when to apply
parallel programming techniques.


3

In addition, parallel programming in .NET was written to automatically handle the hardware
capabilities. This means that if there is only one core, or many cores, parallel program
m
ing
in .NET will handle all the situation
s and will make use of the hardware when it is readily
available. In other words, if a parallel program is run on a processor with one core, it will run
the same as a sequential program
[1]
. Since .NET handles this, it takes a great amount of
pressure of
f of the developer, allowing him or her to focus strictly on the software instead of
worrying about how the hardware handles the code.


Another benefit to understanding parallel programming in will allow the software developer to
successfully debug their s
oftware using tools available through organizations that provide
parallel programming libraries.

In .NET, Visual Studio 2010 allows the user to run the
Performance Profiler. The Performance Profiler outputs visual representations of concurrency
problems,

CPU usage, and other useful information that allows the developer to implement more
productive software.


The number of benefits is endless; however
,

it is important to address the most important. Later
in this discussion it will be easy to pick out nume
rous advantages of why one should understand
parallel programming.



Parallel Programming in .NET Defined


Although it is easy to see the benefits of parallel programming from the reading above, there is a
whole new aspect to software design patterns and p
ractices and when they should be applied.
This discussion will cover how
to
define data parallelism versus task parallelism

design
practices
; then extend into
the programming techniques

used for each type, as illustrated in
Figure 1.


4


Figure 1: Parallel programming
design patte
rns for each type of parallelism

[1].



Identifying Data Parallelism and Task Parallelism



Data Parallelism


Data parallelism is the pro
cess of making
changes to

many data types in a set simultaneously

[3]
.
Addi
tionally, data parallelism can be understood as performing the same operation to a set of
data types at the same time
[4]
. For instance, if an array of 50 strings needed to be reversed, it is
reasonable to use parallel programming techniques to call the “
stringName.Reverse()” function
because all of the data is independent of each other. On the opposite side of the spectrum it
would be useless to try and concatenate pairs of strings
in the array because it eliminates the
dependencies of the data set.



Task Parallelism


Task parallelism is the process of executing a number of tasks concurrently
[4]
. Since tasks run
on separate threads, we can use task parallelism to
complete operations in parallel, then later

link
the tasks to work together

by

using “fo
rk/join”

to
achieve an
expected
outcome

[1]
. In
general,
task parallelism is much more difficult to design for due to the extensive scheduling that needs to
take place when executing tasks in parallel; however, it can be done.



5

Design Patterns


The most
important part of choosing the correc
t design to use when implementing

a parallel
program

is taking into effect the potential parallelism. Potential parallelism is the notion of
correctly identifying when it is acceptable to use parallel
programming techn
iques so the
software runs faster

when hardware is readily available
[1]
.
The design patterns covered in this
discussion are parallel loops, parallel tasks, parallel aggregation, futures,
and
parallelism
.




Parallel Loops


The importance of the Parallel
Loops pattern is to make absolutely certain that the data set that is
being operated on is independent of each other, in other words the steps of a loop must not
change shared variables of a data element. Add
itionally, a developer should identify the prob
lem
at hand, or opportunities, before implementing parallel loops.
The two type
s

of parallel loops are
parallel for
-
loop and the parallel for
-
each loop.


Due to the simplicity of converting a for
-
loop into a parallel for
-
loop
,

a common
misunderstanding is

that they perform the same. The only guarantee parallel loops have is that
all data elements will be changed by the end of loop, meaning loops that have loop body
dependency will fail. The most prevalent case of this is when a developer tries to create
a loop
sums up the total value of an array. The best way to identify if there is a loop body dependency
is if a variable is declared outside of the scope of the loop.

Lastly, it is also safe to assume loops
that have a step size other than one are data d
ependent.

[1]


A benefit of using parallel loops is the usefulness of the exception handling that comes with
them. Exception handling that is used in sequential code practices can be used the same way in
parallel programming with one exception. When an
exception is thrown, it will then become a
part of a set of exceptions; and this set of exceptions is of type “System.AggregateException.”
Within this set of exceptions is it easy to see

what exceptions

have occurred
for loop

iteration.
Add
itionally, it
will provide the developer with what operation was being executed during the
time of the exception.
[1]

[4]

[10]



As an added bonus, the parallel loops also come with a safety harness. Since para
llel loops are
partition focused

meaning they spli
t up iter
ations among cores

they still communicate at the
system level. For instance, if a parallel loop were to throw an exception in the first iteration of
50, the TPL will halt all other iterations on each core before the software becomes overwhelmed
with excep
tion handling. This is particularly good for loops with a large range to iterate through.
[1]

[4]


The added advantage of using the Parallel Loops design pattern is having the ability to customize
the performance of the loops. Microsoft has now made it e
asy enough to change a few fields to
increase or decrease performance of our loops. This would be useful is making products more
valuable assuming that the customer has the hardware to support the implemented parallelism.
T
he TPL expose

options, namely
ParallelOptions
,

of the parallel loops
, like
MaxDegreeOfParallelism
,

MinDegreeOfParallelism
, and
SetMaxThreads
, which all
allow the
6

softwar
e developer to determine how many cores the software is run on
[1]

[4]
. Making use of
these options would allow the
business to throttle speeds of the software, ultimately increasing
profits so long as the customer is willing to pay.


One last common problem with parallel loops is oversubscription and undersubscription.
Oversubscription occurs when there are too many t
hreads for the amount of logical cores

and the
tasks on the threads take an l
onger than normal time to run
[9]
.
Simply put, if there are eight
threads created with only four cores available, the cores have more threads subscribed
to them
than can take car
e of. On the other hand, undersubscription is when cores aren’t being us
ed
when they are free to work
[9]
. So, if there are four threads created with four cores available,
and the develop
er

sets the
MaxDegreeOfParallelism

to two, that would mean

two core
s are doing
all of the work when it could be split up evenly

among the four cores
. In short, the optimum
number

of threads for a parallel loop

is equal to the number of logical cores divided by the
average fraction of core utilization per task.

Figure 2
represents the calculation of a processor
with four cores where each task uses 10% of a single core’s resource
, where

𝑃
𝑡

is the optimum
number of
threads each core should run.


𝑃
𝑡
=
4
(
1

.
1
)
=
4
.
44

Figure 2: Calculation to find optimum number of threads per

core.



Parallel Aggregation


Parallel Aggregation design patterns are somewhat similar to Parallel Loops. Parallel
Aggregation

or the Map/Reduce Pattern,

is specifically for the “computing sum” example
discussed in Parallel Loops. The only difference is that in Parallel Aggregation the computed
sum will be data elements that are using unshared, local variables
[1]
. In short, Parallel
Aggregation uses the

input from multiple elements and combines them for a single output
[1]
.


An example that uses Parallel Aggregation

would be: a developer is given an amount N, number
of arrays containing all similar data elements. The developer then must

accumulate the

subtotal,
and add all of the subtotals together, resulting in one final total. Since there are multiple inputs
and one final output, it is obvious we will want to use the Parallel Aggregation pattern
[10]
. This
can be done by using the parallel for
-
loop

or parallel for
-
each loop with the added PLINQ merge
command will allow the developer to get the proper result.



Parallel Aggregation is most effective when using PLINQ. Although the
details about the syntax
of PLINQ are

out of the scope of this discuss
ion, it is a helpful library to learn in order to cut
costs while parallel programming.



Parallel Tasks


The Parallel Task design pattern has made asynchronous programming much easier due to the
intricate design of the
System.Threading.Tasks

implementatio
n. Now, in .NET is it easy to create
7

new threads to complete tasks using the
Task.Factory
, and this allows the software developers to
create asynchronous code. U
sing

these resources allows Windows

to use

it’s built in
Task
Scheduler

to automatically hand
le threads on different cores, thus in
creasing the speed of
programs.


It is helpful to remember what a task is. A task is a single operation that can run asynchronously,
on a different thread, without any noticeable changes happening in the software th
at creates i
t
[5]
.
With that being said, the Parallel Task pattern can now be applied to an example.


A situation where the Parallel Task pattern should be used would be an application where
multiple operations should run concurrently. A prime example
would be implementing

a

chart
that trends
real
-
time
data
collection
that is being read asynchronously with an external
hardware
I/O card. Assuming the chart is tracking multiple variables (frequency, voltage, current, etc.) and
has multiple axes for each
variable, it should update all the informati
on as fast as it can, at the
same time for each variable. To do this, it would be required to use the
Task.Factory.StartNew()

for each task that needs to be started. In this case, the Tasks would be to
ReadFreq
uency,
ReadVoltage,
and
ReadCurrent

for each variable that is going to be trended. Calling the
StartNew

function for each Task in a while
-
loop followed by the
WaitAll()

call
would

then create
a new thread for each Task, execute the communication
s

to retrieve each value

asynchronously
,
then the software would wait for all of the Tasks to complete. When that is done, the values that
result in the completed Tasks can be used to update the charts and the while loop can be executed
again.


Like
the Pa
rallel Loops design pattern, the TPL handles exceptions for the Task class the same
way. The added feature is that the developer can now use the
InnerException

of the
AggregateException

exception set to determine where specifically in the code the excepti
on is
occurring. This is important because the
InnerException
holds key inf
ormation like which thread
had thrown

the exception along with
any exceptions that occurred with the function calls inside
the Task.



Futures


Futures is a design pattern that can

be compared with household activities. For example, “while
brushing teeth, put slippers on and let the dog out.”
A Future software design

is based on how
the developer forks the flow of the control in a program
[1]
. The fork in the previous example
wou
ld be at the brushing teeth, and it would fork into two other tasks, putting slippers on and
letting the dog out. In the end, it will result in one overall output, the dog will be let out, teeth
will be brushed, and the slippers will be put on.


To bett
er understand the analogy, the tasks in the Futures pattern are also described as
continuation tasks
[9]
.
The
name, Futures, means that a task can be started on a separate thread
while the software continues to run. Then, another function can use the Fut
ure result as a
parameter. I
f the task has not completed nor

returned the passed

literal future

result, the function
will wait for the task to
finish;

otherwise it will immediately begin if the task has already
returned the result.

8


Figure 3 helps to id
entify when a Future design could be ported from sequential code. When
there is code that depends on the previous result of a task, then implementing a Future design
would be a great idea. In Figure 3, it is noticeable that variable
f
cannot be complete
until
variables
b
and
d

is computed.



Figure 3: Sequential code.


Figure 4 is the parallel version of Figure 3. The task, named
futureB
, calculates in parallel with c
and d, and the result can later be used to calculate
f
. The advantage of using the
Result

property
is to prevent the user from having to poll to check if the task is done running. If
futureB

isn’t
finished, the TPL knows to automatically wait
for the result before
f

tries to be calculated. This
demonstrates the powerful potential of TP
L because the .NET framework handles these cases
automatically, whereas years ago, everything would have to be done by the developer.



Figure 4: Parallel code implementation of a Future design code snippet.



Pipelines


The pipeline design is used when t
here is a specific process of tasks that will be completed in
order every time. For instance, consider the process of preparing a bowl of cereal. One
possibility of a process would be: get bowl, open cereal container, pour cereal into bowl, open
milk, an
d pour milk into cereal. Following this process every time will create the same expected
outcome, and because of this the software can be designed using the pipeline design.


The pipeline design will usually use
the collection class called
BlockingCollection. With this
type, the developer has the ability to limit the capacity of items, or tasks, in the collection and
can has a degree of control with
how fast tasks are processing. Since the
BlockingCollection
container

is derived from .NET’
s Concurrency object, they can automatically release and accept
tasks that want to be removed or added.


Applying Figure 5 to a structure similar to what was mentioned above; helps visualize the power
of usefulness of BlockingCollection. BlockingCollectio
ns can be thought of as queues. When
an object is added to the collection, the collection can run the task for an object automatically,
and when it is done, it will release it to the next blocking collection. What this does it eliminates
the need to poll

the thread to see if it is done with the task. This significantly increases
efficiency of a core, ultimately creating faster response times.

9



Figure 5: BlockingCollections acting as buffers for a pipeline design

[1]
.



Tools


With a swift Google search of
“Performance Profiler” will result in a list of 3
rd

party profilers
available to multiple operating systems, however the best profiler for the .NET developer is the
native tool built into Visual Studio 2010.



Visual Studi
o 20
10


Visual Studio 2010 Ultimate and Premium include the tools to analyze .NET software written
both sequentially and in parallel. This tool is called the Performance Profiler. Here a profile can
be created to
see thread contentions during runtime of soft
ware. The output would dis
play a
chart as seen in Figure 6
.


10


F
igure 6
:
Output from the thread contention tool

in VS2010

[1]
.


Using a tool like this allows the developer to
look at each thread created by the software and by
using the x
-
axis as a time re
ference, sees if any threads are in contention. Furthermore, the
developer can use the zoom/pan options to fit a given window and determine how long threads
are in contention. Also the tool will specifically say which and how many threads are in
contenti
on.


This tool is important for debugging purposes because it gets down to the nitty
-
gritty of the
software. With this tool, the developer can not only see thread contentions, but also tell which
specific
pieces of code

within the thread are causing the problems. This is a significant factor in
determine if the developer is correctly using parallel programming techniques.



Supportive Libraries


Although they are not required, Microsoft has introduced libraries that
have made parallel
prog
ramming in .NET more easy to do and faster to write. In much more detail, the Rx and
PLINQ libraries will be briefed to gain conceptual understanding.



PLINQ


Parallel Language
-
Integrated Query, or PLINQ,

is an integrated query lan
guage that was built
with the same intentions as LINQ which was introduced with .NET Framework 3.5. The use of
Parallel LINQ makes it easy for the software developer to retrieve a collection of objects with the
similar s
yntax

used in database queries. Al
though it is not required to use PLINQ to implement a
powerful parallel program, it does have advantages

[8]
.

The most advantageous aspect of
PLINQ is the fact a developer can write a query to retrieve custom objects.

11


Wit
h over 200 different extensions

to

PLINQ, using the correct syntax, at the right time, can
make or break the speed of the software

[5]
. For example, it would be more efficient to write a
for
-
loop for a small set of known data than to write a query statement, it simply goes faster.
Ther
e is no set way to calculate if your query is faster than your loop without physically timing
it, however PLINQ will also make use of multiple cores when executing the query. In short, it is
up to the discretion of the developer to make correct use of PLI
NQ.



Rx


Reactive E
xtensions
, or Rx,

is a highly supportive library used by some parallel programmers.
Introduced in .NET Framework 3.5 SP1, it provides additional LINQ to Object queries available
to the developer
[7]
. The main necessity of Rx in the parallel programming world is not only to
make use of
additional queries, but also implement the Push or Pull model in parallel programs.
For example, Rx could be implemented to push the results of one task, Task A, to a
nother task,
Task B. An example of the pull method would be: Task B would implement a subscriber that
would watch for data to be ready in Task A, and pull it from Task A when it has a valid result. To
remind the reader, Reactive Extensions would be highl
y valuable when implementing the Futures
parallel design pattern.



Conclusion


Parallel computing is has been around for decades, and not until recent has it been made easy to
implement parallel software. With the most recent release of .NET Framework 4,

Microsoft has
eased the pressure put on software developers to take into account the way their software
affects

the hardware. Due to the research put into parallel programming in .NET,
software engineers
can now implem
ent some design patterns like, Paral
lel L
oop
, Parallel Tasks, and F
utures

to
successfully develop a software application. From old parallel libraries to new, it is always
important to remember that timely response is a golden rule of software design; and parallel
programming is now in the e
ye of the developer.















12

References


[1]

Campbell, Colin, et al.
Parallel Programming with Microsoft .NET: Design Patterns for
Decomposition and Coordination on Multicore Architectures.
Microsoft.

2010. Print.

[2]
Computer cluster

(n.d.).
FL:
Wikimedia Foundation, Inc. Retrieved October 31, 2012, from


http://en.wikipedia.org/wiki/Computer_cluster

[3]

Data Paralle
l
ism

(n.d.). FL: Wikimedia Foundation, Inc. Retrieved October 31, 2012,

from
http://en.wikipedia.org/wiki/Data_parallelism

[4]

Hillar, Gaston C.
Professional Parallel Programming with C#: Master Parallel Extensions
with .NET 4.

Indiana: Wiley. 2011. Print.

[5]

J. A
lbahari and B. Albahari.
C# 4 in a Nutshell.
O’Reilly, fourth edition, 2010. Print.

[6]

MSDN.
THE MANYCORE SHIFT: Microsoft Parallel Computing In
itiative Ushers
Computing into
the Next Era
.

(2007, November). Retrieved October 31, 2012, from


http://www.intel.com/pressroom/kits/upcrc/ParallelComputing_backgrounder.pdf

[7]

Rx

Extensions

(n.d.)
.
In M
SDN
.
Retrieved
October 31, 2012, from

http://msdn.microsoft.com/en
-
us/data/gg577609.aspx

[8]

Skeet, Jon.
C# In Depth
. Con
necticut: Manning. 2011. Print.

[9]

T. G. Mattson, B. A. Sanders, and B. L. Massingill.
Patterns for Paralle
l Programming
.
Addison
-
Wesley, 2004. Print


[10]

Toub, Stephan
.
(2010, July 10).
Patterns of Parallel Programming CSharp
.
Patterns of
Parallel Programming: Understanding and Applying Parallel Patterns with the .NET
Framework 4 and Visual C#
.

Retrieved Octo
ber 30, 2012, from

http://www.microsoft.com/en
-
us/download/details.aspx?id=19222