Scalability of DryadLINQ
2001 E.lingelbach bane
Apt.# 440 Bloomington, IN
DryadLINQ is a system, which enables a brand new
programming model for large
computing by providing a set of language extensions.
This paper describes the
implementation of the DryadLINQ and it evaluates
DryadLINQ on several programs implemented by
DryadLINQ, that is,
DNA sequence assembly
], High Energy Physics data
a parallel seed
mapping application and compare their
the same applications implemented by Hadoop
Categories and Subject
Describe the system ar
DryadLINQ, and show the scalability of
DryadLINQ by comparing program implemented
by DryadLINQ with those implemented by
Hadoop and CGL
Scalability DryadLINQ Hadoop
of DryadLINQ is to provide
way to write applications that run on
a large amount
computers to process
with much ease
to a wide arrange
DryadLINQ can compile imperative
transparently and automatically in
language into distributed
scale to large computing clusters.
As what the old
fashioned parallel databases and,
what’s more, those recent cloud
ch as CGL
] have been shown, even with limited
financial budget, it is also possible to implement high
performance execution engines
Section 2 presents
the DryadLINQ’s architecture on
environment, and a brief introduction to Hadoop.
Section 3 evaluates the scalability of DryadLINQ
implementation by comparing the performance with
Hadoop and CGL
MapReduce. Section 4 presents the
related work to this research, and section 5 presents
DryadLINQ and Hadoop
he DryadLINQ system is designed to materialize the
array of developers could
large amounts of data
effectively and suffer little
performance loss in scaling up with modest effort.
by DrtadLINQ, and it would then
running on the Dryad cluster
A Dryad job is a directed acyclic graph where each
edge represent a data channel
The execution of a Dryad job is
ized “job manager.” The job manager is
’s dataflow graph;
processes on cluster computers;
the job and collecting
Sixth, Dynamically t
the job graph
the rules defined by users
Figure 1 shows how
Dryad system architecture
The duty of t
he job manager is creating
vertices (V) on proper computers with the
of a remote
execution and monitoring daemon (PD).
bership. Vertices exchange data through
TCP pipes, or
he vertices in
the job that are currently running and
respondence with the job execution graph
the grey shape has shown
ryadLINQ Execution Overview
The DryadLINQ system
of two main
components: a runtime
and a parallel compiler
former one provides an implementation of the
DryadLINQ operators and the latter one
DryadLINQ programs to distributed execution plans
illustrates the Dryad system architecture.
In Steps 1
uting a .NET user application
deferred execution, expressions are
accumulated in a DryadLINQ expression object, the
application invoking a method that materializes the
triggering their actual execution
takes over and compiles
the LINQ expression into a
hich can be
Dryad (Step 3). This step performs
f the compiling
the expression into sub expressions
each to be run i
n a separate Dryad vertex;
code and s
tatic data for the remote
Dryad vertices; and
serialization code for the required data
At Step 4,
a custom, DryadL
specific, Dryad job
at Step 5.
By using the plan
created in Step 3, i
tes the job graph, and
creates the vertices when
available. Each Dryad vertex
program, which specific for it created in Step 3 by
the Dryad job completes
returns the control back
at Step 8 and
the data to the output table(s)
Hadoop and HDFS
When comparing to Google’s MapReduce runtime,
Apache Hadoop has a similar architecture to
data via HDFS,
which maps all the local disks of the compute nodes
to a single file system hierarchy
, and the file system
the data to be disp
ersed to all the nodes.
order to improve the overall I/O bandwidth,
Hadoop takes the data locality into
the MapReduce computation tasks.
The outputs of the map tasks
would be accessed by
the reduce tasks via HTTP connections, before this,
they would be stored in local disks.
Although this approach
chanism in Hadoop,
intermediate data transformation, especially for the
cases that executing the applications, which
intermediate results frequently.
Table 1. Hardware Configuration
shows the flow of execution when a
program is executed by DryadLINQ.
Figure 3. Comparison of features supported by
Dryad and Hadoop of heterogeneous compute
Xeon (R )
Xeon (R )
CAP3 is a DNA sequence assembly program that
performs several major assembly steps to a given set
of gene sequences.
output + Other output files
As what have been shown above, the program reads
gene sequences, each of which needs to be processed
by the CAP3 program separately, from an input file
and writes its output to several output files and to the
application only needs
to know the input
file names and their locations for it executes the CAP3
executable as an external program.
node of the cluster stores roughly the same
number of input data files by dividing the input data
containing the names of the original data files
available in that node;
partitions stored in each node
are pointed to the late
created Dryad partitioned
After what has been
shown above, a DryadLINQ
program would be created to read the data file
names from the provided partitioned
execute the CAP3 program.
However, a suboptimal CPU core
is highly unlikely for CAP3
A trace of job schedu
ling in the HPC cluster
revealed that the utilization of CPU cores of the
scheduling of individual CAP3
given node is not optimal.
The reason why the utilization of CPU cores is not
optimal is that when an application is scheduled,
Q uses the number of data partitions as a
to schedules the
number of vertices to to
the nodes rather than individual CPU cores
that the underlying PLINQ
runtime would handle the further parallelism
available at each vertex and utilize all the CPU
cores by chunking the input data.
input for DraydLINQ is only the
names of the original data files, it has no way to
mine how much time the
process a file, and hence the chunking of records at
PLINQ would not lead to optimizing of the
schedule of tasks.
Figure 4 and 5
show comparisons of performance
all three runtimes for the
DryadLINQ does not schedule multiple concurrent
vertices to a given node, but one vertex at a time.
vertex, which uses PLINQ to schedule
homogeneous parallel tasks,
would have a
running time equal to the
takes the longest
time to complete
In contrast, we can set the maximum and minimum
number of map and reduce tasks to execute
concurrently on a given node in Hadoop, so that it
utilize all the CPU cores.
The performance and the scalability graphs
that the DryadLINQ application
MapReduce versions of the CAP3 application
work almost equally well for the CAP3 program.
. Performance of different
implementations of CAP3 application.
HEP is short for
collection of large number of binary files, which will
not be directly accessed by the DryadLINQ program,
so we have to
each compute node of the cluster a
division of input data manually, and
which stores only the
The first step of the analysis requires
function coded in ROOT script to all the
operation allows a function to be applied to
an entire data set, and produce multiple output
values ,so in each vertex the program
can access a data
partition available in that node.
method, the program
iterates over the data set and groups the input data files,
and execute the ROOT script passing these files names
along with other necessary parameters, what’s
also saves the
is, a binary file containing a
histogram of identified features of the input data, in a
predefined shared directory and produces its location
as the return value.
In the next step of the program, we perform a
operation to these partial histograms by
operation and to
those collections of histograms in a given data
partition by using another ROOT
finally to the
output partial histograms produced by the previous
step by the main
program. The last combination would
produce the final histogram of identified features.
The results of this analysis
among the performance
of three runtime implantations
are shown in
The results in Figure 6
compared to DraydLINQ and CGL
implementations, Hadoop implementation has a
remarkable overhead which is mainly due to
differences in the storage mechanisms used in these
HDFS can only be accessed using C++ or Java clients,
and the R
OOT data analysis framework is not capable
of accessing the input from HDFS.
In contrast, both Dryad and CGL
implementations’ performance are improved
significantly by the ability of reading input from the
What’s more, in the DryadLI
NQ implementation, the
intermediate partial histograms
stored in a shared
directory and are combined during the second phase as
a separate analysis.
implementation, the partial histograms are directly
transferred to the reducers where
they are saved in
local file systems and combined.
This difference can explain the performance difference
between the CGL
MapReduce implementation and the
CloudBurst is a Hadoop application that performs a
mapping algorithm to
the human genome and other reference genomes.
CloudBurst parallelizes execution by seed, that
reference and query
together and sent to a reducer for further
analysis if they sharing the same seed.
CloudBurst is composed of a
the alignments for each read with at
he best unambiguous alignment for
each read rather than the full catalog of all
An important characteristic of the application is that
the variable amount of time it spent in the reduction
phase. This characteristic can be a limiting factor to
scale, depending on the scheduling policies of the
framework running the algorithm.
In DryadLINQ, the same workflow is expressed as
DryadLINQ runs the whole computation as a whole
rather than two separate steps fo
llowed by one
The reduce function would produces one or more
alignments after receives a set of reference and
query seeds sharing the same key as input. For each
input record, query seeds are grouped in batches,
and in order to reduce the memory l
batch is sent to an alignment function sequentially.
Figure 5. Scalability of dif
implementations of CAP3
Figure 6. Performance of different
implementations of HEP d
We developed another DryadLINQ implementation
that can process each batch in parallel assigning
them as separate threads running at the same time
using .NET Parallel Extensions.
ults in Figure 7
show that all three
implementations follow a similar pattern although
DryadLINQ is not fast enough especially when nodes
number is small.
The major difference between DryadLINQ and
Hadoop implementations is that in DraydLINQ, even
PLINQ assign records to separate threads
running concurrently, the cores were not utilized
completely; conversely, in Hadoop each node starts
tasks which has number equal to one’s cores
and each task ran independently by doing a fairly
t of work.
Another difference between DryadLINQ and Hadoop
implementations is the number of partitions created
before the reduce step. Since Hadoop creates more
partitions, it balances the workload among reducers
If the PLINQ scheduler worke
d as expected, it would
keep the cores busy and thus yield a similar
balance to Hadoop.
In order to achieve this, one can try starting the
computation with more partitions aiming to schedule
multiple vertices per node. However, DryadLINQ runs
s in order, so it would wait for one vertex to
finish before scheduling the second vertex, but the first
vertex may be busy with only one record, and thus
holding the rest of the cores idle.
So the final effort to reduce this gap would be the
using of the
the idle cores, although it is not identical
level of parallelism.
Figure 6 shows the performance comparison of three
runtimes implementation with increasing data size.
Both implementations of DryadL
INQ and Hadoop
scale linearly, and the time gap is mainly related to
what has been explained
the current limitations
with PLINQ and DryadLINQ’s job scheduling polices.
Figure 6. Performance comparison of DryadLINQ
and Hadoop for CloudBurst.
of activity in archi
. One of the earliest
commercial generic platforms for distributed
computation was the Teoma Neptune platfor
which introduced a map
inspired by MPI’s Re
duce operator. The Hadoop
source port of MapReduce
slightly extended the
computation model, separated the execution layer
from storage, and virtualized the exe
Google MapReduce [7
e same architecture.
based architecture for a
generic execution layer. DryadLINQ has a richer s
operators and better lan
support than any of
At the storage layer a variety of very
simple databases have appeared, in
Google’s BigTable [9
SQL Server Data Services.
Architecturally, DryadLINQ is just an application
g on top of Dryad and gener
Dryad jobs. People can envisage
interoperate with any of these storage layers.
shows the scalability of DryadLINQ by
DryadLINQ to a series of
applications with unique requirements. The
ications range from simple map
such as CAP3 to multiple stages
of MapReduce jobs in
showed that all these
applications can be implemented using the DAG based
programming model of DryadLINQ, and their
X. Huang and A. Madan, “CAP3: A DNA
Sequence Assembly Program,” Genome Research, vol.
9, no. 9, pp. 868
] M. Schatz, “CloudBurst: highly sensitive read
”, Bioinformatics. 2009
June 1; 25(11): 1363
 J. Ekanayake and S. Pallickara, “ MapReduce for
Data Intensive Scientific Analysis,” Fourth IEEE
International Conference on eScience, 2008, pp.277
 Apache Hadoop, http://hadoop.
] ISARD, M., BUDIU, M., YU, Y., BIRRELL, A.,
TERLY, D. Dryad: Distributed data
parallel programs from sequential building blocks. In
Proceedings of European Conference on Computer
] CHU, L., TANG, H
YANG, T., AND SHEN, K.
Optimizing data aggregation for cluster
services. In Symposium on
Principles and practice of parallel
programming (PPoPP), 2003.
] DEAN, J., AND GHEMAWAT, S. MapReduce:
Simplified data processing on large clusters.
In Proceedings of the 6
Operating Systems Design and Implementation
] BECK, M., DONGARRA, J., AND PLANK, J. S.
. Scalability of CloudBurst with
NetSolve/D: A massively parallel grid
execution system for scalable data intensive
collaboration. In International
Distributed Processing Symposium (IPDPS),
CHANG, F., DEAN, J., GHEMAWAT, S., HSIEH,
W. C., WAL
LACH, D. A., BURROWS, M.,
CHANDRA, T., FIKES, A., AND GRUBER, R. E.
: A distributed storage system for
structured data. In Symposium on Operating
System Design and Implementation (OSDI),