Efficient Data Placement Algorithm for Cloud-based Workflow

jeanscricketInternet and Web Development

Nov 3, 2013 (3 years and 7 months ago)

91 views


1


Efficient
Data Placement

Algorithm
for Cloud
-
based Workflow
1

Peng

Zhang
1,2
,3
, Guiling

Wang
3
, Yanbo

Han
3
,

Jing

Wang
3


1
Institute of Computing Technology,
Chinese Academy of Sciences,
100049
,
China
,

2
Graduate School of the Chinese Academy of Sciences,
100190
,
China,

3
North China University of Technology,
100041
,
China

Abstract

While cloud
-
based
workflow

shows potentials of inherent scalability

and expenditure reduction, such issues as
data transfer

and efficiency have popped up as major concerns.

The workflow

engine

must intelligently select
locations

in which

the
data will reside

to avoid redundant data transfer w
hen one task

needs several

distributed
data in cloud
.
In this paper,
a
n efficient

data placement algorithm for cloud
-
based workflow is
proposed
.

The

data placement algorithm
uses

affinity

graph to
group

datasets

to
minimize

data transfers

while keeping
a
polynomial

time co
m
plexity. By integrating the algorithm into
cloud
-
based
workflow engine, we ca
n expect
efficiency improvement

in data transfer. Exp
eriments and analysis supported
our

efforts
.

Keywords:

data p
lacement
,
affinity
g
raph
,
c
loud
c
omputing,

w
orkflow
,
d
ata
t
ransfer



1

The work wa
s partially s
upported by the National Science Foundation of China under Grant No.

61033006
,

60970131
.

Male
, PhD candidate,
His research interests include service

composition
,
workflow management
.

Email:zhangpeng@software.ict.ac.cn

1
.

INTRODUCTION

Recently,
c
loud computing

has been gaining
popularity as a technology that promises to provide the
next generat
ion of Information Technology (IT) platforms
based on the concept of utility computing

[
1
-
3
]. Some
key
features of cloud computing
such as
high performance,
massive storage

and

relatively low cost of infrastructure
construction
attract many attentions
.

By
taking advantage of cloud computing,

the
Workflow
M
anagement
S
ystem

(WfMS)

shows potentials
of inherent scalability

and expenditure reduction
, and

enables

to
gain a wider utilization

for

cross
-
organizational

e
-
business application
.

H
owever,

they will also
face

some
new c
hallenges, where data

transfer

is one of them.



One the one hand,
since in the cloud computing
system
, data
sets

may
are distributed

and
the
communication

is

also
based on the Internet
.
When
one task needs to process data from different
locati
ons
,
data
transfer
on
the
Internet
is
inevitable

[4]
.



On the other hand,
t
he infrastructure of the

cloud
computing system

is hidden from their users, who
do not know the exact physical locations

where
their data are

placed
. This kind of model is very
conve
nient

for users, but
random
data placement
could lead to
unnecessary

data transfers

[
5
].

So we need an efficient data placement to respond to
the
challenge
.
Concerning

the
data placement

problem
,
Kosar

and colleagues

[
6
] propose a data placement
scheduler
for

the distributed computing systems. It
guarantees the reliable and

efficient data transfer with
different protocols. Cope et al.

[7]
propose a data
placement strategy for urgent

computing environments to
guarantee the data’s robustness. At the

infrastru
cture level,
NUCA

[
8
] is a data placement and replication

strategy for
distributed caches that can reduce the data

s access
latency. These works mainly

focus on how to
transfer

the
application data, and they cannot
minimize
the total data
transfers. As

clo
ud computing has become more and more
popular, new data management systems have also
appeared, such as the Google File System

[9] and Hadoop

[10].
B
ut they are
designed mainly for Web search
applications, which

are different from workflow
applications.

The

closest
data placement
research to ours is the

workflow system which has proposed some data
placement
strategies based on the
k
-
means cluster
algorithm

[
5
]
, but t
he
algorithm requires objective
function

and
show
s

exponential

cost

growth
.

This paper treats

the cloud as an

umbrella term
referring to Internet
-
based interconnect computing
and
storage nodes
,

which partition the Internet
-
based
cyberspace for controlled sharing and

collaboration
, and
reports a practice of the data placement algorithm for
cloud
-
ba
sed workflow
that have the following

specific
features
:



Efficient

d
ata

placement. The data
placement

works in tight combination with task scheduling to
place data in a graphical partition manner, upon
wh
ich the
data transfers

could be minimized.
Moreover,
the data placement
algorithm

is
polynomial

and
involves no

arbitrary empirical
objective functions to evaluate

candidate partitions
.



We evaluate the performance of our proposed
algorithm
and demonstrate

that our
data
placement

algorithm
achieves significan
t performance

improvements.


2

2
.

MOTIVATING EXAMPLE

W
e provide an example scenario for explain a typical
problem can be addressed by the reported work.
Emergency
M
aterials
M
anagement
(EMM)
is an entire
lifecycle management

that includes key tasks like

demand

analysis
, raise, storage,
transport

and consumption
. EMM
is also
a
typical cross
-
organizational

e
-
business
application

that represents the kind of systems that can be
used as the motivational example for the reported work.
Since an EMM is usually quite
co
mplicated

system
,
we
have decided to choose describe only a
simplified
episode

of an EMM
to illustrate
the

problem

that we have address
.
As shown in Figure 1, the episode
that
involves
investigation office, materials management office and
transport office,

who are required to act on a following
scenario.

dataflow
controlflow
Node A
Node B
Node C



Delivery
Transport Office
Delivery
Distribution
Sign in
Investigation Office
Analyze
demand
Plan
sourceing
Materials Management

Office
Allocation
Source
In stock
Out of stock
Receive
Order
Make
Shipping
Order
Order
Form
Node D


...

...
Load
data transfer

Fig
ure
.1
The r
andom
d
ata
p
lacement

To support the abovementioned relief management
workflow scenarios, several kinds of data need are
required; a
n

analysis task

produces an “Order form”
repor
t
ing the em
ergency materials report.
The
materials

management
office receives the “
Order

Form” and
output
s

the

S
hipping
O
rder”
.

The local transport office
delivers the emergency
materials

as
soon
as the

O
rder

Form


and

S
hipping
O
rder”

are r
e
ceived.

All the

data
re
quired to support this workflow
are
placed
o
n
different
nodes
i
n the cloud.
Figure 1 shows
that
the

O
rder
F
orm”

in the WfMS
is placed
o
n node A
and the

Shipping Order


is placed
o
n node C, then there
is at least one data transfer in each workflow instanc
e,
regardless of the delivery task is
scheduled

for
node A or
C. I
f there is a need to a large number of concurrent

instances

for supporting the wor
k
flow, we can anticipate a
very high cost of data transfer
even if the nodes are
co
n
nected with Ethernet.
Le
t us s
uppose that there are 10
transport offices
,

1
Mbits

order form and 100
Mbits
/s
ec

Ethernet bandwidth, each transport office

enables

1000
delivery i
n
stance, the total data transfer time is m
ore or
less

1
×
10
2

seconds
, which can be a significant amount of
time in disastrous relief scenarios
.

It is noteworthy that it is usually impossible to place
all data to the same node as a
result of storage limit. In
addition
, placing all data to the same node will

also
lead
to a node becoming

a super node that has a hi
gh workload.

2
.

D
ATA PLACEMENT MODEL

For
e
-
business applications
, their processes
need to
be executed for a large number of times sequentially
within

a very short time period or concurrently with a
large number of instances

[
1
1
], so that
w
hen one task

need
s
data

located at different

nodes,
these concurrent
instances
makes

the data
transfer
become

more criticality
.
To effectively
place
these data
, the

workflow
engine
must
intelligently select

nodes

in which these

data

will resid
e.
In this paper, a data sched
uling engine
has been
proposed,
as shown
in Figure

3. The data scheduling
engine mainly
does

two things: analyze the dependencies between tasks
and data; place the data
based on the

analysis results.
After that, the data scheduling engine notifies the task

scheduling engine to
schedule the tasks. As

have been
known,
in workflows, both tasks and data could be

numerous and make up a complicated many
-
to
-
many
relationship. One task might need many data and one data
might

be
used
by many tasks. So the data place
ment

should be based on these dependencies

between tasks and
data
.

In order to facilitate
understanding, some related
definitions are given as follows.

DEFINITION 1

Workflow Specification: A workflow
specification can be represented by a directed acyclic
g
raph
WSpc

=(
T
,
E
),where
T

is a finite set of task
t
i

(1

i

n
,
n
=|
T
|) and
E

is a finite set of directed edges
e
ij
(1

i

n
, 1

j

n
). Task
T
= (
ActivityTask
,
ControlTask
) are basic
building blocks of a workflow.
ActivityTask

defines the
abstract functional
requirement
s,
ControlTask

defines the
control logics.
Activit
yTask

= (
Name
,
In
,
Out
), where the
three elements represent the task

s name, input
parameters

and output
parameters
.
ControlTask
={
Start
,
AndSplit
,
AndJoin
,
OrSplit
,
OrJoin
,
End
}.

DEFINITION
2

Data Set:

The data set
D
is

a finite
set of data, and every dat
a

has a
size
,

s
o

every data
d
i

D

has attributes

denoted as
<
i
,
s
i
>
, where
i
denotes the
identifier
,
s
i

denotes the size of
d
i
.

The data are used to
represent the parameters of
ActivityTask
, so each
t
i

ActivityTask
(1

i

|
ActivityTask
|) has attributes denoted
as<
j
,
D
j
>, where

j
is the ide
ntifier,
D
j

D
.

DEFINITION
3

Data

A
ffinity
: The data affinity
is
defined as
aff
ij




k
ServiceTas
k
kij
acc
,
where

acc
kij

is the number
of
task
k

referencing both
data

i

and
j
. The summation

occurs over
the

Activity
Task

in
WSpc
. This definition of
data
affin
ity measures the strength of an imaginary bond


3

between the two

data
, predicated on the fact

that
data

are
used together by

activity tasks
.

DEFINITION
4 Data

A
ffinity
M
atrix
:
Based on this
definition of
data

affinity, the

data

affinity matrix
DA

is
defined
as follows: It is an

n n


matrix for the
n
-
data

problem whose (
i
,
j
)

element equals
aff
ij
.

DEFINITION
5

Data Block Set:

The data block set
B

is

a finite set of data block, and each data block has a set
of data, and is denoted as<
m
,
D
m
>, w
here
m

is the
identifier,
D
m

D
.

DEFINITION
6

Node Set: The node set
R

is a finite
set of nodes, and each node
r
n

has a size and a set of data
blocks, and is denoted as <
n
,
s
n
,
B
n
>, where
n

is the
identifier,
s
n

denotes the storage limit of
r
n
,
B
n

B
.

From w
orkflow
deployment

to workflow execution,
this paper mainly discusses two data placement cases.


The first case is initial data placement. When
workflow specifications are deployed in cloud, the data
used
by activity tasks need
to
be placed. The second cas
e
is runtime data placement. When workflow specifications
are executed, the data generated by activity tasks need
to
be placed. Let

us
introduce the two cases in details.

2.1 Initial Placement

Vertical partitioning is the process that divides a global
obje
ct which may be a single relation or more like a
universal relation into groups of their attributes, called
vertical fragments. It is used during the design of a
distributed
database to enhance the performance of
transactions. Vertical partitioning has a v
ariety of
applications wherever the match between data and
transactions can affect performance

[
1
2
]
.
In this paper,
the
match between data and
task
s can affect
directly the data
placement, which can affect indirectly the data transfer
performance, so we ca
n apply the vertical partitioning to the
data placement, where the tasks are seem as transactions and
data are seem as attributes in a relation.

We shall use the
following notation and

terminology in the description of our

initial data placement
.



Primitive

cycle

denotes any cycle in the

affinity

graph.




Affinity cycle
denotes a primitive cycle that

contains a cycle node.



Cycle completing edge

denotes a “to be

selected”
edge that would complete a cycle.



Cycle node

is that node of the cycle

completing
edge, w
hich was selected earlier.



Former edge

denotes an edge that was selected

between the last cut and the cycle node.



Cycle edge

is any of the edges forming a cycle.



Extension of a cycle

refers to a cycle being

extended by pivoting at the cycle node.

The above

definitions are used in the

proposed
data
placement

to process the affinity graph and

to generate
possible cycles from the graph. The

intuitive explanation
s
will be found in

the paper

[1
2
].

Let

s take the materials
management
and transport
process for
exa
mple
. Firstly, the workflow specification
as shown in Figure 2(a) is transformed into a data affinity
matrix as
shown

in the left of Figure 2(b)
.A diagonal
element

D
A
(
i
,
i
) equals
to
the sum of the

usage for the
data
d
i
. This is reasonable since it shows

th
e “strength” of that
data in terms of its use by

all
activity task
s.

The procedure
for generating the
data blocks

by the affinity graph is
described below.

Each partition of the graph generates a
data block
.

d
1
d
3
d
2
d
4
d
5
(
a
)
S
2
S
3
S
4
S
5
S
6
AndSplit
AndJoin
S
1
Transport Office
Materials Management

Office
S
7
S
8

d
1
d
3
d
2
d
4
d
5
d
1
d
3
d
2
d
4
d
5
2 1 2 0 0
1 3 1 2 1
2 1 2 0 0
0 2 0 2 1
0 1 0 1 3
(
b
)
d
1
d
2
d
3
d
4
d
5
1
2
1
2
1
1

(
c
)
d
1
d
2
d
3
d
4
d
5
1
2
1
2
1
1
d
1
2
d
3
d
5
d
4
d
2
2
1
1
1

(
d
)
d
2
d
1
d
3
d
4
d
5
B
1
-
>
R
1
B
2
-
>
R
2
B
3
-
>
R
3
S
1
S
2
S
3
S
4
S
5

Figure 2.The data placement process

1

Construct the affinity graph

from

the
workflow
specification

being considered. Note that the

data

affinity
matrix is itself an adequate data

st
ructure to
represent this graph. No

additional physical storage of
data would be

necessary.

Figure 2(b) shows an
affinity graph.

2

Start from any node.

In Figure 2(b), we start from the
node
d
1
.

3

Select an edge which satisfies the following

conditions:



It sho
uld be linearly connected to the

tree already
constructed.



It should have the largest value among

the possible
choices of edges at each

end of the tree.

In Figure 2(b), we firstly select the edge (
d
1
,
d
3
), and
secondly select the edge (
d
3
,
d
2
). Next, we se
lect the edge
(
d
2
,
d
4
) and the edge (
d
4
,
d
5
). Finally, we select the edge
(
d
5
,
d
2
).
This iteration will end when all nodes are

used for
tree construction.

4

When the next selected edge forms a

primitive cycle:


4



If a c
ycl
e node does not exist, check for

the
“po
ssibility of a cycle”
,
and if the

possibility exists,
mark the cycle as an

affinity cycle.
The p
ossibi
li
ty
of a cyc
l
e results

from the

condition that no former
edge exists, or

p(former edge) <= p(al1 the cycle
edges).

Consider this cycle as a

candidate par
tition.
Go to step 3.

In Figure 2(b), when we select the edge (
d
5
,
d
2
), there
is no cycle node and
no former edge exists
, so the
primitive cycle
d
2
-
d
4
-
d
5

constitutes a candidate
partition
,
where the cycle node is
d
2
.



If a cycle node exists already, discard

this edge and
go to step 3.

5

When the next selected edge does not form

a cycle
and a candidate partition exists:



If no former edge exists, check for the

possibility
of extension of the cycle by

this new edge.
The
p
ossibility of extension

results from the c
ondition
of p(edge being cons
id
ered

or cycle completing
edge)>= p(any one of the cycle

edges).

If there is
no possibility,

cut this edge and consider the cycle
as

a partition. Go to step 3.

In Figure 2(b), because there is no possibility of
extension of th
e cycle
d
2
-
d
4
-
d
5
, we cut the edge (
d
3
,
d
2
),
and consider
d
2
-
d
4
-
d
5

as a partition.



If a former edge exists, change the cycle

node and
check for the possibility of

extension of the cycle
by the former

edge. If there is no possibility, cut
the

former edge and

consider the cycle as a

partition. Go to step 3.

In our present

approach, however, we consider the
data affinity
matrix as a

complete graph called the a
ff
inity
graph in which an

edge value represents the affinity
between the two
data
. Then, forming a line
arly connected

spanning tree, the
procedure

generates all meaningful

data
block
s in iteration by considering a cycle as

a
data block
.
A linear
and

connected tree has only two

ends.
In the right
of
Figure
2(b)

shows the affinity graph corresponding

to
the

d
ata affinity

matrix after excluding zero
-
valued

edges.
Note that the
data affinity

matrix serves as a

data structure
for the affinity graph.

The major advantages of the
proposed

method are that:



There is no need for iterative binary

partitioning.
The major

weakness of iterative

binary partitioning
is that at each step two

new problems are generated
increasing the

complexity; furthermore,
termination of the

algorithm is dependent on the
discriminating

power of the objective function.



The method requires no o
bjective function. The
empirical objective functions

were selected after
some trial and error

experimentation to find out
that they possess a

good discriminating power.
Although

reasonable, they constitute an arbitrary

choice. This arbitrariness has been e
liminated

in
the proposed methodology.

Now we consider the computational

complexity. Step
1 does not affect the

computational complexity because
the attribute

affinity matrix can be used as a symmetric
matrix
.
The repeat loop in the detailed description is

executed
n
-
l times, where
n

denotes the number of

data
.
At each iteration, selection of the next

edge takes a time
O(
n
). Also whether a cycle exists

or not can be
implemented in time of O(
n
). Thus, the algorithm takes a

time O(
n
2
)
. The partition
results a
re

shown

as in Figure
2(c), where each broken line partition a data block and
each data block includes a group of data. For example, the
data
d
1

and the data
d
3

constitute a data block <1, {
d
1
,
d
3
}>, and the data
d
2

and the data
d
4

and the data
d
5

constitu
te a data block <2, {
d
2
,
d
4
,
d
5
}>. Next, these data
blocks
are placed

in different
nodes.

The

procedure is
introduced as follows:

6

Get the
ActivityTask

from
WSpc
;

7

C
alculate

the
estimated

data transfer time if the data
block
B
k

is scheduled to the node

R
j

ac
cording

to
formula 1. The
bandwidth
set
is

denoted as
BW
,
BW
ij

denotes the bandwidth between node
R
i

and node
R
j
.
Here the sum of the storage limit of all nodes is
greater than the sum of the size of all data.

,,
min (/)
k j k i k k
j k ij
d R d R d B
R d BW

 



(1)

8

Sort the data tra
nsfer time and then select the node
R
j

with the least data transfer time;

9

If the storage limit
of
R
j

is greater than the size of
data block
B
i
, then place the
B
i

to the node
R
j
; else try
other nodes and go to Step 9;

10

I
f the size
of
data block
B
i

is greater

than any storage
limit of nodes, then the data block will be partitioned
into two data blocks with the maximum sum of
affinity weight until the size is satisfied;

As
shown

in Figure 2(d), the data block
B
1

is placed
in the node
R
1
, and the data block
B
2

i
s partitioned two
data blocks as a result of the total size of
d
2
,
d
4

and
d
5

is
greater than any storage limit of nodes. Since the
partition

{
d
2
,
d
4
}

and the partition {
d
5
} have the maximum sum of
affinity weight, the
d
2

and
d
4
constitute

one data block an
d
is placed in the node
R
2
, and the
d
5

constitute

another data
block and is placed in the node
R
3
. There
are

only two
data transfers after the

S
1
,
S
2
,
S
3
,
S
4
, and
S
5

are executed.


Suppose that the number of data and nodes is
n

respectively, and the time
c
omplexity

from step 6 to step
10 is O(
n
2
)+O(log(
n
)
n
).

2.2 Runtime Placement

In
workflow

execution,
there are two situations

to
change the data placement. One situation is that
new
workflows are deployed to the
cloud computing
environment, t
ogether with the

new data and
activity
tasks will be

added to the system
;
the

other situation is
that the ready activity task has been executed, together
with the
output

data,
which might be used by several later
activity
tasks

or later workflows. For the former situation
,
we only repeat the
initial

placement
. For the latter
situation, our idea includes three steps:

11

C
alculate

the data affinity weights

between

the newly
generated data and existing nodes: T
he
data affinity

5

weight

between data

d
i

and

node

R
j

is denoted
as
d
R
ij
,
which is the sum of the
data affinity weight
of
d
i

with
all the data in
R
j
, Based on the data affinity weights
and the storage limit of the nodes, we select one to
place the new generated data.

12

Place the newly generated data: The newly generated
data i
s placed
o
n the node which has the greatest data
affinity weight and enough storage
; else

try other
nodes like step 9 and step 10.

13

Task
scheduling

engine will
periodically monitor the
state
s

of all the workflow’s tasks and

schedule the
ready

task

to the n
ode has the most required data.

Node A
Node B
Node C



Delivery

Workflow Engine
Task
Scheduling
Engine
Data
Scheduling
Engine
Shipping
Order
Order
Form

Figure 3.
The d
ata
p
lacement

architecture

Let

us

review our example. After the runtime data
placement, the newly generated data

Shipping Order


and newly generated data

Order Form


will be p
laced
o
n
the s
a
me node as shown in Figure 3.

3.

DATA PLACEMENT

ALGORITHM

In order to generalize
the
above

described
steps, the
alg
o
rithm is given.
A
ccording to our
analysis

for above
-
mentioned steps, the time
complexity

of algorithm is no
more than O(
n
2
) +
O(
n
2
)+O(log(
n
)
n
), and is a near
optimal
solution
while keeping
a polynomial

time
co
m
plexity.

Function
DP4WF(
Msg
,
WSpc, R, DA
)

1

BEGIN

2

if (
Msg

==

1

F//
initial

st慧e

P

卥p
dataSet

//generate data set

4

Set
taskSet
=
WSpc
.
T
.
ActivityTask

5


for
i
=0 to

set
.size()

6


ActivityTask
t
=
taskSet
.get(
i
);

7


dataSet
.add(
t
.
D
)

8

DA
=Affinity(
dataSet
,

taskSet
)

9

Graph
g
=

Aff
inity
G
raph
(
DA
)/

10


Set
set
=

GraphPartition(
g
)

11

for j=0 to
set
.size()

12

B
block

=set.get(
j
);

13


result
+
=scheduleBlock(
block
,
R
)

14

if (
Msg

==

O

F//r畮tim攠etage



dataSet
=
engine
.fetchNewData();

16

for each
dm

dataSet

17

for each node
R
i

18

calculate
dR
mj
/

19

R
n
=max(
dR
mj
)

20

result
+=scheduleData(
d
m
,
R
n
)/

2
1


Return
result
;

2
2

END

T
o our knowledge, little

effort has been allocated to
investigate
the data placement for workflow using the
graphical algorithm.
This
graphical
algorithm can be used
effectively for

data
partitioning because it overcomes the

shortcomings of binary par
titioning and it does not

need
any complementary algorithms such as the

BEA
procedure that ha
s

been described in the work reported in

paper
[
4]
. Furthermore, the algorithm involves no

arbitrary
empirical objective functions to evaluate

candidate
partitions
.

4 EXPERIMENTAL RESULTS

We
have adopted

a pub/sub message
-
oriented
middleware ActiveMQ
2

to enhance our prior tools named
VINCA Personal Workflow

[1
3
], and develop a cloud
-
based workflow
platform that has been used for the
experimental evaluation of our app
roach in this Section.

To evaluate their

performance, we run
some
workflow
instance
s

through
3

simulation

strategies:

Random
: In this simulation, we randomly place the
existing data

during the initial stage and store the
generated data in the local

node
(i
.e., where they were
generated) at the runtime. This

simulation represents the
traditional data placement strategies

in old distributed
computing systems. At that time, data were usually stored
in the local node

naturally or in the nodes that had the
avail
able storage. The

temporal intermediate data, i.e.
generated data, were also naturally

stored where they
were generated waiting for the tasks to retrieve

t
hem.

K
-
Means
: This simulation shows the overall
performance

of
k
-
means
algorithms. The strategy need

object function to
cluster generated data
and place them
in
the appropriate

nodes
.

DP4WF
: This simulation shows the overall
performance

of our algorithms. Our

algorithms are
specifically designed for cloud workflows.

The strategy is
based on data dependenc
y and can automatically

partition

the existing data. Comparisons with other strategies will
be made with

different aspects to show the performance
of our algorithms.

The traditional way to evaluate the performance of a
workflow

system is to record and comp
are the execution
time.

However, in our work, we will count the total data
transfer
instead. The execution time could be influenced



2

http://activemq.apache.
org/


6

by other factors, such as bandwidth, scheduling strategy

and I/O speed.

Our data placement strategy aims to
reduce the

data
transfer
s

between
nodes

on the Internet.
So we

directly take the number of datasets that are
actually
transferred

during

the workflow’s execution as
the measurement to evaluate the

performance of the
algorithms. In a cloud computing environment

with a
limi
ted bandwidth based on the Internet, if the total data
transfer
s

have

been reduced, the execution time will be
reduced

accordingly. Furthermore, the cost of data
transfer
s

will also

decrease.

0
50
100
150
200
250
30
50
80
120
Data Sets
Data Transfers
Random
K-Means
DP4WF

(a)

The d
ata transfer comparison

50
60
70
80
90
100
110
120
5
10
15
20
Nodes
Data Transfers
Random
K-Means
DP4WF

(b)

The d
ata transfer comparison

Fig
ure

4
.
The d
ata transfers without storage limit

To make the evaluation as objective as possible, we
generate

test workflows randomly to run on

our platform
.
This would make

the evaluatio
n results independent of
any specific applications.

As we need to

run Random and
the
K
-
Means and DP4WF
algorithms

separately, we set
the number of existing datasets and generated

datasets to
be the same for every test workflow. That means

we have
the same
number of existing datasets and tasks

for every
test workflow, and we assume that each task will only

generate one dataset. We can control the complexity of
the test

workflow by changing the number of datasets.
Every dataset will

be used by a random number

of tasks,
and the tasks that use the generated

datasets must be
executed after the task that generates their input.

We can
control the complexity of the relationships between the

datasets and tasks by changing the range of this random
number.

Another fact
or that would have an impact on the
algorithms is

the

storage limit of nodes
. We can randomly
set

the storage
limit

for the nodes
. We will run new
simulations to

show their impact on the performance.
Here, we have only included

graphs of the simulation
res
ults.

In Figure
4
(a), we ran the test workflows with
different complexities on 15 nodes. We used 4 types of
test workflows with

different numbers of datasets. In
Figure

4
(b), we fixed the test workflows’

datasets count to
50, and ran them on different numb
ers of

nodes
.

From the
results, we could conclude that the

K
-
Means and DP4WF
algorithm
s

can effectively reduce the total data transfer
s

of the workflow’s execution
.

However, in the simulation described above, we did
not limit

the amount of storage that the

nodes
had
available during

runtime. In a cloud computing
environment,

nodes

normally have

a limited storage,
especially in some storage constrained systems.

When
one

node

is overloaded, we need to reallocate the data

to
other
node
s. The reallocation will
not only cause extra

data transfer
s
, but will also delay the execution of the
workflow.

To count the reallocated datasets, we ran the
same test workflows

as in Figure
5

with a storage limit in
every

node.

From Figure
5
, we can see that as the number of

nod
e
s

and datasets increases, the performance of th
e
random

strategy
decreases. This is because the datasets
and tasks are gathering on one node. This

triggers the
adjustment process more frequently, which costs extra
data transfers.

0
50
100
150
200
250
300
350
400
30
50
80
120
Data Sets
Data Transfers
Random
K-Means
DP4WF

(a)

The d
ata transfer comparison

50
70
90
110
130
150
170
190
210
230
250
5
10
15
20
Nodes
Data Transfers
Random
K-Means
DP4WF

(b)

The d
ata transfer comparison

Fig
ure

5
.
The d
ata transfers with storage limit

During the execution of every test

workflow instance,
we recorded the number of datasets that

transfer
red

to
each node, as well as the tasks that scheduled to

that node.
The objective was

to
see how the

tasks and datasets were
distributed, which could indicate the workload

balance
among nodes. We also calculated the standard deviation
of the

nodes’ usage.


Figure
6

shows the average standard deviation

of
running 1000 test workflows on 15
node
s each having 80

existing datasets and 80 tasks
.

From Figure
6
, we can see

7

relatively high deviations in the

node
s’ usage in the two
simulations without the runtime

alg
orithm. This means
that the tasks and the datasets are allocated to one

node
more frequently. This leads to a node becoming

a super
node that has a high workload. By contrast, in the other

two simulations that use the runtime algorithm to pre
-
allocate

the
generated data to other
node
s, the deviation of
the node usage is low. This demonstrates that the runtime
algorithm

can make a more balanced distribution of the
workload among
node
s.

0
2
4
6
8
10
12
14
16
Random
K-MEANS
DP4WF
Standard Deviation
Data Transfer
Task Scheduling

Fig
ure

6
.
The s
tandard deviation of workl
oad

The result in Figures

7

indicate
s

that

varying the

size
of datasets
has
great
effect

on the

running cost. When the
size of datasets is less than 40, the
running cost

of the K
-
means is less than ours.
H
owever, with the growth of the
size of datasets, t
he
running cost

of the K
-
means shows
exponential increase tendency, but ours only shows linear
increase tendency.

0
50
100
150
200
250
300
30
50
80
120
Data Sets
Cost time /s
K-Means
DP4WF

Fig
ure

7
.
The r
unning cost comparison

We thus conclude that
our
data
placement

algorithm
achieves significant

performance

improvements

while
keeping
a polynomial

time co
m
plexity
.

5
.

CONCLUSIONS

With cloud
-
based

workflow, users can reduce their
IT

expenditure and enjoy full
-
fledged
business
automation

based upon

the cloud fabric with high
availability and scalabi
lity.

However, to our knowledge,
o
nly a few works give some
preliminary

research results
.
In
this paper, we
propose an efficient
data placement
algorithm for cloud
-
based
workflow
, which works in tight
combination with task scheduling to place data in a
gra
phical partition manner, upon
wh
ich the
data transfers

could be minimized. Moreover, the data placement
algorithm

is
polynomial

and involves no objective
function.

REFERENCES

[1]

Weiss
,
A

(2007).

Computing in the Cloud
.

ACM Networker,
11:18
-
25.

[2]

Han,

Y,

Sun,

JY
,

Wang
, GL,
Li
, HF.
A Cloud
-
Based BPM
Architecture with User
-
End Distribution of Non
-
Compute
-
Intensive Activities and Sensitive Data
.
Journal of Computer
Science
and
Technology
. 25(6):
1157
-
1167
. 2010

[3]

Armbrust M, Fox A, Griffith R
et al
. Above the clouds:
A
Berkeley view of cloud computing. University of California,
Berkeley, 2009,
http://www.eecs.berkeley.edu/Pubs/TechRpt
s/2009/EECS
-
2009
-
28.html
.

[4]

Deelman

E., Chervenak

A.,
Data management challenges of
data
-
intensive

scientific workflows, in: IEEE International
Symposium on Cluster Computing

and the Grid, 2008, 687
-
692

[5]

Yuan
, D
, Yang
,
Y, Liu
,
X, Chen
,
J
J

(
2010).
A data placement
strategy in scientific cloud workflows
.
Future
Generation
Computer Systems
, 26(8):1200
-
1214.

[6]

Kosar
,
T, Livny
,
M

(2005).

A framework for reliable and efficient
data placement in

distributed computing systems
.

Journal of
Parallel and Distributed Computing
, 65(10):
1146
-
1157.

[7]

Cope
,
JM,

Trebon
,
N, Tufo
,
HM,

Beckman
,
P

(2009).

Robust
data placement in urgent

computing environments
.
IEEE
International Symposium on Parallel

and Distributed Processing

(
IPDPS’09
)
, Rome, Italy, IEEE Computer Society.

[8]

Hardavellas
,
N, Ferdman
,
M, Falsafi
,
B, Ailamaki
,
A

(2009).

Reac
tive NUCA: Near
-
optimal block placement and replication
in distributed caches
.

36th Annual International Symposium on
Computer Architecture

(
ISCA'09
)
, Austin, Texas, USA, IEEE
Computer Society.

[9]

Ghemawat
,
S, Gobioff
,
H, Leung
,
ST

(2003).

The Google File
Sys
tem
.
SIGOPS

Oper. Syst. Rev., 37
:
29
-
43.

[10]

Borthakur, D

(2007).

The hadoop distributed file system:
Architecture and design
.
Hadoop Project Website
, 11, 21.

[11]

Liu, K
,
Jin, H
,
Chen, J
,
Liu, X
,
Yuan, D
,
Yang, Y

(2010).
A
Compromised
-
Time
-
Cost Scheduling Algo
rithm in SwinDeW
-
C
for Instance
-
Intensive Cost
-
Constrained Workflows on Cloud
Computing Platform
.

International Journal of High Performance
Computing Applications
, 24(4):445.

[12]

Navathe
,
SB,

Ra
,
M

(1989).
Vertical partitioning for database
design: a graphical

algorithm
.

SIGMOD Record
. 18(2):

440
-
450

[13]

Wang, J, Zhang, LY, Han, YB (2006). Client
-
Centric Adaptive
Scheduling of Service
-
Oriented Applications.
Journal of
Computer Science and Technology
, 21(4): 537
-
546.

云工作流下的有效的数据布局算法

摘要:
当云工作流带来可扩展性和节约成本的好处时,数据传输及其效率逐渐成为主要关注的问题。当一个任务需要
同时处理分布在不同位置的数据时,工作流引擎必须智能地选取数据的存放位置来避免不必要的数据传输,为此本文
提出了云工作流下的有效的数据布局算法,该算法使用了关联图划分数据集来减少数据传输次数,同时保持了多项式
的时间复杂度。实验表明,该算法不仅提高了云工作流的数据传输效率,而且减少了算法的运行成本。

关键字:
数据布局;关联图;云计算;工作流;数据传输