Dynamic Controlling of Data Streaming Applications for Cloud Computing

ickybiblegroveInternet και Εφαρμογές Web

3 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

85 εμφανίσεις

Dynamic
Controlling

of

Data Streaming Applications

for Cloud Computing


Junwei Cao
1
,
2
*

and Wen

Zhang
3

1
Research Institute of Information Technology, Tsinghua University, Beijing

100084,

China

2
Tsinghua National Laboratory for Information Science and Techno
logy, Beijing

100084,

China

3
Chongqing Military Delegate Bureau
,
General Armament Department of PLA, Chongqing 400060, China

*
Corresponding e
mail: jcao@tsinghua.edu.cn


Abstract


Performance of data streaming applications is co
-
determined by both networki
ng and computing
resources, which should be allocated in an integrated
and cooperative way.
Dynamic

controlling of

resource
allocation is required

since
unilateral redundancy in
networking or computing resources may
result in

underutilization

but
not neces
sarily high performance
since insufficiency of either resource may become a
bottleneck. In this paper, a
virtualized cloud platform

is
utilized to
implement

data streaming
: fuzzy logic
controllers are designed to allocate required CPU
resources; iterative
bandwidth allocation
is applied
which is processing
-

an
d storage
-
aware to guarantee
on
-
demand data provision. Experimental results show
that our
approach

leads to high performance of
applications as well as high utilization of resources,
which is also just
ified by comparison with oth
er
resource allocation methods.


1.

Introduction


Data streaming
applications are
special

in that (1)
they are
continuous

and long running in nature; (2) they
require efficient transmission of dat
a from/to
distributed sources

in an

end
-
user
-
pulling way; (3) it is
often not feasible to store all the data in
entirety

for
subsequent

processing because of limited storage and
high volumes of data to be processed
;

(4) they need to

make efficient use of computational

resources to carry
out

processing

in a timely manner. New computing
paradigm, e.g. Cloud Computing

[1]
, provides better
support for data streaming since virtualized resources
can be allocated in a more fine
-
grained and on
-
demand
way.

For example,
LIGO (Laser Interferometer
Gravitational
-
wave Observatory)

[2]
, is
aimed at
direct
detec
tion of gravitational waves emitted from space
sources
. LIGO data analysis streams terabytes of data
per day from observatories for real
-
time processing

using scientific workflows

[3]
.
This is a typical
scenario

in

many scientific applications where data
processing is
continuously executed over
remote

streams

as if data were always
available

from local
storage
.

G
reat challenges have to be addressed

to provide
sufficient
but not redundant
resources,
mainly
computing
resource
s

and
networking bandwidth,
so
that
application requirements can be met,

while
maintai
ning high resource utilization.

Virtualization
technology has been applied for resource management

[4]
.
O
we to the progress of virtualization technology,
such as the open source solution Xen
[5]
, it is possible
to allocate fine
-
grained computational resources for
applications
.
Virtual

machines
(VMs) are able

to
instantiat
e
multiple
independently

configured guest
environment
s

on a host resource at the same time
, to
provide performance isolation
. W
ith the ability of
dynamic configuration, VMs make it possible to
allocate
computing
resources for applications on

demand

with fi
ner granularity.

In
this

work,
VMs

are

setup for data streaming applications for resource
isolation and performance guarantee.

As shown

in our previous work

[6]
,
CPU

and

bandwidth must be allocated in a cooperative and
integrated way.
I
n such scenarios
,
u
nilateral
redundancy of
CPU

or bandwidth will not
necessarily
lead to high
data through
put
;

on the other hand,

in
sufficiency

of either will
become a bottlen
eck of
the
ultimate throughput
.
I
t is
required

to allocate
computing and networking

resources
to
reach a balance
between

obtain
ing

high
data
throughput

while
maintaining high
resource
utilization
.

Actually,
data

throughput and resource
utilization

conflict

with each
other in nature.

D
ue to stochastic dynamics of
computational systems

and time
-
variant workloads
,

an
automatic allocation mechanism
that

can react

to
a

dynamic
environment quickly is
required. In this work,

closed
-
loop feedback control

is applied
.

F
eedback control has been applied to
computational systems

[7]

and

some promising results
obtained.

A

closed
-
loop controller
o
bser
ves
the error
between the set
-
point (called reference

in control
theory terminology)
and
the
output of
the target system
(called plant)

to
activate its control algorithm (such as
proportional, integral and derivate, PID in short)
and
produce the control variable of
the

plant, so as to adjust
the output of the plant,
until the system reaches to the

targeting

state, even in the presence of unexpected
disturbance.
I
t is
indispensable to establish
mathematical model
s

in which control system
s

are

described

using one or more differential equations that
define the system response to inputs

in traditional
feedback control
.

This

is rather challenging
in some
cases, especially
for

data streaming applications
due to
variable coupling and heavily nonlinear property of the
system
.

F
ortunately, fuzzy control
offer
s

an alternative
,
which provides formal techniques

to
represent
,

manipulate and implement human experts


heuristic
knowledge for controlling a plant via IF
-
THEN rules.

F
uzzy control
does not rely on mathematical modeling
of the plant but
establish
es

a direct nonlinear mapping
between the
ir input
s

and outp
ut
s
,
which
significantly
reduces the difficulty of control system design.

I
n
this work
, an integrated
approach

is implemented
for

fine
-
grained
and on
-
demand
resource allocation

for
data streaming applications

in a

virtualization based
cloud

platform
: fuzzy

logic control is applied for
CPU

allocation by
configuring

VM
s dynamically

according
to resource utilization
;
an iterative algorithm is
adopted
for
bandwidth

allocation, which is processing
-
,
congestion
-

and storage
-
aware.
A
ctually, allocation of
CPU and
bandwidth
is

tightly
coupled
, and
experimental results included in this work show good
performance of our approach
.

T
he rest of this

paper is organized as follows
:
Section 2

formulates the problem

in details
,
and
the
following two sections

illustrate
fuzzy

allocation of
CPU

resources and
iterative bandwidth allocation
,
respectively. E
xperimental results are
illustrated

in
Section
5

to
show performance evaluation of

our
approach

using an example of gravitational wave data
analysis
.

S
ection
6 discusses

relate
d
work

and

Section
7

conclu
de
s
the

paper
.


2
.
Problem Formulation


In this work, v
irtualization technology makes it
possible to provide a predictable
and controllable run
-
time environment for each application;
fuzzy control
is
applied
for on
-
demand

allocat
ion

of

resource
s

for each
VM.

Both
performance metrics
,
data
throughput and
resource utilization, are defined,
co
-
determined by
CPU

and bandwidth

allocation
.

A
t any time

t
, for a
data streaming application
i
, if
the amount of data
in
local
storage, denoted

as
Q
i
(t)
, is
higher than a certain level

(e.g.,
a block as explained
later
)
, the processing will be running otherwise it will
just be idle.
Q
i
(t)

is

co
-
determined

by
both
data
provision

and data processing
since

new

data will be
streamed to local storage
while
processed data will be
cleaned up

afterwards
.

T
he amount of data in storage varies
over time
and
can be described
using the

following

differential
equation
:







t
d
t
transpeed
t
Q
i
i
i






(
1
)



0
0

i
Q

,
where


t
Q
i

,
transpeed
i
(t)

and
d
i
(t)

stand for the
derivative of
Q
i
(t)
, assigned
transferring

bandwidth and
processing speed for data
stream

i
.

As
demonstrated

in
[8]
, data sets to be processed
are

composed with lots of small files
(LOSF).

D
ata are
processed block by block, where a block consists of a
certain number of small files.

I
f there are blocks
available in the
local storage, an indicator, denoted as
Ready
i

for
the
application
i
,
is
set to be 1,
otherwise
Ready
i

is

0.
S
o
d
i
(t)

can be described as
:



0 0
0 1
i
i
i
Ready
d t
Ready




 




Some definitions must be provided here for
a
clearer problem statement.

Realistic
Processing Speed

(
RPS
)
:
the processing
speed given
a
data
streaming

scheme

d
i
(t)

here.

Theoretic
Processing Speed

(
TPS
)
:
the pro
cessing
speed the allocated
CPU

resources can generate if there
were

always enough data locally
, denoted as
procspeed
i
(t,C
i
(t))
, where
C
i
(t)

stands for the allocated
CPU

resource for
application
i

at time
t
.
Relationship
between
procspeed
i
(t,C
i
(t))

and
C
i
(
t)

must be
determined with system identification

and it is safe to
say that
procspeed
i
(t,C
i
(t))

is a non
-
decreasing function
of
C
i
(t)
, where
C
i
(t)

mainly refers to a proportion of
CPU cycles as explained later
.

Realistic Throughput (
RTP
)
:
given
a
data supp
ly

scheme
,
the
amount of data processed in a given period
of time.

Theoretic Throughput (
TTP
)
:
the
amount of data
processed in a given period of time if there
were

always enough data locally with allocated
CPU

resources.

S
cheduling
of
CPU and
bandwidth and

resources
is
carried out periodically to cope with the dynamic status
of resources and applications.
S
uppose the length of a
scheduling period is
M
,

and for the
h
th

scheduling
period,
the following formulas are
ex
plicit:












1
Re
,
0
Re
0
,



i
ady
t
i
C
t
i
procspeed
i
ady
t
i
C
t
i
d
t
i
d













hM
M
h
dt
t
i
C
t
i
procspeed
h
i
TTP
1
,
,










hM
M
h
dt
t
i
C
t
i
d
h
i
RTP
1
,
,

From
(
1
)
:





































hM
M
h
t
hM
i
Q
M
h
i
Q
dt
t
i
transpeed
hM
M
h
dt
t
i
Q
t
i
transpeed
h
i
RTP
1
1
1
.
,

D
efine utilization of compute resource

(
UC

in short)

as

h
i
TTP
h
i
RTP
h
i
UC
,
,
,


(
2
)

i.e.,

























hM
M
h
t
hM
M
h
t
dt
t
i
procspeed
hM
i
Q
M
h
i
Q
dt
t
i
transpeed
h
i
UC
1
1
1
,

(
3
)

,
denoting

to what extent the allocate
d
compute
resource

is utilized
.

RTP
i,h

can be defined in another form as
:






h
i
i
dt
t
procspeed
h
i
RTP
,
,

(
4
)

where
Ω
i,
h

stands for the time
fragment wh
en

processing is
going
on

and then

utilization can be
redefined
in another way
as:

M
h
i
UC
h


,

(
5
)


Note that (
5
)
implies that
TPS
i,h

will keep constant i
n a
scheduling period with
given
CPU

resources.

UC

can be defined also as the ratio
of
RPS

to
TPS
.

The problem is to a
llocate proper amount of
CPU

resource to generate
RPS

approaching
TPS

as much as
possible given the data supply scheme.
I
t is obvious
that
redundant

CPU

resource wil
l make a
TPS

much
larger than
RPS
, which implies underutilization of
computing

resource
s
.

I
f available bandwidth is limit
,
RPS

will be zero at most
time

with redundant
CPU

cycles for lack of data to process
.
This dependency

between data
provision

and proce
ssing make it
necessary to allocate compute resources on demand so
as to make
RPS

equal with
TPS
.

O
ur goal is to get high throughput while
maintaining high utilization of resources even when
system characteristics are
vague

as discussed before.

A

low utili
zation implies that at most time CPUs are idle
for lack of data to be processed.
O
n the
other

hand, an
extremely high utilization indicates that CPU resources
are over
-
loaded and more resources are required.
I
n
both cases, data throughput will be hampered.

I
n
intuition, applications with extremely high utilization
should be allocated more resources to increase
throughput, and resources of those applications with
low utilization should be decreased to increase the
utilization and avoid waste of resources.
A
c
tually,

this
intuition is nearly sufficient

to design a fuzzy controller
,
dynamically configur
ing

the clock speed of
VM
s for
each data streaming application
and

implement
ing

fine
-
grained

allocation

of CPU resources.

The i
terative

bandwidth allocation algor
ithm

must
be

processing
-

and storage
-
aware
. That

is to say
,
data
are streamed according to processing capacity
,

since

insufficient
CPU

resource
may

lead to waste of
available bandwidth
. Also a relatively higher speed for
data streaming than processing will

lead to data
accumulation, requiring more available
local storage
,
which is
unnecessary

and should be avoid
.
On the other
hand, redundant
CPU

resources
may

make full use of
bandwidth
and result in a high throughput
,
at cost of

CPU
underutilization.
It is
required

to achieve a
balance between
CPU and
bandwidth resource
s

for
each data streaming application.

T
he coupling between
data
provision

and processing
leads

to
the system
with
heavily nonlinear pro
perties

and

system modeling is
not a trivial task
.

This
is why fuzzy logic control is
applied instead.

A
llocation of
CPUs

and bandwidth is carried out in
an integrated and cooperative way.
A

resource
container which holds appropriate
CPU

and bandwidth
resources
calculated with our proposed algorithm is
implemen
ted

for each data streaming application,
similar with

that

demonstrated in
[9]
.


3
.
Fine
-
g
rained
CPU Allocation


As mentioned above,
fine
-
grained
CPU

resource
allocation is implemented with combination of
virtualization techno
logy and fuzzy control, where the
former provides an isolated run
-
time environment and
the latter is responsible for appropriate resource
configuration.


3.1. Virtualization with Xen

R
ecent
progress
on

virtualization technology m
akes
it possible
for r
esour
ce isolation and performance
guarantee

for each data streaming application
.

Virtualization
provide
s

a powerful new layer of
abstraction in dis
tributed computing environments,
which separates physical hardware and operating
system, so
as
to improve
resource

utilization and
flexibility.
Virtualization allows multiple VMs with
different operating systems and software configuration
to run on a single physical machine
.
Each VM holds its
own
virtual hardware, such as CPU and memory, on
which operating system and
applications can be
loaded.

Xen is a
virtualization

technology that offers
negligible performance penalties
with
a technique
called
paravirtualization
,

which
exposes a set of clean
and simple device abstractions
to allow most
instructions to run at their n
ative speed

and the overall
capacity

is
very close to raw physical
performance.

Xen
conduct
s a VM management
mechanism called hypervisor
, or in
other words
, virtual
machine monitor (VMM),

to share and access
hardware
at lower level
.

With Xen, configuration

of VMs can be
dynamically ad
justed to optimize performance. The
amount of memory of each VM can be changed easily.

The CPU of a VM is called virtual CPU, often
abbreviated as VCPU.

The quota of physical CPU
cycle a
VCPU

will get
is determined by two param
eters,
i.e.,
weight

and
cap
, where the former is a relative
value and the latter is an absolute one
.

A VCPU

with a
weight of 128

can obtain twice as many CPU cycles as
one whose

weight
is

64, while 50 as a cap value
indicates that the VCPU will obtain 50%
of a physical
CPU’s cycles.

What’s more, binding a VCPU with a
physical CPU will provide better performance.

Each data streaming application will be provided
with a VM which will be occupied exclusively

by it
.
Xen
is

applied to create a
VM

with configurabl
e clock
speed for each application.
C
ap, one of Xen

s
inter
faces regarding CPU allocation
,

is

adjusted
dynamically
according to the measured
utilization

and
pre
-
d
efined fuzzy rules as described below.


3
.
2
.
Fuzzy Control
Overview

A fuzzy control system is
based on fuzzy logic

related with fuzzy concepts that cannot be expressed as
true

or
false

but rather as
partially true
.

Inputs of a
fuzzy control system are

values in terms of logical
variables that take on continuous values between 0 and
1

which indicate
s
degree of truth
,
rather than

classical
or digital logic operat
ing

on discrete values of either 0
and 1 (true and false).

As an obvious advantage,
fuzzy

logic
can express the
solution to problem
s

in terms that
human operators can understand,
so as to make

use of
their experience in design
ing

of the controller. This
makes it easier to mechanize tasks that are already
successfully performed by humans

but hard to establish
mathematical models
.

So
,
fuzzy control
has
its special
merit when it is difficult, if n
ot possible, to apply
traditional control method.

The input variables in a fuzzy control system are in
general mapped into fuzzy sets, where an input variable
may be mapped into several fuzzy sets with
corresponding
truth

values determined by the
membershi
p functions.
This process is called
fuzzification.

All the rules that apply are invoked, using
the membership functions and truth values obtained
from the inputs, to determine the result of the rule. This
result in turn will be mapped into a membership
fun
ction and truth value controlling the output variable.
These results are combined to give a specific
answer by
a procedure known as
defuzzification
.

A

fuzzy logic controller
(FLC)
can be de
pict
ed as
the diagram

in Figure 1
, consisting of
an input stage, a
processing stage, and an output stage.

UC
Ke
d
/
dt
Kec
Fuzzification
Inference
Rule Base
Defuzzification
Ku
u
Fuzzy Logic Controller

UC
Figure 1.
A

fuzzy logic controller

The input stage maps inputs including
UC

and

UC

to the appropriate membership functions (as shown
in Figure
s

2 and 3
,

respectively) and truth values
,
known as
fuzzification
.

These mappings are then fed
into the rules

in t
he processing stage
,
which
based on
inference mechanism,

invokes each
relevant

rule
in the
rule base
and generates a result for each, then combines
the results of the rules.

Here
the rule specifies an AND
relationship between the mappings of the two input
variables

so

the minimum of the two is used as the
combined truth value
.

Finall
y,
in
the output stage
,
t
he
appropriate output state is selected and assigned a
membership value at the truth level of the premise. The
truth values are then defuzzified

through centroid
defuzzification
.

In our scenario,
the output is
a
proportional factor
,
PF

in short, which will be used to
calculate
the allocated CPU quota of each
VM
, in
terms of
cap
.

S
ome basic
concepts
are given below

to help
construct an elementary
understanding of fuzzy logic
control
lers and
their

mechanism
.

U
niverse of discourse

is t
he domain
of an input
(output) to (from) the FLC
.

I
nputs
and outputs
must be
mapped to the universe of discou
r
se

by
quantization
factors
(
Ke

and

Kec

in Figure 1)

and scaling factor

(
Ku

in Figure 1)

respectively
, which helps to
migrate the
fuzzy control lo
gic to different problems without any
modification.

L
inguistic variables

describe the inputs and
output(s) of a fuzzy controller.

T
hese linguistic
variables are a natural way resembling human thoughts
to handle uncertainties created by stochastics present
in
most computer systems.

Linguistic variables involved
in this
work
include
UC
,

UC

and
PF
.

L
inguistic values

are
used

to describe
characteristics of the
linguistic
variables.

V
ery low, low,
medium, high and very high
are the linguistic values
for
UC
, while those for

UC

and
PF

are
NB
,
NM
,
NS
,
ZE
,
PS
,
PM

and
PB
, where
N
,
P
,
B
,
M
,
S

an
d
ZE

are
abbreviations of
negative
,
positive
,
big
,
medium
,
small

and
zero
,

respectively, and the combination of them
just takes on a degree of truth.
D
ifferent from classical
mathematics, in fuzzy world, t
his is represented as a
continuous value between

0
to 1, and 0.5 indicates

we
are halfway certain. The mapping from

a numeric value
to a degree of truth for a linguistic

value is done by the
membership function.

L
inguistic rules

form

a

set of IF premise THEN
consequent rules to map the inputs to o
utput(s)
of a
fuzzy controller, i.e.,
to guide the

fuzzy controller
’s

actions.
T
hese rules are defined in terms of linguistic
variables, different from the numerical input
-
or
-
output
of the classical controller.
A linguistic rule
for example
is: IF
UC

is
high

AND

U
C

is
NB

THEN

PF

is
NB
.

R
ule
-
base

h
olds

a set of IF
-
THEN rules
as a part of
the controller,
dictating how to achieve
PF

according
to the fuzzified linguistic values of
UC

and

UC
.

M
embership functions

quantify

the certainty an
UC

and


UC

value to be associa
ted with a certain
linguistic value.
E
xcept for the membership of
linguistic value
very low

for
UC
,
we use symmetric
triangles of an equal base and 50% overlap with
adjacent
MFs
.

U
nlike traditional set theory, in fuzzy set
theory underlying fuzzy control t
heory, set membership
is not binary but continuous to deal with uncertainties.
T
hus, a fuzzy input or output
may belong to more than
one set

maximum two adjacent sets in ou
r

MFs

with
different certainty values.

I
nference mechanism
s

in Figure
1

determines
w
hich rules

will be

appl
ied

at the
k
th

sampling point,
b
ased on the fuzzified
UC

and

UC
.

T
o compute the
certainty value of the premise in the corresponding IF
premise THEN consequent rule(s), we take the
minimum between the certainty values of
UC

and

UC
,
since

the consequent cannot be

more certain than the
premise.

F
uzzification

is to transform precise values of
inputs into fuzzy sets with corresponding membership
functions, which is indispensable for fuzzy inference.
O
utputs of fuzzy inference are fuzzy s
ets, which are not
suitable to drive the controlled system, and they must
be transformed into a clear value by
defuzzification
.
C
entriod method is applied to compute the control
signal
.


3
.
3
.
Linguistic Variables
a
nd Fuzzy Rules

A
s for the FLC proposed in
this paper, the inputs
are the observed resource utilization
UC

as defined in
(
2
) or (
5
) and
the
derivative of it
,


UC
, unlike most
existing FLCs whose inputs are usually the observed
error between the settled point and the actually
measured value and the
derivative of the error
.

A
lthough it has two inputs, essentially it is a single
input controller for

UC

can be derived from
UC

as:







1
UC k UC k UC k
   

O
utput of the fuzzy controller
for application
i

in
the
h
th

scheduling period, called the control si
gnal in
control terminology,
is a proportional factor, denoted
as
PF
i
,h
.
At the
h
th

scheduling period
,
given inputs
UC

and

UC
, suppose
the relevant
fuzzy

sets
of output
form a set denoted as
M
i,h

with membership denoted as
M
i,h
(u)
,

where
u

U
i,h

and
U
i,h

i
s the universe of
discourse, then the output can be calculated as






,,
,,,
i h i h
i h i h i h
U U
PF M u udu M u du

 

(
6
)

Suppose the initial caps of each application are
C
i,0
,
in the
h
th

scheduling period, the cap will be



1
,
1
1
,
1
,
,






h
CapScale
h
i
PF
h
i
C
h
i
C

(
7
)

, and initiall
y

,0
1
i
PF


w
here

CapScale

is the varying scale of
the

allocated
cap.
PF
i
,h

is adjusted every scheduling period, so it is
adaptive to the varying situations.
R
elationship
between allocated cap and
procspeed

can be obtained
with system identif
ication
[7]

as
described later in
S
ection 5.
2

and a linear model
(an approximately
proportional model in certain scope)

is adopted as















p
l
q
m
m
h
Cap
b
l
h
procspeed
l
a
h
procspeed
m
1
1
(
8
)

L
inguistic va
lues

of
UC

include
very
-
low
,
low
,
medium
,
high
and
v
ery
-
high
,
indicating utilization
status of
CPU

resources.

T
riang
u
lar member
ship

functions are

adopted, as shown in Figure 2.

0
.
2
0
.
4
0
.
6
0
.
8
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
0
1
very high
high
medium
low
very low
Input variable UC
Membership Function

F
igure 2. Triangular membership functions of
UC

Our goal is to keep the utilization at a high level
(
80% in this scenario), and a low or extremely high
utilization is not
required
.
Xen
allow most instructions
to run at their native speed, but
due to I/O intensive
characteristics of data streaming applications, there is
indeed some overhead, i.e., performa
nce loss.
I
n order
to guarantee
high processing efficiency,
we set the
target utilization to be 80%

rather than 100%, which
means that
the allocated CPU quotas are more than
actually needed to
compensate the overhead
, as
inferred from
(
2) o
r

(5)
.


B
oth
inp
ut

UC

and output
PF

adopt t
riangular
membership functions
with linguistic variables of
NB
,
NM
,
NS
,
ZE
,
PS
,
PM

and
PB
,
as shown in
F
igure
s

3

and
4
,

respectively
.

I
t can be seen that the universe of
discourse of

UC

falls to the scope of
-
0.4 to 0.4,
which is
just based on our empirical observation.

I
t is
also the case for
PF

where the universe

of scope is set
to 0.6 to 1.4.

-
0
.
5
-
0
.
4
-
0
.
3
-
0
.
2
-
0
.
1
0
0
.
1
0
.
2
0
.
3
0
.
4
0
0
.
2
0
.
4
0
.
6
0
.
8
1
NB
NM
NS
ZE
PS
PM
PB
Input variable


UC
Membership Function

F
igure 3.
Triangular

membership functions of

UC

0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
1
.
1
1
.
2
1
.
3
1
.
4
0
0
.
2
0
.
4
0
.
6
0
.
8
1
PB
PM
PS
ZE
NS
NM
NB
Output variable PF
Membership Fucntion

F
igure
4
. Triangular

membersh
ip functions of
PF

Table 1. Fuzzy r
ules

PF

UC

very

low

low

medium

high

v
ery

high


UC

NB

NB

NB

N
B

N
B

PB

NM

N
M

NS

NS

ZE

ZE

PS

P
S

PM

PM

PB

PB

A
s mentioned above,
a low utilization implies that
the allocated CPU quota sh
ould be decreased to release
redundant compute resources without reducing the
ultimate
throughput;

meanwhile extremely high
utilization indicates that more CPU resources are
required to increase processing efficiency. W
hen

the
utilization is
very low
,
low

or
medium
, the generated
PF

should be
less than 1

while the very high utilization
requires a
PF

lager than 1
.
T
o avoid oscillation,

UC

should also be paid attention.
W
hen the utilization is
far away from the settled point (80% here), the
adjustment can be big.
T
hen the PF in the 1
st
, 2
nd

and
3
rd

columns in Table 1 will be NB (negatively big)
while in the last column it is PB (positive
ly big).
W
hen
the utilization falls to the high area, more careful
adjustment is required, as shown in the 4
th

column in
Table 1. For example, when

UC

is
NS
,
PF

is also
NS
;
and when

UC

is
ZE
,
PF

is also
ZE
, which means that
no adjustment is required so a
s to
keep

a stable status.
N
ote that in Figure 4,
ZE

means the value of
PF

is 1,
not 0.

F
rom Table 1, it can be seen that these f
uzzy
rules
are
simple but robust
, guarantee
ing

a rapid
convergence

to the settled point without stable state
error
s
, which is a

required

characteristic for control
system
s
, as shown in
experimental results included in
Section 5
.


3
.
4
.
Hybrid
R
esource
Allocation

The control system

of r
esource
allocation

for dat
a
streaming applications is illustrated

in

Figure 5.
CPU
and

bandwidth

r
esources

are allocated by actuator,
abbreviated as ACT, which performs the integrated
allocating scheme
. The t
ransfer

function from allocated
CPU and bandwidth resources

to utilization

UC
,
denoted as
G
, is not available, for the two inputs are
tightly coup
led and interactive to each other.
F
ortunately, this
transfer

function is not indispensable
for our fuz
zy allocation scheme.

A

FLC receives
UC

and

UC

and outputs the caps
of CPU for each application and then
procspeed

is
determined with (
8
). An

iterative bandwidth allocation

(IBA)

is implemented as
described

in Section 4 to
decide
transpeed
.
UC

at the next scheduling period
is

obtained with
(
2
) or (
5
)
.
I
n such a way, the control
system works and
hybrid
res
ource allocation is
implemented

for cloud computing
.

FLC
G
IBA
ACT
U
C

Figure 5.
T
he
control system


4
.
Iterative
B
andwidth Allocation


For d
ata streaming applica
tions, e.g. LIGO data
an
alysis,
corresponding data sources are
usually
located
on

remote sites

to the processing cluster
,
so it

is
required

to transfer data to local storage

through
Internet
.
T
he total bandwidth
from

the cluster

to
Internet
is limited, denoted as
I
,

which is shar
ed by
multiple

data streams.

T
he data streams, called sessions,
denoted
as

s
, form a
set

S
.
E
ach session will be
assigned a
bandwidth

x
s

(i.e.,
transpeed

in Section 2)
,
where

x
s


X
s
,
X
s

=[
b
s
,
B
s
]

and

b
s

>0,
B
s

<

.
b
s

stands
for the least bandwidth required
for session
s
, while
B
s

is the highest bandwidth

of the link from the
corresponding

data source
to the session
s
.
S
ession
s

will have a utility
function
U
s
(x
s
)
,
which is

assumed to
be concave, continuous, bounded and increasing in the
interval

[
b
s
,
B
s
].
W
e
try to maximize the sum of the
utilities of all the sessions, maintaining fairness among
them.
T
he problem can be described as follows.

.
P





S
s
s
s
x
U
max

.
.
t
s




S
s
s
I
x

s
s
X
x










R
epository policy is applied here
, and variables
U
s

and

L
s

are the settled upper and lower limits of data
amount in storage for session
s

to control the
resuming

and

pause

of data transfers:
wh
en

the amount of data in
storage

of
s

reaches the upper limit, the data transfer
is

halted
and

when this amount scratches the lower limit,
data transfer
is

resumed.
S
o
data transfer

may be
intermittent rather
than

continuous, so as to be
storage
aware
, to
avoid data
overflow

while g
uaranteeing
adequate data provision
.

D
ue to the
amount of data in storage
, there are two
possible
transfer
states for each

s

at any time, i.e.,
active and inactive, which indicate a data transmission
is on or off.
A
ll active sess
ions form a set,
called

S
A
,
and it is obvious that this set is varying because
transfer

states of sessions are changing.
A
t every sampling time
k
, status of transfer
s

can be determined by data amount
in storage and the
previous

status
:



















s
L
k
s
amount
k
s
status
s
L
k
s
amount
s
U
k
s
amount
k
s
status
s
U
k
s
amount
k
s
status
1
,
,
1
0
1
,
,
1
,
,
0
1
,
,
0
1
1
,
,
1
,
,
1
,

,
where
status
s
,k

and
amount
s
,k

stand for status of
transfer and data amount in storage for session
s

at the
k
th

sampling time
, respectively.

T
he initial condition is
status
s
,
0

=1
and

amount
s
,
0
=0.

Then at
the

k
th

sampling
point

1
,
,


k
s
A
status
if
S
s

and

0
,
,


k
s
A
status
if
S
s

W
e just
allocate bandwidth for active transmissions,

so the
bandwidth
constraint

may be rewritten as




A
S
s
s
I
x

A
n iterative optimization algorithm is proposed
as
foll
ows
.


W
hile
s


S
A

































A
S
s
I
k
s
x
if
s
X
k
s
x
k
A
S
s
I
k
s
x
if
s
X
k
s
x
U
k
k
s
x
k
s
x




'
1

o
therwise



A
k
s
S
s
x




,
0
1

H
ere,
x
s
(
k
)
is the
bandwidth

for
session
s

S

at the
k
th

s
ampling time
.

{
α
k
}

and
{
β
k
}

are two positive
sequences
,
β
k


(
0
, 1)
.
[
·
]

Xs

denotes a projection on
the set
X
s

as








y
b
B
y
s
s
X
s
,
max
,
min





'
U

is the
sub
-
gradient

of












A
S
s
k
s
s
x
U
U

and










k
s
k
s
x
U
x
U




'

And

ρ

is the so
-
called safety
coefficient

to avoid
bandwidth exce
ss
,
where

ρ


(0, 1
)
.

E
ssentially,
this allocation algorithm is
processing
-
aware
,
since

status
s
,k

and
amount
s
,k

are affected by
processing, i.e.,
S
A

is
associated with data

processing.
B
andwidth
allocation
is
coupled

with CPU allocation
.
O
n the
other

hand, the allocated
bandwidth

will
also
result in a utilization to
trigger

the
FLC

to a
djust
allocation of
CPU

resource.
I
n this way, allocation
s

of
bandwidth and
CPU

resource
s

are

integrated and
interactive
, and they cooperate to allocate resources for
applications on
-
demand
.

P
arameters
in this iterative bandwidth allocation
scenario, such
as
α
k
,
β
k

and

ρ

can be adjusted according
to different allocation principles, such as relative
fairness, the
-
most
-
needed
-
the
-
first, etc.

The mostly
applied utility function
is

logarithmic
:





s
s
s
s
x
w
x
U


1
ln

,
where
w
s

is the weight of session
x
s

and
a larger
w
s
implies a bigger quota in the total available bandwidth
for each iterative step.

I
n data streaming scenario, due to coupli
ng of data
processing and provision
, i.e., because of
the
fact that
the ultimate throughput or in other words the ultimate

processing efficiency is co
-
determined by allocated
CPU and bandwidth and both are interactive to each
other, relationship between processing efficiency and
allocated CPU or bandwidth is heavily non
-
linear
which is hard, if not possible, to be expressed i
n
precise mathematical form
ula
.
T
his makes it very
difficult to apply traditional feedback control for
absence of
a
plant model, and fortunately the model
-
free nature of fuzzy control
ling

provide

a feasible
alternative.


5
.
Performance Evaluation


In
this

section, the approach described in Sections 3
and 4 is implemented and experiments are carried out
using a gravitational wave
data analysis case study.


5.1 Experiment Design

VM
s are set up
on a HP DL580G5
server with
4
Intel
CPUs containing
16
Xeon E7310
core
s

and 8 GB
memories

for
LIGO
data streaming
applications
.
D
ata
items are streamed to the
VM
s from
remote data
sources
.

A

LIGO
data analysis

application
reads in two

data
streams from
remote LIGO data
archives

and

calculates
correlation coefficients tha
t can be used

to characterize
similarity of two
data
curves. If two

signals from two
observatories

occur simultaneously with similar curves,
it would be

likely that a gravitational wave
candidate

is
detected.

Note that this is only a simplified case study
since

actual LIGO
data

analysis pipeline is much more

sophisticated
since

many pre
-

and post
-

signal

processing steps are required.

LIGO data streams are
composed with
numerous

small data

files, each
containing observational data acquired in 16

seconds.
D
a
ta files used in th
is

experiment is the

LIGO level 3
data with reduced sizes that only include

data from
the
gravitational wave channel.

A
ccording to complexity of processing

(denoted in
terms of proportional coefficients obtained by system
identification
in the following subsection)
,
applications

can be divide
d into
two

categories:
light

and
heavy

one
s.
For

light
applications
, data items can be
processed with
small

amount of computation while
heavy
application
s perform more complex process
ing

on data items

and re
latively large amount of
computational

resources are
required
.
H
eavy and light

groups
of applications
are used as benchmarks
,
each
with 5

application
s respectively
.

T
otal
available
bandwidth
,
I
, is

set to
5, 10 and 15 Mbps.

T
hree values
of
CapScale
,

3, 5 and 8,

are

evaluated.

Note that here
h
eavy or light applications are
relatively defined.

O
ur allocation
algorithm

is evaluated for 100
scheduling periods
.

P
erformance
metrics
include the
output of the FLC (
PF
), the utilization of each
allocated
compu
tational

resources

(as define in (
1
)
,
UC
),
the
allocated cap for each application and
resource usage
(
US
)

of CPU and bandwidth
.
N
ote

that

the usage of CPU
means the sum of allocated caps for
each application,
which is
different from the utilization
define
d

in (
1
).


5.
2

System Identification

T
o reveal the mathematical relationship between
the processing
speed and the allocated computing
resources

(mainly the quota of CPU),
system
identification is carried out
.

H
ere

1
,
188 pairs of
LIGO
data files
from two obs
ervatories
with the total amount
of 4,354
MB are used in the
experiment
.

T
hree
VM
s are
setup

with 512

MB
, 256

MB

and
128
MB

of memory
,

respectively.
T
he allocated caps
for each
VM

rang from 5% t
o 100%
.
T
he makespans
during which all the data pairs are

proc
essed are shown
in Figure 6
, from which it seems that allocated memory
has

little to do with the makespan for this application
.
In the following experiments, memory size for each
VM is set to 128
MB
.


0
10
20
30
40
50
60
70
80
90
100
0
500
1000
1500
2000
2500
3000
Cap
(%)
Makespan
(
s
)
512
M
256
M
128
M

Figure
6
.
M
akespans with d
ifferent cap
s and
memory sizes

T
he processing speed is

shown in Figure
7

with
the
solid line.

P
olynomial curve fitting is applied to
generate a mathematical function from the cap to
processing speed
,
using

Le
ast Squares Method (LSM)
.

0
10
20
30
40
50
60
70
80
90
100
0
5
10
15
20
25
Cap
(%)
Processing Speed
(
M
/
s
)
Measured
Fitted

Figure
7
.
Processing speeds with different cap
s

F
rom
Figure
s

6

and
7
,
it can be inferred that
once
the allocated cap
exceeds 50%, the same increase will
not make so obvious
benefit

as it does in the scope of
5% to 50%.

For our application, CPU
allo
cation

for
VMs
is very essential to share CPUs among multiple
data streaming applications so that higher performance
can be achieved with less resource usage.


5.
2
. Resource
Allocation
a
nd Utilization

As shown in Figures 8 and 9,
l
ight
and heavy
applicatio
ns

get appropriate amount of
CPU

resources

using our approach,

where the total
bandwidth

is
5
Mbps
.

I
n these cases,
required CPU

resource
s

of each
application
is

far less than a whole physical CPU, for
the allocated cap
s

of them
are

under 20% or
even
10%.

T
he total CPU
utilization
is also
far less than 100%
.

I
nitial
c
aps for each application are 10%.
All
the
allocation schemes
a
re converged to a steady state
.
N
o
steady state error exists in each allocation scheme.
B
ut
in presence of
sudden change
s

of availa
ble resources
,
they can make a
rapid response
, as shown in Figure
9
.

0
10
20
30
40
50
60
70
80
90
100
0
.
6
0
.
7
0
.
8
0
.
9
1
1
.
1
1
.
2
1
.
3
Scheduling Periods
PF


pf
1
pf
2
pf
3
pf
4
pf
5
(a) Proportional factors

0
10
20
30
40
50
60
70
80
90
100
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
Scheduling Periods
UC


uc
1
uc
2
uc
3
uc
4
uc
5
(b) Utilization

0
10
20
30
40
50
60
70
80
90
100
4
6
8
10
12
14
16
Scheduling Periods
Cap


cap
1
cap
2
cap
3
cap
4
cap
5
(c) Allocated cap

0
10
20
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
Scheduling Periods
US
(%)


Bandwidth
CPU
(d) Resource

usage

Figure
8
. Performance of heavy tasks

0
10
20
30
40
50
60
70
80
90
100
0
.
7
0
.
8
0
.
9
1
1
.
1
1
.
2
1
.
3
1
.
4
Scheduling Periods
PF


pf
1
pf
2
pf
3
pf
4
pf
5
(a) Proportional factors

0
10
20
30
40
50
60
70
80
90
100
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
Scheduling Periods
UC


uc
1
uc
2
uc
3
uc
4
uc
5

(b) Utilization

0
10
20
30
40
50
60
70
80
90
100
1
2
3
4
5
6
7
8
9
10
Scheduling Periods
Cap


cap
1
cap
2
cap
3
cap
4
cap
5
(c) Allocated cap

0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Scheduling Periods
US
(%)


Bandwidth
CPU

(d) Resource usage

Figure
9
. Perfor
mance of
light

tasks

5.
3
. Robustness and
Adapt
ability

Adaptability

of the allocation algorithm i
s
also
tested
.
F
or
example
,
total available
b
andwidth jumps to
8
and 10
at the 30
th

and 60
th

scheduling period
,

respectively
, and the algorithm can react to thi
s
situation
very fast

as shown in Figure
1
0
.
From the
view point of control, this
variance in bandwidth can be
considered

as
a
disturbance
,
so the robustness or
performance o
f anti
-
disturbance is
good.

0
10
20
30
40
50
60
70
80
90
100
0
.
7
0
.
8
0
.
9
1
1
.
1
1
.
2
1
.
3
1
.
4
Scheduling Periods
PF


pf
1
pf
2
pf
3
pf
4
pf
5
(a) Proportional factors

0
10
20
30
40
50
60
70
80
90
100
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
Scheduling Periods
UC


uc
1
uc
2
uc
3
uc
4
uc
5
(b) Utilization

0
10
20
30
40
50
60
70
80
90
100
5
10
15
20
25
30
Scheduling Periods
Cap


cap
1
cap
2
cap
3
cap
4
cap
5
(c) Allocated cap

0
10
20
30
40
50
60
70
80
90
100
40
50
60
70
80
90
100
Scheduling Periods
US
(%)


Bandwidth
CPU

(d) Resource usage

F
igure
1
0
. Responsiveness to bandwidth


5.4.
Parametric
C
onvergence and
S
tability

A
s a routine in control system d
esign,
stability
analysis is indispensable because only the stable
systems can be appli
ed.

W
e find that
some

parameter
s
,
e.g.
C
apScale

in (7),
have

something to do with the system stability.

F
or
some applications, a
larger

C
apScale

results in a rapid
conve
rgence to the steady state.
B
ut a
large

C
apScale

may also lead to performance oscillation
and

instability
.
F
or
example
, when
C
apScale

is set to
8
, the outcome of
FLC, resource utilization and allocated CPU quota for
each light application will
not

converge

to a stable
value. In this case,
the control system cannot
work in a
stable state
, as shown in Figure
1
1.

0
10
20
30
40
50
60
70
80
90
100
0
.
7
0
.
8
0
.
9
1
1
.
1
1
.
2
1
.
3
1
.
4
Scheduling Periods
PF


pf
1
pf
2
pf
3
pf
4
pf
5

(a) Proportional factors

0
10
20
30
40
50
60
70
80
90
100
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
Scheduling Periods
UC


uc
1
uc
2
uc
3
uc
4
uc
5

(b) Utilization

0
10
20
30
40
50
60
70
80
90
100
2
4
6
8
10
12
14
16
Scheduling Periods
Cap


cap
1
cap
2
cap
3
cap
4
cap
5

(c) Allocated cap

Figure 11
. Performance of light tasks with

a

large

CapScale

H
eavy
tasks

are not
so

sensitive to
a large

CapScale

as light ones.
I
n Figure
12
,
CapScale

is also set to 8.
While
some oscillation occur
s, the

magnitude

is very
small
.

S
o
CapScale

is
essential fo
r
the approach
proposed in this work
.
We set
CapScale

empirically
from 3 to 5
.

0
10
20
30
40
50
60
70
80
90
100
0
.
85
0
.
9
0
.
95
1
1
.
05
1
.
1
1
.
15
1
.
2
1
.
25
1
.
3
Scheduling Periods
PF


pf
1
pf
2
pf
3
pf
4
pf
5

(a) Proportional factors

0
10
20
30
40
50
60
70
80
90
100
0
.
65
0
.
7
0
.
75
0
.
8
0
.
85
0
.
9
0
.
95
1
Scheduling Periods
UC


uc
1
uc
2
uc
3
uc
4
uc
5

(b) Utilization

10
15
20
25
30
35
40
Cap


cap
1
cap
2
cap
3
cap
4
cap
5

(c) Allocated cap

Figure
12
. Performance of he
avy tasks with
a large
CapScale


5.
5
.
Performance
C
omparison

S
everal other
resource allocation scheme
s are
developed in comparison
, as shown in

T
able 2:
iterative

and
even

stands for the way
to allocate
bandwidth

among
application
s, in an iterative way as
described

in Section
4

or just to
divide the total
bandwidth evenly;
d
ynamic

stands for
CPU

allocation
manner described in Section
3
, while
fixed

mean
s

that
allocated
CPU resources for each
application

are

constant.

O
bviously, our approach belongs to Case
1.

Table 2. Algorithm setting
s


B
andwidth

C
PU

C
ase 1

iterative

dynamic

C
ase
2

iterative

fixed

C
ase
3

even

dynamic

C
ase
4

even

fixed

S
ome results are provided in Table 3, where
performance
metric
s from top to bottom
are

final
throughput (the sum of dat
a processed
during the
evaluation
), CPU usage and bandwidth (BD in short)
usage in percentage.
C
haracters of H and L are
abbreviations of
heavy

and
light
, indicating the type

of
applications.

T
able 3. Performance
comparison

Index

Case

T
otal Bandwidth

5

10

15

Fin
al

TP

H

1

49951

97080

117140

2

46542

77107

77269

3

46745

87495

104660

4

48566

80004

82296

L

1

49999

99994

134980

2

46569

94019

1
2
3458

3

4
7
5
7
6

9
5
01
8

124990

4

4
8
9
67

9
4
94
8

1
0
4
458

CPU

Usage

(%)

H

1

4
0
.17

7
7.76

9
8.81

2

50

50

50

3

39
.16

66.09

9
1.88

4

50

50

50

L

1

9
.79

2
0
.74

31.93

2

50

50

50

3

9
.76

19
.90

28.93

4

50

50

50

BD

Usage

(%)

H

1

93.49

91
.50

81
.77

2

93.08

77.11

51.51

3

9
2
.90

8
7.08

8
0
.10

4

9
0
.
13

80.
0
0

54.86

L

1

9
2
.14

9
2
.02

89.99

2

93.
14

93.14

90

3

9
1
.41

9
0
.5
0

83.33

4

93.24

9
1
.99

8
0
.33

O
ur algorithm prevail
s in all the scenarios

included
in Table 3
.
F
or example, with
fewer resources
, our
algorithm obtains a higher throughput.
O
n the contrary,
in some case, for instance Case 4, where the
bandwidth
is evenly allocated

irrespective of the requirements of
applications and
CPU

resources are fixed constantly,
the

situation

is
deteriorated
,
though they occupy so big
a quota of resources.
C
ase
s

2 and 3 improve a little, but
their performance is n
ot
as

ideal as our approach. It
reflects that unilateral adjustment of
bandwidth

or
CPU

resources is not powerful enough to reach the goals of
high throughput and high utilization of resources
simultaneously, which from another aspect justifies our
asserti
on that
bandwidth

and
CPU

resources should be
allocated in an integrated and cooperative way.


6
.
Related Wor
k


Stream processing
[10]

has become one of major

focus
es

of database research in recent years, and some
tools and techniques have been developed to cope with
efficient handling of continuous

queries on data
streams,

e.g.
Aurora

[11]

and
Telegra
phCQ

[12]
. O
ur
work focuses on scheduling
data
streaming applications
on
virtualized cloud

resources.

Distributed computing techniques evolve from
cluster, grid to cloud computing

[13]
.
Resource
m
anagement and allocation
has been a key issue in
these areas

[14]
.
For cloud computing

with
virtualization technology as kernel
, virtual machine
s
[15]

or virtual clusters
[16]

are basic elements for
management,
scheduling and optimization

[17]
. Some
existing management tools include
Eucalyptus

[18]
,
VMPlants
[19]
, Usher
[20]
, etc
.
S
ome schedulers are
developed to support data
streaming applications, e.g.
GATES

[21]

and
S
treamline

[22]
, but they mainly
concern

on
computing

resource allocation.

S
ever
a
l
existing projects,
EnLIGHTened

[23]
,
G
-
lambda

[24]

and
PHOSPHORUS

[25]

put emphases

on network
ing

resource
s
.

T
hey hold the whole control over an optical
netwo
rk so that a deterministic lightpath can be
obtained with advance reservation or on demand, while
our network is public Internet based on
“best effort”
TCP/IP

protocols

to approach the
required

bandwidth

as much as possible.

C
ontrol theory has been success
fully applied to
control performance or quality of service (QoS) for
various computing systems.
A
n extensive summary of
related work
can be found

in the first chapter of

[7]
.
S
ome control types, such as proportional, integral, and
derivative (PID) control
[26]
[27]
, pole placement
[28]
,
linear quadratic regulator (LQR)
[28]

and adaptive
control
[29]
[30]

are proposed.
M
ost of them require
precis
e models of controlled objects.

T
he first application of fuzzy control was
introduced into industry

[31]
.
F
uzzy control

[32]
[33]

is
also a topic of research in
computing system, but it is
mainly focused on admission control to get a better
quality of service.
A
daptive f
uzzy control is applied for
utilization management of periodic tasks
[34]
, where
the utilization is defined as the ratio of the estimated
execution time to the task period.
F
uzzy inference is
carried out on fuzzy rules to deci
de the threshold, a
point over which the quality of service (QoS) of tasks
should be degraded or even an admission control must
be performed to reject more tasks.
In this work,

execution time
estimation
must be provided, which
may be not feasible for some
applications
.

T
he l
atest relevant work
[35]

is focused on
providing
predictable

execution so as to meet the
deadlines of tasks.
V
irtualization technology is applied
to implement the so
-
called performance container and
compute throttling framework, to realize the

controlled time
-
sharing


of h
igh performance compute
resources, i.e., fine
-
grained CPU allocation.
S
ystem
identification is carried out to establish the model of
controlled object and a proportional and integral (PI)
controller is applied.
T
his work has similar motivation
wit
h our wor
k
, but
for the

data streaming scenario, it is
hard to
generate

a precise model

from allocated
bandwidth and
CPU

resource
s

to
corresponding

utilization or throughput
generation
as explained
before.
So we adopt the model
-
free fuzzy control approach.


7
. Conc
lusions

and
Future Work


In this work we provide a new approach for
resource allocation
of virtualized cloud resources
for
data streaming applications.
F
rom experimental results

included in Section 5
, some
detailed conclusions can be
inferred as follows
:



S
peeds of data
streaming and

processing reach a
balance at a high level to guarantee
running

of
applications to obtain high throughput while
consuming just reasonable amount of resources

on
demand
, especially usage of
CPU resources is fairly
economic;



H
igh
CPU
utilization
can be achieved
,
e.g.
, 80%

in
Figure 9, an
d there is
no stable state error
,
so as

to
make efficient use of resou
rces while avoiding
overloading;



H
eavy applications
consume
more co
mpute
resources than light ones.

H
igher total
available

bandw
idth lead
s

to a higher
CPU

resource usage,
while lighter tasks will result in a higher bandwidth
usage, which reflects the inter
relation

between
bandwidth and
CPU

resources, as
also shown

in
Table

3;



Data
t
hroughput is co
-
determined by total available
band
width and
CPU

resources
. U
nilateral
redundancy in either resource will not necessarily
lead to a higher throughput
but

may result in

under
-
utilization, so integrated r
esource allocation is
required.

F
uzzy logic control of
CPU

resourc
es

and

iterative
bandwi
dth

allocation

are implemented for resource
scheduling for grid data streaming applications
.
Virtualization technology is applied
to enable

fine
-
grained

and on
-
demand resource allocation.
E
xperimental results show the good performance in
resource utilizati
on,
data
throughput and robustness of
the approach
, in comparison with several other
methods with less adaptability

to dynamic
environments
.

I
n further work
, v
irtualization of network and
storage will be
implemented
.
M
ore complex
scenario
s

will be consider
ed
,
e.g. for

workflows or pipelines,
where VMs can be established for each stage and
controlled to achieve balance
among different

stages
in
one pipeline
so as to avoid performance bottleneck and
achieve overall high performance
.


Acknowledgement


This wor
k is supported by National Science
Foundation

of China (grant No. 60803017)

and
Ministry of Science and Technology of China under
National 973 Basic Research Program (grants No.
2011CB302505 and No. 2011CB302805)
.


References


[1].

M. Armbrust, A. Fox, R. Griff
ith, A. D. Joseph, R.
Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I.
Stoica, and M. Zahari
,

A View of Cloud Computing

,
Communications of the ACM
, Vol. 53, No. 4, pp. 50
-
58, 2010.

[2].

A. Abramovici, W. E. Althouse, et. al., “LIGO: The
Laser Interfero
meter Gravitational
-
Wave Observatory”,
Science
, Vol. 256, No. 5055, pp. 325


333, 1992.

[3].

D. A. Brown, P. R. Brady, A. Dietz, J. Cao, B.
Johnson, and J. McNabb, “A Case Study on the Use of
Workflow Technologies for Scientific Analysis:
Gravitational Wave Da
ta Analysis”, in I. J. Taylor, D.
Gannon, E. Deelman, and M. S. Shields (Eds.),
Workflows for eScience: Scientific Workflows for
Grids
, Springer Verlag, pp. 39
-
59, 2007.

[4].

M. Migliardi, J. Dongarra, A. Geist and V. Sunderam
,
“Dynamic Reconfiguration and

Virt
ual Machine
Management in the Harness Metacomputing System

,
Computing in Object
-
Oriented Parallel Environments
,
Lecture Notes in Computer Science
, Vol.
1505
, pp.
127
-
134,
199
8
.

[5].

P. Barham, B. Dragovic, K. Fraser, S. Hand, T.L.
Harris, A. Ho,

R. Neugebauer,

I. Pratt

and
A. Warfield
,

“Xen and the Art of Virtualization”
,

Proc.
ACM Sym
p.
on Operating Systems Principles
, 2003
.

[6].

W
.

Zhang, J
.

Cao, Y
.

Zhong, L
.

Liu and C
.

Wu,
“An
Integrated Resource Management and Scheduling
System for Grid Data

Streaming Applicati
ons

,

Proc
.
9
th

IEEE/ACM Int
.

Conf
.
on Grid

Computing
,

pp.

258
-
265,
Tsukuba, Japan
.

[7].

J.

L. Hellerstein, Y. Diao, S. Parekh,
and
D.

M. Tilbury
,

Feedback

Control of Computing Systems
,

Wiley
-
IEEE
Press, August 2004
.

[8].

W. Zhang, J. Cao, Y. Zhong, L. Liu and C. Wu
,
“Block
-
Based Concurrent
a
nd Storage
-
Aware Data
Streaming
f
or Grid Applications
w
ith Lots
o
f Small
Files”
,
Proc
. 1st Int.

Workshop on Service
-
Oriented
P2P Networks and Grid Systems, conj. 9th IEEE Int.
Symp.
o
n Cluster Computing and the Grid
, pp.

53
8
-
5
43
,

Shanghai, China, 2009
.

[9].

X. Liu, X. Zhu, S. Singha
l, and M. Arlitt, “Adaptive
E
ntitlement
C
ontrol

to
R
esource
C
ontainers on
S
hared
S
ervers,”
Proc.

9
th

IFIP/IEEE International

Symposium
on Integrated Network Management (IM 2005)
, Nice,

France, 2005.

[10].

L. Golab
, and M. T. Ozsu, “Issues in Data Stream
Management”,
SIGMOD Record
,
Vol.
32
, No.
2
, pp.
5
-
14, 2003.

[11].

D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C.
Convey,

S. Lee, M. Stonebraker, N. Tatbul, and S.
Zdonik
,
“Aurora:

A
N
ew
M
odel and
A
rchitecture for
D
a
ta
S
tream
M
anagement”
,

VLDB Journal
,

V
ol. 12,
No. 2, pp. 120
-
139, 2003.

[12].

S. Chandrasekaran, O. Cooper, A. Deshpande, M. J
.
Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy
,
S. R. Madden, F. Reiss, and M. A. Shah
,
“Telegraph
CQ
:
C
ontinuous

D
ata
fl
ow
P
roc
essing”
,
Proc. ACM SIGMOD
Int

l.

Conf. on Management of
Data (SIGMOD

03)
, 2003.

[13].

I. Foster, Y. Zhao, I. Raicu, S. Lu, “Cloud Computing
and Grid Computing 360
-
Degree Compared”,
Proc.
IEEE Grid Computing Environments, co
nj.

IEEE/ACM
Supercomputing Conf.
, Aust
in, 2008.

[14].

R. Buyya, C. S. Yeo, S. Venugopala, J. Broberga, and
I. Brandicc
,

Cloud Computing and Emerging IT
Platforms: Vision, Hype, and Reality for Delivering
Computing as the 5th Utility

,

Future Generation
Computer Systems
, Vol. 25, No. 6, pp. 599
-
616,

2009.

[15].

M. Rosenblum

and
T. Garfinkel,

Virtual Machine
Monitors: Curren
t Technology and Future Trends

,
IEEE Computer
, Vol. 38, No. 5, pp. 39
-
47, 2005.

[16].

I. Foster, T. Freeman, K. Keahey, D. Scheftner, B.
Sotomayer and X. Zhang, “Virtual Clusters for Grid
Co
mmunities”,
Proc. IEEE Int. Symp. on Cluster
Computing and the Grid
, May 2006.

[17].

F. Zhang, J. Cao, L. Liu, and C. Wu,

Redundant
Virtual Machine Management in Virtualized Cloud
Platform

, Int. J. Modeling, Simulation, and Scientific
Computing, 2011. (to appe
ar)

[18].

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S.
Soman, L. Youseff, and D. Zagorodnov
,

The
Eucalyptus Open
-
Source Cloud
-
Computing System

,
Proc
.
9
th

IEEE/ACM Int
.

Symp
.

on Cluster Computing
and the Grid
, 2008.

[19].

I. Krsul, A. Ganguly, J. Zhang, J. A
. B. Fortes, and R.
J. Figueiredo
,

VMPlants: Providing and Managing
Virtual Machine Execution E
nvironments for Grid
Computing

,
Proc
.

ACM/IEEE SC2004 Conference
,
2004.

[20].

M. McNett, D. Gupta, A. Vahdat, and G. M. Voelker
,

Usher: an Extensible Framework for
Managing
Clusters of Virtual Machines

,
Proc
.

21
st

C
onf
.

on
Large Installation System Administration
, 2007.

[21].

L
.

Chen

and
G
.

Agrawal
,

A
Static Resource
Allocation Framework
f
or Grid
-
b
ased Streaming
Applications”
,
Concurrency and Computation:
Practice
and
Ex
perience
,
18:653

666
,
2006
.

[22].

B
.

Agarwalla, N
.

Ahmed, D
.

Hilley, and U
.
Ramachandran
,
“Streamline:
S
cheduling
S
treaming
A
pplications in a
W
ide
A
rea
E
nvironment”
,
Multimedia Systems
, Vol. 13, No. 1, pp. 69
-
85, 2007
.

[23].

L. Battestilli, et al.,

EnLIGHTened Comput
ing: An
Architecture

for Co
-
allocating Network, Compute, and
O
ther Grid Resources for

High
-
End Applications

,

Proc.
Int

l.

Symp. on
High Capacity Optical Networks
and Enabling Technologies
,
pp. 1
-
8,
2007.

[24].

A. Takefusa, et al.,

G
-
lambda:
C
oordination of a
G
rid
S
cheduler

and
L
ambda
P
ath
S
ervice over GMPLS

,

Future Generation Computer Systems
,
Vol
.

22,
No.

8,
pp.
868
-
875
,
October 2006
.

[25].

S. Figuerola, et al
.,


PHOSPHORUS:
S
ingle
-
step
O
n
-
demand
S
ervices across
M
ulti
-
domain
N
etworks for e
-
S
cience

,

Proc. SPIE
, Vol
.

6784
, 67842X
,
2007
.

[26].

T.

F. Abdelzaher, K. G. Shin, and N. Bhatti,
“Performance

G
uarantees for Web

S
erver
E
nd
-
systems:
A
C
ontrol
-
theoretical
A
pproach”
,
IEEE Trans. on
Parallel and

Distributed Systems
,
V
ol. 13, 2002.

[27].

S. Parekh, N
.

Gandhi, J
.
L
.

Hellerstein
, D
.

Tilbury, T
.
S
.

Jayram,
and
J
.

Bigus, “Using
C
ontrol
T
heory to
A
chieve
S
ervice
L
evel
O
bjectives in

P
erformance
M
anagement”
,
Real Time Systems Journal,
V
ol. 23,
N
o.
1
-
2, 2002.

[28].

Y. Diao, N. Gandhi, J.

L. Hellerstein, S. Parekh, and
D.M. Tilbury, “MIMO
C
on
trol

of an Apache Web
S
erver: Modeling and
C
ontroller
D
esign

,

American
Control

Conf
.
, 2002.

[29].

A. Kamra, V. Misra, and E.

M. Nahum, “Yaksha: A
S
elf
-
tuning
C
ontroller for

M
anaging the
P
erformance
of 3
-
tiered

W
eb
S
ites

,

Proc.
12
t
h

IEEE In
t.
Workshop
on Qualit
y of Service
, June, 2004.

[30].

Y. Lu, C. Lu, T. Abdelzaher, and G. Tao, “An
A
daptive
C
ontrol
F
ramework for

QoS
G
uarantees and its
A
pplication to
D
ifferentiated

C
aching
S
ervices

,

Proc.
10
th


IEEE

Int
.

Workshop on Quality of Service
, May
2002.

[31].

P. J. King and
E.
H. Mamdani, “Application of
F
uzzy
A
lgorithms for
C
ontrol
S
imple
D
ynamic

P
lant

,

IEE
Proc., Control Theory App.
,
V
ol. 121, pp. 1585
-
1588,
1974.

[32].

Y. Diao, J.

L. Hellerstein, S. Parekh, “Using
F
uzzy
C
ontrol to
M
aximize
P
rofits in

S
ervice
L
evel
M
anagement”
,

IBM

Systems Journal
, Vol. 41, No. 3,
2002.


[33].

B. Li and K. Nahrstedt, “A
C
ontrol
-
based
M
iddleware

F
ramework for
Q
uality

of

Service A
daptations

,
Communications
,
V
ol.

17, pp. 1632
-
1650, 1999.

[34].

M. H. Suzer and K. D. Kang,

Adaptive F
uzzy Control
f
or Utilization Ma
nagement”
,
IEEE
I
nt. Sym
p
. on
Object/Component/S
ervice
-
oriented
Real
-
time
Distributed Computing
, 2008.

[35].

S
.

Park and M
.

Humphrey,

Feedback
-
c
ontrolled
Resource Sharing for Predictable eScience

,
Proc
.

ACM/IEEE
C
onf
.

on Supercomputin
g

2008
.