An economic approach for scalable and highly-available distributed applications

grrrgrapeInternet and Web Development

Oct 31, 2013 (4 years and 13 days ago)

72 views

An economic approach for scalable and
highly
-
available distributed applications

Nicolas
Bonvin
,
Thanasis

G.
Papaioannou

and Karl
Aberer

School of Computer and Communication Sciences

Ecole

Polytechnique

F
´
ed
´
erale

de Lausanne (EPFL)

1015 Lausanne, Switzerland

firstname.lastname@epfl.ch


2010 IEEE 3rd International Conference on Cloud Computing

1

Outline


INTRODUCTION


MOTIVATION
-

RUNNING EXAMPLE


SCARCE:
THE QUEST OF AUTONOMIC
APPLICATIONS


EVALUATION


CONCLUSIONS


COMMENT

2

INTRODUCTION


A
successful online application should


be
able to
handle
traffic
spikes
and
flash crowds
efficiently


be
resilient
to all
kinds of failures

(software,
hardware, rack
or even
datacenter
failures)



A naive solution against load variations would be
static over
-
provisioning of resources


result into resource underutilization for most of the time


A
s
the size of the
cloud increases
its
administrative overhead
becomes
unmanageable


The
cloud resources for an application should be
self
-
managed and
adaptive to load variations or
failures

3

INTRODUCTION


We propose a
middleware


(“Scattered Autonomic Resources”, referred to as
Scarce
)


to
avoid
underutilized
computational resources


dynamically adapts to changing
conditions
(failures
or load
variations)



Simplifies the development of online applications composed
by
multiple independent components
(e.g. web services)


following the
Service Oriented Architecture (SOA)
principles


4

INTRODUCTION


Components are treated as individually rational entities that
rent computational resources
from
servers



Components migrate, replicate or
stop
according to their
economic
fitness


fitness expresses
the difference
between
the

utility offered by a
specific application component

and
the cost for retaining it in the
cloud



Components
of a certain
application are
dynamically
replicated to
geographically
-
diverse
servers
according
to the
availability requirements of the application


5

INTRODUCTION



Our
approach combines the following unique characteristics
:


Adaptive
component replication
for accommodating load variations


Geographically
-
diverse placement of clone component instances


Cost
-
effective placement of service
components


Decentralized self
-
management of the cloud resources for the
application


6

RUNNING EXAMPLE


Building an application that both
provides

robust

guarantees against
failures

(hardware, network, etc.) and
handles dynamically load spikes

is a
non
-
trivial
task


a simple web application for selling e
-
tickets composed by 4 independent
components



7

l

the entry point of the

application and serves the
HTML pages
to end users

A user manager for managing
the profiles of the customers

A ticket manager for managing
the amount of available

tickets of an event

An e
-
ticket generator that
produces e
-
tickets in PDF

format

RUNNING EXAMPLE


Each component can be regarded as a
standalone

and
self
-
contained

web
service








This
application


is
highly sensitive to traffic
spikes


is
business
-
critical
(deployed
on
different geographical regions)

8

A
token (or a session ID) is assigned
to each
customer’s browser
by the
web front
-
end
and is
passed to
each
component along with the requests

This token
is used
as a key in the key
-
value database to store the details

of the client’s shopping
cart

(number
of
tickets ordered)

SCARCE


THE QUEST OF AUTONOMIC
APPLICATIONS:


A. The
approach


B. Server
agent


C. Routing
table


D.
Economic model


E. Maintaining high
-
availability

9

The
approach


We consider applications
formed
by many
independent
components


interact together to provide a service to the end user



A component


is
self
-
managing, self
-
healing and
is hosted
by a
server
(allowed
to host
many different components)


stop
, migrate
or replicate
to a new server according to its
load or availability

10

Server
agent


The server agent is a special component that resides at each
server


responsible for
starting and stopping
the components of
applications


checking the
health

of
the
services


1. verifying
if the
service process
is still
running


2. firing
a test request
,checking
that the corresponding reply is
correct



The
agent knows the properties of every service that
composes the application


the path of the service executable


This knowledge is
acquired
when the agent starts, by
contacting
another agent
(referred
to as
bootstrap agent
)

11

Server
agent


During the startup phase, the agent also retrieve
the current

routing
table
from the bootstrap
agent







Assume
that a server belongs to a rack, a
room,

a
datacenter,
a city, a country and a continent.


A

label
of the form

continent
-
country
-
city
-
datacenter
-
room
-
rack
-
server”


For example, a
possible label
for a server located in a data center in London
could be

EU
-
UK
-
LON
-
D1
-
C03
-
R11
-
S07


12

Routing table


Keeps
locally a mapping between components and
servers








It
is maintained by a gossiping
algorithm


where
each agent contacts a random subset

(
log
(
𝑁
)

where
𝑁

is the
total number of servers)

of
remote agents


exchanges
information about the services
running on server

13

Choosing
the replica


we
consider
4 different policies
that a server


may
use
for
choosing
the replica of a
component


proximity
-
based
policy:



the
geographically nearest replica
is chosen


rent
-
based
policy
:



the least loaded server is
chosen based
on the rent price of the
servers


random
-
based policy
:



a
random replica is chosen


a net benefit
-
based policy
:



the
geographically
closest and
least loaded
replica

14

Economic model


Service replication should be highly adaptive to
the processing
load
and to
failures


the server agent as an individual
optimizer to each component


1. to
ascertain the pre
-
specified availability
guarantees


2. to
balance
its economic
fitness


At each
epoch, a
service
:



pays
a
virtual rent
to the
servers where
it is
running


virtual rent corresponds to the
usage of
the server
resources

(
CPU
, memory, network,
disk)


may
be
replicated

,

migrated

or
stopped

by the server
agent


based on the service demand,
the renting cost
and the maintenance
of high availability

15

Economic model


The actions performed by
the server agent
are
directly related
to the economic
viability, which
is given by
:



The
utility

of a component corresponds to the value




 


is
a factor computed using the utilization
of the
server resources
by the
component



 
 𝒍

is
a certain threshold that determines when a
component
should
be considered
fit enough

in order to
replicate(currently
, this is set
to
25% of
server usage
)


Some components
may be more business critical than others

16

Economic model


The
virtual rent
paid by the component

𝑐

to the server



is
given
by:



𝑐 
𝑠

is a
subjective estimation
of the server
quality and reliability


 𝑎
𝑠

is
a
factor that
expresses the resource utilization of the
server



Other
utility

and
rent

function could be used as long as they
were
both increasing to the resource usage
and

result

in
comparable values

17

Migrate
or S
top


At
the
beginning
of a new
epoch, a
component
may:


migrate or stop:
if it has
negative balance
for the
last


epochs


If the availability is satisfactory, the component
stops


Otherwise, it tries to find a less expensive server
(
migration
)



To avoid
oscillations

of a replica among servers, the
migration
is
only allowed if the following
migration
conditions:


The
minimum availability is still satisfied
using
the new server


the
absolute price difference
between the
current and
the new
server


the
 𝑎
𝑠

of the current server



is above a
soft limit

18

Replicate


replicate
: if it has
positive balance
for the
last


epochs


For
replication, a component has
also to
verify that it can
afford the replication by
having
a positive
balance
′

for
consecutive


epochs
:



Where
 
′

is
the current virtual rent of the
candidate server
𝑆

for
replication


the factor
1
+


accounts for
a


100%

increase
at this
rent price in
the next epoch

of
the
candidate server


(
an upper bound of

=
0
.
2

can
typically be
assumed)

19

A
vailability


The
availability of a component should be always kept above a
required
minimum
level



estimating the probability of each server to fail necessitates access to
an large set of historical data
and
private information of the server



Express the
availability of a service

𝑖

by
the
geographical
diversity of the servers
that host its replicas




𝑆

(

1
,

2
,

,

𝑛
)
is
the set of servers hosting replicas of the
service


𝑐 

,

𝑐 



0
,
1
are the
confidence levels of servers

20

Availability



The
diversity
function
returns
a number calculated based on
the
geographical distance
among each server pairs


This distance can be
represented as
a


-

bit
number


(continent
, country,
city, data
center,
room,
rack,
server)


When two servers
have
the
same location
, their corresponding
proximity
bit is
set to
1, otherwise to 0



A
binary
NOT

operation is then
applied to the
proximity to
get the
diversity
value



having more replicas in distinct servers located even
in the same location
always results in increased
availability

21

Candidate
server


When
the availability of a component
falls below
ℎ

a new
service instance should be started (i.e. replicated)


maximize the
net benefit
between
the diversity of the resulting
set of
replica locations for the service and
the virtual rent
of the new server




where
rent



is the virtual
rent of
candidate
server





is a weight related to the proximity of the server location to the
geographical distribution of
the client requests for the service
(cf.[3])

(

tend to
replicate
closer to
the components

that heavily rely on the
services of
the former)


22

[3]N
.
Bonvin
, T. G.
Papaioannou
, and K.
Aberer
, “Cost
-
efficient and
differentiated
data
availability
guarantees in data clouds
,” in
Proc. of the ICDE, Long Beach, CA, USA, 2010.

Candidate server


The components rank servers according


net benefit
(6)


randomly choose the target for replication
among the top
-
k ones

( avoid overloading the same destination server at an epoch )



the
availability tends to
be increased
as much as possible at
the
minimum cost
,
while
the
network latency for the query
reply also
decreases



that
the same approach according to (6) is used
for choosing
the candidate server
for component migration

23

Experimental Setup


We employ two different
testbed

settings:



a single application setup consisting of 7 servers


a multi
-
application setup consisting of 15 servers


The hardware specification
of each
server is Intel Core i7 920
@ 2.67 GHz,
8GB Ram
, Linux
2.6.32
-
trunk
-
amd64


We run two
databases (MySQL
5.1 and Cassandra 0.5.0
)



One generator of
client requests for each application
(
FunkLoad

1.10
, http
://funkload.nuxeo.org/) on their own
dedicated servers


24

Experimental Setup


P
erforming
the
following actions
:


1
) request the main page that contains the list
of entertainment events


2
) request the details of an event
A


3) request the details of an event
B


4
) request again
the details
of the event
A


5
) login into the application and
view user account


6)
update some personal
information


7
) buy
a ticket
for the event
A


8
) download the corresponding
ticket in PDF



A
client continuously performs this list of
actions over a period
of 1
minute

25

Experimental Setup


An epoch is set to
15

seconds


An
agent sends gossip messages every
5

seconds



We consider two different placements of the components:


A
static approach



each
component is
assigned to
a server by the system
administrator


A
dynamic approach


all
components are
started on
a single server and dynamically
migrate
/
replicate /
stop

according to the load or the hardware failures

26

Dynamic
vs

Static Replica Placement

27

the
response time is lower bounded by that
of the slowest component

(
in our case,
for generating
PDF tickets)

Scalability

28

the multi
-
application experimental setup

Assume that all
10 servers reside at 1
datacenter:

Increase the
number of concurrent users
from 150 to
1500

randomly routed
among the replicas of a component

High
-
availability

29

A
single application
setup
consisting of 7
servers


Assume that
each component has 2
replicas that reside at
separate
servers


10
concurrent clients continuously
send requests
for 1 minute


After
30 seconds, one random
server
between those
hosting the
replicas of a component fails

Adaptation to
New Cloud Resources

30

We employ
the single
-
application experimental setup, but
the number
of
available servers
in the
cloud ranges from
1 to
10

Evaluation of
Routing Policies

31

In this case, we employ the single
-
application
setup

The
4 servers of the cloud are located in 2 datacenters (
2 servers
per datacenter).

The
round
-
trip time between
the datacenters
is 50
ms


The
minimum availability (
i.e. number
of replica per component) is set to 2

CONCLUSIONS


Proposed
an
economic

approach
for
dynamic

accommodation
of load spikes for composite
web

services
deployed in
clouds



Server agents act as

individual
optimizers and autonomously
replicate,
migrate

or
stop based
on their economic fitness



This
approach
also offers
high availability guarantees
by
maintaining a
certain number
of the various components in
geographically diverse

32

COMMENT


Rents
v.s
. Resources


Using rents is a more flexible approach


Distributed
v.s
.
Centralized

33