Qualitative Evaluation - wsclients

obtainablerabbiData Management

Jan 31, 2013 (4 years and 4 months ago)

242 views






UNSW

S
1

2009

Research Project

V
ersion: draft

Evaluating Cloud
Application Development
and Runtime Platforms


Project Team:

Supervisors
-

A/Prof. Anna Liu
(annaliu@cse)
and Dr. Helen Paik (hpaik@cse)

Students
:
Fei Teng (ften303@cse), Liang Zhao (lzha077@cse)
and


Xiaomin Wu (xmwu432@cse)

1


TABLE OF CONTENTS







2


1

INTRODUCTION




3


2

EVALUATION METHODOLO
GY

2.1

OVERVIEW

Cloud platform
s are still
in the stage of
evolving
, without any

wide
-
accepted public standard at the
moment yet. But it cannot stop
companies

keen
ing
to promote
their
products to
be the first person
to try the cloud tomato
.

In this evaluation,

three big companies
’ cloud platforms

are
involved
. The
y are Microsoft Windows
Azure, Amazon Elastic Compute Cloud and Google App Engine.
Since there is no wide
-
accepted
public standard at the moment, each platform comes with its own featured technologies and
models
.
It
lead
s

to severe difficulty of
cross
-
platform evaluations
.
For the purpose of keeping measurements
commonality,

timing,
throughput

and error rates

are the main focus in
qualitative

evaluation
, while
user experience
s

are the key consideration

in
quantitative

evaluation
.


2.2

QUALITATIVE EVAL
UATION METHODOLOGY

2.2.1

EVALUATION TERMINOLO
GY

The entire qualitative evaluation is mainly based on kinds of timing and request that measured and
observed from client and cloud hosting server side.

Before taking further steps to introduce
evaluation methodologies, it is necessary to make a clarification on these timing
-
relevant and
request
-
relevant terminologies.

2.2.1.1

TIMING

T
he figure
above

shows a full round
-
trip
among

a client, a cloud hosting server
and a cloud database.


Figure

1
:
Time
-
relevant Terminologies

4




Client Response Time: it is the amount of network round
-
trip time between the client and the
cloud host/database,
plus
the amount of time that required processing a client’s request on the
cloud host/database.

This client response time is
observed

directly in the client.



Processing Time: it is amount of time the server
needs

to process a block of logic codes. In order
to get an accurate processing time, a timer is set to start and end right before and after the

code block. If a database transaction is involved in the code block
,

the processing time will also
include database operating time and network round
-
trip time between the cloud hosting server
and the cloud database.

The processing time is attached in the
response, and returned to the
client.



Database Processing Time: It is practically impossible to measure accurate time that a cloud
database takes to finish a database transaction. Alternatively,
for measuring database
processing time between the cloud host
ing server and the cloud database,
the processing time
could be used

as setting a
timer before and after database API methods
.

This database
processing time is returned in the response of the web application.
W
hile for measuring
database processing time di
rectly between the client and the cloud database (only applying to
Azure Storage

and Amazon S3), the client response time between the client and the database
could be applied.

This database processing time is monitored directly in the client.


2.2.1.2

REQUEST

To
e
asier
i
ndentify all requests sent, they are categorized into four types according to their response
results.



Incomplete Request: It is a type of requests that a client fails to send or receive.



Completed Request: It is a request that a client sends
successfully and receives a response from
the cloud host/database at last.



Failed Request: It is a completed request
, but its response

contains
an error message.



Successful Request: It is a
fully completed request, without any errors.


2.2.2

TEST CASES

To maximi
ze the coverage of the evaluation,
some

scenarios are illustrated to help build test cases.



Client


Cloud Host Evaluations: A user visits a web application on the cloud from an end user
application. The client response time would be the user’s first conce
rn to the cloud.



Cloud Host


Cloud Database Evaluations: A user may send/receive an article or a form to/from
the cloud database through the web application, the database processing time is a main factor
taking into consideration. Meanwhile, if thousands
of users take the same action concurrently,
the performance of the cloud database would also be interesting.



Client


Cloud Database Evaluations: For a large file transferring, whether or not a user can
make a peer
-
to
-
peer connection between client and the

cloud database without going via the
cloud host. And also, the performance of this connection.

5


More details will be
discussed

in the section of each test later.


2.2.3

TEST STRATEGY

In order to perform all test cases above, two types of test strategies are
adopted in the qualitative
evaluation.



Stress Test Strategy:
T
o
obtain architecture insights
, for instance, performance

and potential
errors
, concurrent requests
are

sent to cloud platforms in specific configurat
ion for differ
ent
test

cases
.



Singleton Requ
est Test Strategy: To mak
e numerical measurements on tim
ing

and throughput

for cloud databases
, requests
are

sent
continuously

one after another,
avoid
ing

stress affection
and network traffic.



Singleton Transferring Test Strategy: It is a
revised

version of singleton request test strategy to
fit for large file transferring. T
im
ing

and throughput

of

RESTful
cloud databases

are
measured
.


2.2.3.1

STRESS TEST STRATEGY

In the stress test strategy, some experiences of implementation are worth discussing, for instance,
configuring three requests within every thread,
varying

number of concurrent threads, and repeating
tests.


Figure

2
:
The Flow Chart of the Stress Test

Strategy

6


S
tress test strategy is implemented in
a manner o
f
multi
-
threads programming.
Within every thread,
three requests are
set to
sen
d

continuously
to ensure that there is a period
,

which is
mostly in the
middle of the test, to maintain requests in a concurrent way.

The

number of concurrent
threads

is
variabl
e after every round to suit to tests. It could either be
i
ncreased to put

more stress to cloud platforms, or be fixed to
repeatedly

v
e
rify results.

Furthermore,
due to high network fluctuations
overtime, outlier results
may
likely be
encounter
ed
during the

evaluation
. Th
ese issues can be addressed by
running tests multiple rounds, and
scheduling tests repeatedly in different time slots. A

corn job is

invoked

to arrange
tests
over
24

hour

period.

This test strategy is used
in

Client


Cloud Host Evaluations
and Cloud Host


Cloud Database
E
valuations.


2.2.3.2

SINGLETON REQUEST TE
ST STRAGTEGY


The flow chart of singleton request test strategy is a modified version of stress
test, which disables
the multi
-
threads manner, cancelling increasable number of concurrent threads. In the chart, the
number of concurrent threads has been set to be one constantly. And within every thread, only one
request is sent, without any continuousl
y requests.

In the singleton request test strategy, because there is only one request in every thread, and only
one concurrent thread at any time, the number of rounds here also indicates the total amount of
requests which are sent to a cloud platform. Sin
ce only one thread at any time, the behaviour of
sending requests would be one request after another to avoid stress affection.


Figure

3
:
The Flow Chart of the Singleton Request Test Strategy

7


This test strategy is adopted in Cloud Host


Cloud Database Evaluations.


2.2.3.3

SINGLETON TRANSFERRI
NG TEST STRATEGY

The f
low chart above shows another
revised
singleton request strategy
. It is
specifically design
ed

for
large data
supported
storage
s
,
testing

throughput directly from
client

application to cloud database
s
.

Data in three
size
s are sent via RESTful protocol
,

ranging
among

1 megabyte
,

10

megabytes

and 15

megabytes
.

Furthermore, i
n order to address outlier issue
s which may
occur

in tests
, multiple
rounds
are

enable
d

in this

test

as well.

This test strategy is adopted in
Client


Cloud Database Evaluations
.





Figure

4
:
The Flow Chart of the
Singleton Transferring Test Strategy

8


2.3

EVALUATION APPLICATI
ON ARCHITECTURE
S

2.3.1

APPLICATION ARCHITEC
TURE
S

ON CLIENT

This section lists architectures of two client applications used in the evaluation. The former one,
Contract
-
First Web Service based client, is designed for stress test
strategy

and singleton request test
strategy; and the
latter

one, RESTful based client,

is implemented for singleton transferring test

strategy
.


2.3.1.1

CONTRACT
-
FIRST WEB SERVICE BA
SED CLIENT

The diagram above illustrates the testing model used for cross
-
platform
evaluations based on
Contract
-
First Web Service.

As mentioned above, three platforms offer diverse programming languages for web
application

development.
In this evaluation,
Mic
rosoft Windows Azure
implements

C#
;

Google App Engine

uses

Python; and Amazon

r
uns Java on
an

Ubuntu
-
based instance

machine.

For the purpose of keeping as much commonality as possible among three cloud platforms, the
Contract
-
First Web Service is dedicated to the evaluation. As following guidelines:



A WSDL file
is

built first.



Three

hosting servers implement all functions defined in the WSDL file.



A unified client application is created from the WSDL file. So it can communicate with different
platforms via the same protocol.



Figure

5
:
Contract
-
First Web Service

Based Client

9


2.3.1.2

R
ESTFUL BASED CLIENT

This application architecture is
implemented

for
singleton transferring test strategy
, especially for
Azure Blog Storage and Amazon S3
.
App Engine Datastore is not in the test list, since it does not have
any protocol to do client to cloud database communication.
When doing

PUT/GET/DELETE RESTful
action, the client reads binary
data

from the local machine, and sends them directly to Azure
Blob
Storage or Amazon S3


2.3.2

APPLICATION ARCHITEC
TURE
S

ON CLOUD PLATFORMS

This section illustrates application architectures
implemented on each cloud platform.
Since
so far a
public standard
of cloud platforms has not been established yet
, each platform applies its own
featured technologies and
models

to suit the WSD
L
/REST
-
based evaluation.


2.3.2.1

MICROSOFT WINDOWS AZ
URE

This diagram
below
illustrates the web application
architecture

used
on

Microsoft Windows Azure
I
nstance
.
Microsoft
Windows Azure provides a Windows
-
based environment for applications
running
and data
storing
on servers in Microsoft data centres in
a distributed manner.

As can be seen
in

the figure, runtime environment
s, framework libraries
, software development

kits
an
d
other Microsoft libraries have already been
encapsulated i
nto Windows Azure. Developer can
simply put their main
focus on
design of
business logic.

In the evaluation, Windows Communication
Foundation was selected to work as web role
among

various models in the Framework, adopting
service codes in C# and implementing as a web application. By invoking Azure SDK, Windows

Figure

6
:
RESTful Based Client

10


Communication Foundation communicates with multiple Azure Data Storages, which are sitting on
the cloud via RESTful protoco
l.


2.3.2.2

AMAZON ELASTIC COMPU
TE CLOUD

Amazon E
lastic Compute Cloud, known as Amazon E
C2
,

is a highly configurable cloud.
Developers are
flexible to setup

their

favourite software and applications
on

any
operating systems that
Amazon
Instances

supports
.

For the instance used in the evaluation,
an

Ubuntu
-
based instance is used to
hold

Tomcat
a
s Servlet

on top of Java 1.6 SDK. Third party framework Apache CXF is used to provide SOAP protocol. By using
JPA

1.0, Apache CXF is enabled to communicate with PostgreSQL, which is installed on the same
instance machine as where web application is hosted.

With Amazon API, Apache CXF is allowed to
invoke Amazon’s cloud databases, Amazon SimpleDB and Amazon S3
,

via SOAP or RESTful protocol.


Figure

8
: Web Application on Amazon Instance


Figure

7
:
Web Application on Microsoft Windows Azure Instance

11



2.3.2.3

GOOGLE APP ENGINE CL
OUD

Google has made
two kinds of programming languages
available on Google App Engine,

c
omparatively, Python support is originally delivered since the first release of Google App Engine. It is
much more stable in practise at the moment. Therefore, in the evaluation, Python an
d its third party
frameworks, ZSI and Zope Interface, offering SOAP protocol supports in Python, are selected to
develop and deploy the
web application
codes.

In addition, Google App Engine SDK is used to make a connection via inner protocol between server

and Google stateful

services, App Engine Datastore

and App Engine
M
emcache. T
he former one

offer
s
cloud da
tabase behind web applications
,
while

the lat
t
er one
provid
es

storage for

data
caching.





Figure

9
: Web Application on Google App Engine

12


2.4

QUANTITATIVE EVALUAT
ION METHODOLOGY





13


3

QUALITATIVE
EVALUATION

3.1

SUMMARY OF EVALUATIO
N



3.2

STRESS ROUND
-
TRIP TEST


The three figures above indicate
change
s

of
average client response time when cloud hosting servers
are
under
variant

amount of stress, at
various

time and date

From figures, latencies are dramatic
ally increased after 2100 concurrent request
s
.

T
wo statement
s

could be raised.

Firstly, i
t
could be
difficult for a limited
number of test machines

to challenge the
entire

cloud hosting server
s
.

Secondly, e
ven
the latencies were

due to the burden
raised
on

cloud
hosting server
s
,
a
quota of 2100 concurrent requests
is

efficient enough for nowadays enterprise
s
.

Taking

the
ticket booking system of
2008 Beijing Olympic
G
ame

as an example
,
it crashed when its
burden reached at

2200

r
equests per
s
econds.




14


3.3

SINGLETON DATABASE W
RITE
/READ

TEST

3.3.1

TEST CONFIGURATION

This is a
test case of
Cloud Host


Cloud Database
Evaluations,
based on
the singleton request test
strategy, following
a scenario that a user sends/receives an article or a form to/from the cloud
databa
se through the web application.

Base on the scenario, requests
of

various sizes
, simulating a character (1
byte
), a message (100
bytes
),
an ar
ticle (1
kilobyte
) and a small file (
1
megabyte
),

will be sent one by one to
the
web
application
which host
ed

on the
cloud.

The database
processing time will be measured on

the cloud hosting
server
, and sent back to client.

For each test, the number of requests is fixed.

In terms of specifications
, Amazon SimpleDB and Azure Table Storage are advertised to store

structure data
,

w
hile Amazon S3

and Azure Blob Storage are aim
ed

for storing binary data.
In the
evaluation, request which is no larger than 1

kilobyte

will be stored into structure data oriented
database, and the one which is larger than 1

kilobyte

will
be put into binary data oriented database.

But for App Engine Datastore,
there is not
separat
ion on

structure data and binary data in database
level,
all data are supported in the format of string, text, and blob

in property level.


3.3.2

SINGLETON DATABASE W
RITE TEST


Figure

10
:
Average Singleton
Writing

Time on Cloud Databases

15


O
n the view of average data
base

processing time,
overall, each

singleton
data
base

processing time

for
writing
s
mall requests (1
byte
, 100
bytes
, and
kilobytes
) on cloud databases
varies in a small
range
.

It suggests that the size of small requests doesn’t affect too much on data
base

processing
tim
e on every cloud databases.

T
he
figure

also
state
s

that
Amazon
LocalDB
shows its strength
from

1

byte

to

1

kilobyte
. It is mainly
due to the stressless test environment, so that a local database without any optimizing can still
handle requests

normally
.

In addition, building the local database and the web application in the
same Amazon Instance might shorten the data
ba
se

processing time,
comparing to the time
uses by
other cloud hosting servers and corresponding cloud databases, which may not sit in the same cloud,
leading to a smallest time.

When the
size of request

goes to 1

megabyte
, Amazon S3
almost

has
the same per
formance as App
Engine Datastore while Azure Blog Storage
takes less time than others
.

By diving request size by data
base

processing time,
the speed of every database transaction could be
calculated to help build
the CDF of singlet
on write operation
.

It

reflects the different write speed
on
each cloud database
for
different request sizes
.

Overall,
by increasing the size of the request, the tr
ansfer speed is getting faster progressively.
It
indicates that the connection between the cloud hosts and the cloud databases is fast and stable on
three cloud platforms.

Comparatively,

Amazon SimpleDB has the slowest
speed,

which is worse than
App Engine Datastore and Azure Table Storage

In the first three tests

(1
byte
, 100
bytes
, and 1
kilobyte
),
App Engine
D
atastore, Azure Table Storage,
Amazon LocalDB and Amazon SimpleDB

are
conducted. The order is quite stable in which
Amazon
LocalDB performs
much
faster

than others
.


Figure

11
:
CDF of Singleton Write Throughput on Cloud Databases

16


As for the 1

megabyte
test, three
cloud

platforms perform similar
ly. Approximately 80% of their
requests’ speeds almost approach

10

megabyte
s

per
s
econd
.


3.3.3

SINGLETON DATABASE R
EAD TEST

T
h
ese two
diagram
s

indicate

data
base

processing time
of
reading requests
and CDF of read
throughput
on
different
cloud databases
.


Figure

12
:
Average Singleton
Reading

Time on Cloud Databases


Figure

13
:
CDF of Singleton
Read

Throughput on Cloud Databases

17


On
e

interesting point could be drawn is that comparing with singleton database write test, the
data
base

processing

time for all cloud platforms decreases dramatically except for Amazon S3
,

which
take
s longer time
than
it is in singleton writing test
.

Another
observation

is that Amazon SimpleDB changes
its position

from the last one to second last
.
Azure Table Storage performs better than Amazon SimpleDB in write, but worse in read
.


3.4

STRESS DATABASE WR
ITE/READ TEST

3.4.1

TEST CONFIGURATION

Based on the stress test strategy, another case of Cloud Host


Cloud Database Evaluations is
performed to simulate a scenario that multiple users take the
write/read action concurrently.

In this test case, the
number of requests varies,
but

the size of each request is fixed.
Among three
platforms, Google App Engine has a strict quota
limitation

for free use. The
incoming
bandwidth

is
limited
at maximum 56
megabyte
s

per
m
inute
. Since

the number of concurrent requ
ests in
the test

varies from 300 to 3300,
in order to suit the incoming bandwidth limitation
,
the size of the

request
has to be

set to 1
kilobyte
.

A

cron job
is
schedule
d to perform

repeatedly
stress
database test
s
over 24 hours
. And every

test

produces a copy of test results for further analysis. Therefore, for
each

platform,
6

copies of writing
test results and
5

copies of reading test results in different time
period

are
collected
, so 44 copies
of
results
in all.

By analyzing all these result
s,

not only error rate tables, error detail lists, and a CDF of throughput
will be established, but also high network fluctuations on the test over time will be observed.


3.4.2

ERROR DETAIL
S

LIST

A variety of errors occurred during the stress database test. In
terms of phases where errors are
thrown, all faults are categorized into three categories
.



Connection Error:
The e
rror

is

encountered

if
a
request
do
es

not reach
cloud host
s, due to
network connection problems, such as packages lose, proxy gateway
temporary unavailable.

This request is also called incomplete request, according to the
terminology

definition
.



Server Error
:
The
F
ault

occur
s

within cloud hosting servers, for instance, web application
is

not
ab
le to allocate resources to
current

request
.

The request goes back to the client eventually,
with an error message. This request goes into the category of failed request.



Database Error
:
The e
rror

come
s

from cloud database
in the period of data
base

processing time.

The error of this request is also
sorted as failed request.

18


E
rror
details in each category are listed as following.


3.4.3

DATABASE AND SERVER
ERROR PRECENTAGES
OF

REQUESTS

O
n
a

client application,
the number of rounds is set to
6

and the initial concurrent threads are 100
,
according to the stress test strategy
. Therefore, all
six rounds
will start
1
00,
3
00, 500,
7
00,
9
00 and
11
00 concurrent thread
s gradual
ly, with three continuous requests in each thread.

The overall
requests sent

will be 10800

for each client application
.

Table

1
: Error Details Table

Error
Category

Error Messages

Reason

Happened on

Database
Errors

datastore_errors:

Timeout

Multiple action perform at the
same entry, one will be processed
others will be
failed

Request takes too much time to
process

Google App Engine
Datastore

datastore_errors:

TransactionFailedError

An error occurred for the API
request datastore_v3.RunQuery()

Google App Engine
Datastore

apiproxy_errors:

Error

Too much contention on
these
datastore entities

Google App Engine
Datastore

Amazon SimpleDB is
currently unavailable

Too many concurrent requests

Amazon SimpleDB

Server Errors

Unable to read data
from the transport
connection

WCF failed to open connection

Microsoft Windows
Azure

500 Server Error

HTTP 500 ERROR : Internal Error

Google App Engine

Zero Sized Reply


Amazon EC2

Connection
Errors

Read timed out

HTTP time out

Microsoft Windows
Azure

Amazon EC2

Access Denied

HTTP 401 ERROR

Microsoft Windows
Azure

Google App
Engine

Amazon EC2

Too many open files

Java IO exception, due to machine
limit, too many concurrent
requests (too many threads) have
been launched

Microsoft Windows
Azure

Amazon EC2

Network Error
(tcp_error)

Local proxy connection error

Microsoft
Windows
Azure

Google App Engine

Unknown Host
Exception


Microsoft Windows
Azure


19


To maximize the stress from the client side, three
test machines

are deployed to run

client
applications
collectively
at the
scheduled

time. In all, 32400 requests in each scheduled test.

The overall
writing
error percentages chart is generated from the average results
of retained
data
from scheduled tests over 24 hours.

Th
e

chart illustrates the
performance
of
cloud
host
s

and
cloud
database
s

from the aspect

of error rates
.

Considering the
number

of requests sent
in all
,
although App Engine Datastore and Amazon
SimpleDB threw
average
31.67 and 111.17 faults in each scheduled test, separately, the overall
performance of
all cloud

databases
are still acceptable, keeping the database correct rates at a high
level. Even the worst one, still makes the rate more than 99.67%

of
completed

requests
.

Among all cloud platforms, Google App Engine drops the mo
st number of
server

errors, containing
“500 Server Error” messages. The
largest server error rate and database error rate happened on time
after May 21 16:30 EST 2009, when was May 20 23:30 PST 2009 to Google App Engine as well.
C
hecking its host and database status diagram
on that day, there
were

some

large
latencies
on both
host and database
,

around one or half hour earlier than the scheduled test.
It could be a cause. But
since it is hard to prove that earlier
latencies
affected the later test, it is still

not
certain that

the
latencies were

the main reason which led to high error rates.


Figure

14
: Overall Error
Percentages

of Writing Requests

20


For
the overall reading error percentages chart,
the correct rates of cloud database
s

and cloud host
s

are even higher, almost 99.99% of
completed

requests
.

All cloud platforms performance in a
significant
good condition

over different time periods
.

For both stress database write and read test,
the percentage
s

of connection errors
among

all
requests
on
Ama
zon LocalDB and App Engine Database

vary

in a range of
15%

to
20%
.
Amazon
SimpleDB takes
the
minimum rate, less than 10%, in both tests, respectively, almost reaching 0% in
stress database read test. C
ontrarily
, Azure Table Storage occupies a largest rate in stress database
read test, which is more than
30
%.

More details about connection errors will be
discussed

in the next
section.


3.4.4

CONNECTION ERROR PER
CENTAGE
S

OF

ROUND
S

Besides of database errors a
nd server errors,
connection errors
are

incomplete requests
failed
reach
ing

cloud hosts mainly
due to
high network fluctuations
,

cloud hosting
capabilities

or security
issues
.


Figure

15
:
Overall Error Percentages
of

Read
ing Requests

Table

2
: Average

Connection Error Percentage of All Requests of Rounds in Stress Database Write Test


Round 0

300

Round 1

900

Round 2

1500

Round 3

2100

Round 4

2700

Round 5

3300

App Engine Datastore

4.61
%

11.83
%

23.46
%

22.30
%

26.67
%

28.72
%

Azure
Table
Storage

0
.00%

0
.00%

0.21
%

2.98
%

19.62
%

30.54
%

Amazon SimpleDB

1.15
%

0.12
%

0.97
%

6.81
%

11.01
%

11.13
%

Amazon LocalDB

0
.00%

0.53
%

6.35
%

16.02
%

19.88
%

23.93
%


21


For each platform, in both read and write test,
the

trend of
the

average connection error
percentages

of rounds
tends to
rise

when the number of concurrent requests increases.
B
ut
App
Engine Datastore and Amazon SimpleDB have a
smaller percentage trend in read test than write

test
,
while Azure
Table
Storage and Amazon LocalDB are
contrary, higher
trends
in read test than write
test.

Amazon SimpleDB keeps a
lowest

percentage

in both write and read test,
approaching to 0% in read
test.

But Amazon LocalDB, which shares the same

hosting server

instance
and
cloud
network

environment

with Amazon SimpleDB,
starts
receiv
ing

high
connection
error
percentage
s

from

Round
2.

The reason for this
phenomenon is that the local database setting in the instance
in Amazon LocalDB
takes a lot of computing resources from the host, pushing the host to a limitation of
capabili
ty and
leading dropping package.

For
Azure

Table Storage, the
connection
error
per
centage
s begin to leap, from less than 1% in Round
2, to more than 50% and 30%,
separately

in read test and write test, in Round 5. The overall error
connection requests
t
ake

almost one third
of all
read
requests

as well
. Most of
connection
errors
are

raised due to “Read time out”. It occurs 9728.20 times on average in each scheduled test, varying in
a range of 8156 times to 12775 times.

The

Read time out


message means that a client application does not receive any response for its
sent request, l
ead
ing to the occurrence of time out error.

Because there is no way to get into
Microsoft Windows Azure

s instance, as Amazon does, to identify this issue.
Some conclusions can be
assumed, that:



The c
loud hosting server

of the web application

reaches its capa
bility.



The network from Australia to USA is not well connected as expected

due to insufficient
external connections and geography
.



Peers in the same network have some effects on the test.

For App Engine Datastore, it keeps the connection error
percentages

around 25% from Round 2 to
Round 5 in write test, and less than 25% in read test.

Most connection errors from Google App
Engine contains

Access Denied


message, which is a standard HTTP 401 error.

But

there is no HTTP
401 error in
web
application

access
logs.
It means these requests are blocked before
getting

into the
web application.
The conclusion

can be
presumed

that

the access is restricted due to a firewall.
When thousands of requests go into Google App Engine concurrently from the same IP, the rule
of
the firewall may be
triggered
.

Table

3
: Average Connection Error Percentage of All Requests of Rounds in Stress Database Read Test


Round 0

300

Round 1

900

Round
2

1500

Round 3

2100

Round 4

2700

Round 5

3300

App Engine Datastore

2.11
%

11.08
%

9.83
%

10.74
%

23.17
%

21.75
%

Azure
Table
Storage

0
.00%

0
.00%

0.03
%

30.24
%

48.65
%

52.53
%

Amazon SimpleDB

0.00
%

0.08
%

0.00
%

0.05
%

0.32
%

0.20
%

Amazon LocalDB

0
.00%

0.48
%

9.44
%

17.09
%

23.57
%

29.97
%


22



3.4.5

CDF OF STRESS WRITE/
READ THROUGHPUT ON C
LOUD
DATABASES

W
ithin every response of a
successful

request, there is a
database processing time attached.
The
time could be used together with the request time to calculate
the speed of every database
transaction
, and
draw
the CDF of
stress

write
/read

throughput.

All data in Round 3 of stress tests are
used to draw this diagram.

According to the diagram
,
the first
thing
needs

to be considered is that
,

instead of being the first
place in writing and reading test, Amazon LocalDB conversely performs the worst among a
ll these
platforms which implies the poor handling concurrent requests capability. Moreover, except for
Amazon LocalDB, App Engine Datastore, Amazon SimpleDB and Azure Storage all show higher speed
in read
test
rather than write

test
, while Amazon LocalDB
seems quite similar.

Comparing to the local database, all cloud databases show an
impressive

scalability

in write and read
to some extent.


3.5

SINGLETON LARGE FILE

WRITE/READ/DELETE TE
ST

3.5.1

TEST CONFIGURATION

Singleton large file tests are b
ased on the
singleton

transferring

test strategy
, implementing as cases
of
C
lient



Cloud Database Evaluations to simulate a scenario that
a user transfer different sizes of
large file directly to the cloud databases
.


Figure

16
:
CDF of
Stress

Write/
Read Throughput on Cloud Databases

23


3.5.2

SINGLETON LARGE FILE

WRITE TEST

This diagram illustrate
s

the average time has ta
ken to upload large binary file
s

to cloud database
s

directly from
the client application
.
From the figure,
both
average write time of
Azure Blob St
orage
and Amazon S3 are

exactly the same.

That is probably due
to the
uploading
limitation
of the local
network

environment
, the test reaches
the
threshold

of the local network before getting
insights

of
the
cloud databas
e
s.

3.5.3

SINGLETON LARGE FILE

READ TEST


Figure

17
:
Large File Average
Wri
te

Time Directly from
Client
to Different Platforms


Figure

18
:
Large File Average
Read

Time Directly from Client to Different Platforms

24


This line chart show
s

the average time has been taken to retrieve binary file
s

from cloud database
s

to the client application
. Compar
ing

with the
figure of large file average write time
,
i
t can be seen
that
Amazon S3
has a
faster

speed on writing
than
reading
, w
hile Azure Blob Storage

s read speed is
faster than writing.

3.5.4

SINGLETON LARGE FILE

DELETE TEST

This
diagram
shows the average time has been taken to process
delete

action on cloud database
s

which triggered directly by
the client application
. It is

confirmed that neither Amazon S3 nor Azure
Blob Storage will delete data entries
on the fly w
hen they receive the signal
. Both of them
mark the
entr
y

as
“to be deleted”
,

and
reply the client

with

successful deleted


message at the
first

stage.
The real d
elete action will be
performed

later
.





Figure

19
:
Large File Average
Delete

Time Directly from Client to Different Platforms

25


4

QUANTITATIVE EVALUAT
ION

4.1

PRODUCTIVITY SUPPORT
S

FOR THE DEVELOPERS

4.1.1

DEVELOPMENT
UTILITIES

Taking a snap shot into Microsoft Windows Azure, heavily equipped
frameworks and environments
are its highlight.
Almost all
existed Microsoft web development frameworks and run
time
environments are supported

in
Microsoft
Windows Azure.
With aids of these toolkits, d
eveloper
s

can
simpl
y

put their
main
focuses on the business logic implementation

with C# or PHP
. But the
downside

is
obvious

as well.

They

have to
stick with Microsoft programming environments
, for
instance, Microsoft Visual Studio
.

Speaking
of

Amazon EC2, an administration role will be granted to developer
s

when using
a

virtual
machine instance.
They are allowed to
install whatever programming

environment
s they want in the
instance.
In other words, there is no restriction at all on selecting programming environments on
Amazon EC2.

B
ut on the other hand, extra works need to be done,
for example, uploading and
installi
ng a
pplication runtime environment
s,

setting up and connecting cloud databases
from the
instance

manually.

Different from Microsoft Windows Azure which offers fully functioned frameworks, and Amazon EC2
which provides highly configurable
environment
, Google App Engine
re
-
implements programming
languages to suit
Google App Engine
. Up to present, Google has
enabled
Python and
JVM
-
supported
languages

on its cloud
platform
.

Developers are free to choose frameworks based on Python and
JVM
-
supported langua
ges to
improve

productivity
. But, in practical, some
limitations

on Google App
Engine restrict the choices, for instance, no multiple threads, no local I/O access, and 30 seconds for
a request handler.

Besides of these, Google offers other Google APIs to i
ntegrate Google App Engine
with other Google services.


4.1.2

LEARNING RESOURCES

Microsoft provides several
official

c
annels to
help developers.



"How Do I?" Videos for the Azure Services Platform

http://msdn.microsoft.com/en
-
us/azure/dd439432.aspx



Channel9

http://channel9.msdn.com
/



A
zure Service Platform Resources

http://www.microsoft.com/azure/resources.mspx



Steve Marx Blog

http://blog.smarx.com
/


26


Amazon EC2


Google App Engine



Google App Engine on Google Code

http://code.google.com/appengine/



Google App Engine Blog

http://googleappengine.blogspot.com/



Google Developer
s


Channel on YouTube

http://www.youtube.com/user/GoogleDeveloper


4.2

IMPLEMENT

4.2.1

WEB HOSTING
SERVICE

Microsoft Windows Azure provides
web

role

pattern for creating front
-
end web application. A web
role is a
web

application
which is
accessible
via an HTTP
/
HTTPS endpoint
. Developers are also
allowed to modify the number of instances to implement more web roles, making a scale of resource
usages.

Google App Engine supports native Python and
all JVM

supported languages, endpoint is exposed via
a un
ique URL. By choosing a proper third party web framework will be good enough to achieve any
web application. However each request cannot be process longer than 30 seconds by Google App
Engine.

Amazon EC2 offers virtualise
d
hardware

resources in an
instance
.

Comparing with Microsoft
Windows Azure and Google App

Engine, the user has the ability to choose its own runtime
environment, which available in the market ranging from open source to purchased products. User
has the flexibility to choose its own prefere
nce and do the customization for their on
-
demand used.


4.2.2

COMPUTING SERVICE

Worker
role

pattern in
Microsoft
Window Azure is designed for providing a
back
-
end

processing
application which can communicate with
Azure S
torage services and other Internet
-
based
services.
But it

is not allowed to
expose any external endpoints

to users
,
which means no
listen
er

for incoming
requests over HTTP
/
HTTPS. However, a we
b role and a worker role can
live in the same instance
,
working on listening and computing
separately
.

Th
e same as web roles, the number of worker roles
can also be modified to scale resource usages.

27


As for Amazon EC2,

developers can run any computing software to take advantage of computing
resources on an instance.

Moreover,
multiple instances can be created

and deleted whenever
needed in order to give a scalable resource for computing service.

Essentially
,

Google App Engine is not designed for doing long
time processing. Any requests that are

l
onger than 30 seconds

will be terminated automatically
.

But b
y
s
cheduling

corn job
s in Google App
Engine, web applications on the cloud host can
still
be used to p
rocess

some light
-
weight
ed

computing tasks within 30 seconds
.

4.2.3

DATA SERVICE

B
ecause cloud databases used in three vendors behave in some

distribute
d

manner
s, some
relational

database logics do not fits new situations.
T
he most significant one is that none of then supports SQL
join statements. Also, based on different theories used in these cloud databases, terminologies are
identically
different

when
describ
ing

some
relational
database liked
concepts.

Relation

in the
relational
database is called kind in App Engine Datastore,
domain

in Amazon
SimpleDB
or
bucket in Amazon S3.
Attribute is name
d

property in App Engine Datastore.
W
hile an
entity in Google App En
gine

is used to describe a tuple
.

Microsoft provides two kind
s

of data service
s

in the cloud, Windows Azure Storage and Microsoft
SQL Data Service, however none of them support transaction up till the report writing.

Windows
Azure Storage is a REST
-
base
d
,

HTTP 1.1 only, dynamic cloud storage service. It contains
three storage services,
Azure
Blob Storage,
Azure
Table Storage and
Azure
Queue Storage
, aiming for
binary data,
structured

data and communication between web roles and work roles
respectively
.

F
or

Azure Blob Storage
, the
maximum

allowed file size is 50
gigabytes
. But
a

single
file
,

which
can be
uploaded to the storage at once,

must be no larger than 64

megabyte
s
.
When the file excesses

64

megabyte
s
, developer
s

must
trunk
it
into blocks,
each in size

of

4

megabyte
s
,
and
50

gigabytes

in
total.

For
Azure
T
able
S
torage
,

a
ll properties put into one row must no larger than 1M in total, and
the
maximum number of properties
has
to

be
within 255. Furthermore, every r
ow of
an
entry must be
assigned a

unique

p
artition
k
ey under the account, and a
unique
r
ow
k
ey under its partition.
Importantly, Azure Table Storage store
s

data in a distribute
d

manner

based on the partition key.

The primary role of
Azure
Queue

Storage
is to provide a way for
w
eb roles to communic
ate with
w
orker roles. Within

Azure
Queue

Storage
, every message put into queue cannot be larger than 8

kilobytes
. But the number of messages is
not limited.

Regardless of how data is stored, in
Azure Blob Storage, Azure Table Storage
or

Azure
Queue
Storage
,
all data held in Windows Azure
S
torage are

replicated three times

for
fault tolerance
.

The
cloud
database

guarantees consistency, so

the data read is
expected
.

M
icrosoft also provided another kind of cloud storage service, named SQL Data Service
,
supporting
SOAP and REST
. But the protocol was later deprecated. Instead, Microsoft announced a new protocol,
28


called Tabular Data Stream, would be release in the future. Since SQL Data Service is deprecated in
the middle of the evaluation, and development
documents about Tabular Data Stream are still
incomplete, the evaluation of cloud database on Microsoft Azure Cloud mainly focuses on Azure
Data Storages.

Amazon
has

two
kind
s

of data service
s

too
,

Amazon SimpleDB for
structured

data and Amazon S3 for
binary data.

Amazon SimpleDB provides the core database functions of data indexing and querying. Comparing
with the traditional relational database, Amazon
SimpleDB

requires less administrative burden of
data
modelling
, index mainte
nance, and performance tuning. However, there are some limitations
on Amazon

SimpleDB

for users’

consider
ation. For instance, the attribute value is limited to
1024
bytes
, and
the
number of attributes is

limited to 255 in one table
.

Amazon S3

claims to off
er
unlimited storage. With the simple module design,
developers

can write,
read and delete objects ranging from 1 byte to 5 gigabytes for one single da
ta. However, before
creating
an object

into cloud,
a
bucket
, which
name
should be
global

unique

in the world
,

has to be
exp
licitly created
.

Google App Engine packs its database functions in
App Engine Datastore
,
another implementation of
Google BigTable
,

to
index

and store data
.
It equips with some unique features,
support
ing

transaction
operation,
build
ing

index automatically according to queries defined in web applications
,
using reference property to operate cross
kind

query, implementing data models for
kind
inherit
ance
and
dynamic
increasing
properties
.
But up to now, App Engine Datastore

can only

communicate
through the cloud hosting server vi
a inner protocol.

In App Engine Datastore,
an entry is limited to 1 megabyte. D
ifferent
from other cloud databases, it
does not offer storage for
different

data in database level. Only some
propertie
s
are
introduced

to
store binary data, text data, string data and list data.

4.2.4

DE
PLOYMENT AND MANAGEM
ENT SUPPORT

Microsoft
Windows Azure provides a web portal, Azure Services Developer Portal, for developer to
deploy and manage application onto
the cloud host.
D
evelopers can make choice between two
development methods, s
taging deployment and
p
roduction deployment. T
he staging deployment is
used only for
testing purpose.
By default, every deployed application is set

in

s
taging deployment

at
first, swapping to production deployment for releasing by developers after testing.
After successful
deployment,
developers can transfer
application logs

to Azure Blob Storage,

adjust

the
number of
running
instance
s
.

Within Azure Services Developer Por
tal,
a
nalytics is provided to monitor the status of application
status. F
or example,

virtual machine usage in date range, in
-
bound and out
-
bound bandwidth
network status for both Hosting Server and Storage service.

Amazon EC2 provides a set of command line

tools

to help developers

interactive with the
cloud
platform

to
creat
e,
reboot
, and terminate
an instance. Besides
of

command line tool
s
, another
common tool used by most of
developers

is called AWS Management
C
onsole, which gives a quick,
29


global picture
of
the
cloud
platform
so that
developers

can access and manage web services.
But
graphical interface, like
AWS Management
C
onsole
, is
only
available to

Amazon EC2 and Amazon
Elastic MapReduce

up to now
,
other

infrastructure services
will supported in the c
onsole

in the
future
.

Google App Engine SDK comes alone with command line tools to
deploy

application
s

onto Google
App Engine.
Before
submitting
, a string value, called version, could be defined to
separate

the
submitted version from hosted versions. Altho
ugh only one version can be set to be released,
m
ultiple

other versions are still allowed to be hosted in the cloud, accessed with
corresponding

endpoints, for test and
maintenance

purpose
.

A rich web portal is provided to manage and monitor the
web
application

in Google App Engine
.
A
ccessing logs, resource usages, performance charts, and error rates can all be accessed through this
web
portal
. But accuracy and frequency of updated usages varies. Accessing logs are updated on the
fly when a new reques
t comes, but the usage of stored data only is updated once a day, some
updates in the intervening
period

is just some
estimate
s.


4.2.5

COSTS

Microsoft Windows Azure is in Beta version. It is free to be used

at the moment with some
restrictions
. It is free to us
e for 2000
virtual machine
hours
.

The
maximum
of
cloud storage capacity
is up to 50

gigabytes. And
total bandwidth is 20

gigabytes
for every day.

A
c
onsumption
-
b
ase
d

m
odel
,

which charge
s

by

the computing resources that applications use
, i
s
raised by Micros
oft for future charging.

The charging details are expected to be released in July 2009.

Amazon Web Service has been put onto commercial since early 2006. Pricing detail
s

can be obtained
from its web sites
. Amazon also provides a tool
,
Simple Monthly Calcul
ator
,

to
help
estimate monthly
fee
.

Charing in Amazon EC2 varies depends on instances, bandwidth, storages. For instances, developers
can rent either reserved instances, which are only for Linux/Unix
-
based instances, for a certain
period

with a fix amount,

or on
-
demand instances, which is paid according to
geography
location of
instances, installed operating systems,
consume
d resources

and other
criteria
.

The bandwidth is charged
separately

for both incoming and outgoing. The incoming bandwidth is
fixed, but the outgoing bandwidth is charged depends on usages. The more bandwidth used, the
higher discount received for every gigabyte.

The storage is billed different on Amazon SimpleDB and Amaz
on S3.
Amazon SimpleDB

charges for
m
achine
u
tilization,
d
ata
T
ransfer and
d
ata
s
torage
, while
Amazon S3

bills according to r
equests,
d
ata
tr
ansfer and
d
ata
s
torage.
Comparing to Amazon SimpleDB, Amazon S3 does not offer the
first 1
gigabyte

free
throughput

in every month.

30


Google App Engine
allocates certain free quota to use every day.

When a web application runs out of
quota, Google suspends it until the next reset, or charges for extra quotas, according to presetting.
Google App Engine
intends to charge

o
n

seven

r
esource
s,

r
equests
, d
atastore
, m
ail
, U
RL
f
etch
, i
mage
m
anipulation
,
m
emcache
, and

d
eployments
, but
up to now
, only
r
equests
, d
atastore
, and m
ail

are
available for billing.
Es
pecially, Google App Engine provides budget function

to set daily maximum
budget, adjusting budget balance among five resources
.

Hosting
is free o
n Google App Engine
.

There is no charge for hosted files. B
ut
the number of hosted
files is limited at
1000
, while maximum 10 megabytes for a
single file
.

4.2.6

SECURIT
Y

When using Microsoft Windows Azure,
a
Windows Live

ID

is

needed
for developers
to provide
credentials
.

When
accessing
the

web

applicati
on which has been deployed on the cloud host
,
requests can be

made
through

HTTP/HTTPS. When

accessing
the cloud
databases
, a storage account
and a 256
-
bits share key, which generated by the Azure Service Developer Portal, are needed for
authentication.
Unless a public access right is granted for a

container in
Azure Blob Storage,
so it can
be visited without any
aut
hentication
.

To access Amazon EC2, public

and

private X.509
k
ey pair was used to access instances

which are

created from public
machine instances

provided by Amazon EC2. As for Amazon S3, in order to give
the
right
permissions
developers

wish to
grand
, dat
a in Amazon S3
are

secured
in bucket level and
object level. The permission in SimpleDB is controlled in domain level.
O
nly the domain creator
and

authenticated
developers

are allowed to
access
.
Beside,
Amazon SimpleDB also provides SSL
-
encrypted endpoints for
data

accessing
.
However
, the data in
Amazon
Simpl
e
DB is not encrypted.
Developers have to
encrypt

them
before send
ing them
to Amazon SimpleDB.

In Google App Engine, a Google account is
required

for deploying
web
application
s

onto
th
e
cloud
host, and

accessing

web portal to manage cloud

application. Unlike cloud data
bases

hosted by
M
icrosoft and Amazon,
App Engine

Datastore is not exposed t
hrough endpoints
. It can only be
visited
from
web applications hosted on
Google App Engin
e

via i
nner protocol.
For secure accessing
from clients,
Google App Engine
also allows requests sent through
HTTPS
.