Cloud Computing: Technologies and Opportunities

obtainablerabbiΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 5 μήνες)

124 εμφανίσεις

Cloud Computing:

Technologies and Opportunities

H. T. Kung

Harvard School of Engineering

and Applied Sciences

January 21, 2010 at NTHU



Copyright © 2010 by H. T. Kung



2

“Cloud Computing” Is Hot, At Least the Term


(This chart and much of the other material
presented in this talk is based on literature openly
available from Google)

Cloud Computing

Grid Computing

Utility Computing

Search Term Popularity over Years








2004

2005

2006

2007

2008

3

Server Racks in Google Data Center




Source: Edward Chang, Google (MMDS 08)

4

First, Some Definitions



Cloud computing

refers to
both the
applications delivered as services over the
Internet and the hardware and systems software
in the datacenters that provide those services


A


cloud

is a

specific instance of a cloud
computing service
,
e.g., Amazon’s Elastic
Compute Cloud (EC2)



Data centers
” can hold tens to hundreds of
thousands of servers that concurrently support a
large number of cloud computing services (e.g.,
search, email, and utility computing)



5

Advantages of Cloud Computing


Utility computing


Reduced capital expenditure for users


Resource sharing and scaling


Simplified IT management


Device and location independence in access


Reliability through multiple redundant sites


Security improvement via centralization

6

Cloud Computing Can Potentially Offer Some
Solutions to Computer Attacks on the Internet

7

Nine Goals of Computer Utility

1.
Convenient
remote terminal
access as the normal mode of system usage

2.
A view of
continuous operation
analogous to that of the electric power and
telephone companies

3.
A
wide range of capacity
to allow growth or contraction without either system or
user reorganization

4.
An internal
file system so reliable
that users trust their only copy of programs and
data to be stored in it

5.
Sufficient control of access to allow
selective sharing of information

6.
The ability to
structure hierarchically
both the logical storage of information
as well as the administration of the system

7.
The capability of
serving large and small users
without inefficiency to
either

8.
The ability to support
different programming environments
and human interfaces
within a single system

9.
The
flexibility and generality of system organization
required for evolution through successive waves of technological improvements and the
inevitable growth of user expectations


8

These Nine Goals Were Actually Articulated
More Than 40 Years Ago


The nine goals listed on the preceding slide were
excerpts from a 1972 paper:


“Multics


The first seven years,”
by F. J. Corbato and J.
H. Saltzer, MIT and C. T. Clingen, Honeywell Information
Systems


Thus visions similar to cloud computing vision have
been around for a long time


It is the incredible scalability in cloud computing
implementations

of recent years that is new


Therefore, to understand cloud computing, it is
instructive to
focus on its enabling technologies,
beyond just concept discussions

9

Traditional Examples of Cloud Computing

This “everything X
as a Service”
(XaaS) formulation
can still be difficult
to understand, and
is in fact a little
boring


So what is really
the essence of
cloud computing?

10

The Essence: In Cloud Computing

We Program or Run a Cloud
As a System


For example, we program a cloud to allow parallel
execution of tasks or run a cloud to attain load
balancing


Thus a cloud is much more than a merely collection
of servers


More precisely, a cloud takes on a set of tasks
expressed in certain abstraction (“virtual tasks”) and
assigns them to physical resources for their
executions. These tasks can be, for example:


Virtual machines (for IaaS)


Virtual services (for SaaS)


Virtual platforms (for PaaS)


The result is that hardware and software in a cloud
work in concert to deliver scalable performance


11

Outline for the Rest of the Presentation


Challenges


Enabling Technologies (or Best Current
Practices)


Opportunities

12

Some Areas of Challenges


User
-
experienced performance


Scalable implementation of services


Privacy


Distributed modular deployment for
the network edge (to meet future
needs)


Data lock
-
in


13

Challenge: Programming a Massive Number
of Servers


Parallel programming, while managing non
-
uniform inter
-
node communications with high
latency disparity

14

Challenge: Programming under

High Workload Churn


Because Internet services are still a relatively new
field, new products and services frequently emerge
(e.g., YouTube and iPhone phenomena) and fade
away (e.g., lately, people have been saying P2P is
dying as far as getting new investments are
concerned)


Google’s front
-
end Web server binaries are released
on a weekly cycle, with nearly a thousand
independent code changes checked in by hundreds
of developers

the core of Google’s search services
has been reimplemented nearly from scratch every 2
to 3 years

15

Challenge: Managing Power Consumption


We need drive servers as well as high
-
speed
and redundant networks for connecting
servers, racks and clusters


Large data centers are literally both
gigantic
heaters and air conditioners

at the same
time


Due to the high heat concentration, power
-
consuming heat
-
exchange mechanisms are
needed consuming about
40%
of total
center’s power


16

Power Consumption Can Be Very Large

Just for Data Storage Alone

Consider, e.g., a large
-
scale email service


140M users


10GB
storage per
user, assuming 10X online
redundancy


1TB disk drive with 8W standby power
consumption (e.g., Seagate ST31000340AS)


Server power consumption: 10X


Datacenter power and cooling: 1.4X
(chillers/pumps, power distribution, UPS, etc.)

This means 140M
*.001*8*10*1.4
=
15.7
megawatts

17

Data Centers Power Consumption:

A Very Serious Issue


In year 2007, the U.S. Environmental Protection
Agency estimates that data centers currently use 7
gigawatts of electricity during peak loads


These data centers consume about 1.5% of the total
electricity used on the planet, and this amount is
expected to grow rapidly. Environmental agencies
worldwide are considering standards to regulate
server and data center power


We will need to rethink about our use of data centers,
and their deployment for the sake of the environment
.
This is like that even if our income allows us to drive
big cars it does not mean that we should abuse our
use of resources




Challenge: Distributed Cloud Computing
at the Network Edge


Processing at the network edge is
advantageous when backhaul
networking for reaching back is
seriously constrained and/or low
-
latency responses are essential



Local enterprise computing
may be needed for control
and security/privacy
reasons


Sensor data sets may be
too large to move


Heat is dissipated over
distributed edge data
centers, no longer highly
concentrated


Someone should do a
serious calculation on
energy consumption
implications when most of
desktops are migrated into
clouds


Edge

Edge

Edge

Edge

Cloud

Edge

19

Challenge: Handling Failures


Disk drives, for example, can exhibit annualized
failure rates higher than 4%


In addition, there are application bugs, operating
system bugs, human errors, and the failures of
memory, connectors, networking, and power supplies


Different deployments have reported between 1.2
and 16 average server
-
level restarts per year


Component failures are the norm rather than the
exception


Therefore, constant monitoring, error detection, fault
tolerance, and automatic recovery must be integral to
the system



20

Challenge: Scheduling


In parallel computing, a single slower worker
can determine the response time of a huge
parallel task. To deal with “stragglers,” toward
the end of a job we may need to identify such
situations and speculatively start redundant
workers only for those slower jobs


We may also need to preempt a running job in
order to accommodate higher
-
priority tasks such
as a real
-
time interactive task. Preemption
actually is a major source for system instability



21

Challenge: Migration


We migrate machines for balancing or consolidating
load


Note that less loaded servers normally don’t decrease
their power proportionally
. Thus consolidating their loads
to a single server will save power


In addition, migration facilitates fault management,
load balancing, and low
-
level system maintenance


But migration should minimize its disruption to normal
operations


Furthermore, when a server CPU is put to sleep, we
may still need to access the data on its local disks
(what is your solution?)



22

Challenge: Performance Assurance in
Virtualization


When

virtual machines of uncooperative
users share the same resources, these
virtual machines must be isolated from one
another


In particular, the execution of one virtual
machine should not adversely affect the
performance of another

23

Challenge: Data Center Networking


Layer
-
3 IP networking is scalable, but cloud
-
computing services and end
-
host virtualization
cannot be tied to layer
-
3 IP addresses. This is
because we want to allow a virtual
machine/service/application to be run on any
server


This means that we need to use a
flat layer
-
2
network to enable any server to be assigned to
any
service


However
, layer
-
2 networks are generally not
scalable and they do not allow multiple paths
required for high
-
bandwidth traffic patterns

24

Challenge: Data Lock
-
in


Can you imagine the day when you need to
move tons of your emails to a new email
provider?


Can you see the strategies of Apple behind
its MobileMe service?


Data lock
-
in is likely an even more serious
issue than application/software lock
-
in


Data lock
-
in over the cloud has become one
of the most important control points for many
businesses


25

Outline


Challenges


Enabling Technologies (or Best
Current Practices)


Opportunities

26

Current Practice: High
-
level Programming
Models and Implementation


Provide a programming model
layer to facilitate application
development


The most famous programming
model in cloud computing is
MapReduce.
Thousands of data
parallel applications have been
programmed in MapReduce


Run
-
time
implementation is the
key
. It takes care of scheduling
and fault
-
tolerance for the
application


Programming

Models

(e.g., MapReduce)

Applications

(e.g., Page Ranking)

27

Current Practice: Master
-
slave Model in

MapReduce Execution


The Master
server
dispatches map
and reduce
tasks to worker
servers,
monitors their
progress, and
reassigns
workers when
failures occur

28

Current Practice: Software Infrastructure To
Support Programming Model Implementation


For example, with the GFS
support, a MapReduce
implementation can focus
on its Master
-
based
scheduling, by leaving
system issues, such as file
fault tolerance and data
locality support to GFS


That is, MapReduce can
now write files to a
"persistent" file system
distributed over racks of
servers in a data center




Programming

Models

(e.g., MapReduce)

Applications

(e.g., Page Ranking)

Software Infrastructure

(e.g., Google File System)

29

Current Practice: Google File System (GFS)


In GFS
,
every file chunk is replicated three times automatically


The master server can potentially be a performance bottleneck. In
principle, it could be replaced with a set of distributed master
servers using, e.g., distributed hash table (DHT). But this could
complicate management due to lack of centralized knowledge

30

Current Practice:

Data Center Network Wiring


Multi
-
rooted Tree

31

Current Practice:

Data Center Network Protocols


We could use
virtual layer
-
2
networks

which may actually
be implemented with layer
-
3
or even layer
-
4 (yes, TCP!)
protocols and nodes


Note that normal layer
-
3 and
layer
-
4 protocols will be on
top of this virtual layer 2 in
supporting applications!

Layer 4

(TCP/UDP)

Applications

(e.g., Page Ranking)

Layer 3

(IP)

Virtual Layer 2

Implemented with

Layer 3 and Layer 4

Layer 1

(E.g., Ethernet )

32

Current Practice: Virtualization Technologies


For example, Xen’s virtual machine hypervisor (used
by Amazon’s EC2) can multiplex physical resources
at the granularity of an entire operating system


Virtualization allows individual users to run
unmodified binaries
, or collections of binaries, in a
resource controlled fashion (for instance an Apache
server along with a PostgreSQL backend)


Furthermore it provides an extremely high level of
flexibility since the
user can dynamically create the
precise execution environment their software requires
.
Unfortunate configuration interactions between
various services and applications are avoided (for
example, each Windows instance maintains its own
registry)

33

Fast Live Migration with
Pre
-
copy


Pages of memory are iteratively copied from the
source machine to the destination host, all without
ever stopping the execution of the virtual machine
being migrated. Page
-
level protection hardware is
used to ensure a consistent snapshot is transferred,
and a rate
-
adaptive algorithm is used to control the
impact of migration traffic on running services


The final phase pauses the virtual machine, copies
any remaining pages to the destination, and resumes
execution there


Migrating entire OS instances on a commodity cluster
can incur service downtimes as low as 60ms

Current Practice:

Container
-
based Datacenter


Placing the server
racks (thousands of
servers) into a
standard shipping
container and
integrating heat
exchange and power
distribution into the
container


Air handling is similar
to in
-
rack cooling and
typically allows higher
power densities than
regular raised
-
floor
datacenters


The container
-
based facility has
achieved extremely high energy
efficiency ratings compared with
typical datacenters today

Microsoft Data Center Near Chicago

(9/30/2009)

Source: http://www.datacenterknowledge.com/archives/

2009/09/30/microsoft
-
unveils
-
its
-
container
-
powered
-
cloud

35

Packaging Related Issues


A container
-
based datacenter will likely be

prefabricated in factory. It is rather difficult, if
not impossible, to service it by humans once it is
deployed in the field, due to operational and
space constraints


“Data center in a shipping
-
container” is analogous to
“system on a chip” built with low
-
power transistors
which may fail


Suppose we want to stack 4x4x4 units together.
How should we network them together?

35

36

Opportunities in Using A Wireless

Network as A Backplane


Ubiquity of wireless networking


Wireless LAN such as
802.11

(aka “Wi
-
Fi”) has
become widely available on mobile devices


Inherent advantages of the wireless medium


No wires
! so it can support flexible and rapid
deployment


Convenience in broadcast/multicast

37

Use of Wireless Connections


By definition, edge computing is more ad
-
hoc and
less enterprise; wireless networking will give us the
required flexibility


Potentially by using wireless we can get rid of most of
these wires in the bottom of the interconnection
hierarchy without significant performance impact

May use Wireless Instead

38


Related Work at Harvard

for “Wireless Computing at the Edge ”


We are developing two
systems


Wireless Ad
-
Hoc File System
(AHFS)


TCP over multiple coded
wireless links


These systems incorporate
advanced technologies such
as localization, clustering,
network coding, and
geographic routing


39

Wireless MapReduce Implementation


We believe that wireless
broadcast is natural in
distributing data to multiple
map worker nodes,
whereas wireless remote
procedure calls can
efficiently facilitate
communications for reduce
workers


As a proof of concept, we
have a preliminary
MapReduce
implementation of the
distributed speaker ID
application we built in the
past

39

Testbeds to Support Wireless Networking
Research at Harvard

Indoor testbed

at Harvard


Outdoor testbed

(“cloud computing in the air”!)

Maxwell Dworkin

Pierce

MID

Four MIDs on

the Airplane

Two Architecture Primitives of AHFS


Cluster
-
oriented file
operation


Nodes in a cluster can talk to
each other well, so they can
provide file redundancy


Model use: "
Write a file into
a cluster



A file is associated with a
cluster of nodes, where
clients read from/write to the
file. That is, the cluster is a
“rendezvous” point for users
of the file


Location
-
oriented file operation


Put files in the proximity of their expected users in order to minimize
transmission distance


Model use: “
An airplane writes a file to ground nodes at location X


TX
-
2

TX
-
1

TX
-
3

S

RX
-
2

RX
-
1

RX
-
3

D

[E
4
][E
3
][E
2
][E
1
]

Time

[F
4
][F
3
][F
2
][F
1
]

[G
4
][G
3
][G
2
][G
1
]

Rate =1/2

TCP over Multiple Network
-
coded Wireless
Links: Exploiting Space
-
Time Redundancy

Space

A Generation

of four packets

It is sufficient for
D to receive just
four out of the
twelve packets

Twelve network
coded packets

43

Outline


Challenges


Enabling Technologies (or Best Current
Practices)


Opportunities


44

General Comments on Opportunities


Cloud computing is here to stay


one of the most
efficient ways to do processing on a large scale


E.g., driver of further
power efficient development


Clouds are expanding into new domains


E.g.,
network edge: smart phones, set
-
top boxes, netbooks


Cloud computing drives demand for ever increasing
bandwidth and access flexibility


Data center network infrastructure


Wireless networks to provide cloud access


Faster storage technologies such as flash


It is not unlike the dot.com period. There are many
business models and applications being proposed. It is
likely that it will take several years to sort through these
ideas


45

Opportunities (1/2)

1.
Cloud end
-
devices
with
services/applications/data from the cloud, while
being able to use resources in local
environments (e.g., TVs and desktops) via, e.g.,
300mbps Wi
-
Fi wireless links

2.
Fabric computing

rather than traditional rack
-
based blade servers (e.g., network
-

or switch
-
centric servers rather than CPU
-
centric servers
for better power management and space use)

3.
Programming models
beyond MapReduce, e.g.,
synchronized message passing

46

Opportunities (2/2)

4.
GPU
-
based clouds
for large scientific
computing to complement x86
-
based multicore
clouds

5.
Cloud services capable of making use of
private storage
. For example, run Google Docs
on Pogoplug servers at home

6.
Fault
-
tolerant file systems for flash storage

under $$ that can survive faulty blocks present
in flash storage (this is different from the usual
wear
-
leveling file system)

7.
Cloud Computing for Everything
(cc
-
for
-
x)
where x can be e
-
commerce, healthcare,
sensor networks, mobile phones, etc.

47

Conclusion (1/2)

1.
Today’s cloud computing results from decades
of technology advances in areas such as
server CPUs, operating systems, programming
models, networks, fault
-
tolerant software,
virtualization, data centers management,
power management, etc.

2.
In fact, similar visions were actually known 40
some years ago, e.g., from Project MAC at MIT

3.
It is the implementation that has made the
difference
. In particular, it is the highly
scalable technologies developed in recent
years that have suddenly made cloud
computing a hot area