CEG7380 Cloud Computing Lecture 1

earsplittinggoodbeeInternet and Web Development

Nov 3, 2013 (3 years and 7 months ago)

65 views

CEG7380 Cloud Computing

Lecture 1

Keke Chen

Outline


Syllabus


Scope of this course


Tentative schedule


Prerequisites


Resources


Assignments



Introduction

Scope of this course


Understand the basic ideas of cloud
computing


Get familiar with


Tools


Systems


Expose to some research topics



Two major parts:


Processing large data with the cloud


Scaling up/down web applications
with the cloud


Note: some programming parts need
self
-
study



Prerequisites


Some programming skills


Java, python, shell


Comfortable with learning new
programming frameworks


Sufficient knowledge about


Data structure and databases


Operating systems


Distributed systems


Assignments and Grading


Reading papers (~3) (10%)


Some miniprojects (4~5) (60%)


Help you master the concepts


Learn to use tools and systems


Self
-
motivated research projects are
strongly encouraged!


Final exam (20%)


Class attendance and discussion
(10%)



Resources


updated reference list


Inhouse hadoop cluster


AWS access


coupon code for each student


Pilot


Submitting reading assignments and
projects


Tentative Schedule


Parallel data processing


Distributed file systems (GFS, HDFS)


MapReduce


High
-
level distributed data management



Cloud infrastructures


Virtualization


AWS and Eucalyptus


Interactive front
-
end


Google App Engine



Cloud security and privacy


Research topics


In projects, we will learn to use


Hadoop


Mapreduce, Pig Latin


AWS


google app engine

Cloud Computing

lecture 1
-
2

Some slides are borrowed from
UC
Berkeley RAD Lab








Keke Chen



Outline


What is cloud computing?


Why now?


Cloud killer applications


Cloud economics


Challenges and opportunities


“above the cloud”


“Clairemont Report”



What is Cloud Computing?


Old idea: Software as a Service (SaaS)


Def: delivering applications over the
Internet


Recently: “[Hardware, Infrastrucuture,
Platform] as a service”



Utility Computing: pay
-
as
-
you
-
use
computing


Illusion of infinite resources


No up
-
front cost


Fine
-
grained billing (e.g. hourly)

12

Cloud computing vs. grid
computing


Cloud computing = virtualization+
grid + services + utility computing


Grid computing: resource provisioning,
load balancing, parallel processing


Views of different users


System admin/hadoop users: grid


Application owners/service users:
service, utility


Users and cloud providers

Why Now?


Experience with very large datacenters


profitable for cloud providers


economics of scale


Pervasive broadband Internet


Fast x86 virtualization


Pay
-
as
-
you
-
go billing model


Large user base


Online payment


Online Ads


Content distribution



Web 2.0 lowers the entry point to e
-
business


more small e
-
business owners



Large user base of clouds


15

Spectrum of Clouds


Instruction Set VM (Amazon EC2,
3Tera)


Bytecode VM (Microsoft Azure)


Framework VM


Google AppEngine, Force.com

EC2

Azure

AppEngine

Force.com

Lower
-
level,

Less management

Higher
-
level,

More management

16

Cloud Killer Apps


Mobile and web applications


Batch processing / MapReduce


Data analytics (big data)


E.g., OLAP, data mining, machine learning


Extensions of desktop software


Matlab, Mathematica


17

Unused resources

Cloud Economics


Pay by use instead of provisioning for peak

Static data center

Data center in the cloud

Demand

Capacity

Time

Resources

Demand

Capacity

Time

Resources

18

Unused resources

Economics of Cloud Users


Risk of over
-
provisioning: underutilization

Static data center

Demand

Capacity

Time

Resources

19

Economics of Cloud Users


Heavy penalty for under
-
provisioning

Lost revenue

Lost users

Resources

Demand

Capacity

Time (days)

1

2

3

Resources

Demand

Capacity

Time (days)

1

2

3

Resources

Demand

Capacity

Time (days)

1

2

3

20

Economics of Cloud Providers


5
-
7x economies of
scale

[Hamilton
2008]







Extra benefits


Amazon: utilize off
-
peak capacity


Microsoft: sell .NET tools


Google: reuse existing infrastructure

Resource

Cost in

Medium DC

Cost in

Very Large DC

Ratio

Network

$95 / Mbps / month

$13 / Mbps / month

7.1x

Storage

$2.20 / GB / month

$0.40 / GB / month

5.7x

Administration

≈140 servers/admin

>1000 servers/admin

7.1x

21

Adoption Challenges

Challenge

Opportunity

Availability

Multiple providers & DCs

Data lock
-
in

Standardization

Data Confidentiality,
Auditability, and privacy

Encryption, VLANs,
Firewalls; Geographical Data
Storage; Privacy preserving
data outsourcing

22

Growth Challenges

Challenge

Opportunity

Data transfer
bottlenecks

FedEx
-
ing disks, Data
Backup/Archival

Performance
unpredictability

Improved VM support, flash
memory, scheduling VMs

Scalable storage

Invent scalable store

Bugs in large distributed
systems

Invent Debugger that relies
on Distributed VMs

Scaling quickly

Invent Auto
-
Scaler that
relies on ML; Snapshots

23

Policy and Business Challenges

Challenge

Opportunity

Reputation Fate Sharing

Offer reputation
-
guarding
services like those for email

Software Licensing

Pay
-
for
-
use licenses; Bulk
use sales

24


Research Challenges Mentioned by
Database Community (Claremont
Report)

Functionality and operational
cost


Background: compare massive
-
scale
data intensive computing systems
with today’s DBMS


Limited functionality


Simple APIs (e.g. mapreduce)


Pushes more burden on developers


Benefits


Easier to manage


Lower operational cost


Service Level Agreement (SLA) that is
hard to provide for a SQL DBMS

P.S. DB Systems are notorious for their expenses in
installation and maintenance.

Manageability


Features of cloud systems


Limited human intervention


High variance workloads


A variety of shared infrastructures


No DBAs or Administrators to assist developers



Systems need to do work automatically


Self
-
managing


Adaptive (autonomous) computing

Data security and privacy


Users sharing physical resources in a
cloud


Protect from each other (security)


Protect from curious cloud providers
(privacy)


Successes may depend on specific
target usage scenarios


Examples


Query based services


Mining based services

Datasets over multiple clouds


Interesting datasets might be
available in different clouds


Different cloud providers


Private or public clouds



Services mashing up datasets


Inevitably crossing clouds



Federated cloud architectures

Algorithms on Big data


Working on “Big Data”


Data mining


Machine learning


Visualization


Traditionally assume data is in


flat files or relational databases


Distributed data organization puts
new challenges


Redesign algorithms


Redesign frameworks