Bill Howe - eScience Institute

townripeΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 9 μήνες)

189 εμφανίσεις

eScience

Open
Mic
:

Cloud Computing

Bill Howe,
Phd

eScience

Institute, UW

http://escience.washington.edu

3/18/2013

Bill Howe, eScience Institute

4

eScience

is about data


Old model:

Q略特u瑨攠w潲汤


⡄慴愠慣煵楳楴楯渠捯c灬敤 瑯 愠獰散楦楣s桹h潴桥獩猩


New model:

䑯w湬潡搠瑨攠w潲汤


⡄慴愠慣a畩獩s楯渠獵灰潲o猠m慮礠桹灯瑨敳敳e


Astronomy: High
-
resolution, high
-
frequency sky surveys (SDSS, LSST, PanSTARRS)


Biology: lab automation, high
-
throughput sequencing,


Oceanography: high
-
resolution models, cheap sensors, satellites

40TB / 2 nights

~1TB / day

100s of devices

3/18/2013

Bill Howe, eScience Institute

5

eScience

is married to the Cloud:

Scalable computing and
storage for everyone

3/18/2013

Bill Howe, eScience Institute

6

Generator


[Slide source: Werner Vogels]

3/18/2013

Bill Howe, eScience Institute

7

"... computing may someday be organized as a public utility just as

the telephone system is a public utility... The computer utility could

become the basis of a new and important industry.


Emeritus at Stanford

Inventor of LISP

--

John McCarthy

1961

3/18/2013

Bill Howe, eScience Institute

8

Economies
of Scale

src: Armbrust et al., Above the Clouds: A Berkeley View of Cloud

Computing, 2009


3/18/2013

Bill Howe, eScience Institute

9

Economies of Scale

src: James Hamilton, Amazon.com

3/18/2013

Bill Howe, eScience Institute

10

Elasticity

Provisioning for peak load

src: Armbrust et al., Above the Clouds: A Berkeley View of Cloud

Computing, 2009


3/18/2013

Bill Howe, eScience Institute

11

Elasticity

Underprovisioning

src: Armbrust et al., Above the Clouds: A Berkeley View of Cloud

Computing, 2009


3/18/2013

Bill Howe, eScience Institute

12

Elasticity

Underprovisioning, more realistic

src: Armbrust et al., Above the Clouds: A Berkeley View of Cloud

Computing, 2009


3/18/2013

Bill Howe, eScience Institute

13

Animoto


[Werner Vogels, Amazon.com]

3/18/2013

Bill Howe, eScience Institute

14

Periodic


[Deepak Singh, Amazon.com]

Growth

3/18/2013

Bill Howe, eScience Institute

16

Growth


3/18/2013

Bill Howe, eScience Institute

17

Amazon


[Werner Vogels, Amazon.com]

3/18/2013

Bill Howe, eScience Institute

18

[Werner Vogels, Amazon.com]

History

History

3/18/2013

Bill Howe, eScience Institute

20

Application
Service
Providers

2000

Timeline

2001

2004

2005+

2006

2008

2009

3/18/2013

Bill Howe, eScience Institute

21

Exemplars


Software as a Service



Platform as a Service



Infrastructure as a Service

3/18/2013

Bill Howe, eScience Institute

22

Grid Computing


Grid vs. Cloud


WAN vs. centralized


Heterogeneous vs. Data Center


Physical vs. Virtualized


Fewer, larger, dedicated allocations


vs. more, smaller, shared allocations


Foster 2002

3/18/2013

Bill Howe, eScience Institute

23

Cloud Services

Automation

Constrained

Google Docs

SalesForce.com

Force.com

Google App
Engine

Windows
Azure

EC2

S3

Elastic MapReduce

Infrastructure
-
aaS

Platform
-
aaS

Software
-
aaS

SQL Azure

Microsoft Azure

3/18/2013

Bill Howe, eScience Institute

25


Highly
-
available

Fabric Controller (FC)


Azure FC Owns this Hardware

[Roger Barga, Microsoft]

History

3/18/2013

Bill Howe, eScience Institute

26

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

27

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

28

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

29

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

30

At Minimum

CPU: 1.5
-
1.7 GHz x64

Memory: 1.7GB

Network: 100
+

Mbps

Local Storage: 500GB


Up to

CPU: 8 Cores

Memory: 14.2 GB

Local Storage: 2
+

TB


[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

31

Fabric

VM

Web Role

Worker Role

Agent

Agent

main()

{ … }

Load
Balancer

HTTP

IIS

ASP.NET, WCF,
etc.

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

32



Fabric


Compute

Storage

Application

Blobs

Queues

HTTP

Tables

Drives

[Roger Barga, Microsoft]

3/18/2013

Bill Howe, eScience Institute

33

AzureScope


http://azurescope.cloudapp.net/


Performance measurements

[Roger Barga, Microsoft]

MY 2 FAVORITE USE CASES

3/18/2013

Bill Howe, eScience Institute

34

3/18/2013

Bill Howe, eScience Institute

35

Use Case 1:

Google Docs for developers



The cloud is the ultimate collaborative development
environment


A shared environment outside of the jurisdiction of over
-
protective
(or otherwise non
-
responsive) sysadmins


No bugs closed as

can

t replicate



Example: New software for serving oceanographic model
results, requiring collaboration between UW,
OPeNDAP.org, and OOI

Bill Howe

3/18/2013

Bill Howe, eScience Institute

36


Waited two weeks for credentials to be established


Gave up, spun up an EC2 instance,
rolling
within an hour

Similarly, Seattle

s Institute for Systems Biology
uses EC2/S3 for sharing computational pipelines

Use Case 2: Reproducible Research


Protocols, assays, experiments, workflows
are increasingly computational


Paradoxically, these activities are often
harder
to reproduce than

manual


protocols


Why?

3/18/2013

Bill Howe, eScience Institute

37

Software dependencies

account management

OpenGL

3D Drivers

Mesa

Java 1.5

SAX

mod_python

TomCat

config

security

PostGIS

Proj4

config

VTK

PostgreSQL

EJB

Python2.5

SOAP Libs

XML
-
RPC Libs

Apache

S3/EC2

SQL Server

Data Services

Google App Engine

MATLAB

Division of Responsibility

Q: Where should we place the division of responsibility between
developers and users?

Need to consider skillsets


Can they install
packages?


Can they compile code?


Can they write DDL statements?


Can they configure a web server?


Can they troubleshoot network problems?


Can they troubleshoot permissions problems?

Frequently the answer
is


No


Plus: Tech support is hard. Usually easier to

fix it yourself.


Division of Responsibility

Is there anything busy users
are

willing to do?

Example in the classroom


Dr. Randy Leveque, AMATH 574, Winter 2009


Virtual machines with Clawpack software pre
-
installed,
along with data, models, and analysis tools.


See a How To at

http://escience.washington.edu/

search for

virtual machine



(or go here:
http://bit.ly/eMOcle

)



3/18/2013

Bill Howe, eScience Institute

41

Use Case 3: Data Sharing


The days of FTP are over


It takes days to transfer 1TB over the Internet,
and it isn

t likely to succeed.


Need to push the computation to the data, rather
than push the data to the computation


Cloud is perfect


Globally shared storage


Equipped with arbitrary, on
-
demand computation by
anyone

3/18/2013

Bill Howe, eScience Institute

42

3/18/2013

Bill Howe, eScience Institute

43

CASE STUDIES

3/18/2013

Bill Howe, eScience Institute

44

3/18/2013

Bill Howe, eScience Institute

45

3/18/2013

Bill Howe, eScience Institute

46

FoldIt


Database, fileserver,
multiple webservers


< $30k for a 3 year term


Database replicated in
multiple zones


Web servers scale
automatically with usage


i
ncludes 1TB of storage


3/18/2013

Bill Howe, eScience Institute

47

3/18/2013

Bill Howe, eScience Institute

48

Many more



Computational Fluid Dynamics


Astronomy


GPGPUs


HIPAA
-
protected applications


National Security applications



It

s Mainstream!

3/18/2013

Bill Howe, eScience Institute

50