The Datacenter Needs an Operating System

doctorrequestInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 4 χρόνια και 28 μέρες)

82 εμφανίσεις

The Datacenter Needs an
Operating System

UC  BERKELEY  
Anthony D. Joseph



LASER Summer School

September 2013

My Talks at LASER 2013

1.

AMP Lab introduction

2.

The Datacenter Needs an Operating System

3.

Mesos, part one

4.

Dominant Resource Fairness

5.

Mesos, part two

6.

Spark

2

Collaborators



Matei

Zaharia




Benjamin Hindman



Andy Konwinski



Ali Ghodsi



Randy Katz



Scott Shenker



Ion Stoica

3

M
achines: Background

Clusters of commodity servers have become a
major computing platform in industry and
academia (100’s – 10,000’s of machines)

Driven by data volumes outpacing the processing
capabilities of single machines – big data and
science

Democratized by cloud computing

4

M
achines: Background

Some have declared that “the datacenter is the
new computer”

Our claim: this new computer increasingly needs
an operating system

Not necessarily a new host OS, but a common
software layer that manages resources and
provides shared services for the whole datacenter,
like an OS does for one host

5

Why Datacenters need an OS

Growing diversity of applications

»

Computing frameworks:
MapReduce
, Dryad,
Pregel
,
Percolator,
Dremel
, MR Online, Spark

»

Storage systems: GFS,
BigTable
, Dynamo, SCADS

»

Web apps and supporting services

Dryad

Pregel

Cassandra

Hypertable

6

Why Datacenters need an OS

Growing diversity of applications

»

Computing frameworks:
MapReduce
, Dryad,
Pregel
,
Percolator,
Dremel
, MR Online, Spark

»

Storage systems: GFS,
BigTable
, Dynamo, SCADS

»

Web apps and supporting services

Growing diversity of users

»

200 Hive users at Facebook, 
running near-interactive 
ad hoc queries

Same reasons computers 
needed one!

7

What Operating Systems Provide

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

8

What Operating Systems Provide

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

Most importantly:
enables a highly
interoperable
software

ecosystem


that we now take for granted



9

Example

A scientist analyzing data on one machine can pipe it
through a variety of tools, write new tools that
interface with these through standard APIs, and trace
across the stack

In the future, the scientist should be able to launch a
cluster on EC2 and do the same things:

»

Mix and combine a variety of apps & programming models

»

Write new parallel programs that talk to these

»

Get a unified interface for managing the cluster

»

Debug and trace across all these components

10

Today’s Datacenter OS

Hadoop

MapReduce
as common execution and
resource sharing platform

»

Means jobs have to compile to
MapReduce


»

Inter-user resource sharing, but at the level of MR jobs
Hadoop
InputFormat
API for data sharing – what
happens with the next hot platform after Hadoop?

11

Today’s Datacenter OS

Abstractions for productivity programmers, but
not for system builders

Difficult to debug, especially across layers

Other examples:


»

Amazon/Azure services

»

Google internal stack and Google Compute Engine

»

Hadoop YARN

12

Today’s Datacenter OS

Abstractions for productivity programmers, but
not for system builders

Difficult to debug, especially across layers

Other examples:


»

Amazon/Azure services

»

Google internal stack and Google Compute Engine

»

Hadoop YARN

The
problems
motivating a datacenter OS are well
recognized, but solutions are
narrowly targeted



Can researchers take a longer-term view?

13

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

14

Resource Sharing

To solve these interaction problems we would
like to have a computer made simultaneously
available to many users in a manner somewhat
like a telephone exchange. Each user would be
able to use a console at his own pace and
without concern for the activity of others using
the system.”

– Fernando J.
Corbató
, 1962



15

Today’s Resource Sharing

Today, cluster apps are built to run independently 
and assume they own a fixed set of nodes

Result: inefficient static partitioning

What’s the right interface for dynamic sharing?

0%  
17%  
33%  
0%  
17%  
33%  
0%  
17%  
33%  
0%  
50%  
100%  
App 1

App 2

App 3

16

Tomorrow’s Datacenter OS

Resource sharing:

»

Lower-level interfaces for fine-grained sharing – Mesos
and Hadoop YARN are first steps in this direction

»

Optimization for a variety of metrics (e.g., energy)

»

Integration with network scheduling mechanisms (e.g.,
Seawall [NSDI ‘11], NOX, Orchestra)

»

Others: Azure Fabric Controller

17

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

18

Tomorrow’s Datacenter OS

Persistent data sharing – many design issues
addressed

»

Placement/Locality

»

Reliability

»

Availability

»

Consistency

»

Bandwidth/Latency

»

Software versioning

19

Tomorrow’s Datacenter OS

Persistent data sharing:


»

Standard interfaces for cluster file systems, key-value
stores, etc.

»

Lineage instead of replication for reliability (Spark RDDs)

»

Application frameworks self-manage versioning

Many possibilities:


»

Amazon Elastic Block Store and S3

»

HDFS

»

Azure storage services

20

Tomorrow’s Datacenter OS

Transient data sharing – many design issues
addressed

»

Failures on either side

»

Consistency

»

Timeliness

21

Tomorrow’s Datacenter OS

Transient data sharing:


»

In-memory data sharing (e.g. Spark, DFS cache), and a
unified system to manage this memory – DFS cache for
MapReduce
cluster could serve 90% of jobs at Facebook
(
HotOS


11)

»

Streaming data abstractions (analogous to pipes)

Many possibilities:


»

Amazon/Azure message queues

»

Percolator

22

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

23

Tomorrow’s Datacenter OS

Programming abstractions:


»

Many new distributed application programming models,
abstractions, and languages

»

Tools for programming for distributed coordination and
fault-tolerance (e.g., Apache Zookeeper)

»

New tools that can be used to build the next
MapReduce
/
BigTable
in a week (e.g., BOOM)

»

Efficient implementations of communication primitives
(e.g. shuffle, broadcast)

24

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing

Programming

Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace
,
DTrace
, top, …

files, pipes, IPC, …

libraries, languages

25

Tomorrow’s Datacenter OS

Debugging and Monitoring facilities:


»

Tracing and debugging tools that work across the cluster
software stack (e.g. X-Trace, Dapper, Magpie,
Hystrix
)

»

Replay debugging that takes advantage of limited
languages / computational models

»

Unified monitoring infrastructure and APIs (e.g.,
Hystrix
)

26

Putting it Together

A successful datacenter OS might let users:

»

Build a
Hadoop
-like software stack in a week using the
OS’s APIs, while gaining other benefits (e.g. cross-stack
replay debugging)

»

Share data efficiently between independently written
apps and programming frameworks

»

Understand cluster behavior without having to log into
individual nodes

»

Dynamically share the cluster with other users

27

How Researchers can Help

Focus on paradigms, not performance

»

Industry is tackling performance but lacks luxury to take
long-term view towards abstractions

Explore clean-slate approaches

»

Likelier to have greater impact here than in a “real” OS
because datacenter software changes quickly!

Bring cluster computing to non-experts

»

Most impactful (datacenter as the new workstation)

»

Much harder and more rewarding than big users

28

Berkeley Data Analytics Stack

Apache Spark

Shark

BlinkDB


SQL

HDFS /
Hadoop
Storage / Tachyon

Apache
Mesos
/ YARN Resource Manager


Spark
Streaming

GraphX


MLBase


29

Apache Mesos – Cluster Operating System

Efficiently shares resources
among diverse parallel
applications

Mesos
 slave  
Mesos
 master  
Dryad  
scheduler  
Mesos
 slave  
Hadoop
 
executor  
task  
Mesos
 slave  
Dryad  
executor  
task  
MPI  
scheduler  
MPI  
executor  
task  
Hadoop
 
scheduler  
Dryad  
executor  
task  
MPI  
executor  
task  
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
31
61
91
121
151
181
211
241
271
301
331
Share of Cluster
Time (s)
MPI
Hadoop
Spark
30

M
achines

Make datacenter a real computer!

Node OS

(e.g. Linux)

Node OS

(e.g. Windows)

Node OS

(e.g. Linux)



Datacenter “OS” (e.g., Apache Mesos)



Share datacenter between multiple cluster computing apps



Provide new abstractions and services

AMP

stack

Existing

stack

31

M
achines

Make datacenter a real computer!

Node OS

(e.g. Linux)

Node OS

(e.g. Windows)

Node OS

(e.g. Linux)



Datacenter “OS” (e.g., Apache Mesos)

Hadoop


MPI

Hypertbale




Cassandra

Hive

Support existing
cluster computing
apps

AMP

stack

Existing

stack

32

M
achines

Make datacenter a real computer!

33

Node OS

(e.g. Linux)

Node OS

(e.g. Windows)

Node OS

(e.g. Linux)



Spark

SCADS



Datacenter “OS” (e.g., Apache Mesos)

Hadoop


MPI

Hypertbale




Cassandra

Hive

PIQL

Support interactive
and iterative data
analysis (e.g., ML
algorithms)

Consistency
adjustable data
store

Predictive &
insightful query
language

AMP

stack

Existing

stack

M
achines

Make datacenter a real computer!

Node OS

(e.g. Linux)

Node OS

(e.g. Windows)

Node OS

(e.g. Linux)



Spark

SCADS



Datacenter “OS” (e.g., Apache Mesos)

Applications, tools

Hadoop


MPI

Hypertbale




Cassandra

Hive

PIQL



Advanced ML algorithms



Interactive data mining



Collaborative visualization

AMP

stack

Existing

stack

34

Milestones

2010: Mesos in Apache incubator

2010: Spark open sourced

2012: Shark (SQL) open sourced

Feb 2013: Spark Streaming alpha open sourced

Mar 2013: Tachyon alpha open sourced

Jun 2013: Spark entered Apache Incubator

Aug 2013: Machine Learning library for Spark

35

BDAS Users 
(partial list)

36

37

BDAS Buzz

Big Data Landscape – Our Corner

38

MLbase Meet Up at Twitter 
(13 Aug 2013)

39

BDAS Contributors

70 public contributors on
GitHub


»

US, China, India, UK, Canada, Vietnam

»

Startups and large multinationals: Intel, Yahoo,
Ooyala
,
Quantifind
,
ClearStory
,
Palantir
, Foursquare,
Groupon


40

Researchers Using BDAS

UC Berkeley

IBM
Almaden


Cornell

Duke

Tsinghua

Purdue





41

What is fueling the traction?

Superior technologies
J


»

Fast and expressive

»

It works!

Integration with existing
Hadoop
ecosystem

»

HDFS

»

HBase


»

Hive

42

BDAS Future Directions

Future data analytics need to support

»

Fast SQL

»

Approximate queries

»

Machine learning

»

GraphX


»

Streaming

»

Crowdsourcing!!!

Mix and match all of the above

http://ampcamp.berkeley.edu/3/


43

Conclusion

Datacenters need an OS-like software stack for
same reasons as single computers: manageability,
efficiency, programmability, and
thriving software
ecosystem

Multiple DCOS already emerging in ad-hoc ways

Researchers can help by taking a
long-term
systems view towards these problems

44

My Talks at LASER 2013

1.

AMP Lab introduction

2.

The Datacenter Needs an Operating System

3.

Mesos, part one

4.

Dominant Resource Fairness

5.

Mesos, part two

6.

Spark

45