Harold Lim, Fei Dong, Shivnath Babu

gorgeousvassalSoftware and s/w Development

Nov 7, 2013 (3 years and 5 months ago)

49 views

Presented by Carl Erhard & Zahid Mian

Authors:
Herodotos

Herodotou,

Harold Lim,
Fei

Dong, Shivnath Babu

Duke University

Analysis in the Big Data Era

9/26/2011

2

Massive Data

Data

Analysis

Insight

Key to
Success

=
Timely

and
Cost
-
Effective

Analysis

Starfish

We want a MAD System

9/26/2011

Starfish

3

M
agntetism

“Attracts” or welcomes all
sources of data, regardless of
structure, values, etc.

A
gility

Adaptive, remains in sync with
rapid data evolution and
modification

D
epth

More than just your typical
analytics, we need to support
complex operations like
statistical analysis and
machine learning

No wait…I mean MADDER

9/26/2011

Starfish

4

D
ata
-
lifecycle

Do more than just queries,


Awareness

optimize the movement,




storage, and processing of big



data

E
lasticity

Dynamically adjust resource
usage and operational costs
based on workload and user
requirements

R
obustness

Provide storage and querying
services even in the event of
some failures

Practitioners of Big Data Analytics


Who are the users?


Data analysts, statisticians, computational scientists…


Researchers, developers, testers…


Business Analysts…


You!



Who performs setup and tuning?


The users!


Usually lack expertise to tune the system

9/26/2011

5

Starfish

Motivation

9/26/2011

Starfish

6

Tuning Challenges


Heavy use of
programming languages
for MapReduce
programs (e.g., Java/python)



Data loaded/accessed as
opaque files



Large space
of tuning choices (over 190 parameters!)



Elasticity

is wonderful, but hard to achieve (
Hadoop

has many useful mechanisms, but policies are lacking)



Terabyte
-
scale

data cycles

9/26/2011

7

Starfish


Our goal:
Provide
good

performance automatically

Starfish: Self
-
tuning System

9/26/2011

8

MapReduce

Execution Engine

Distributed File System

Hadoop

Java / C++ /


R

/ Python

Oozie

Hive

Pig

Elastic

MapReduce

Jaql

HBase

Starfish

Analytics System

Starfish

What are the Tuning Problems?

9/26/2011

9

Job
-
level
MapReduce

configuration

Workload
management

Data

layout

tuning

Cluster sizing

Workflow
optimization

J
1

J
2

J
3

J
4

Starfish

Starfish’s Core Approach to Tuning

9/26/2011

10

1)
i
f

Δ
(
conf. parameters) then
what

…?

2)
if

Δ
(data properties)
then
what

…?

3)
if

Δ
(cluster properties)
then
what

…?


Profiler

Collects concise

s
ummaries of

execution

What
-
if Engine

Estimates impact of
hypothetical changes
on execution

Optimizers

Search through space of tuning choices

Job

Workflow

Workload

Data layout

Cluster

Starfish

Starfish Architecture

9/26/2011

11

Starfish

Job Level Tuning


Just
-
in
-
Time Optimizer
: Automatically selects
efficient execution techniques for MapReduce jobs.



Profiler
: A Starfish component which is able to collect
detailed summaries of jobs on a task
-
by
-
task basis.



Sampler
: Collects statistics about input, intermediate,
and output data of a MapReduce job.

9/26/2011

Starfish

12

MapReduce Job Execution

9/26/2011

13

split 0

map

out 0

reduce

split 2

map

split 1

map

split 3

map

Out 1

reduce

job
j

=
<

program
p
, data
d
, resources
r
, configuration
c
>

Starfish

What Controls MR Job Execution?


Space of configuration choices:


Number of map tasks


Number of reduce tasks


Partitioning of map outputs to reduce tasks


Memory allocation to task
-
level buffers


Whether output data from tasks should be compressed


Whether combine function should be used

9/26/2011

14

job
j

=
<

program
p
, data
d
, resources
r
, configuration
c
>

Starfish

Effect of Configuration Settings


Use
defaults

or set
manually

(rules
-
of
-
thumb)


Rules
-
of
-
thumb
may not
suffice

9/26/2011

15

Two
-
dimensional
projection of a multi
-
dimensional surface

(Word Co
-
occurrence
MapReduce

Program)

Rules
-
of
-
thumb settings

Starfish

MapReduce Job Tuning in a Nutshell


Goal:




Challenges:

p

is an arbitrary MapReduce program;
c

is
high
-
dimensional; …

9/26/2011

16

)
,
,
,
(
min
arg
c
r
d
p
F
c
S
c
opt


)
,
,
,
(
c
r
d
p
F
perf


Profiler



What
-
if Engine



Optimizer

Runs
p

to collect a
job profile
(concise
execution summary) of <
p
,
d
1
,
r
1
,
c
1
>

Given profile of <
p
,
d
1
,
r
1
,
c
1
>, estimates
virtual profile
for <
p
,
d
2
,
r
2
,
c
2
>

Enumerates and searches through the
optimization space S
efficiently

Starfish

Job Profile


Concise representation
of program execution as a job


Records information at the level of
“task phases”


Generated by
Profiler

through
measurement

or by the
What
-
if Engine
through
estimation


9/26/2011

17

Memory
Buffer

Merge

Sort,

[Combine],

[Compress]

Serialize,

Partition

map

Merge

split

DFS

Spill

Collect

Map

Read

Starfish

Job Profile Fields

Dataflow:
amount of data
flowing

through task phases

Map output bytes

Number of spills

Number of records in buffer per spill

9/26/2011

18

Costs:
execution times at

the level of
task phases

Read phase time in the map task

Map phase time in the map task

Spill phase time in the map task

Dataflow Statistics:
statistical
information about dataflow

Width of input key
-
value pairs

Map selectivity in terms of records

Map output compression ratio

Cost Statistics:
statistical
information about resource

costs

I/O cost for reading from local disk per byte

CPU cost for executing the Mapper per record

CPU cost for uncompressing the input per byte

Starfish

Generating Profiles by Measurement


Goals


Have zero overhead when profiling is turned off


Require no modifications to Hadoop


Support unmodified MapReduce programs written in
Java or Hadoop Streaming/Pipes (Python/Ruby/C++)



Approach: Dynamic (on
-
demand) instrumentation


Event
-
condition
-
action rules are specified (in Java)


Leads to run
-
time instrumentation of Hadoop internals


Monitors task phases of MapReduce job execution


We currently use Btrace (Hadoop internals are in Java)

9/26/2011

19

Starfish

Generating Profiles by Measurement

9/26/2011

20

split 0

map

out 0

reduce

split 1

map

raw data

raw data

raw data

map
profile

reduce
profile

job
profile

Use of Sampling


Profile fewer tasks


Execute fewer tasks

JVM = Java Virtual Machine, ECA = Event
-
Condition
-
Action

JVM

JVM

JVM

Enable Profiling

ECA rules

Starfish

Results of Job Profiling

9/26/2011

Starfish

21

Results using Job Profiling

9/26/2011

Starfish

22

Workflow
-
Aware Scheduling


Unbalanced Data Layout


Skewed Data


Data Layout Not Considered when
SchedulingTasks


Addition/Dropping Partitions

No Rebalance


Can Lead to Failures Due to Space Issues


Locality
-
Aware Schedulers Can Make Problem Worse


Possible Solutions:


Change # of Replicas


Collocating Data (Block Placement Policy)

9/26/2011

Starfish

23

Impact of Unbalanced Data Layout

9/26/2011

Starfish

24

Impact of Unbalanced Data Layout

9/26/2011

Starfish

25

Impact of Unbalanced Data Layout

9/26/2011

Starfish

26

Workflow
-
Aware Scheduling


Makes Decisions by Considering Producer
-
Consumer
Relationships


9/26/2011

Starfish

27

Nodes

Starfish’s Workflow
-
Aware Scheduler


Space of Choices:


Block Placement Policy: Round Robin (
Local Write
is
default)


Replication Factor


Size of blocks: general large for large files


Compression: Impacts I/O; not always beneficial

9/26/2011

Starfish

28

Starfish’s Workflow
-
Aware Scheduler


What
-
If Questions


A) Expected
runtime of Job
P if the RR block placement
policy is used for P’s output files?


B) New Data layout in the cluster if the RR block placement
policy is used for P’s output files?


C) Expected runtime of Job C1 (C2) if its input data layout
is the one in the answer of Question (above)?


D) Expected runtimes of Jobs C1 and C2 if scheduled
concurrently when Job P completes?


E) Given Local Write block policy and RF = 1 for Job P’s
output, what is the expected increase in the runtime of Job
C1 if one node in the cluster fails during C1’s execution?

9/26/2011

Starfish

29

Estimates from the What
-
if Engine

9/26/2011

30

Hadoop

cluster: 16 nodes, c1.medium

MapReduce

Program: Word Co
-
occurrence

Data set: 10 GB Wikipedia

True surface

Estimated surface

Starfish

Workflow Scheduler Picks Layout

9/26/2011

Starfish

31

Optimizations
-
Workload Optimizer

9/26/2011

Starfish

32

Provisioning
--
Elastisizer


Motivation: Amazon Elastic
MapReduce



Data in S3, processed in
-
cluster, stored to S3


User Pays for Resources Used


Elastisizer

Determines …


Best cluster


Hadoop

configurations


… Based on user
-
specified goals (execution time and
costs)

9/26/2011

Starfish

33

Elastisizer

Configuration Evaluation

9/26/2011

Starfish

34

Elastisizer

Configuration Evaluation

9/26/2011

Starfish

35

Elastisizer
-

Cluster Configurations

9/26/2011

Starfish

36

Multi
-
objective Cluster Provisioning

9/26/2011

37

0
200
400
600
800
1,000
1,200
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
Running Time
(min)

Actual
Predicted
0.00
2.00
4.00
6.00
8.00
10.00
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
Cost ($)

EC2 Instance Type for Target Cluster

Actual
Predicted
Instance Type for Source Cluster:
m1.large

Starfish

Critique of Paper


Good


Source Available for Implementation


Able to See the impact of various settings


Good Visualization Tools


Tutorials/Source available at duke.edu/starfish


Bad


The paper (and subsequent materials) talk a lot about
what

Starfish does, but not necessarily
how

it does
it


There is no documentation on
LastWord
, and this
seems
important


Only works after a the job/workflow has been executed
at least once

9/26/2011

Starfish

38

Starfish’s Visualizer


Timeline Views


Shows progress of a job execution at the task level


See execution of same job with different settings


Data
-
flow Views


View of flow of data among nodes, along with MR jobs


“Video Mode” allows playback execution from past


Profile Views


Timings, data
-
flow, resource
-
level



9/26/2011

Starfish

39

Timeline Views

9/26/2011

Starfish

40

Timeline View

9/26/2011

Starfish

41

Data Skew View

9/26/2011

Starfish

42

Data Skew View

9/26/2011

Starfish

43

Data Skew View

9/26/2011

Starfish

44

Data
-
flow Views

9/26/2011

Starfish

45

References


Herodotou
,
Herodotos
, et al. "Starfish: A self
-
tuning
system for big data analytics."
Proc. of the Fifth CIDR
Conf
. 2011.


Dong,
Fei
.
Extending Starfish to Support the Growing
Hadoop

Ecosystem
. Diss. Duke University, 2012.


Herodotou
,
Herodotos
,
Fei

Dong, and
Shivnath

Babu
.
"
MapReduce

programming and cost
-
based
optimization? Crossing this chasm with Starfish."
Proceedings of the VLDB Endowment

4.12 (2011).


http://www.cs.duke.edu/starfish
/


http://
www.youtube.com/watch?v=Upxe2dzE1uk

9/26/2011

Starfish

46

Backup

9/26/2011

Starfish

47

Hadoop MapReduce Ecosystem


Popular
solution

to Big Data Analytics


9/26/2011

48

MapReduce

Execution Engine

Distributed File System

Hadoop

Java / C++ /


R

/ Python

Oozie

Hive

Pig

Elastic

MapReduce

Jaql

HBase

Starfish

Workflow
-
level Tuning


Starfish has a
Workflow
-
aware Scheduler

which
addresses several concerns:


How do we equally distribute computation across
nodes?


How do we adapt to imbalance of a load or energy
cost?



The
Workflow
-
aware Scheduler
works with the
What
-
if Engine

and the
Data Manager

to answer these
questions

9/26/2011

Starfish

49

Workload
-
level Tuning


Starfish’s
Workload Optimizer

is aware of each
workflow that will be executed. It reorders the
workflows in order to make them more efficient.


This includes reusing data for different workflows that
use the same MapReduce jobs.

9/26/2011

Starfish

50

What
-
if Engine

Job Oracle

Virtual Job Profile for
<p, d
2
, r
2
, c
2
>

What
-
if Engine

9/26/2011

51

Task Scheduler Simulator

Job

Profile

<p, d
1
, r
1
, c
1
>

Properties of Hypothetical job

Input Data

Properties

<d
2
>

Cluster

Resources

<r
2
>

Configuration

Settings

<c
2
>

Possibly Hypothetical

Starfish

Virtual Profile Estimation

9/26/2011

52

Given

profile for job
j = <p, d
1
, r
1
, c
1
>

estimate

profile for job
j' = <p, d
2
, r
2
, c
2
>

(Virtual) Profile for
j'

Dataflow

Statistics

Dataflow

Cost

Statistics

Costs

Profile for
j

Input

Data
d
2

Confi
-
guration

c
2

Resources

r
2

Costs

White
-
box Models

Cost

Statistics

Relative

Black
-
box

Models

Dataflow

White
-
box Models

Dataflow

Statistics

Cardinality

Models

Starfish

Job Optimizer

9/26/2011

53

Best Configuration

Settings
<
c
opt
>

for
<p, d
2
, r
2
>

Subspace Enumeration

Recursive Random Search

Just
-
in
-
Time Optimizer

Job

Profile

<p, d
1
, r
1
, c1>

Input Data

Properties

<d
2
>

Cluster

Resources

<r
2
>

What
-
if

calls

Starfish