Applying Twister for Scientific Applications

plantationscarfAI and Robotics

Nov 25, 2013 (3 years and 7 months ago)

89 views

S
A
L
S
A

S
A
L
S
A

Applying Twister for Scientific Applications

NSF Cloud PI Workshop

March 17, 2011



Judy Qiu

http://salsahpc.indiana.edu


School of Informatics and Computing

Indiana University

Twister
v0.9

March 15, 2011

New
Infrastructure
for
Iterative MapReduce Programming


SALSA Group


Bingjing Zhang, Yang
Ruan
,
Tak
-
Lon Wu, Judy Qiu, Adam Hughes, Geoffrey Fox,
Applying
Twister to Scientific Applications
, Proceedings of IEEE
CloudCom

2010 Conference,
Indianapolis, November 30
-
December 3, 2010



Auto generation of partition files and
configureMaps


Auto configuration

to setup Twister environment automatically on a cluster


Concurrent file loading in
Mapper

configuration phase


and file loading balancing


Performance improvement

(e.g.
JVM Tuning)


Scalability


Iteratively refining operation


Typical MapReduce runtimes incur extremely high overheads


New maps/reducers/vertices in every iteration


File system based communication


Long running tasks and faste
r communication in Twister enables it to
perform close to MPI



Time for 20 iterations

K
-
Means Clustering

map

map

reduce

Compute the
distance to each
data point from
each cluster center
and assign points
to cluster centers

Compute new cluster

centers

Compute new cluster
centers

User program

S
A
L
S
A

Motivation

Data
Deluge

MapReduce

Classic Parallel
Runtimes (MPI)

Experiencing in
many domains

Data Centered,
QoS

Efficient and
Proven techniques

Input

Output

map

Input

map

reduce

Input

map

reduce

iterations

Pij

Expand the Applicability of MapReduce to more
classes

of Applications

Map
-
Only

MapReduce

Iterative MapReduce

More Extensions

S
A
L
S
A

A Programming Model for Iterative
MapReduce


Distributed data access



In
-
memory MapReduce



Distinction on static data
and variable data (
data
flow vs.
δ

flow
)



Cacheable
map/reduce

tasks (long running tasks)



Combine operation



Support fast intermediate
data transfers





Reduce (Key, List<Value>)

Iterate

Map(Key, Value)

Combine (
Map
<
Key,Value
>)

User
Program

Close()

Configure()

Static

data

δ

flow

Bare
-
metal Nodes

Linux Virtual
Machines

Microsoft Dryad / Twister

Apache Hadoop / Twister

Data Mining Services in the Cloud

Smith Waterman Dissimilarities,
PhyloD

Using
DryadLINQ
, Clustering,
Multidimensional Scaling, Generative Topological Mapping, etc

Xen
, KVM

SaaS

Applications/
Workflow

Cloud
Platform

Cloud

Infrastructure

Hardware

Nimbus, Eucalyptus,
OpenStack
,
OpenNebula

Hypervisor/

Virtualization

Windows Virtual
Machines

Linux Virtual
Machines

Windows Virtual
Machines

Apache
PigLatin
/Microsoft
DryadLINQ
/Google
Sawzall


Higher Level
Languages

Cloud Technologies and Their Applications

S
A
L
S
A

MPI & Iterative MapReduce papers



MapReduce on MPI
Torsten

Hoefler
, Andrew
Lumsdaine

and Jack
Dongarra
,
Towards
Efficient MapReduce Using MPI
,
Recent Advances in Parallel Virtual Machine and
Message Passing Interface
Lecture Notes in Computer Science, 2009, Volume
5759/2009, 240
-
249


MPI with generalized MapReduce


Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung
-
Hee Bae, Judy Qiu,
Geoffrey Fox
Twister: A Runtime for Iterative MapReduce
,
Proceedings of the First
International Workshop on MapReduce and its Applications of ACM HPDC 2010
conference, Chicago, Illinois, June 20
-
25, 2010


Grzegorz

Malewicz
, Matthew H.
Austern
,
Aart

J. C. Bik, James C.
Dehnert
,
Ilan

Horn,
Naty

Leiser
, and
Grzegorz

Czajkowski

Pregel
: A System for Large
-
Scale Graph Processing
,
Proceedings of the 2010 international conference on Management of data

Indianapolis,
Indiana, USA Pages: 135
-
146 2010


Yingyi

Bu, Bill Howe, Magdalena
Balazinska
, Michael D. Ernst

HaLoop
: Efficient Iterative
Data Processing on Large Clusters
,
Proceedings of the VLDB Endowment, Vol. 3, No. 1,
The 36th International Conference on Very Large Data Bases
, September 1317, 2010,
Singapore.


Matei

Zaharia
,
Mosharaf

Chowdhury
, Michael J. Franklin, Scott
Shenker
, Ion
Stoica

Spark: Cluster Computing with Working Sets
poster at
http://radlab.cs.berkeley.edu/w/upload/9/9c/Spark
-
retreat
-
poster
-
s10.pdf



Russel

Power,
Jinyang

Li,
Piccolo: Building Fast, Distributed Programs with Partitioned
Tables
, OSDI 2010, Vancouver, BC, Canada





S
A
L
S
A

Features of Existing Architectures


Programming Model
(SPMD)


MapReduce (Optionally “
map
-
only
”)


Focus on
Single Step
MapReduce computations (DryadLINQ supports
more than one stage)


Input and Output Handling


Distributed data access (HDFS in Hadoop, Sector in Sphere, and shared
directories in Dryad)


Outputs normally goes to the distributed file systems


Intermediate data


Transferred via file systems (Local disk
-
> HTTP
-
> local disk in
Hadoop
)


Easy to support fault tolerance


Considerably high latencies


Fault Tolerance

Google, Apache Hadoop, Sector/Sphere,

Dryad/DryadLINQ (DAG based)


S
A
L
S
A

Twister Architecture


Scripts for file manipulations


Twister daemon is a process, but Map/Reduce tasks are Java Threads (Hybrid
approach)

M

Local

R

Twister

Daemon

Map output
goes directly
to reducer

Twister

Daemon

Local

Reduce output
goes to local disk
OR

to Combiner

1

3

4

Read static
data from
local disk

1

B

B

B

Broker
Connection

Receive static data (1)

OR

Variable data (
key,value
)

via the brokers (2)

4

2

Broker
Network

Twister

Driver

Main program

S
A
L
S
A

Twister

Programming Model

configureMaps
(..)

Two configuration options :

1.
Using local disks (only for maps)

2.
Using pub
-
sub bus

configureReduce
(..)

runMapReduce
(..)

while(
condition
){

} //end while

updateCondition
()

close()

User program’s process space

Combine()
operation

Reduce()

Map()

Worker Nodes

Communications/data transfers
via the pub
-
sub broker network


Iterations

May send <
Key,Value
> pairs directly

Local Disk

Cacheable map/reduce tasks

S
A
L
S
A

Twister API

1.
configureMaps
(
PartitionFile

partitionFile
)

2.
configureMaps
(Value[] values)

3.
configureReduce
(Value[] values)

4.
runMapReduce
()

5.
runMapReduce
(
KeyValue
[]
keyValues
)

6.
runMapReduceBCast
(Value
value
)

7.
map
(
MapOutputCollector

collector, Key
key
, Value
val
)

8.
reduce
(
ReduceOutputCollector

collector, Key
key,List
<Value>
values)

9.
combine
(Map<Key, Value>
keyValues
)

S
A
L
S
A

Pagerank


An Iterative MapReduce Algorithm


Well
-
known pagerank algorithm [1]


Used ClueWeb09 [2] (1TB in size) from CMU


Reuse of map tasks and faster communication pays off

[1] Pagerank Algorithm,
http://en.wikipedia.org/wiki/PageRank

[2] ClueWeb09 Data Set,
http://boston.lti.cs.cmu.edu/Data/clueweb09/


M

R

Current


Page ranks
(Compressed)

Partial
Adjacency
Matrix

Partial


Updates

C

Partially
merged


Updates

Iterations

Overhead
OpenMPI

v Twister

negative overhead due to cache

http://futuregrid.org

13

S
A
L
S
A

Dimension Reduction Algorithms


Multidimensional Scaling (MDS) [1]

o
Given the proximity information among
points.

o
Optimization problem to find mapping in
target dimension of the given data based on
pairwise proximity information while
minimize the objective function.

o
Objective functions: STRESS (1) or SSTRESS (2)







o
Only needs pairwise distances

ij

between
original points (typically not Euclidean)

o
d
ij
(
X
) is Euclidean distance between mapped
(3D) points


Generative Topographic Mapping

(GTM) [2]

o
Find optimal K
-
representations for the given
data (in 3D), known as

K
-
cluster problem (NP
-
hard)

o
Original algorithm use EM method for
optimization

o
Deterministic Annealing algorithm can be used
for finding a global solution

o
Objective functions is to maximize log
-
likelihood:


[1]

I. Borg and P. J.
Groenen
.
Modern Multidimensional Scaling: Theory and Applications. Springer, New
York, NY, U.S.A., 2005.

[2] C. Bishop, M.
Svens
´
en
, and C. Williams. GTM: The generative topographic mapping.
Neural computation,
10(1):215

234, 1998.


S
A
L
S
A

Multi
-
dimensional Scaling (EM)


Maps high dimensional data to lower dimensions (typically 2D or 3D)


SMACOF (Scaling by
Majorizing

of
COmplicated

Function)[1]





[1] J. de
Leeuw
, "Applications of convex analysis to multidimensional
scaling,"
Recent Developments in Statistics, pp. 133
-
145, 1977.


While(condition)

{


<X> = [A] [B] <C>


C =
CalcStress
(<X>)

}

While(condition)

{


<T> = MapReduce1([B],<C>)


<X> = MapReduce2([A],<T>)


C = MapReduce3(<X>)

}

Next Generation Sequencing Pipeline on Cloud

16

Blast

Pairwise

Distance


Calculation

Dissimilarity

Matrix


N(N
-
1)/2 values

FASTA File

N Sequences

block

Pairings

MapReduce

1

2

3

Clustering

Visualization


Plotviz

4

Visualization


Plotviz

MDS

Pairwise

clustering

MPI

4

5


Users submit their jobs to the pipeline and the results will be shown in a visualization tool.


This chart illustrate a hybrid model with
MapReduce

and MPI. Twister will be an unified solution for the pipeline mode.


The components are services and so is the whole pipeline.


We could research on which stages of pipeline services are suitable for private or commercial Clouds.

S
A
L
S
A

Scale
-
up Sequence Clustering Model
with Twister

Gene Sequences
(N = 1 Million)

Distance Matrix

Interpolative MDS
with Pairwise
Distance Calculation

Multi
-
Dimensional
Scaling (MDS)

Visualization

3D Plot

Reference
Sequence Set
(M = 100K)

N
-

M
Sequence
Set (900K)

Select
Reference

Reference
Coordinates

x, y, z


N
-

M
Coordinates

x, y, z


Pairwise
Alignment &
Distance
Calculation

O(N
2
)




O(N
2
)




O(N
2
)

S
A
L
S
A

Current Sequence Clustering
Model with MPI

Gene
Sequences

Pairwise
Alignment &
Distance
Calculation

Distance Matrix

Pairwise
Clustering

Multi
-
Dimensional
Scaling

Visualization

Cluster Indices

Coordinates

3D Plot

Smith
-
Waterman /
Needleman
-
Wunsch

with

Kimura2 / Jukes
-
Cantor
/ Percent
-
Identity

MPI.NET
Implementation

MPI.NET
Implementation

MPI.NET
Implementation

Chi
-
Square /
Deterministic
Annealing

C# Desktop
Application based
on
V
TK

*

* Note. The implementations of Smith
-
Waterman and Needleman
-
Wunsch

algorithms are from Microsoft Biology Foundation library

S
A
L
S
A

Twister MDS Interpolation
Performance Test

S
A
L
S
A

S
A
L
S
A

University of

Arkansas

Indiana

University

University of

California at

Los Angeles

Penn

State

Iowa

Univ.Illinois


at Chicago

University of

Minnesota

Michigan

State

Notre

Dame

University of

Texas at El Paso

IBM
Almaden

Research Center

Washington

University

San Diego

Supercomputer

Center

University

of Florida

Johns

Hopkins

July 26
-
30, 2010 NCSA Summer School Workshop

http://salsahpc.indiana.edu/tutorial

300+ Students learning about Twister &
Hadoop


MapReduce technologies, supported by FutureGrid.

S
A
L
S
A

22

http://salsahpc.indiana.edu/b534/

S
A
L
S
A

23


MapReduce and MPI are SPMD programming model


Twister extends the MapReduce to iterative algorithms


Dataming

in the Cloud (Data Analysis in the Cloud)


Several iterative algorithms we have implemented


K
-
Means Clustering


Pagerank


Matrix Multiplication


Breadth First Search


Multi Dimensional Scaling (MDS)



Integrating a distributed file system


Integrating with a high performance messaging system


Programming with side effects yet support fault tolerance

Summary

S
A
L
S
A

MapReduceRoles4Azure

Will
have prototype Twister4Azure by May 2011

Several iterative algorithms we have
implemented

S
A
L
S
A

26

Twister for Azure

Map
1

Map
2

Map
n

Map Workers

Red
1

Red
2

Red
n

Reduce Workers

In Memory Data Cache

Task Monitoring

Role Monitoring

Worker Role

MapID

…….

Status

Map Task Table

MapID

…….

Status

Job Bulleting Board

Scheduling Queue

S
A
L
S
A

Sequence Assembly Performance

S
A
L
S
A

Acknowledgements to:

S
A
L
S
A

HPC Group

Indiana University

http://salsahpc.indiana.edu