MOCCA – Component-based grid environment for ... - CrossGrid

duckexcellentInternet and Web Development

Feb 5, 2013 (4 years and 2 months ago)

270 views

1

Component
-
based Grid Environment for
Programming Scientific Applications

Maciej Malawski




2

Outline


Problem: programming applications on Grid


Programming models and virtualization


CCA + H2O


Extensions to the environment


Applications and tests


Summary and future work

3

Experience (CrossGrid) Grid is complex

T e s t b e d
Applications
Applications
Services
Services
Tools
Tools

17
sites

9
countries

over
200
CPUs

4 TB
of
storage
Roaming
Access Server
Migrating
Desktop
OCM
-
G
Performance
Prediction
Data
Access
Globus
Toolkit
MPI
Verification
MPI
Library
Portal
Post
-
processing
Infrastructure
Monitoring
Plugin
Plugin
Plugin
SOAP
SOAP
SOAP
SOAP
SOAP
Protocol
API
API
API
API
Links
API
API
API
SOAP
SOAP
API
API
(JMX)
(OMIS)
Visualization
Kernel
Links
Performance
Analysis
API
Application
Monitoring
Benchmarks
Network
Monitoring
Medical
Support
Particle
Physics
Meteo
/
Pollution
Flood
Simulation
DataGrid
Scheduler
Roaming
Access Server
Migrating
Desktop
OCM
-
G
Performance
Prediction
Data
Access
Globus
Toolkit
MPI
Verification
MPI
Library
Portal
Post
-
processing
Infrastructure
Monitoring
Plugin
Plugin
Plugin
SOAP
SOAP
SOAP
SOAP
SOAP
Protocol
API
API
API
API
Links
API
API
API
SOAP
SOAP
API
API
(JMX)
(OMIS)
Visualization
Kernel
Links
Performance
Analysis
API
Application
Monitoring
Benchmarks
Network
Monitoring
Medical
Support
Particle
Physics
Meteo
/
Pollution
Flood
Simulation
DataGrid
Scheduler
4

Problem


how to program grid applications


Scientific applications:


Compute intensive


May be data
-
intensive


Often custom
-
made


Written in many programming languages (e.g. Fortran)


Collaborative


Current practice on Grid:


“Write a JDL scripts which submits a shell script as a batch job, which uses SSH
to launch

a process on the head node of the cluster to serve as a proxy for
communication...” (from CGW'06 presentation by ICM)


“Submit a shell script which queries the LFC catalog, retrieves TAR archive from
SE using GRIDFTP, unpacks the archive, runs another computing script, stores
the output on SE and registers in LFC catalog.”
-

a biomedical application
(CGW'06)


Problems with scientific computing (IPDPS'05 panel discussion):


Software


Software


Software... engineering

5

Two key challenges


Programming model


Suitable for the distributed environment


Allowing to manage complex applications


Supported by standards


Supporting scientific applications


Facilitating programming


Virtualization


Hiding the complexity of heterogeneous environment


Allowing to dynamically create/acquire pools of resources on
demand


6

Research objectives


Concept of programming environment for
scientific applications on Grid


Analysis of programming models for grid applications


Identification of desired features of programming environment


Prototype implementation and feasibility study


Verification of the model and prototype with
typical applications



Thesis (provisional):


Extended Component model may be used for creating grid
environment for programming and running complex scientific
applications.

7

Many programming models


MPI, PVM


Custom protocols


Tuple spaces, HLA


Distributed objects


Active objects


Components


Skeletons


Service Oriented Architectures, Web Services

8

Virtualization: state of the art (incomplete)


Globus GRAM, Condor, VDT, gLite, Unicore


large
-
scale batch job oriented submission systems


Virtual Workspaces: using Globus to submit VMWare (or
other type) virtual machines to create a Condor pool of
resources, which can be in turn accessible using Globus
Toolkit


Cannot call it lightweight solution!


SOA


everything accessible as Web Service


Efforts to support dynamic service deployment


Component model: a container provides a virtualization
layer for hosting components


Dynamic deployment directly embedded into a programming model
-

(component = unit of deployment)

9

What are components?


A unit of software
development/deployment/reuse


i.e. has interesting functionality


Ideally, functionality someone else might be able to (re)use


Can be developed independently of other components


Interacts
with the outside world only through
well
-
defined interfaces


Can be composed with other components


“Plug and play” model to build applications


Composition based on interfaces


Hosted in a framework/container responsible for
other services (communication, security)


10

Benefits of Component
-
based Approach


Enables composing applications from blocks which
originally were not designed to be combined


Addresses software complexity issues


Many frameworks provide language interoperability


Enformcement of separation of interface from
implementation


Facilitates managing third party libraries


Allows easy swapping of implementation


Increases software productivity


Mature and successful technology in business and
desktop applications

11

Components vs. Web Services


Component:


Formal models for component
programming (e.g. Fractal)


May be created on
-
demand,

e.g. more components deployed
when needed


Explicitly declare required
interfaces (uses ports)


can be
directly connected


no need to
pass invocation data via central
workflow engine


May have parallel connections


Does not require SOAP as a
protocol



Client
2
...
Server
Component
Client
1
Client
N
Server
2
Client
...
Server
1
Server
N
12

Proposed approach to building grid
environment


Use a component model


Apply a virtualization layer


Design a base component environment with a
set of desired features


Extend the environment features


13

Desired features of Grid components


Scalable to different environments (from laptops to HPC clusters)


lightweight platform


dynamic, pluggable, reconfigurable at runtime


Facilitated deployment on shared resources


Virtualization (creating dynamic workspaces)


Dynamic (hot) deployment


Communication adjusted to various levels of coupling


P2P, WANs, LANs, intercluster connections, direct binding in one process


supporting parallelism


Supporting multiple languages


allowing easy adaptation of legacy code


combining Java flexibility with optimized Fortran libraries


Facilitating programming


composable in space and in time


taking advantage of semantic description and reasoning


Adapted to unreliable Grid environment


supporting dynamic and interactive reconfiguration of connections, locations, bindings


providing support for migration and checkpointing


Interoperability with grid standards


Web Services


SOAP, WSDL, possibly WSRF


Grid Component Model (ProActive/Fractal)

14

State of the art


examples of solutions
(incomplete)


Scalable to different environments (from laptops to HPC clusters)


HPC: CCAFFEINE, GridCCM


Lightweight: XCAT, ProActive, ICENI


Facilitated deployment on shared resources


ProActive, XCAT (using Globus)


Communication adjusted to various levels of coupling


CCAFFEINE


direct binding, MPI; XCAT


SOAP


optimized communication: IBIS, GridCCM


Parallel, collective communication: GridCCM, IBIS, ProActive


Supporting multiple languages


legacy code: BABEL


Interoperability: CORBA, SOAP


Facilitating programming


composable in space and in time: XCAT, ICENI, GCM


hierarchical


Skeleton approach: HOC, ASSIST


taking advantage of semantic description and reasoning: ICENI, Semantic Web Services


Adapted to unreliable Grid environment


dynamic and interactive reconfiguration: ProActive, XCAT, Web Services model


migration and checkpointing: Proactive, XCAT


Interoperability with grid standards


Web Services


XCAT, ProActive


Grid Component Model: ProActive reference implementation


15

Base for the Solution: CCA and H2O


Common Component Architecture
(CCA)


Component standard for HPC


Uses and provides ports described in SIDL


Support for scientific data types


Existing tightly coupled (CCAFFEINE) and
loosely coupled, distributed (XCAT)
frameworks


H2O


Java
-
based d
istributed resource sharing
platform


Providers setup H2O kernel (container)


Allowed parties can deploy pluglets
(components)


Separation of roles: decoupling


Providers from deployers


Providers from each other


RMIX: efficient multiprotocol RMI extension


Container
Provider host
Deploy
Lookup
& use
Provider
Client
<<create>>
B
A
Provider
<<create>>
A
B
Container
Lookup
& use
Client
Deploy
Provider
,
Client
,
or
Reseller
Provider host
Traditional model
Proposed model
Container
Provider host
Deploy
Lookup
& use
Provider
Client
<<create>>
B
B
A
A
Provider
<<create>>
A
A
B
B
Container
Lookup
& use
Client
Deploy
Provider
,
Client
,
or
Reseller
Provider host
Traditional model
Proposed model
16

Example scenarios of H2O


1. Provider = deployer



e.g. resource = legacy
application

2. Reseller:=

developer = deployer



e.g. computational service
offered within a grid system

3. Client = deployer



e.g. client runs custom
distributed application on
shared resources

Deploy

B

A

Legacy

App

Deploy

Provider

A

Client

Repository

A

B

Reseller

C

Deploy

A

native

code

Provider

Client

Repository

A

B

Developer

C

Provider

Client

B

A

Registration and Discovery

e
-
mail,

phone, ...

JNDI

UDDI

LDAP

DNS

GIS

...

B

Publish

Find

Provider

17

Features of the environment


Scalable to different environments (from Laptops to HPC clusters)


lightweight platform: use H2O


dynamic, pluggable, reconfigurable at runtime: dynamic CCA model + H2O kernel facilities


Facilitated deployment on shared resources


Static virtualization by using H2O kernel as a daemon


Dynamic virtualization using a pool of transient H2O kernels created on
-
demand


Communication adjusted to various levels of coupling


Offered by RMIX library of H2O


Parallel extensions for CCA: multiple ports


Facilitating programming



Composition in time: Low
-
level Python or Ruby Scripting, High
-
level: Virolab/GridSpace
programming environment


Semantic description: under development within Virolab


Supporting multiple languages


Integration of RMIX with Babel


Integration of MOCCA with Babel


pending


Interoperability with grid standards


Web Services


future work (technically feasible: either RMIX of embedded server


Xfire)


Grid Component Model (ProActive/Fractal) interoperability


recent work


Adapted to unreliable Grid environment


supporting dynamic and interactive reconfiguration of connections, locations, bindings


providing fault
-
tolerance support: migration and checkpointing


future work

18

MOCCA


a basic component framework



Each component is a separate pluglet


Dynamic remote deployment of components


Components packaged as JAR files


Security: Java sandboxing, detailed access policy


Using RMIX for communication


efficiency, multiprotocol interoperability


Flexibility and multiple scenarios


as in H2O


MOCCA_Light: pure Java implementation


Java API or Jython and Ruby scripting for application asssembly


http://www.icsr.agh.edu.pl/mambo/mocca


19

Dynamic virtualization



A pool of computing resources may be created by submitting a
number of H2O kernels on many Grid sites


Application components may be deployed on the kernels belonging to
the pool


Virtual resource pool may be used by a single user or shared for
collaboration


Interaction with cluster nodes in private network


JXTA transport
(needs more testing)

Standalone
machine
Cluster
Grid node
Resource
Broker
SSH
PBS
LCG
H2O
H2O
H2O
H2O
H2O
H2O
User's
virtual
resource
pool
NS
bind()
lookup()
20

Communication extension: RMIX over
JXTA


Fully operational RMI implementation running over JXTA
P2P network


Methods can be
invoked on remote
objects located behind
firewalls or NATs


Our implementation of
JXTA socket factories
manages all the JXTA
connectivity
transparently from
user’s point of view

21

Parallelism: Extensions of CCA for Multiple
Ports and Connections


Multiple users of one provides
port (easy part)


Single provides port


Naming convention for client
components (
client1, client2,

...)


Single client of multiple
providers:


Need multiple uses ports on the
client side


Use ParameterPort of CCA to
parametrize the number of uses
ports


Client component creates a
required number of uses ports


Naming convention for server
components and uses port names


Extension of CCA
BuilderService: MultiBuilder


Creation of multiple components


Handling multiple connections


Client

2

...

Server

Component

Client

1

Client

N

Server

2

Client

...

Server

1

Server

N

22

Support for composition in space and in
time


Declarative vs.
imperative programing


Composition in space


Graph of component
connections


ADL


Application
Description Language


Supported by
MOCCAccino


Composition in time


Workflow model (script)


Centralized execution


Currently supported low
-
level scripting in Jython
and JRuby


High
-
level scripting
developed within Virolab

init()
store()
...
getMolecule()
simulate()
Configuration
Generator
Simulated
Annealing
Storeroom
Runtime system
Invocation
Direct connection
Simulated
Annealing
Simulated
Annealing
23

Composition in space
-

Moccaccino


ADLM (ADL for MOCCAccino)


XML based
language for:


Describing types and number of components and their
connections


Concept of hierarchical component groups


Optional information to specify resources


Hints for deployment of components (whether they are
computation intensive or communication intensive).


Application Manager


responsible for:



Discovering available kernel pool



Planning optimal location of components



Deploying components in specified kernels



Connecting components

24

Moccacino usage

Pong
Ping
list
map
index
: 0
key
:

left

index
: 1
key
:

right

Zonk
Pong
Zonk
Zonk
Zonk
map
key
:

left

key
:

right

Pong
Pong
Ping
Ping
list
map
index
: 0
key
:

left

index
: 1
key
:

right

Zonk
Zonk
Pong
Pong
Zonk
Zonk
Zonk
Zonk
Zonk
Zonk
map
key
:

left

key
:

right


each
with
2
-
element
list
of


each
with
map
with

left

and

right

keys
of

1
*
1
*
Pings
Pongs
Zonks
Component
Graph
Biulder
creates
one
component
instance
of


each
with
2
-
element
list
of


each
with
map
with

left

and

right

keys
of

1
*
1
*
Pings
Pings
Pongs
Pongs
Zonks
Zonks
Component
Graph
Biulder
creates
one
component
instance
of

HDNS
Registry
Kernel
information
Provider
Parser
Graph
Builder
Deployment
Planner
Application
Deployer
Application
Manager
MOCCA
Builder
25

Motivation for multiprotocol and
multilanguage interoperability


Grids are heterogeneous


Multiple programming languages


in single application


Java for middleware


C for system programming


FORTRAN for computing


Python for scripting


Multiple protocols


in single application


High speed local networks (Myrinet)


TCP/SSL/TLS in WAN


SOAP for loosely coupled message exchange


Overlay P2P networks for traversing private network boundaries (NATs)


Context: MOCCA component framework

26

Multilanguage Solution
-

Babel


SIDL


Scientific Interface Definition Language


Standard for CCA Components


Supports arrays and complex types


Focus on interfaces


Babel:


SIDL parser


Code generator


Runtime library


Intermediate Object

Representation (IOR)


Core of Babel object


Array of function

pointers


Generated code in C

C
C++
f77
f90
Python
Java
Babel
C
C++
f77
f90
Python
Java
Babel
package example version 1.2 {


class Hello {


string hello( in string hello);


}

}


// user defined non
-
static methods:


/**


* Method: hello[]


*/


public java.lang.String hello_Impl (


/*in*/

java.lang.String hello )


{


// DO
-
NOT
-
DELETE splicer.begin(example.Hello.hello)


// Insert
-
Code
-
Here {example.Hello.hello} (hello)


return ”Server says: ” + hello;


// DO
-
NOT
-
DELETE splicer.end(example.Hello.hello)


}

/**


* Method: hello[]


*/

char*

example_Hello_hello(


/*in*/
example_Hello self,


/*in*/
const char* hello);

27

Currently: Babel for Local Applications


All Babel objects in
one process


Implemented in
CCAFFEINE
framework


Existing
multilanguage CCA
components


see
CCA tutorial

Java

application

Fortran

native

library

SIDL

C++

native

library

SIDL

Babel IOR

Babel IOR

28

Our Solution


Babel + RMIX


Implementation of
Babel RMI
extensions


generic mechanism
of method invocation
(reflection)


Dynamic loading of
communication
library


No need for code
generation and
compilation

Java

application

Fortran

native

library

SIDL

C++

native

library

SIDL

Babel IOR

RMIX

library

Babel IOR

Network

SIDL

RMIX

library

SIDL

29

Interoperability with Grid Component
Model (CoreGRID)



Based on
Fractal

Model



Deployment Functionalities



Asynchronous and extensible
port semantics



Collective Interfaces



Autonomicity and adaptivity
thanks to “autonomic” and
“dynamic” controllers



Support for language neutrality
and
interoperability

Component

Identity

Binding

Controller

LifeCycle

Controller

Content

Controller

Content

Controller

30

Motivation for interoperability


Framework interoperability is an important issue
for GCM


Existing component models and frameworks for
Grids


CCA, CCM


Already existing „legacy” components


ProActive/Fractal and H2O/MOCCA


alternative
Java
-
based frameworks for distributed
computing: can they interoperate?

31

Fractal vs. CCA


Similarities: general for most component models


Separation of interface from implementation


Composition by connecting interfaces


Differences


Fractal components are reflective (introspection) vs. the CCA
components are given initiative to add/remove ports at runtime


BindingController in Fractal vs. BuilderService in CCA


No ContentController in CCA (and no hierarchy)


Factory interface in Fractal vs. BuilderService in CCA


AttributeController in Fractal vs. ParameterPort in CCA


No ADL in CCA

32

Approaches to integration


Single component
integration


Wrapping a CCA component
into a primitive GCM one


Allow to use a CCA
component in a GCM
framework


Framework
interoperability


Ability for two component
frameworks to interoperate


Allow to connect a CCA
component assembly (running
in a CCA framework) to a
GCM component application


Wrapper
CCA
Component
C
BC
cca.Services
Wrapper
CCA
Component
C
BC
CCA
Component
CCA
Component
Builder
Service
Glue
Glue
CCA
Framework
33

Solutions to typing issues

1.
Generate the type of a wrapped CCA component at
runtime (at initialization)


Pros: fully automated


Cons: restricts to usage of ports which are declared by CCA
component during initialization (at setServices() call)

2.
Manual description of a CCA component in ADL format


Pros: Generic solution


Cons: Require additional task from developer

3.
(Semi)automatic generation of ADL


May combine approach 1. and 2.

4.
Reuse existing CCA type specifications (SIDL,
CCAFFEINE scripting, others


not standardized)

34

Technical approach


CCA controller


Creates glue components for all ports (client and server)


Connects glue to CCA system (using CCA builder) and to membrane
(using BC)

CCA
Controller
CCA
Component
C
CCA
Component
CCA
Component
Builder
Service
Server
Glue A
CCA
Framework
Client
Glue B
BC
BC
WA
CCA
A
A
A
A
B
B
B
B
B
H2O Kernel
H2O Kernel
H2O Kernel
35

Glue Components


Server Glue:


Deployed as Fractal component


Uses MOCCA client code to delegate
invocation to CCA interface


Can be also deployed on H2O kernel


Client Glue:


Deployed as CCA component in H2O
kernel


Launches ProActive runtime in H2O
kernel


Creates Fractal component in this
runtime


Both:


Can be generated from the interface
type (TODO)

CCA
Component
Client
Glue B
BC
B
B
B
B
H2O Kernel
CCA
Component
Server
Glue A
WA
A
A
A
H2O Kernel
36

ProActive + MOCCA


MOCCA invocations are synchronous


Composite (membrane) should be synchronous to avoid
deadlocks


Or, we may consider generating glue with wrapped types
(IntWrapper, etc)


this changes types of interfaces


Class loading issues


The classes generated by ProActive runtime must be visible to the
code running in H2O kernel


The RMI class loading works fine if the codebase is set properly
on ProActive side


37

Communication Intensive Application
Benchmark


Simplified scenario:


2 components


Provides port: receive and send
-
back array of double (ping
-
pong)


Tested on local Gigabit Ethernet and on transatlantic Internet between
Atlanta and Krakow


2.4 GHz Linux machines


Comparison with XCAT

38

Small Data Packets

Factors:


SOAP header overhead in XCAT


Connection pools in RMIX

39

Large Data Packets


Encoding (binary vs. base64)


CPU saturation on Gigabit LAN (serialization)


Variance caused by Java garbage collection

40

Automatic Flow Composer Example


Compose application graph from
initial data (e.g. initial ports) or
incomplete graph


First implemented for XCAT
framework


Easy migration to MOCCA


Modification of code required
(xcat.Port)


Similar performance for XCAT and
MOCCA (exchange of text
documents)


Lookup
Flow
Optimizer
Flow
Composer
Link
Evaluator
Site
Evaluator
Component
Registry
Evaluate
Compose
Evaluate
41

Other applications


Domain decomposition (some student toy apps)


Data mining using Weka (as a Virolab example)

42

Gold Cluster Application



Components


Starter


a „driver” component for
the application, provides a
Go

port


Configuration generator


random
initial configurations


Simulated annealing


compute
intensive simulation component


Storeroom


used for keeping
results and statistics


Gather


auxiliary component for
passing molecules






Ports


Molecule


offers
getMolecule()

method


Control ports


for steering the
application


Generator
Control
Starter
Simulated
Annealing
Gather
Molecule
Molecule
...
Molecule
Annealing
Control
Configuration
Generator
Simulated
Annealing
Storeroom
Simulated
Annealing

Control
43

Resources and Results


Using heterogeneous
infrastructure


available ad
-
hoc


Local machine


SSH access


Cluster in CYFRONET


PBS


CrossGrid tesbed (LCG based
middleware)


Clusters in PSNC Poznan and
IFCA Santander


Java VMs already installed


Cluster nodes allow remote
point
-
to
-
point communication
(MPICH
-
enabled: no firewalls!)


Problem size grows with
number of nodes (weak
scaling)



1
2
3
4
5
6
7
8
9
10
0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
Number of nodes
Computing time[s]
44

Future work


Optimization algorithms (scheduling) for ADL
and scripting models


Monitoring support (Gemini)


Formal model (adapted from GCM)


Further integration with Babel


More applications


45

Summary


Analysis of programming models for Grid, selection of
component model


Design and implementation of CCA framework based on
H2O platform


Extending applicability of H2O for dynamically created
pools of resources (user
-
centric or ad
-
hoc created Vos)


Extensions for parallel
-
distributed CCA components


Support for time and space composition modes by high
-
level scripting and ADL
-
based application


Towards multilanguage interop


Supporting interoperability between component models

46

Key papers


Maciej Malawski, Dawid Kurzyniec, and Vaidy Sunderam. MOCCA


towards a
distributed CCA framework for metacomputing. In Proceedings of the 10th
International Workshop on High
-
Level Parallel Programming Models and Supportive
Environments (HIPS2005), 2005. IEEE Computer Society


Maciej Malawski, Marian Bubak, Michał Placek, Dawid Kurzyniec, and Vaidy
Sunderam. Experiments with distributed component computing across Grid
boundaries. In Proceedings of the HPC
-
GECO/CompFrame workshop in conjunction
with HPDC 2006, 2006.


P. Jurczyk, M. Golenia, M. Malawski, D. Kurzyniec, M. Bubak, V. S. Sunderam,
Enabling Remote Method Invocations in Peer
-
to
-
Peer Environments: RMIX over
JXTA, in: Roman Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy Wasniewski
(Eds.), Parallel Processing and Applied Mathematics: 6th International Conference,
PPAM 2005, Poznan, Poland, September 11
-
14, 2005, Revised Selected Papers,
Lecture Notes in Computer Science, 3911, Springer, 2006, pp. 667
-
674


M. Malawski, D. Harezlak, M. Bubak, Towards Multiprotocol and Multilanguage
Interoperability: Experiments with Babel and RMIX, in: M. Bubak, M. Turała, K. Wiatr
(Eds.), Proceedings of Cracow Grid Workshop
-

CGW'05, November 20
-
23 2005,
ACC
-
Cyfronet UST, 2006, Kraków, pp. 266
-
278.


M. Bubak, M. Malawski, M. Placek, Using MOCCA Component Environment for
Simulation of Gold Clusters, in: M. Bubak, M. Turała, K. Wiatr (Eds.), Proceedings of
Cracow Grid Workshop
-

CGW'05, November 20
-
23 2005, ACC
-
Cyfronet UST, 2006,
Kraków, pp. 295
-
299.

47

Acknowledgements


Vaidy Sunderam, Dawid Kurzyniec


Emory University,
Atlanta


Daniel Harężlak, Michał Placek


Tomek Bartyński, Eryk Ciepiela, Joanna Kocot,
Przemysław Pelczar, Iwona Ryszka


Paweł Jurczyk, Maciej Golenia


Tomasz Gubała, Marek Kasztelnik, Piotr Nowakowski


Ludovic Henrio, Matthieu Morel, Francoise Baude, Denis
Caromel


Sophia
-
Antipolis, France


Marian Bubak