DIET Overview and some recent work

sunfloweremryologistData Management

Oct 31, 2013 (3 years and 9 months ago)

102 views

DIET Overview and some recent work


A middleware for the
large scale
de
ployment

of applications over the Grid

Frédéric

Desprez

LIP ENS Lyon / INRIA

GRAAL Research Team


Join work with

N. Bard, R.
Bolze
, B.
Depardon
, Y
Caniou
,

E
. Caron,


B.
Depardon
, D.
Loureiro
, G
. Le
Mahec
,

A.
Muresan
, V
.
Pichon
, …

D
istributed

I
nteractive
E
ngineering

T
oolbox

Introduction


Transparency

and
simplicity

represent the holy grail for Grids (maybe
even before
performance
) !


Scheduling tunability

to take into account the characteristics of specific
application classes



Several applications ready (and not only number crunching ones !)


Many incarnations of the Grid (metacomputing, cluster computing, global
computing, peer
-
to
-
peer systems, Web Services, …)


Many research projects around the world


Significant technology base



Do not forget good ol’ time research on scheduling and distributed
systems !


Most scheduling problems are very difficult to solve even in their simplistic
form …


… but simple solutions often lead to better performance results in real life

Introduction, cont


One long term idea for the grid

offering (or renting) computational power and/or storage
through the Internet




Very high potential



Need of Problem Solving and Application Service Provider Environments



More performance, storage capacity



Installation difficulty for some libraries and applications



Some libraries or codes need to stay where they have been developed



Some data need to stay in place for security reasons



Using computational servers through a simple interface


RPC and Grid
-
Computing: GridRPC


One simple idea


Implementing the RPC programming model over the grid


Using resources accessible through the network


Mixed parallelism model (data
-
parallel model at the server level and task
parallelism between the servers)



Features needed


Load
-
balancing (resource localization and performance evaluation,
scheduling),


IDL,


Data and replica management,


Security,


Fault
-
tolerance,


Interoperability with other systems,





Design of a standard interface



within the OGF (
GridRPC

and SAGA WG
)


Both computation requests and data management


Existing implementations:
NetSolve
,
Ninf
, DIET,
OmniRPC

RPC and Grid Computing: Grid RPC

AGENT(s)

S1

S2

S3

S4

S2 !

Request

Op(C, A, B)

Client

DIET’s Goals


Our goals



To develop a toolbox for the deployment of environments using the Application
Service Provider (ASP) paradigm with different applications


Use as much as possible public domain and standard software


To obtain a high performance and scalable environment


Implement and validate our more theoretical results


Scheduling for heterogeneous platforms, data (
re)distribution

and replication, performance
evaluation, algorithmic for heterogeneous and distributed platforms, …


Based on CORBA, NWS, LDAP, and our own software developments


FAST for performance evaluation,


LogService

for monitoring,


VizDIET

for the visualization,


GoDIET

for the deployment



Several applications in different fields (simulation,
bioinformatics,
…)


Release 2.2 available on the web


ACI Grid
ASP, TLSE, ACI MD GDS,
RNTL GASP, ANR
LEGO,
Gwendia
,
COOP, Grid’5000

http://graal.ens
-
lyon.fr/DIET/

DIET Dashboard

DIET Architecture

LA

MA

LA

LA

LA

Server front end

Master Agent

Local Agent

Client

MA

MA

MA

MA

JXTA

Client and server interface


Client side


So easy …


Multi
-
interface (C, C++, Fortran,
Java,
Scilab
, Web, etc.)


Grid
-
RPC
compliant



Server side


Install and submit new server to agent
(LA)


Problem and parameter description


Client IDL transfer from server


Dynamic services


new service


new version


security update


outdated service


Etc.


Grid
-
RPC compliant


Data/replica management


Two needs


Keep the data in place to reduce the overhead of communications between clients and
servers


Replicate data whenever possible



Three approaches for DIET


DTM (LIFC, Besançon)


Hierarchy similar to the DIET’s one


Distributed data manager


Redistribution between servers


JuxMem (Paris, Rennes)


P2P data cache


DAGDA (IN2P3, Clermont
-
Ferrand)


Joining task scheduling and data management



Work done within the GridRPC Working Group (OGF)


Relations with workflow management

Client

A

F

G

Client

Y

Server 1

Server 2

X

B

B

B

DAGDA

D
ata
A
rrangement for
G
rid and
D
istributed
A
pplications


A new data manager for the DIET middleware providing


Explicit data replication: Using the API.


Implicit data replication: The data are replicated on the selected SeDs.


Direct data get/put through the API.


Automatic data management: Using a selected data replacement algorithm
when necessary.


LRU: The Least Recently Used data is deleted.


LFU: The Least Frequently Used data is deleted.


FIFO: The «

oldest

» data is deleted.


Transfer optimization by selecting the more convenient source.


Using statistics on previous transfers.


Storage resources usage management.


The space reserved for the data is configured by the user.


Data status backup/restoration.


Allowing to stop and restart DIET, saving the data status on each node.

DAGDA


Transfer model


Uses the pull model.


The data are sent independently of the service call.


The data can be sent in several parts.

1: The client send a request for a
service.

2: DIET selects some SeDs
according to the chosen
scheduler.

3: The client sends its request to
the SeD.

4: The SeD download the data
from the client and/or from other
nodes of DIET.

5: The SeD performs the call.

6: The persistent data are
updated.

DAGDA


DAGDA architecture


Each data is associated to one
unique identifier


DAGDA control the disk and
memory space limits. If
necessary, it uses a data
replacement algorithm.


The CORBA interface is used
to communicate between the
DAGDA nodes.


The users can access to the
data and perform replications
using the API.


DIET Scheduling


Co
llector of
R
esource
I
nformation (CoRI)


Interface to gather performance information



Functional requirements


Set of basic metrics


One single access interface


Non
-
functional requirements


Extensibility


Accuracy and latency


Non
-
Intrusi veness



Currently 2 modules

available


CoRI Easy


Fast


Extension possibilities:

Ganglia, Nagios

R
-
GMA, Hawkeye, INCA, MDS, …


CoRI
-
Easy

Collector

FAST

Collector

CoRI Manager

Other
Collectors
like

Ganglia


Performance evaluation of platform enables to find an efficient server
(redistribution and computation costs) without testing every configuration


performance database for the scheduler


Based on NWS (
Network Weather Service
)

FAST: Fast Agent’s System Timer

Client application

FAST API

Static Data

Acquisition

Dynamic Data

Acquisition

FAST

Low level software

LDAP

BDB

NWS

...

Computer




Memory amount



CPU Speed



Batch system

Network




Bandwidths



Latencies



Topology



Protocols

Computation




Feasibility



Execution time


on a given


architecture

Computer




Status (up or


down)



Load



Memory



Batch queue


status

Network




Bandwidths



Latencies

Plugin Schedulers


“First” version of DIET performance management


Each SeD answers a profile (COMP_TIME, COMM_TIME, TOTAL_TIME,
AVAILABLE_MEMORY) for each request


Profile is filled by FAST


Local Agents sort the results by execution time and send them back up to the
Master Agent



Limitations


Limited availability of FAST/NWS


Hard to install and configure


Priority of FAST
-
enabled servers


Extension hard to handle


Non
-
standard application
-

and platform
-
specific performance measures


Firewall problems with some performance evaluation tools


No use of integrated performance estimator (i.e. Ganglia)


DIET Plug
-
in Schedulers


SeD level


Performance estimation function


Estimation Metric Vector (
estVector_t)

-

dynamic collection of performance estimation
values


Performance measures available through DIET


FAST
-
NWS performance metrics


Time elapsed since the last execution


CoRI (Collector of Resource Information)


Developer defined values



Standard estimation tags for accessing the fields of an estVector_t



EST_FREEMEM



EST_TCOMP



EST_TIMESINCELASTSOLVE



EST_FREECPU


Aggregation Methods


Defining mechanism how to sort SeD responses: associated with the service and
defined at SeD level


Tunable comparison/aggregation routines for scheduling


Priority Scheduler


Performs pairwise server estimation comparisons returning a sorted list of server responses;


Can minimize or maximize based on SeD estimations and taking into consideration the order in
which the request for those performance estimations was specified at SeD level.

Workflow
Management (ANR
Gwendia
)


Workflow representation


Direct Acyclic Graph (DAG)


Each vertex is a task


Each directed edge represents
communication between tasks



Goals


Build and execute workflows


Use different heuristics to solve

scheduling
problems


Extensibility to address multi
-
workflows

submission and large grid platform


Manage heterogeneity and variability of
environment

Architecture with MA DAG


Specific agent for workflow management (MA DAG)


Two modes:


MA DAG defines a complete scheduling of the workflow (ordering and mapping)


MA DAG defines only an ordering for the workflow execution, the mapping is done in the next step
by the client which pass by the Master Agent to find the server where execute the workflow
services.

Workflow Designer


Applications viewed as services within DIET


Compose services to get a complete application workflow in a
drag’&’drop

fashion

DIET: Batch System interface


A parallel world


Grid resources are parallel (parallel machines or clusters of compute nodes)


Applications/services can be parallel


Problem


many types of Batch Systems exist, each having its own behavior and user interface


Solution


Use a layer of intermediary meta variables


Use an abstract
BatchSystem

factory

Grid’5000

Grid’5000

1)
Building a
nation wide experimental platform

for


Grid & P2P researches (like a particle accelerator for the computer scientists)


9 geographically distributed sites hosting clusters with 256 CPUs to 1K CPUs)


All sites are connected by RENATER (French Res. and Edu. Net.)


RENATER hosts probes to trace network load conditions


Design and develop a system/middleware environment for safely test and repeat
experiments

2) Use the platform for Grid experiments in real life conditions


Address critical issues of Grid system/middleware:


Programming, Scalability, Fault Tolerance, Scheduling


Address critical issues of Grid Networking


High performance transport protocols, Qos


Port and test applications


Investigate original mechanisms


P2P resources discovery, desktop Grids


4 main features:


A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature


A software infrastructure allowing users to access Grid’5000 from any Grid’5000 site and
have home dir in every site


A reservation/scheduling tools allowing users to select node sets and schedule
experiments


A user toolkit to reconfigure the nodes and monitor experiments

Goals and Protocol of the Experiment


Validation of the DIET architecture at large scale over different administrative
domains



Protocol


DIET deployment over a maximum of processors


Large number of clients


Comparison of the DIET execution times with average local execution times


1 MA, 8 LA, 540 SeDs


2 requests/SeD


1120 clients on 140 machines


DGEMM requests (2000x2000 matrices)


Simple round
-
robin scheduling using

time_since_last_solve



Grid’5000



Paravent : 9 s



Lilles : 34 s



Paraci : 11 s



Bordeaux : 33 s



Parasol : 33 s



Sophia : 40 s



Toulouse : 33 s



Lyon : 38 s



Orsay : 40 s

Results

Grid’5000

ORSAY

SeD

LoadLeveler

BORDEAUX

Project Users

Sed =
Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD.

MA =
master agent, coordinates Jobs. We can define rescue or multiple Master Agent.

WN

= worker node

SeD

LoadLeveler

SeD

LoadLeveler

SeD

LoadLeveler

Web
Interface


Orsay

Decrypthon2

CRIHAN

DB2

Orsay

Decrypthon1

Master

Agent

DIET

Décrypthon

LILLE

JUSSIEU

Deployment
example:
Décrypthon

platform

BD AFM

Cliniques

Lyon

IBM WII

Data manager

Interface

Eucalyptus


the Open Source Cloud


Eucalyptus is:


A research project of a team from the
University of California, Santa
Barbara


An Open Source Project


An
IaaS

Cloud Platform



Base principles


A collection of Web Services on each node


Virtualization to host user images (
Xen

technology)


Virtual networks to provide security


Implement the Amazon EC2 interface


Systems / Tools built for EC2 are usable


“Turing test” for Eucalyptus


Uses commonly
-
known and available Linux technologies



http://
open.eucalyptus.com
/

Eucalyptus platform

DIET
Cloud

architecture

MA

LA

LA

SeD

SeD

SeD

CLC

CC

CC

NC

NC

NC

NC

+

=

DIET

Eucalyptus

DIET
Cloud

Architecture


Several

solutions

that

differ

by

how

much

of

the

architectures

of

both

systems

overlap

or

are

included

one

in

the

other


DIET

is

completely

included

in

Eucalyptus


DIET

is

completely

outside

of

Eucalyptus





and

all

the

possibilities

in

between

DIET
completely included in Eucalyptus


The
DIET
platform is
virtualized inside
Eucalyptus


Very flexible and scalable
as DIET nodes can be
launched when needed


Scheduling is more
complex

CLC

CC

CC

NC

SeD

NC

LA

NC

MA

Eucalyptus

DIET

DIET
completely outside of Eucalyptus


SeD requests
resources to
Eucalyptus


SeD works directly
with the Virtual
Machines


Useful when
Eucalyptus is a 3
-
rd
party resource

MA

LA

SeD

SeD

CLC

CC

NC

NC

DIET

Eucalyptus

Implemented Architecture


We have considered the architecture taking benefits of DIET design


when DIET is completely outside of Eucalyptus


Eucalyptus is treated as a new Batch System


Easy and natural way of use in DIET


DIET is designed to easily add a new batch scheduler


Provide a new implementation for the
BatchSystem

abstract class



Handling of a service call is done in three steps:



1. Obtain the requested virtual machines by a SOAP call
to Eucalyptus


2. Execute the service on the instantiated virtual machines,
bypassing the Eucalyptus controllers


3. Terminating the virtual machines by a SOAP call to
Eucalyptus

DIET
Cloud:

a new DIET architecture

Eucalyptus

Amazon
EC2

Eucalyptus

Batch
System

Some thoughs about DIET and Clouds


The

door

to

using

Cloud

platforms

through

DIET

has

been

opened


The

first

DIET

Cloud

architecture

was

designed


The

current

work

serves

as

a

proof

of

concept

of

u
sing

the

DIET

Grid
-
RPC

middleware

on

top

of

the

Eucalyptus

Cloud

system

to

demonstrate

general

purpose

computing

using

Cloud

platforms


Possible

ways

of

connecting

the

two

architectures

have

been

studied



Several

issues

still

remain

to

be

solved


Instance startup time needs to be taken into account


A new scheduling strategy is needed for more complex architectures


The performance of such a system needs to be measured




GridRPC


Interesting approach for several applications


Simple, flexible, and efficient


Many interesting research issues (scheduling, data management, resource discovery and reservation,
deployment, fault
-
tolerance, …)


DIET


Scalable, open
-
source, and multi
-
application platform


Concentration on several issues like resource discovery, scheduling (distributed scheduling and plugin
schedulers), deployment (GoDIET and GRUDU), performance evaluation (CoRI), monitoring (LogService and
VizDIET), data management and replication (DTM, JuxMem and DAGDA)


Large scale validation on the Grid5000 platform


A middleware designed and tunable for an application given


And now …



Client/server DIET for Décrypthon applications


Deployment and validation on execution


Duplicate and check requests from UD


Validation using SeD_batch (Loadleveler version)


Data management optimization


Scheduling optimization


More information and statistics for users


Fault tolerance mechanisms


Conclusions and future work

http://graal.ens
-
lyon.fr/DIET

http://www.grid5000.org/

Research Topics


Scheduling


Distributed scheduling


Software platform deployment with or without dynamic connections between
components


Plug
-
in
schedulers


Multiple (parallel) workflows scheduling


Links with batch
schedulers


Many tasks scheduling



Data
-
management


Scheduling of computation requests and links with data
-
management


Replication,
data
prefetching


Workflow
scheduling



Performance evaluation


Application modeling


Dynamic information about the platform (network, clusters)

Questions ?

http://graal.ens
-
lyon.fr/DIET