PBS, LSF and ARC integration

judgedrunkshipΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

148 εμφανίσεις

06/08/10

PBS, LSF and ARC integration

Zoltán Farkas

zfarkas@sztaki.hu

MTA SZTAKI LPDS


06/08/10

PBS, LSF and ARC

2

Outline


Introduction


Requirements


PBS and LSF


ARC


Architecture of P
-
GRADE Portal runtime layer


PBS/LSF integration


ARC integration


Summary

06/08/10

PBS, LSF and ARC

3

Introduction


P
-
GRADE Portal supported gLite, Globus


ETHZ requirement:


Make use of PBS local clusters


Make use of LSF local clusters (Brutus)


Sometimes make use of ARC grid resources


All this should be integrated within P
-
GRADE
Portal

06/08/10

PBS, LSF and ARC

4

PBS (and LSF)


Portable Batch Scheduler


(Load Sharing Facility)


Schedule users' jobs on a cluster


Interactive login to a submission node


Users execute different commands:


qsub (bsub): submit


qstat (bjobs): status


qdel (bkill): abort

Submission

Node




Cluster

node

Cluster

node

Cluster

node

Cluster

node

Cluster

node

Scheduler

node

06/08/10

PBS, LSF and ARC

5

ARC


Advanced Resource Connector


Complete grid middleware with:


Information system


Command
-
line clients with integrated broker


Data management stack (GridFTP)


Usable through client programs:


Job description: xRSL


ngsub: submit


ngstat: status update


ngkill: cancel


ngget: get results

06/08/10

PBS, LSF and ARC

6

P
-
GRADE Portal Architecture


Workflow Editor
-
related components


Portlet
-
related components


Workflow data storage


Execution layer




See next slide!

06/08/10

PBS, LSF and ARC

P
-
GRADE Portal Machine

Globus Grid

EGEE Grid

P
-
GRADE Portal's filesystem

User

Workflow

Data

Common workflow and

job execution scripts

Globus scripts

EGEE scripts

Apache Tomcat servlet container

GridSphere portal framework

P
-
GRADE

Portal

Portlet

DAGMan

PBS scripts

PBS

Cluster

Workflow

Editor

Servlet

Workflow

Editor

Client

P
-
GRADE

Portal

Portlet

P
-
GRADE

Portal

Portlet

P
-
GRADE

Portal

Portlet

P
-
GRADE

Portal

Portlet

06/08/10

PBS, LSF and ARC

8

LSF and PBS integration I.


Principal idea:


User should be able to configure a remote ssh connection to
submission nodes through the Settings portlet


Connection is established using ssh keypairs


Established connections are reused in order to minimize ssh
connection attempts


Connections are used on a:


Per
-
user,


Per
-
resource bassis



a given user's connection isn't accessible by other users



different resources use different connections

06/08/10

PBS, LSF and ARC

9

LSF and PBS integration II.

Portal Machine

Connection Pool User 1

Connection Pool User 2

LSF resource 1

PBS resource 1

LSF resource 3

PBS resource 2

LSF resource 2

PRIV

PUB

PRIV

PUB

PUB

PUB

PUB

06/08/10

PBS, LSF and ARC

10

LSF and PBS integration III.


Job preparation:


wkf_pre_LSF.sh: prepare job, wrapper, collect files


wkf_pre_PBS.sh: prepare job, wrapper, collect files


Job execution:


wkf_LSF.sh: submit and observe job using b* commands


wkf_PBS.sh: submit and observer job using q* commands


Wrappers:


LSF_fake.sh: handle generator and collector jobs, run exe


PBS_fake.sh: handle generator and collector jobs, run exe


Job post
-
processing:


No real task (wkf_post_LSF.sh and wkf_post_PBS.sh)

06/08/10

PBS, LSF and ARC

11

LSF and PBS integration features


Full PS support


Very quick response time compared to grid
middlewares


Support for any kind of executable

06/08/10

PBS, LSF and ARC

12

ARC integration I.


Very similar to the EGEE support


An ARC client stack has to be installed on the P
-
GRADE Portal machine


Users can gain access with X.509 proxy certs


Two possible resource selections:


User can specify the target cluster


Cluster can be selected by client broker

06/08/10

PBS, LSF and ARC

13

ARC integration II.


Job preparation: wkf_pre_nordugrid.sh


Wrapper script preparation


Generator
-
related cleanups (as needed)


Autogenerator
-
related file uploads (as needed)


Job execution: wkf_nordugrid.sh


xRSL prepared based on job properties


Job submission and management using ng* commands


Wrapper script: manage generator and collector jobs if
needed


Job post
-
processing: wkf_post_nordugrid.sh


No real job to perform

06/08/10

PBS, LSF and ARC

14

ARC integration features


Full PS support


Offers the possibility to select execution
resource


Support for any kind of executable


Multi
-
node job support


Offers possibility to specify
runTimeEnvironment attributes

06/08/10

PBS, LSF and ARC

15

Summary


PBS, LSF and ARC integration was relatively
simple thanks to the pluggable architecture of P
-
GRADE Portal


However, the devil is in the details:


Ssh connection sharing + parallel connection limits


Proper LSF job cancel