Presentation - IEEE High Performance Extreme Computing

signtruculentΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

109 εμφανίσεις

AN INGENIOUS APPROACH FOR IMPROVING
TURNAROUND TIME OF GRID JOBS WITH

RESOURCE ASSURANCE AND ALLOCATION
MECHANISM

Shikha

Mehrotra


Centre for Development of Advanced Computing

CDAC, Bangalore, India

{shikham@cdac.in}

10
-
12 September 2012

1

IEEE HPEC'12

Outline


Indian National grid GARUDA


Need for Reservation in Grid


Approach followed in realizing reservation in Garuda Grid


Architecture


Features


Performance analysis


Job flow in Garuda grid


Performance metrics


Turnaround time of grid jobs


Case
-
study


Turn
-
around time without reservation


Turn
-
around time with reservation


Data analysis


Results


Conclusion

10
-
12 September 2012

IEEE HPEC'12

2

Grid Computing


Distributed Computing taken to the next level



Aggregation of Resources from many participants
(geographically distributed in general)


Compute resources


Data resources


Special instruments (Telescopes, microscopes, so on..)



Unified, Seamless access to these resources


Analogous to the “Power Grid”

10
-
12 September 2012

3

IEEE HPEC'12

India’s National Grid Computing Initiative:
GARUDA

10
-
12 September 2012

4

IEEE HPEC'12


Motivation


To Collaborate on Research and Engineering of
Technologies, Architectures, Standards and
Applications in Grid Computing


To Contribute to the aggregation of resources in the
Grid


Production infrastructure with


Gigabit networking backbone (NKN)


Large HPC computing resources


Massive Storage


Tools and Services for Unified Access



Currently


Connects more than 60 institutions


Academic & Research labs


Spans across 17 cities of India


Supports 10 Virtual Organizations


Bioinformatics, Seismic engineering, Climate
modeling,
Drug discovery ….

Problem Statement


As the demand for the resources increases more and more, it
becomes really difficult to manage the jobs and allocate
resources to them and hence most of the jobs will be in the
queued state waiting for the resource to be free.

10
-
12 September 2012

IEEE HPEC'12

5

Our Approach


Reduce waiting time


Solution : Advance Reservation of resources



An advance reservation is a reservation that a user or
administrator can request and the scheduler can create.



It
guarantees the availability of resources at specified
future time slot.



10
-
12 September 2012

IEEE HPEC'12

6

Compute Reservation


An advance reservation is essentially defined by the
following:



Start time which is defined using the standard date
-
time
format


An end time, which is either defined using the standard
date
-
time format or computed from the start time plus a
duration value,


Number and type of resource to be reserved.


10
-
12 September 2012

7

IEEE HPEC'12

Garuda Reservation Architecture






RESERVATION
REPLICA DB


LOCAL RESOURCE MANAGER

RESERVATION MANAGER AND
SCHEDULER

GARUDA LRM RESERVATION
COMPONENT

GARUDA MIDDLEWARE
RESERVATION COMPONENT

GLOBUS MIDDLEWARE

GRIDWAY META
-
SCHEDULER

GARUDA GRID LEVEL RESERVATION COMPONENT






RESERVATION DB

FAILOVER

API

COMMANDS

APPLICATIONS

Garuda Reservation Features


Advanced and Immediate Reservation of resources across multiple
clusters


Ensure resource availability


GSI based reservation: Garuda Reservation


Grid Reservation Failover mechanism:


Application Programming Interface


Intelligent resource allocation based on
QoS

Parameters


Virtual Organization support


Avoiding resource under utilization


Integration with
Gridway

Meta
-
scheduler and
Globus

Middleware

Performance Analysis

10
-
12 September 2012

IEEE HPEC'12

10

Performance Metrics


Mean waiting time



Execution time



Turnaround time

10
-
12 September 2012

11

IEEE HPEC'12

Turnaround Time


Turnaround time (total time taken between the submission of
a

program/process/thread/task (Linux) for execution and the
return of

the complete output to the customer/user)


10
-
12 September 2012

12

IEEE HPEC'12

Job Submission

Job Output

User

Performance Analysis

10
-
12 September 2012

IEEE HPEC'12

13

Turn
-
around time without
reservation

10
-
12 September 2012

14

IEEE HPEC'12

Job Set

Waiting

Execution

Turnaround

Job Set 1

0:04:00

0:17:16

0:22:02

Job Set 2

0:06:00

0:17:27

0:24:14

Job Set 3

0:44:00

0:18:31

1:02:49

Job Set 4

1:11:00

0:17:27

1:38:42

Job Set 5

1:20:00

0:18:26

1:37:41

Turn
-
around time without
reservation

10
-
12 September 2012

15

IEEE HPEC'12

Turn
-
around time
with reservation

10
-
12 September 2012

16

IEEE HPEC'12

Job Set

Waiting

Execution

Turnaroun
d

Job Set 1

0:00:09

0:08:03

0:08:32

Job Set 2

0:00:09

0:08:05

0:08:35

Job Set 3

0:00:09

0:08:07

0:08:37

Job Set 4

0:00:09

0:08:05

0:08:37

Job Set 5

0:00:08

0:07:15

0:07:45

10
-
12 September 2012

IEEE HPEC'12

17

Turn
-
around time
with reservation

Comparison of Turnaround times

10
-
12 September 2012

IEEE HPEC'12

18


Guarantees the availability of resources


Eliminates the waiting time


Reduces Turnaround time considerably


Well
integrates into the Grid Middleware


Built for the production infrastructure


Analysis
has shown results that are really encouraging.


10
-
12 September 2012

IEEE HPEC'12

19

Conclusion

Thank You

10
-
12 September 2012

IEEE HPEC'12

20