SGE (Sun Grid Engine) PBS - Rajkumar Buyya

arghtalentΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 8 μήνες)

125 εμφανίσεις

SProj 3

Libra: An Economy
-
Driven Cluster Scheduler


Jahanzeb Sherwani

Nosheen Ali

Nausheen Lotia

Zahra Hayat


Project Advisor/Client: Rajkumar Buyya

Faculty Advisor: Dr. Arif Zaman

Problem Statement

Implementing a computational
-
economy based user
-
centric
scheduler for clusters


A collection of workstations interconnected via a
network technology, in order to take advantage
of combined computational power and resources



An integrated collection of resources that can
provide a single system image spanning all its
nodes: a virtual supercomputer




Used for computation
-
intensive applications
such as AI expert systems, nuclear simulations,
and scientific calculations

What is a cluster?

Why clusters?


Cost
-
effectiveness: low cost
-
performance ratio
compared to a specialized supercomputer


Increase in workstation performance


Increase in network bandwidth


Decrease in network latency


Scalability higher than that of a specialized
supercomputer


Easier to integrate into an existing network than
specialized supercomputers


Computational Economy


Traditional

system
-
centric

performance

metrics


CPU

Throughput


Mean

Response

Time


Shortest

Job

First


Computational

economy

is

the

inclusion

of

user
-
specified

quality

of

service

parameters

with

jobs

so

that

resource

management

is

user
-
centric

rather

than

system
-
centric

Computational Economy (cont’d)


Project focus: to implement a scheduler
that aims to maximize user utility


Job parameters most relevant to user
-
centric scheduling


Budget allocated to job by user


Deadline specified by user


Computational Economy for Grids


What is a grid?


An infrastructure that couples resources such as computers
(workstations or clusters ), software (for special purpose
applications) and devices (printers, scanners) across the Internet
and presents them as a unified integrated single resource that
can be widely used



How a grid differs from a cluster


Wide geographical area


Non
-
dedicated resources


No centralized resource management



Computational Economy for Grids


Management of resources and scheduling
computations in a grid environment is complex as the
resources are



geographically distributed



heterogeneous in nature



owned by different individuals or organizations


have different access and cost models


resource discovery required


security issues


Computational economy has been implemented for grids: the
Nimrod/G resource broker is a global resource management and
scheduling system that supports deadline and economy
-
based
computations in grid
-
computing environments

Computational Economy for Clusters


Market
-
based Proportional Resource Sharing for
Clusters: Brent Chun and David E. Culler, University
of California at Berkeley, Computer Science Division


a market
-
based approach based on the notion of a
computational economy which optimizes for
user valu
e. It
describes an architecture for market
-
based cluster resource
management based on the idea of proportional resource
sharing of basic computing resources. Cluster nodes act as
independent sellers of computing resources while user
applications act as buyers who purchase resources . Users
are allocated credits/tickets
-
the more tickets they have, the
greater their CPU share. Ticket allocation is on the basis of
the amount the user is willing to pay: his valuation of the job


Deadline not incorporated

Cluster Architecture

Cluster Management Software


Cluster

Management

Software

is

designed

to

administer

and

manage

application

jobs

submitted

to

workstation

clusters
.


Creates a Single System Image


When a collection of interconnected computers appear to be a
unified resource, we say it possesses a Single System Image


The benefit of a Single System Image is that the exact location
of the execution of a process is entirely concealed from the user.
The user is offered the illusion of a single powerful computer


Maintains centralized information about cluster status
and resources

Cluster Management Software


Commercial and Open
-
source Cluster Management
Software


Open
-
source Cluster Management Software


DQS (Distributed Queuing System )


CONDOR


GNQS (Generalized Network Queuing System)


MOSIX


REXEC (Remote Execution)


SGE (Sun Grid Engine)


PBS (Portable Batch System)

Cluster Management Software


Why SGE was rejected


lack of online support


lack of stability



Final choice of CMS: PBS(Portable Batch System )

Pricing the Cluster Resources


Cost= a (Job Execution Time) + b (Job Execution
Time / Deadline)


Cost of using the cluster depends on job length and
job deadline: the longer the user is prepared to wait
for the results, the lower his cost


Cost formula forces user to reveal his true deadline

Scheduling Algorithm


How to meet budget and deadline
constraints?


Ensuring low run
-
time for the algorithm


Greedy Algorithm


Complex solutions unfeasible


Test run of algorithm:


5 jobs, arriving at time t=0, 5, 7, 9, 9, on a 3
node cluster


LIBRA with PBS


Portable Batch System (PBS) as the
Cluster Management Software (CMS)


Robust, portable, effective, extensible batch
job queuing and resource management
system


Supports different schedulers


Job accounting


Technical Support



Setting up the PBS Cluster


Installation of Linux with Windows


Installation of SGE as well as PBS


Setting up a Network File System


Configuring GridSim in Java


Configuring PBSWeb


Setting up the Apache WebServer


PHP scripting for Apache


Setting up PostgreSQL


Setting up SSH

PBS Overview


Main components of PBS


Job Server pbs_server


Job Scheduler pbs_sched


Job Executor & Resource Monitor pbs_mom



The server accepts commands and
communicates with the daemons



qsub
-

submit a job



qstat
-

view queue and job status



qalter
-

change job’s attributes



qdel
-

delete a job

Xpbs


GUI for PBS

Xpbs
---

GUI for PBS

Job Scheduling in PBS

The Libra Scheduler


Default FIFO Scheduler in PBS


FIFO
-

sort jobs by job queuing time running
the earliest job first


Fair share: sort & schedule jobs based on
past usage of the machine by the job owners


Round
-
robin
-

pick a job from each queue


By key
-

sort jobs by a set of keys:
shortest_job_first, smallest_memory_first


The Libra Scheduler


Job Input Controller


Adding parameters at job submission time


deadline


budget


executionTime


Defining new attributes of job


Job Acceptance and Assignment Controller


Budget checked through cost function


Admission control through deadline scheduling


Execution host with the minimum load and ability to
finish job on time selected


Equal Share instead of Minimum Share

The Libra Scheduler


Job Execution Controller


Job run on the best node according to
algorithm


Cluster and node status updated


runTime


cpuLoad


Job Querying Controller


Server, Scheduler, Exec Host, and
Accounting Logs


PBS
-
Libra Web
---

Front
-
end for
the Libra Engine

PBS
-
Libra Web

PBS
-
Libra Web

PBS
-
Libra
Web

PBS
-
Libra
Web

PBS
-
Libra Web

PBS
-
Libra
Web

PBS
-
Libra Web

PBS
-
Libra Web

Simulations


Goal:


Measure the performance of Libra Scheduler



Performance = ?



Maximize user satisfaction

Simulations



Simulation Software


Alter GridSim (grid resource management
simulation)

GridSim Class Diagram


Simulations


Methodology



Workload


120 jobs with deadlines and budgets


Job lengths: 1000 to 10000



Resources


10 node, single processor (MIPS rating: 100)
homogenous cluster

Simulations


Assumptions


Strict deadlines


Ignores processing overhead due to
scheduler and clock interrupt



Scheduler simulated as a function


Input: job size, deadline, budget


Output: accept/reject, node #, share allocated

Simulations


Compared:


Proportional Share


FIFO



Experiments:


120 jobs, 10 nodes


Increasing workload to 150 and 200


Increasing cluster size to 20

Simulation Results





120 jobs, 20 did not meet budget



100 Jobs, 10 Nodes

FIFO: 23 rejected
-

Proportional Share: 14 rejected

0
200
400
600
800
1000
1200
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
experiments
time
0
200
400
600
800
1000
1200
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
experiments
time
Simulation Results





Increase workload to 200 jobs on the same 10
node cluster



200 Jobs, 10 Nodes

FIFO: 105 rejected
-

Proportional Share: 93 rejected

0
200
400
600
800
1000
1200
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
0
200
400
600
800
1000
1200
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
Simulation Results





Scale the cluster up to 20 nodes



200 Jobs, 20 Nodes

FIFO: 35 rejected
-

Proportional Share: 23 rejected

0
200
400
600
800
1000
1200
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
0
200
400
600
800
1000
1200
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
Simulation Results

Load on Node 4
0
20
40
60
80
100
120
11.09
23.18
28.16
32.18
36.27
38.24
44.27
56.28
73.36
97.5
time in s
CPU Utilization
Simulation Results

Load on Node 7
0
20
40
60
80
100
120
10.08
23.14
29.2
36.19
37.24
55.28
64.32
71.35
73.39
83.44
101.5
170.8
time in s
CPU Utilization
Simulation Results

Load on Node 9
0
50
100
150
11.1
15.1
19.1
29.2
66.3
67.3
67.3
82.3
time in s
CPU Utilization
Simulation Results

Load on Node 6
0
20
40
60
80
100
120
6.064562
37.19915
152.6984
154.7456
time in s
CPU Utilization
Conclusion & Future Work


Succesfully implemented a Linux
-
based cluster
that schedules jobs using PBS with our
economy
-
driven Libra scheduler, and PBS
-
Libra
Web as the front end.


Successfully tested our scheduling policy


Proportional Share delivers more value to users


Exploring other pricing mechanisms


Expanding the cluster with more nodes and with
support for parallel jobs