Grid

gabonesedestructionSoftware and s/w Development

Feb 17, 2014 (3 years and 7 months ago)

92 views

Eddie.Aronovich@cs.tau.ac.il

Grid
Infrastructure

What is it ?


Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

3

SERVERS

Clients

IT all about IT

Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

4

Hardware utilization


Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

5

SOA & Web services


Decompose processing into services



Each service works independently



Main components:


Universal Description, Discovery and Integration


Simple Object Access Protocol


Web Services Description Language



W
3
C standard




Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

6


Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

7


Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

8

THE WORLD NEEDS ONLY FIVE COMPUTERS


(Thomas J. Watson)



Google grid


Microsoft's live.com


Yahoo!


Amazon.com


eBay


Salesforce.com


Well, that's O(
5
) ;)




Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)

Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

9

Scaling


Scale
-
up


Add more resources within the system


Does not requires changes in the applications


Limited extension


Singe point of failure



Scape
-
out


Add more systems


Architecture dependent (needs change of code)


Economically


Howto ?


Split the operation into groups


Perform each group on a different machine


Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

10

How fast can parallelization be ?


Let:



α

be the proportion of the process that can not be
parallelized.


P


number of processors


S


System speedup


Amdhals law:

S =
1
/ (
α

+ (
1
-

α

) / P )

Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

11

Cluster

types


High availability


Active
-
Active


Active
-
Passive


Heart beat



Load Balancing Cluster


Round robin (weighted/non
-
weighted)


System status aware (session, cpu load, etc)



Compute cluster


Queuing system (condor, hadoop, open
-
pbs, LSF, etc.)


Single system image (ScaleMP, SSI, Mosix, nomad,etc.)

Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

12

Condor script


#################


# Sample script #


#################



Executable


= /bin/hostname


when_to_transfer_output

= ON_EXIT_OR_EVICT


Log



= {file name}.log


Error


= err.$(Process)


Output


= out.$(Process)


Requirements


=
substr
(Machine,
0
,
4
)=="
dopp
"
&& ARCH=="X
86
_
64
"


Arguments


= +
-
u


notification


= Complete


Universe


= VANILLA


Queue
10

From a single PC to a Grid

Farm of PCs

Examples:

Seti@home

Africa@home

Example:

EGEE

Enterprise grid:

Mutualization of
resources in a
company

Volunteer
computing: CPU
cycles made
available by PC
owners

Grid infrastructure:

Internet + disk and storage
resources + services for
information management ( data
collection, transfer and analysis)


Batch to On
-
Line scale

gLite

&

Globus

Dedicated
resources

PBS Torque

Utility computing

(Condor)

hadoop

Eddie Aronovich


Operating System course (TAU CS, Jan 2009)

15

Key Cloud Services Attributes


Off
-
Site, Thirds
-
party provider


Access via Internet


Minimal/no IT skills required to “implement”


Provisioning
-

self
-
service requesting; near
real
-
time deployment; dynamic & fine
-
grained
scaling


Fine
-
grained usage
-
based pricing model


UI
-

browser and successors


Web services APIs as System Interface


Shared resources/common versions


Source: IDC, Sep
2008


What is “Grid”


Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

17

What is Grid Computing ?

Definition is not widely agreed

Foster & Kesselman:



Computing resources are not
administered centrally.



Open standards are used.



Non
-
trivial quality of service is achieved.

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

18

Other definitions


"the technology that enables resource virtualization,
on
-
demand provisioning, and service (resource)
sharing between organizations."

(Plaszczak/Wellner)



"a type of parallel and distributed system that enables
the sharing, selection, and aggregation of
geographically distributed
autonomous

resources
dynamically at runtime depending on their availability,
capability, performance, cost, and users' quality
-
of
-
service requirements“ (
Buyya
)



"a service for sharing computer power and data
storage capacity over the
Internet
."

(CERN)

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

19

Virtual Organization


What’s a VO?


People in different organisations
seeking to cooperate and share
resources across their
organisational boundaries


Why establish a Grid?


Share data


Pool computers


Collaborate


The initial vision: “The Grid”


The present reality: Many “grids”


Each grid is an infrastructure
enabling one or more “virtual
organisations” to share
computing resources


Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

20

Institute A

VO
1

Institute C

Institute B

Institute D

Institute E

VO
2

Institute F

The Grid Metaphor

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

21

G

R

I

D


M

I

D

D

L

E

W

A

R

E

Visualising

Workstation

Mobile Access

Supercomputer, PC
-
Cluster

Data
-
storage, Sensors, Experiments

Internet, networks

Stand alone computer

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

22

Stand alone computer

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

23

Stand alone computer

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

24

Middleware components


The batch approach

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

25

Information

Service

Publish

Replica

Catalogue

DataSets info

Logging &

Book
-
keeping

Author.

&Authen.

Storage

Element

Computing

Element

Resource

Broker

Job Status

Job Submit Event

Job Query

Input “sandbox”

Output “sandbox”

“User

interface”

UI

Network

Server


Job Contr.



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

Characts.

& status

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status

submitted

Job

Status

UI: allows users to

access the functionalities

of the WMS

(via command line, GUI,

C++ and Java APIs)

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status

edg
-
job
-
submit myjob.jdl

Myjob.jdl

JobType = “Normal”;

Executable = "$(CMS)/exe/sum.exe";

InputSandbox = {"/home/user/WP
1
testC","/home/file*”,
"/home/user/DATA/*"};

OutputSandbox = {“sim.err”, “test.out”, “sim.log"};

Requirements = other. GlueHostOperatingSystemName ==
“linux" &&

other. GlueHostOperatingSystemRelease == "Red Hat
7.3
“ &&
other.GlueCEPolicyMaxCPUTime >
10000
;

Rank = other.GlueCEStateFreeCPUs;


submitted

Job

Statu
s

Job Description Language

(JDL) to specify job

characteristics and

requirements

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

Input

Sandbox

files

Job

waiting

submitted


Job
Status

NS: network daemon

responsible for accepting

incoming requests

Job submission

UI

Network

Server


Job Contr.

-


CondorG





Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job
Status

WM: acts to

satisfy the request

Job

Workload
manager

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job Status


Match
-

Maker/

Broker


Where must this

job be

executed ?

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job
Status


Match
-

Maker/

Broker


Matchmaker: responsible

to find the “best” CE

for a job

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job
Status


Match
-

Maker/

Broker


Where are (which SEs)

the needed data ?

What is the

status of the

Grid ?

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job
Status


Match
-

Maker/

Broker


CE choice

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage

waiting

submitted


Job
Status


Job

Adapter


Job Adapter: responsible for the final “touches”

to the job before performing submission

(e.g. creation of wrapper script, PFN, etc.)

Job submission

UI

Network

Server


Job Contr.




Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage


Job
Status

Job Controller: responsible for the

actual job management

operations (done via

CondorG)

Job

submitted

waiting

ready

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node

CE characts

& status

SE characts

& status


RB

storage


Job
Status

Job

submitted

waiting

ready

scheduled

“Compute element”


reminder!



Homogeneous set of
worker nodes

Grid gate node

Local resource management system:

Condor / PBS / LSF master

Globus gatekeeper



Job request

Info
system

Logging


gridmapfile

I.S.

Logging

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node


RB

storage


Job
Status

submitted

waiting

ready

scheduled

running

“Grid enabled”

data transfers/

accesses

Job

Input

Sandbox

files

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node


RB

storage


Job
Status

Output

Sandbox

files

submitted

waiting

ready

scheduled

running

done

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node


RB

storage


Job
Status

submitted

waiting

ready

scheduled

running

done

edg
-
job
-
get
-
output <dg
-
job
-
id>

Job submission

UI

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Replica

Location

Server

Inform.

Service

Computing

Element

Storage

Element

RB node


RB

storage


Job
Status

Output

Sandbox

files

submitted

waiting

ready

scheduled

running

done

cleared

Job monitoring

UI


Log

Monitor

Logging &

Bookkeeping

Network

Server


Job Contr.

-


CondorG



Workload

Manager


Computing

Element

RB node

LM: parses CondorG log

file (where CondorG logs

info about jobs) and notifies LB

LB: receives and stores

job events; processes

corresponding job status

Log of

job events

edg
-
job
-
status <dg
-
job
-
id>

edg
-
job
-
get
-
logging
-
info <dg
-
job
-
id>

Job

status

Grid Operation and Security by Eddie Aronovich, Mar
2008

44

Approaches to Security:
1

The Poor Security House

Grid Operation and Security by Eddie Aronovich, Mar
2008

45

Approaches to Security:
2

The Paranoid Security House

Grid Operation and Security by Eddie Aronovich, Mar
2008

46

Approaches to Security:
3

The Realistic Security House

Grid Operation and Security by Eddie Aronovich, Mar
2008

47

Mapping certificate to local user


Site use local accounting system



Pool of users dedicated for the Grid




Each user is mapped using gridmap file or
VOMS



Mapping can implement

local policy on external users

Grid Operation and Security by Eddie Aronovich, Mar
2008

48

Certificate Request

Private Key
encrypted on
local disk

Certificate

Request

Public Key

ID

Cert

User generates

public/private

key pair.

User send public
key to CA along
with proof of
identity.

CA confirms identity,
signs certificate and
sends back to user.

slide based on presentation given by Carl Kesselman at GGF Summer School
2004


Grid Operation and Security by Eddie Aronovich, Mar
2008

49

Inside the Certificate


Standard (X.
509
) defined format.



User identification (e.g. full name).



Users Public key.



A “signature” from a CA created by encoding a unique string (a
hash) generated from the users identification, users public key
and the name of the CA. The signature is encoded using the
CA’s private key. This has the effect of:


Proving that the certificate came from the CA.


Vouching for the users identification.


Vouching for the binding of the users public key to their
identification.

Name

Issuer: CA

Public Key

Signature

Grid Operation and Security by Eddie Aronovich, Mar
2008

50

Mutual Authentication


A sends their certificate;



B verifies signature in A’s certificate;



B sends to A a challenge string;



A encrypts the challenge string with his
private key;



A sends encrypted challenge to B



B uses A’s public key to decrypt the
challenge.



B compares the decrypted string with the
original challenge



If they match, B verified A’s identity and A
can not repudiate it.

A

B

A’s certificate

Verify CA signature

Random phrase

Encrypt with A’ s private key

Encrypted phrase

Decrypt with A’ s public key

Compare with original phrase

Grid Operation and Security by Eddie Aronovich, Mar
2008

51

Proxy certificate


Avoid passphrase re
-
enter by creating a
proxy


Proxy consists of a new certificate and a private key


Proxy certificate contains the owner's identity
(modified)



Remote party receives proxy's certificate (signed by
the owner), and owner's certificate.


Proxy certificate is life
-
time limited


Chain of trust from the CA to proxy through the owner

Grids in Europe

www.eu
-
egi.eu

52

EGEE
08
Istanbul, Turkey


www.eu
-
egi.eu


Prof. Dieter KRANZLMUELLER , EGEE
08

To be continued

Eddie Aronovich


Operating System course (TAU CS, Jan
2009
)

53