Introduction to Grid & Cluster Computing

fizzlargeSecurity

Nov 3, 2013 (4 years and 4 days ago)

74 views

Introduction to Grid & Cluster Computing


Sriram Krishnan, Ph.D.

sriram@sdsc.edu

Web Portals

Rich Clients

Set of Biomedical Applications

Motivation: NBCR Example

Resources

Telescience Portal

Cyber
-
Infrastructure

Web Services

Workflow

Middleware

PMV ADT

Vision

Continuity

APBSCommand

APBS

Continuity

Gtomo2

TxBR

Autodock

GAMESS

QMView

Cluster Resources


“A computer cluster is a group of tightly coupled
computers that work together closely so that in
many respects they can be viewed as though
they are a single computer.” [
wikipedia
]


Typically built using commodity off
-
the
-
shelf
hardware (processors, networking, etc)


Differs from traditional “supercomputers”


Now at more than 70% of deployed Top500 machines


Useful for: high availability, load
-
balancing,
scalability, visualization, and high performance

Grid Computing


“Coordinated resource sharing and problem solving in
dynamic multi
-
institutional virtual organization.” [
Foster,
Kesselman, Tuecke
]



Coordinated
-

multiple resources working in concert, eg. Disk &
CPU, or instruments & database, etc.


Resources
-

compute cycles, databases, files, application
services, instruments.


Problem solving
-

focus on solving scientific problems


Dynamic
-

environments that are changing in unpredictable
ways


Virtual Organization
-

resources spanning multiple
organizations and administrative domains, security domains,
and technical domains

Grids are not the same as Clusters!


Foster’s 3 point checklist


Resources not subjected to centralized
control


Use of standard, open, general
-
purpose
protocols and interfaces


Delivery of non
-
trivial qualities of service


Grids are typically made up of multiple
clusters

Popular Misconception


Misconception: Grids are all about CPU
cycles


CPU cycles are just one aspect, others are:


Data: For publishing and accessing large
collections of data, e.g. Geosciences Network
(GEON) Grid


Collaboration: For sharing access to instruments
(e.g. TeleScience Grid), and collaboration tools
(e.g. Global MMCS at IU)

SETI@Home


Uses 1000s of internet
connected PCs to help in
search for extraterrestrial
intelligence


When the computer is idle, the
software downloads ~ 1/2 MB
chunk of data for analysis.


Results of analysis sent back
to the SETI team, combined
with 1000s of other participants


Largest distributed computation
project in existence


Total CPU time: 2433979.781
years


Users: 5436301


Statistics from 2006


NCMIR TeleScience Grid

* Slide courtesy TeleScience folks

Condor pool

SGE Cluster

PBS Cluster

Globus

Globus

Globus

Application Services

Security Services (GAMA)

State

Mgmt

Gemstone

PMV/Vision

Kepler

NBCR Grid

Day 1
-

Using Grids and Clusters: Job Submission


Scenario 1
-

Clusters:


Upload data to remote cluster using
scp


Log on to the said cluster using
ssh


Submit job via command
-
line to schedulers, such as
Condor or the Sun Grid Engine (SGE)


Scenario 2
-

Grids:


Upload data using to Grid resource using
GridFTP


Submit job via Globus command
-
line tools (e.g.
globus
-
run
) to remote resources


Globus services communicate with the resource specific
schedulers


Day 1
-

Using Grids & Clusters: Security

Day 1
-

Using Grids & Clusters: User Interfaces

Day 2
-

Managing Cluster Environments


Clusters are great price/performance
computational engines


Can be hard to manage without experience


Failure rate increases with cluster size


Not cost
-
effective if maintenance is more
expensive than the cluster itself


System administrators can cost most than
clusters (1 Tflops cluster < $100,000)

Day 2
-

Rocks (Open Source Clustering Distribution)


Technology transfer of commodity clustering to
application scientists


Making clusters easy


Scientists can build their own supercomputers


Rocks distribution is a set of CDs


Red Hat Enterprise Linux


Clustering Software (PBS, SGE, Ganglia, Globus)


Highly programmatic software configuration
management


http://www.rocksclusters.org

Day 2
-

Rocks Rolls

Day 3
-

Advanced Usage Scenarios: Workflows


Scientific workflows emerged as an
answer to the need to
combine

multiple

Cyberinfrastructure components in
automated process networks


Combination of


Data integration, analysis, and visualization
steps


Automated
“scientific process”


Promotes scientific discovery

Day 3
-

The Big Picture: Scientific Workflows

Here:

John Blondin, NC State

Astrophysics

Terascale Supernova Initiative

SciDAC, DOE

Conceptual SWF

Executable SWF


From
“Napkin Drawings”





… to
Executable Workflows

Source: Mladen Vouk (NCSU)

Day 3
-

Kepler Workflows: A Closer Look

Day 3
-

Advanced Usage Scenarios: MetaScheduling


Local schedulers are responsible for load
balancing and resource sharing within
each local administrative domain


Meta
-
Schedulers are responsible for
querying, negotiating access and
managing resources existing within
different administrative domains in Grid
systems

Day 3
-

MetaSchedulers: CSF4


What is the CSF Meta
-
Scheduler?


C
ommunity
S
cheduler
F
ramework


CSF4 is a group of Grid services hosted inside the
Globus Toolkit (GT4)


CSF4 is fully WSRF compliant


Open Source project and can be accessed at
http://sourceforge.net/projects/gcsf


The development team of CSF4 is from Jilin
University, PRC

Day 3
-

CSF4 Architecture

L

o

c

a

l



M

a

c

h

i

n

e

P

B

S

S

G

E

C

o

n

d

o

r

L

S

F

L

o

c

a

l



M

a

c

h

i

n

e

P

B

S

S

G

E

C

o

n

d

o

r

:

:

CSF

4


Services

Q

u

e

u

i

n

g



S

e

r

v

i

c

e

R

e

s

o

u

r

c

e



M

a

n

a

g

e

r



L

S

F



S

e

r

v

i

c

e

G

r

a

m

P

B

S

G

r

a

m

C

o

n

d

o

r

G

r

a

m

F

o

r

k

G

r

a

m

S

G

E

W

S

-

G

R

A

M

g

a

b

d

R

e

s

o

u

r

c

e



M

a

n

a

g

e

r



F

a

c

t

o

r

y



S

e

r

v

i

c

e

J

o

b



S

e

r

v

i

c

e

R

e

s

e

r

v

a

t

i

o

n



S

e

r

v

i

c

e

G

T

2



E

n

v

i

r

o

n

m

e

n

t

G

a

t

e

K

e

e

p

e

r

G

r

a

m

P

B

S

G

r

a

m

S

G

E

G

r

a

m

C

o

n

d

o

r

G

r

a

m

F

o

r

k

R

e

s

o

u

r

c

e



M

a

n

a

g

e

r



G

r

a

m



S

e

r

v

i

c

e

W

S

-

M

D

S

Meta Information

Grid Environment

G

r

a

m

L

S

F

Day 4
-

Accessing TeraScale Resources


I need more resources! What are my options?


TeraGrid
: “With 20 petabytes of storage, and more
than 280 teraflops of computing power, TeraGrid
combines the processing power of supercomputers
across the continent”


PRAGMA:

“To establish sustained collaborations and
advance the use of grid technologies in applications
among a community of investigators working with
leading institutions around the Pacific Rim”

Day 4
-

TeraGrid

TeraGrid is a “top
-
down”,

planned Grid

PSC

PSC

Extensible Terascale Facility


Members: IU, ORNL, NCSA, PSC,
Purdue, SDSC, TACC, ANL,
NCAR


280 Tflops of computing capability


30 PB of distributed storage


High performance networking
between partner sites


Linux
-
based software
environment, uniform
administration


Focus is a national, production
Grid

PRAGMA Grid Member Institutions

31 institutions in 15 countries/regions

(
+ 7 in preparation
)

UZurich

Switzerland

NECTEC

ThaiGrid

Thailand

UoHyd

India

MIMOS

USM

Malaysia

CUHK

HongKong

ASGC

NCHC

Taiwan

HCMUT

IOIT
-
HCM

Vietnam

AIST

OsakaU

UTsukuba

TITech

Japan

BII

IHPC

NGO

NTU

Singapore

MU

Australia

APAC

QUT

Australia

KISTI

Korea

JLU

China

SDSC

USA

CICESE

Mexico

UNAM

Mexico

UCN

Chile

UChile

Chile

UUtah

USA

NCSA

USA

BU

USA

ITCR

Costa Rica

BESTGrid

New Zealand

CNIC

GUCAS

China

LZU

China

UPRM

Puerto Rico

Track 1: Agenda (9AM
-
12PM at PFBH 161)


Tues, July 31: Basic Cluster and Grid
Computing Environment


Wed, Aug 1: Rocks Clusters and
Application Deployment


Thurs, Aug 2: Workflow Management and
MetaScheduling


Fri, Aug 3: Accessing National and
International TeraScale Resources