Grid Computing 1

brainystitchΤεχνίτη Νοημοσύνη και Ρομποτική

14 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

72 εμφανίσεις

CSE 160/Berman

Grid Computing 1

Grid Book, Chapters 1, 2, 3, 22


“Implementing Distributed Synthetic Forces Simulations
in Metacomputing Environments”

Brunett, Davis, Gottschalk, Messina, Kesselman

http://www.globus.org

CSE 160/Berman

Outline


What is Grid computing?


Grid computing applications


Grid computing history


Issues in Grid Computing


Condor, Globus, Legion


The next step

CSE 160/Berman

What is Grid Computing?


Computational Grid


is a collection of
distributed, possibly
heterogeneous
resources which can be
used as an ensemble

to
execute large
-
scale
applications


Computational Grid also
called “metacomputer”

CSE 160/Berman

Computational Grids


Term
computational grid

comes from an analogy
with the
electric power grid
:


Electric power is ubiquitous


Don’t need to know the source (transformer, generator)
of the power or the power company that serves it


Analogy falls down in the area of
performance



Ever
-
present search for cycles in HPC. Two foci
of research



“In the box” parallel computers
--

PetaFLOPS
architectures


Increasing development of infrastructure and
middleware to leverage the performance potential of
distributed Computational Grids

CSE 160/Berman

Grid Applications



Distributed Supercomputing


Distributed Supercomputing applications
couple multiple computational resources


supercomputers and/or workstations



Examples include:


SFExpress

(large
-
scale modeling of battle
entities with complex interactive behavior
for distributed interactive simulation)


Climate Modeling

(high resolution, long time
scales, complex models)

CSE 160/Berman

Distributed Supercomputing
Example


SF Express


SF Express

= (Synthetic Forces
Express) large scale distributed
simulation of behavior and
movement of entities (tanks,
trucks, airplanes, etc.) for
interactive battle simulation.


Entities require information
about


State of terrain


Location and state of other
entities


Info updated several times a
second


Interest management

allows
entities to only look at relevant
information, enabling scalability


CSE 160/Berman

SF Express


Large scale SF Express run
goals


Simulation of 50,000 entities in 8/97, 100,000 entries in
3/98


Increase fidelity and resolution of simulation over
previous runs


Improve


Refresh rate


Training environment responsiveness


Number of automatic behaviors


Ultimately use simulation for real
-
time planning as well as
training


Large scale runs extremely resource
-
intensive

CSE 160/Berman

SF Express Programming
Issues


How should entities be mapped to
computational resources?


Entities receive information based on
“interests”


Communication reduced and localized based on
“interest management”


Consistency model for entity information
must be developed


Which entities can/should be replicated?


How should updates be performed?


CSE 160/Berman

SF Express Distributed
Application Architecture


D = data server, I = interest management,
R = router, S = simulation node

R

I

D

S

S

R

S

S

S

I

D

S

S

R

S

S

S

I

D

S

S

R

S

S

S

CSE 160/Berman

Site

Hardware

Processors

Entities /
First Run

Entities /
Second
Run

Caltech

HP Exemplar

256

13,095

12,182

ORNL

Intel Paragon

1024

16,695

15,996

NASA Ca

IBM SP2

139

5464

5637

CEWES, Va

IBM SP2

229

9739

9607

Maui

IBM SP2

128

5056

7027

HP/Convex,
Tx


HP Exemplar

128

5348

6733

Total

1904

55,397

57,182

50,000 entity SF Express Run


2 large
-
scale simulations run on August 11, 1997

CSE 160/Berman

50,000 entity SF Express Run


Simulation decomposed terrain (Saudi Arabia, Kuwait,
Iraq) contiguously among supercomputers


Each supercomputer simulated a specific area and
exchanged
interest and state information

with other
supercomputers


All data exchanges were flow
-
controlled


Supercomputers fully interconnected, dedicated for
experiment


Success depended on “moderate to significant system
administration, interventions, competent system support
personnel, and numerous phone calls.”


Subsequent
Globus
runs focused on improving data,
control management and operational issues for wide area

CSE 160/Berman

High
-
Throughput
Applications



Grid used to schedule large numbers
of independent or loosely coupled
tasks with the goal of putting unused
cycles to work



High
-
throughput applications include
RSA keycracking, Seti@home
(detection of extra
-
terrestrial
intelligence), MCell

CSE 160/Berman

High
-
Throughput
Applications



Biggest master/slave parallel
program in the world with master =
website, slaves = individual computers

CSE 160/Berman

High
-
Throughput Example
-

MCell



MCell



Monte Carlo simulation of cellular
microphysiology. Simulation implemented
as large
-
scale parameter sweep.

CSE 160/Berman


MCell



MCell architecture:
simulations
performed by
independent
processors with
distinct parameter
sets and shared
input files

CSE 160/Berman


MCell Programming Issues



How should we assign tasks to processors
to optimize locality?


How can we use partial results during
execution to

steer
the computation?


How do we mine all the resulting data
from experiments for results


During

execution


After execution


How can we use all available resources?

CSE 160/Berman

Data
-
Intensive Applications


Focus is on synthesizing new
information from large amounts of
physically distributed data


Examples include NILE (distributed
system for high energy physics
experiments using data from CLEO),
SAR/SRB applications (Grid version
of MS Terraserver), digital library
applications

Data
-
Intensive Example
-

SARA


SARA

= Synthetic Aperture

Radar Atlas


application developed at JPL

and SDSC


Goal:

Assemble/process files

for user’s desired image


Radar organized into tracks


User selects track of interest

and properties to be highlighted


Raw data is filtered and

converted to an image format


Image displayed in web browser


SARA Application Architecture


Application structure focused around optimizing
the delivery and processing of distributed data

Computation servers

and data servers are

logical entities, not

necessarily different

nodes

. . .

Compute

Servers

Data

Servers

Client

SARA Programming Issues


Which data server should replicated data be accessed from?


Should computation be done at the data server or data moved
to a compute server or something in between?


How big are the data files and how often will they be
accessed?




OGI

UTK

UCSD

AppLeS/NWS

CSE 160/Berman

TeleImmersion


Focus is on use of immersive virtual reality
systems over a network


Combines generators, data sets and simulations
remote from user’s display environment


Often used to support collaboration


Examples include


Interactive scientific visualization (“being
there with the data”), industrial design, art and
entertainment


CSE 160/Berman

Teleimmersion Example


Combustion System Modeling


A shared collaborative
space


Link people at multiple
locations


Share and steer scientific
simulations on
supercomputer


Combustion code
developed by Lori Freitag
at ANL


Boiler application used to
troubleshoot and design
better products

Chicago

San Diego

CSE 160/Berman

Early Experiences with Grid
Computing


Gigabit Testbeds Program



Late 80’s, early 90’s, gigabit testbed program
was developed as joint NSF, DARPA, CNRI
(Corporation for Networking Research, Bob
Kahn) initiative



Goals
were to


investigate potential architecture for a gigabit/sec
network testbed



explore usefulness for end
-
users

CSE 160/Berman

Gigabit Testbeds

Early 90’s


6 testbeds formed:


CASA

(southwest)


MAGIC

(midwest)


BLANCA

(midwest)


AURORA
(northeast)


NECTAR

(northeast)


VISTANET

(southeast)


Each had a unique blend of research in
applications and in networking and
computer science research

CSE 160/Berman

Gigabit Testbeds

Testbed

Site

Hardware

Application
Focus

Remarks

Blanca

NCSA,
UIUC,
UCB,
UWisc,
AT&T

Experimental
ATM
switches running
over experimental
622 Mb/s and 45
Mb/s circuits
developed by
AT&T and
universities


Virtual environments,
Remote visualization
and steering,
multimedia digital
libraries



Network spanned US (UCB to
AT&T). Network research
included distributed virtual
memory, real
-
time protocols,
congestion control, signaling
protocols etc.

Vistanet

MCNC,
UNC,
BellSouth

ATM network at
OC
-
12;
(622 Mb/s)
interconnecting
HIPPI local area
networks


Radiation treatment
planning applications
involving
supercomputer, remote
instrument (radiation
beam) and visualization

Medical personnel planned
radiation beam orientation using a
supercomputer. Extended the
planning process from 2 beams in
2 dimensions to multiple beams in
3 dimensions.

Nectar

CMU,
Bell
Atlantic,
Bellcore,
PSC

OC
-
48
(2.4 Gb/s)
links between PSC
supercomputer
facility and CMU

Coupled
supercomputers running
chemical reaction
dynamics and CS
research


Metropolitan area testbed with
OC
-
48 links between PSC and
downtown CMU campus.


CSE 160/Berman

Gigabit Testbeds

Testbed

Site

Hardware

Application
Focus

Remarks

Aurora

MIT, IBM,
Bellcore, Penn,
MCI

OC
-
12
network
interconnecting 4
research sites and
supporting the
development of ATM
host interfaces, ATM
switches and network
protocols.

Telerobotics,
distributed virtual
memory and
operating system
research

East coast sites. Research
focused mostly on network
and computer science
issues.

Magic

Army Battle
Lab, Sprint,
UKansas,
UMinn, LBL,
Army HPC Lab

OC
-
12
network to
interconnect ATM
-
attached hosts


Remote
vehicle
control applications
and high
-
speed
access to databases
for terrain
visualization and
battle simulation

Funded separately by
DARPA after CNRI
initiative had begun.

Casa

Caltech, SDSC,
LANL, JPL,
MCI, USWest,
PacBell

HippI
switches
connected

by HIPPI
-
over
-
SONET

at OC
-
12


Distributed
Supercomputing


Targeted improving the
performance of distributed
supercomputing
applications by strategically
mapping application
components on resources.

CSE 160/Berman

I
-
Way


First large
-
scale
“modern” Grid
experiment


Put together for
SC’95 (the
“Supercomputing”
Conference)



I
-
Way consisted of a
Grid of 17 sites
connected by vBNS


Over 60 applications
ran on the I
-
WAY
during SC’95

CSE 160/Berman

I
-
Way “Architecture”


Each I
-
WAY site served by an
I
-
POP

(I
-
WAY Point of Presence) used for


authentication of distributed applications


distribution of associated libraries and other
software


monitoring the connectivity of the I
-
WAY
virtual network


Users could use single authentication and
job submission across multiple sites or they
could work directly with end
-
users



Scheduling done with a “human
-
in
-
the
-
loop”


CSE 160/Berman

I
-
Soft



Software for I
-
Way


Kerberos based authentication


I
-
POP initiated rsh to local resources


AFS for distribution of software and state


Central scheduler


Dedicated I
-
WAY nodes on resource


Interface to local scheduler


Nexus based communication libraries


MPI, CaveComm, CC++


In many ways, I
-
Way experience formed
foundation of Globus



CSE 160/Berman

I
-
Way Application: Cloud Detection


Cloud detection from multimodal
satellite data


Want to determine if satellite
image is clear, partially cloudy or
completely cloudy


Used remote supercomputer to
enhance instruments with


Real
-
time response


Enhanced function, accuracy (of
pixel image)


Developed by C. Lee, Aerospace
Corporation, Kesselman, Caltech
et al.

SPRINT

CSE 160/Berman

PACIs


2 NSF Supercomputer Centers
(PACIs)


SDSC/NPACI and
NCSA/Alliance, both committed to
Grid computing


vBNS backbone between NCSA and
SDSC running at OC
-
12 with
connectivity to over 100 locations at
speeds ranging from 45 Mb/s to 155
Mb/s or more

CSE 160/Berman

PACI Grid

CSE 160/Berman

NPACI Grid Activities


Metasystems

Thrust Area one of the
NPACI technology thrust areas


Goal is to create an operational metasystems
for NPACI


Metasystems players:


Globus (Kesselman)


Legion (Grimshaw)


AppLeS (Berman and Wolski)


Network Weather Service (Wolski)

CSE 160/Berman

Alliance Grid Activities


Grid Task Force
and

Distributed Computing
team

are Alliance teams


Globus supported as exclusive grid
infrastructure by Alliance


Grid concept pervasive throughout Alliance



Access Grid

developed for use by distributed
collaborative groups


Allliance grid players include Foster
(Globus), Livny (Condor), Stevens (ANL),
Reed (Pablo), etc.

CSE 160/Berman


Other Efforts



Centurion Cluster

= Legion testbed


Legion cluster housed at UVA


128 533 MHz Dec Alphas


128 Dual 400 MHz Pentium2


Fast ethernet and myrinet


Globus testbed =
GUSTO

which
supports Globus infrastructure and
application development


125 sites in 23 countries as of 2/2000


Testbed aggregated from partner sites
(including NPACI)


CSE 160/Berman

GUSTO (Globus) Computational
Grid

CSE 160/Berman

IPG



IPG

=
Information
Power Grid


NASA effort in grid
computing


Globus supported as
underlying
infrastructure


Application focus
include aerospace
design, environmental
and space applications

CSE 160/Berman


Research and Development
Foci for the Grid


Applications


Questions revolve around
design and development of
“Grid
-
aware” applications


Different programming models:
polyalgorithms, components,
mixed languages, etc.


Program development
environment and tools required
for development and execution
of performance
-
efficient
applications

Resources

Infrastructure

Middleware

Applications

CSE 160/Berman


Research and Development
Foci for the Grid


Middleware


Questions revolve around the
development of tools and
environments which facilitate
application performance


Software must be able to assess
and utilize dynamic performance
characteristics of resources to
support application


Agent
-
based computing and
resource negotiation

Resources

Infrastructure

Middleware

Applications

CSE 160/Berman


Research and Development
Foci for the Grid


Infrastructure


Development of infrastructure
that presents a “virtual machine”
view of the Grid to users


Questions revolve around
providing basic services to user:
security, remote file transfer,
resource management, etc., as well
as exposing performance
characteristics.


Services must be supported by
heterogeneous and interoperate

Resources

Infrastructure

Middleware

Applications

CSE 160/Berman


Research and Development
Foci for the Grid


Resources


Questions revolve around
heterogeneity and scale.


New challenges focus on combining
wireless and wired, static and
dynamic, low
-
power and high
-
power, cheap and expensive
resources


Performance characteristics of
grid resources vary dramatically,
integrating them to support
performance of individual and
multiple applciations extremely
challenging

Resources

Infrastructure

Middleware

Applications