CSE 160/Berman
Grid Computing 1
Grid Book, Chapters 1, 2, 3, 22
“Implementing Distributed Synthetic Forces Simulations
in Metacomputing Environments”
Brunett, Davis, Gottschalk, Messina, Kesselman
http://www.globus.org
CSE 160/Berman
Outline
•
What is Grid computing?
•
Grid computing applications
•
Grid computing history
•
Issues in Grid Computing
•
Condor, Globus, Legion
•
The next step
CSE 160/Berman
What is Grid Computing?
•
Computational Grid
is a collection of
distributed, possibly
heterogeneous
resources which can be
used as an ensemble
to
execute large
-
scale
applications
•
Computational Grid also
called “metacomputer”
CSE 160/Berman
Computational Grids
•
Term
computational grid
comes from an analogy
with the
electric power grid
:
–
Electric power is ubiquitous
–
Don’t need to know the source (transformer, generator)
of the power or the power company that serves it
–
Analogy falls down in the area of
performance
•
Ever
-
present search for cycles in HPC. Two foci
of research
–
“In the box” parallel computers
--
PetaFLOPS
architectures
–
Increasing development of infrastructure and
middleware to leverage the performance potential of
distributed Computational Grids
CSE 160/Berman
Grid Applications
•
Distributed Supercomputing
–
Distributed Supercomputing applications
couple multiple computational resources
–
supercomputers and/or workstations
–
Examples include:
•
SFExpress
(large
-
scale modeling of battle
entities with complex interactive behavior
for distributed interactive simulation)
•
Climate Modeling
(high resolution, long time
scales, complex models)
CSE 160/Berman
Distributed Supercomputing
Example
–
SF Express
•
SF Express
= (Synthetic Forces
Express) large scale distributed
simulation of behavior and
movement of entities (tanks,
trucks, airplanes, etc.) for
interactive battle simulation.
•
Entities require information
about
–
State of terrain
–
Location and state of other
entities
•
Info updated several times a
second
•
Interest management
allows
entities to only look at relevant
information, enabling scalability
CSE 160/Berman
SF Express
•
Large scale SF Express run
goals
–
Simulation of 50,000 entities in 8/97, 100,000 entries in
3/98
–
Increase fidelity and resolution of simulation over
previous runs
–
Improve
•
Refresh rate
•
Training environment responsiveness
•
Number of automatic behaviors
–
Ultimately use simulation for real
-
time planning as well as
training
•
Large scale runs extremely resource
-
intensive
CSE 160/Berman
SF Express Programming
Issues
•
How should entities be mapped to
computational resources?
•
Entities receive information based on
“interests”
–
Communication reduced and localized based on
“interest management”
•
Consistency model for entity information
must be developed
–
Which entities can/should be replicated?
–
How should updates be performed?
CSE 160/Berman
SF Express Distributed
Application Architecture
•
D = data server, I = interest management,
R = router, S = simulation node
R
I
D
S
S
R
S
S
S
I
D
S
S
R
S
S
S
I
D
S
S
R
S
S
S
CSE 160/Berman
Site
Hardware
Processors
Entities /
First Run
Entities /
Second
Run
Caltech
HP Exemplar
256
13,095
12,182
ORNL
Intel Paragon
1024
16,695
15,996
NASA Ca
IBM SP2
139
5464
5637
CEWES, Va
IBM SP2
229
9739
9607
Maui
IBM SP2
128
5056
7027
HP/Convex,
Tx
HP Exemplar
128
5348
6733
Total
1904
55,397
57,182
50,000 entity SF Express Run
•
2 large
-
scale simulations run on August 11, 1997
CSE 160/Berman
50,000 entity SF Express Run
•
Simulation decomposed terrain (Saudi Arabia, Kuwait,
Iraq) contiguously among supercomputers
•
Each supercomputer simulated a specific area and
exchanged
interest and state information
with other
supercomputers
•
All data exchanges were flow
-
controlled
•
Supercomputers fully interconnected, dedicated for
experiment
•
Success depended on “moderate to significant system
administration, interventions, competent system support
personnel, and numerous phone calls.”
•
Subsequent
Globus
runs focused on improving data,
control management and operational issues for wide area
CSE 160/Berman
High
-
Throughput
Applications
•
Grid used to schedule large numbers
of independent or loosely coupled
tasks with the goal of putting unused
cycles to work
•
High
-
throughput applications include
RSA keycracking, Seti@home
(detection of extra
-
terrestrial
intelligence), MCell
CSE 160/Berman
High
-
Throughput
Applications
•
Biggest master/slave parallel
program in the world with master =
website, slaves = individual computers
CSE 160/Berman
High
-
Throughput Example
-
MCell
•
MCell
–
Monte Carlo simulation of cellular
microphysiology. Simulation implemented
as large
-
scale parameter sweep.
CSE 160/Berman
MCell
•
MCell architecture:
simulations
performed by
independent
processors with
distinct parameter
sets and shared
input files
CSE 160/Berman
MCell Programming Issues
•
How should we assign tasks to processors
to optimize locality?
•
How can we use partial results during
execution to
steer
the computation?
•
How do we mine all the resulting data
from experiments for results
–
During
execution
–
After execution
•
How can we use all available resources?
CSE 160/Berman
Data
-
Intensive Applications
•
Focus is on synthesizing new
information from large amounts of
physically distributed data
•
Examples include NILE (distributed
system for high energy physics
experiments using data from CLEO),
SAR/SRB applications (Grid version
of MS Terraserver), digital library
applications
Data
-
Intensive Example
-
SARA
•
SARA
= Synthetic Aperture
Radar Atlas
–
application developed at JPL
and SDSC
•
Goal:
Assemble/process files
for user’s desired image
–
Radar organized into tracks
–
User selects track of interest
and properties to be highlighted
–
Raw data is filtered and
converted to an image format
–
Image displayed in web browser
SARA Application Architecture
•
Application structure focused around optimizing
the delivery and processing of distributed data
Computation servers
and data servers are
logical entities, not
necessarily different
nodes
. . .
Compute
Servers
Data
Servers
Client
SARA Programming Issues
•
Which data server should replicated data be accessed from?
•
Should computation be done at the data server or data moved
to a compute server or something in between?
•
How big are the data files and how often will they be
accessed?
OGI
UTK
UCSD
AppLeS/NWS
CSE 160/Berman
TeleImmersion
•
Focus is on use of immersive virtual reality
systems over a network
–
Combines generators, data sets and simulations
remote from user’s display environment
–
Often used to support collaboration
•
Examples include
–
Interactive scientific visualization (“being
there with the data”), industrial design, art and
entertainment
CSE 160/Berman
Teleimmersion Example
–
Combustion System Modeling
•
A shared collaborative
space
–
Link people at multiple
locations
–
Share and steer scientific
simulations on
supercomputer
•
Combustion code
developed by Lori Freitag
at ANL
•
Boiler application used to
troubleshoot and design
better products
Chicago
San Diego
CSE 160/Berman
Early Experiences with Grid
Computing
•
Gigabit Testbeds Program
–
Late 80’s, early 90’s, gigabit testbed program
was developed as joint NSF, DARPA, CNRI
(Corporation for Networking Research, Bob
Kahn) initiative
–
Goals
were to
•
investigate potential architecture for a gigabit/sec
network testbed
•
explore usefulness for end
-
users
CSE 160/Berman
Gigabit Testbeds
–
Early 90’s
•
6 testbeds formed:
–
CASA
(southwest)
–
MAGIC
(midwest)
–
BLANCA
(midwest)
–
AURORA
(northeast)
–
NECTAR
(northeast)
–
VISTANET
(southeast)
•
Each had a unique blend of research in
applications and in networking and
computer science research
CSE 160/Berman
Gigabit Testbeds
Testbed
Site
Hardware
Application
Focus
Remarks
Blanca
NCSA,
UIUC,
UCB,
UWisc,
AT&T
Experimental
ATM
switches running
over experimental
622 Mb/s and 45
Mb/s circuits
developed by
AT&T and
universities
Virtual environments,
Remote visualization
and steering,
multimedia digital
libraries
Network spanned US (UCB to
AT&T). Network research
included distributed virtual
memory, real
-
time protocols,
congestion control, signaling
protocols etc.
Vistanet
MCNC,
UNC,
BellSouth
ATM network at
OC
-
12;
(622 Mb/s)
interconnecting
HIPPI local area
networks
Radiation treatment
planning applications
involving
supercomputer, remote
instrument (radiation
beam) and visualization
Medical personnel planned
radiation beam orientation using a
supercomputer. Extended the
planning process from 2 beams in
2 dimensions to multiple beams in
3 dimensions.
Nectar
CMU,
Bell
Atlantic,
Bellcore,
PSC
OC
-
48
(2.4 Gb/s)
links between PSC
supercomputer
facility and CMU
Coupled
supercomputers running
chemical reaction
dynamics and CS
research
Metropolitan area testbed with
OC
-
48 links between PSC and
downtown CMU campus.
CSE 160/Berman
Gigabit Testbeds
Testbed
Site
Hardware
Application
Focus
Remarks
Aurora
MIT, IBM,
Bellcore, Penn,
MCI
OC
-
12
network
interconnecting 4
research sites and
supporting the
development of ATM
host interfaces, ATM
switches and network
protocols.
Telerobotics,
distributed virtual
memory and
operating system
research
East coast sites. Research
focused mostly on network
and computer science
issues.
Magic
Army Battle
Lab, Sprint,
UKansas,
UMinn, LBL,
Army HPC Lab
OC
-
12
network to
interconnect ATM
-
attached hosts
Remote
vehicle
control applications
and high
-
speed
access to databases
for terrain
visualization and
battle simulation
Funded separately by
DARPA after CNRI
initiative had begun.
Casa
Caltech, SDSC,
LANL, JPL,
MCI, USWest,
PacBell
HippI
switches
connected
by HIPPI
-
over
-
SONET
at OC
-
12
Distributed
Supercomputing
Targeted improving the
performance of distributed
supercomputing
applications by strategically
mapping application
components on resources.
CSE 160/Berman
I
-
Way
•
First large
-
scale
“modern” Grid
experiment
•
Put together for
SC’95 (the
“Supercomputing”
Conference)
•
I
-
Way consisted of a
Grid of 17 sites
connected by vBNS
•
Over 60 applications
ran on the I
-
WAY
during SC’95
CSE 160/Berman
I
-
Way “Architecture”
•
Each I
-
WAY site served by an
I
-
POP
(I
-
WAY Point of Presence) used for
–
authentication of distributed applications
–
distribution of associated libraries and other
software
–
monitoring the connectivity of the I
-
WAY
virtual network
•
Users could use single authentication and
job submission across multiple sites or they
could work directly with end
-
users
•
Scheduling done with a “human
-
in
-
the
-
loop”
CSE 160/Berman
I
-
Soft
–
Software for I
-
Way
•
Kerberos based authentication
–
I
-
POP initiated rsh to local resources
•
AFS for distribution of software and state
•
Central scheduler
–
Dedicated I
-
WAY nodes on resource
–
Interface to local scheduler
•
Nexus based communication libraries
–
MPI, CaveComm, CC++
•
In many ways, I
-
Way experience formed
foundation of Globus
CSE 160/Berman
I
-
Way Application: Cloud Detection
•
Cloud detection from multimodal
satellite data
–
Want to determine if satellite
image is clear, partially cloudy or
completely cloudy
•
Used remote supercomputer to
enhance instruments with
–
Real
-
time response
–
Enhanced function, accuracy (of
pixel image)
•
Developed by C. Lee, Aerospace
Corporation, Kesselman, Caltech
et al.
SPRINT
CSE 160/Berman
PACIs
•
2 NSF Supercomputer Centers
(PACIs)
–
SDSC/NPACI and
NCSA/Alliance, both committed to
Grid computing
•
vBNS backbone between NCSA and
SDSC running at OC
-
12 with
connectivity to over 100 locations at
speeds ranging from 45 Mb/s to 155
Mb/s or more
CSE 160/Berman
PACI Grid
CSE 160/Berman
NPACI Grid Activities
•
Metasystems
Thrust Area one of the
NPACI technology thrust areas
–
Goal is to create an operational metasystems
for NPACI
•
Metasystems players:
–
Globus (Kesselman)
–
Legion (Grimshaw)
–
AppLeS (Berman and Wolski)
–
Network Weather Service (Wolski)
CSE 160/Berman
Alliance Grid Activities
•
Grid Task Force
and
Distributed Computing
team
are Alliance teams
•
Globus supported as exclusive grid
infrastructure by Alliance
•
Grid concept pervasive throughout Alliance
–
Access Grid
developed for use by distributed
collaborative groups
•
Allliance grid players include Foster
(Globus), Livny (Condor), Stevens (ANL),
Reed (Pablo), etc.
CSE 160/Berman
Other Efforts
•
Centurion Cluster
= Legion testbed
–
Legion cluster housed at UVA
–
128 533 MHz Dec Alphas
–
128 Dual 400 MHz Pentium2
–
Fast ethernet and myrinet
•
Globus testbed =
GUSTO
which
supports Globus infrastructure and
application development
–
125 sites in 23 countries as of 2/2000
–
Testbed aggregated from partner sites
(including NPACI)
CSE 160/Berman
GUSTO (Globus) Computational
Grid
CSE 160/Berman
IPG
•
IPG
=
Information
Power Grid
•
NASA effort in grid
computing
•
Globus supported as
underlying
infrastructure
•
Application focus
include aerospace
design, environmental
and space applications
CSE 160/Berman
Research and Development
Foci for the Grid
•
Applications
–
Questions revolve around
design and development of
“Grid
-
aware” applications
–
Different programming models:
polyalgorithms, components,
mixed languages, etc.
–
Program development
environment and tools required
for development and execution
of performance
-
efficient
applications
Resources
Infrastructure
Middleware
Applications
CSE 160/Berman
Research and Development
Foci for the Grid
•
Middleware
–
Questions revolve around the
development of tools and
environments which facilitate
application performance
–
Software must be able to assess
and utilize dynamic performance
characteristics of resources to
support application
–
Agent
-
based computing and
resource negotiation
Resources
Infrastructure
Middleware
Applications
CSE 160/Berman
Research and Development
Foci for the Grid
•
Infrastructure
–
Development of infrastructure
that presents a “virtual machine”
view of the Grid to users
–
Questions revolve around
providing basic services to user:
security, remote file transfer,
resource management, etc., as well
as exposing performance
characteristics.
–
Services must be supported by
heterogeneous and interoperate
Resources
Infrastructure
Middleware
Applications
CSE 160/Berman
Research and Development
Foci for the Grid
•
Resources
–
Questions revolve around
heterogeneity and scale.
–
New challenges focus on combining
wireless and wired, static and
dynamic, low
-
power and high
-
power, cheap and expensive
resources
–
Performance characteristics of
grid resources vary dramatically,
integrating them to support
performance of individual and
multiple applciations extremely
challenging
Resources
Infrastructure
Middleware
Applications
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment