PPT

fortnecessityusefulSoftware and s/w Development

Dec 14, 2013 (3 years and 9 months ago)

93 views

The Grid:

Globus and the Open Grid Services
Architecture

Dr. Carl Kesselman

Director

Center for Grid Technologies

Information Sciences Institute

University of Southern California

Outline


Why Grids


Grid Technology


Applications of Grids in Physics


Summary

Grid Computing

How do we solve problems?


Communities committed to common goals

-
Virtual organizations


Teams with heterogeneous members &
capabilities


Distributed geographically and politically

-
No location/organization possesses all required skills
and resources


Adapt as a function of the situation

-
Adjust membership, reallocate responsibilities,
renegotiate resources

The Grid Vision



Resource sharing & coordinated problem
solving in dynamic, multi
-
institutional virtual
organizations”

-
On
-
demand, ubiquitous access to computing, data,
and services

-
New capabilities constructed dynamically and
transparently from distributed services


“When the network is as fast as the computer's


internal links, the machine disintegrates across


the net into a set of special purpose appliances



(George Gilder)



The Grid Opportunity:

eScience and eBusiness


Physicists worldwide pool resources for peta
-
op
analyses of petabytes of data


Civil engineers collaborate to design, execute, &
analyze shake table experiments


An insurance company mines data from partner
hospitals for fraud detection


An application service provider offloads excess
load to a compute cycle provider


An enterprise configures internal & external
resources to support eBusiness workload


Grid Communities & Applications:

Data Grids for High Energy Physics

Tier2 Centre
~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPS

France Regional
Centre


Italy Regional
Centre

Germany Regional
Centre

Institute

Institute

Institute

Institute
~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more
channels; data for these channels should be cached by the
institute server

Physics data cache

~PBytes/sec


~622 Mbits/sec
or Air Freight (deprecated)

Tier2 Centre
~1 TIPS

Tier2 Centre
~1 TIPS

Tier2 Centre
~1 TIPS

Caltech
~1 TIPS

~622 Mbits/sec

Tier 0

Tier 1

Tier 2

Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

www.griphyn.org www.ppdg.net www.eu
-
datagrid.org

Grid Communities and Applications:

Network for Earthquake Eng. Simulation


NEESgrid: US national
infrastructure to couple
earthquake engineers with
experimental facilities,
databases, computers, &
each other


On
-
demand access to
experiments, data streams,
computing, archives,
collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC www.neesgrid.org

Living in an Exponential World

(1) Computing & Sensors

Moore’s Law: transistor count doubles each 18 months



Magnetohydro
-

dynamics

star formation

Living in an Exponential World:

(2) Storage


Storage density doubles every 12 months


Dramatic growth in online data (1 petabyte =
1000 terabyte = 1,000,000 gigabyte)

-
2000

~0.5 petabyte

-
2005

~10 petabytes

-
2010

~100 petabytes

-
2015

~1000 petabytes?


Transforming entire disciplines in physical and,
increasingly, biological sciences; humanities
next?

An Exponential World: (3) Networks

(Or, Coefficients Matter …)


Network vs. computer performance

-
Computer speed doubles every 18 months

-
Network speed doubles every 9 months

-
Difference = order of magnitude per 5 years


1986 to 2000

-
Computers: x 500

-
Networks: x 340,000


2001 to 2010

-
Computers: x 60

-
Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements.

Graph from
Scientific American

(Jan
-
2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

Requirements Include …


Dynamic formation and management of virtual
organizations


Online negotiation of access to services: who,
what, why, when, how


Establishment of applications and systems
able to deliver multiple qualities of service


Autonomic management of infrastructure
elements


Open, extensible, evolvable infrastructure

The Grid World: Current Status


Dozens of major Grid projects in scientific &
technical computing/research & education


Considerable consensus on key concepts and
technologies

-
Open source Globus Toolkit™ a de facto standard for
major protocols & services

-
Far from complete or perfect, but out there, evolving
rapidly, and large tool/user base


Industrial interest emerging rapidly


Opportunity: convergence of eScience and
eBusiness requirements & technologies

Globus Toolkit


Globus Toolkit is the source of many of the
protocols described in “Grid architecture”


Adopted by almost all major Grid projects
worldwide as a source of infrastructure


Open source, open architecture framework
encourages community development


Active R&D program continues to move
technology forward


Developers at ANL, USC/ISI, NCSA, LBNL, and
other institutions

www.globus.org

User

User

process #1

Proxy

Authenticate &
create proxy
credential

GSI

(Grid
Security


Infrastruc
-
ture)

Gatekeeper

(factory)

Reliable
remote
invocation

GRAM

(Grid Resource Allocation & Management)

Reporter

(registry +

discovery)

User

process #2

Proxy #2

Create process

Register

The Globus Toolkit in One Slide


Grid protocols (
GSI
,
GRAM
, …) enable resource
sharing within virtual orgs; toolkit provides reference
implementation ( = Globus Toolkit services)


Protocols (and APIs) enable other tools and services
for membership, discovery, data mgmt, workflow, …

Other service

(e.g. GridFTP)

Other GSI
-
authenticated
remote service
requests

GIIS: Grid

Information

Index Server


(discovery)

MDS
-
2

(Meta Directory Service)

Soft state

registration;
enquiry

Globus Toolkit: Evaluation (+)


Good technical solutions for key problems,
e.g.


Authentication and authorization


Resource discovery and monitoring


Reliable remote service invocation


High
-
performance remote data access


This & good engineering is enabling progress


Good quality reference implementation, multi
-
language support, interfaces to many systems, large
user base, industrial support


Growing community code base built on tools

Globus Toolkit: Evaluation (
-
)


Protocol deficiencies, e.g.


Heterogeneous basis: HTTP, LDAP, FTP


No standard means of invocation, notification,
error propagation, authorization, termination, …


Significant missing functionality, e.g.


Databases, sensors, instruments, workflow, …


Virtualization of end systems (hosting envs.)


Little work on total system properties, e.g.


Dependability, end
-
to
-
end QoS, …


Reasoning about system properties

“Web Services”


Increasingly popular standards
-
based framework
for accessing network applications

-
W3C standardization; Microsoft, IBM, Sun, others


WSDL: Web Services Description Language

-
Interface Definition Language for Web services


SOAP: Simple Object Access Protocol

-
XML
-
based RPC protocol; common WSDL target


WS
-
Inspection

-
Conventions for locating service descriptions


UDDI: Universal Desc., Discovery, & Integration

-
Directory for Web services


Web Services Example:

Database Service


WSDL definition for “DBaccess” porttype
defines operations and bindings, e.g.:

-
Query(QueryLanguage, Query, Result)

-
SOAP protocol






Client C, Java, Python, etc., APIs can then be
generated


DBaccess

Transient Service Instances


“Web services” address discovery & invocation
of persistent services

-
Interface to persistent state of entire enterprise


In Grids, must also support transient service
instances, created/destroyed dynamically

-
Interfaces to the states of distributed activities

-
E.g. workflow, video conf., dist. data analysis


Significant implications for how services are
managed, named, discovered, and used

-
In fact, much of our work is concerned with the
management of service instances

OGSA Design Principles


Service orientation to virtualize resources

-
Everything is a service


From Web services

-
Standard interface definition mechanisms: multiple
protocol bindings, local/remote transparency


From Grids

-
Service semantics, reliability and security models

-
Lifecycle management, discovery, other services


Multiple “hosting environments”

-
C, J2EE, .NET, …

OGSA Service Model


System comprises (a typically few) persistent
services & (potentially many) transient
services

-
Everything is a service


OGSA defines basic behaviors of services:
fundamental semantics, life
-
cycle, etc.

-
More than defining WSDL wrappers

Open Grid Services Architecture:

Fundamental Structure


WSDL conventions and extensions for
describing and structuring services

-
Useful independent of “Grid” computing


Standard WSDL interfaces & behaviors for
core service activities

-
portTypes and operations => protocols


The Grid Service =

Interfaces + Service Data

Service

data

element

Service

data

element

Service

data

element

GridService

… other interfaces …

Implementation

Service data access

Explicit destruction

Soft
-
state lifetime

Notification

Authorization

Service creation

Service registry

Manageability

Concurrency

Reliable invocation

Authentication

Hosting environment/runtime

(“C”, J2EE, .NET, …)

The GriPhyN Project


Amplify science productivity through the Grid

-
Provide powerful abstractions for scientists:

datasets and transformations, not files and programs

-
Using a grid is harder than using a workstation. GriPhyN
seeks to reverse this situation!


These goals challenge the boundaries of computer
science in knowledge representation and distributed
computing.


Apply these advances to major experiments

-
Not just developing solutions, but
proving

them through
deployment

GriPhyN Approach


Virtual Data

-
Tracking the derivation of experiment data with high
fidelity

-
Transparency with respect to location

and materialization


Automated grid request planning

-
Advanced, policy driven scheduling


Achieve this at peta
-
scale magnitude


We present here a vision that is still 3 years away, but
the foundation is starting to come together

Virtual Data


Track all data assets


Accurately record how they were derived


Encapsulate the transformations that produce
new data objects


Interact with the grid in terms of requests for
data derivations

GriPhyN/PPDG

Data Grid Architecture

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable Transfer

Service

Compute Resource

Storage Resource

DAG (concrete)

DAG (abstract)

DAGMAN, Kangaroo

GRAM

GridFTP; GRAM; SRM

GSI, CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

NCSA Linux cluster

5) Secondary
reports complete
to master

Master Condor
job running at
Caltech

7) GridFTP fetches
data from UniTree

NCSA UniTree
-

GridFTP
-
enabled FTP
server

4) 100 data files
transferred via
GridFTP, ~ 1 GB
each

Secondary
Condor job on WI
pool

3) 100 Monte
Carlo jobs on
Wisconsin Condor
pool

2) Launch secondary job on WI pool;
input files via Globus GASS

Caltech
workstation

6) Master starts
reconstruction jobs
via Globus
jobmanager on
cluster

8) Processed
objectivity
database stored
to UniTree

9) Reconstruction
job reports
complete to master

GriPhyN Challenge Problem:

CMS Event Reconstruction

Work of: Scott Koranda, Miron Livny, Vladimir Litvin, & others

GriPhyN
-
LIGO SC2001 Demo

Work of: Ewa Deelman, Gaurang Mehta, Scott Koranda, & others

iVDGL: A Global Grid Laboratory


International Virtual
-
Data Grid Laboratory

-
A global Grid laboratory (US, Europe, Asia, South America, …)

-
A place to conduct Data Grid tests “at scale”

-
A mechanism to create common Grid infrastructure

-
A laboratory for other disciplines to perform Data Grid tests

-
A focus of outreach efforts to small institutions


U.S. part funded by NSF (2001
-
2006)

-
$13.7M (NSF) + $2M (matching)

“We propose to create, operate and evaluate, over a

sustained period of time, an international research

laboratory for data
-
intensive science.”


From NSF proposal, 2001

iVDGL Components


Computing resources

-
2 Tier1 laboratory sites (funded elsewhere)

-
7 Tier2 university sites



software integration

-
3 Tier3 university sites



outreach effort


Networks

-
USA (TeraGrid, Internet2, ESNET), Europe (Géant, …)

-
Transatlantic (DataTAG), Transpacific, AMPATH?, …


Grid Operations Center (GOC)

-
Joint work with TeraGrid on GOC development


Computer Science support teams

-
Support, test, upgrade GriPhyN Virtual Data Toolkit


Education and Outreach


Coordination, management

iVDGL Components (cont.)


High level of coordination with DataTAG

-
Transatlantic research network (2.5 Gb/s) connecting
EU & US


Current partners

-
TeraGrid, EU DataGrid, EU projects, Japan, Australia


Experiments/labs requesting participation

-
ALICE, CMS
-
HI, D0, BaBar, BTEV, PDC (Sweden)

-
U Florida

CMS

-
Caltech


CMS, LIGO

-
UC San Diego

CMS, CS

-
Indiana U

ATLAS,
GOC

-
Boston U

ATLAS

-
U Wisconsin, Milwaukee

LIGO

-
Penn State

LIGO

-
Johns Hopkins

SDSS, NVO

-
U Chicago/Argonne

CS

-
U Southern California

CS

-
U Wisconsin, Madison

CS

-
Salish Kootenai

Outreach, LIGO

-
Hampton U

Outreach, ATLAS

-
U Texas, Brownsville

Outreach, LIGO

-
Fermilab

CMS, SDSS, NVO

-
Brookhaven

ATLAS

-
Argonne Lab

ATLAS, CS

Initial US iVDGL Participants

Tier2 / Software

CS support

Tier3 / Outreach

Tier1 / Labs

(funded elsewhere)

Summary


Technology exponentials are changing the
shape of scientific investigation & knowledge

-
More computing, even more data, yet more
networking


The Grid: Resource sharing & coordinated
problem solving in dynamic, multi
-
institutional
virtual organizations


Current Grid Technology

Partial Acknowledgements


Open Grid Services Architecture design

-
Karl Czajkowski @ USC/ISI

-
Ian Foster, Steve Tuecke @ANL

-
Jeff Nick, Steve Graham, Jeff Frey @ IBM


Globus Toolkit R&D also involves many fine
scientists & engineers at ANL, USC/ISI, and
elsewhere (see www.globus.org)


Strong links with many EU, UK, US Grid
projects


Support from DOE, NASA, NSF, Microsoft

For More Information


Grid Book

-
www.mkp.com/grids


The Globus Project™

-
www.globus.org


OGSA

-
www.globus.org/ogsa


Global Grid Forum

-
www.gridforum.org