HPSS HPSS HPSS HPSS HPSS

arghtalentData Management

Jan 31, 2013 (4 years and 6 months ago)

139 views

The Grid

Meeting the LHC

computing challenge

Gavin McCance

University of Glasgow

RSE
6
th

February
2002

RSE 6 February 2002

Gavin McCance

1
/
23

Outline

Scale of the LHC computing challenge

Grid ‘Middleware’


Data Replication

Experimental testbed


RSE 6 February 2002

Gavin McCance

2
/
23

LHC computing
challenge

Typical experiment:


2 MB per event


2.7x10
9

event sample


5.4 PB/year


Up to 9 PB/year Monte Carlo samples


Very large storage and computational
requirements

CERN can handle max of 1/3 of this!

RSE
6
February
2002

Gavin McCance

3
/
23

…computing challenge

Distribute data store and compute
resources


Take advantage of existing local clusters
and local infrastructure


Easier to get funding for local clusters,
particularly cross
-
experiment or cross
-
disciplinary compute resources

RSE
6
February
2002

Gavin McCance

4
/23

Tiered model

Tier
2
Centre

CERN Computer


French Regional
Centre

Italian Regional
Centre

Institute

Institute

Institute

Institute

Tier
2
Centre

Tier2 Centre

Tier
2
Centre

Tier
1

Tier 3

ScotGRID

Tier
2

US Regional Centre

Tier
0

RAL Regional
Centre

Basic reconstructed data

Higher level analysis data

and Monte Carlo

Tag data and Monte Carlo

RSE 6 February 2002

Gavin McCance

5
/
23

UK Grid

Tier2 Centre

Institute

Institute

Institute

Institute

Tier
2
Centre

Tier2 Centre

Tier2 Centre

Tier
3

ScotGRID

Tier
2

RAL Regional
Centre

Higher level analysis data

and Monte Carlo

Tag data and Monte Carlo

Tier 1

Basic reconstructed data

GridPP collaboration

RSE
6
February
2002

Gavin McCance

6
/23

GridPP

RSE
6
February
2002

Gavin McCance

7
/
23

…GridPP

£
17
M three year project

Working in collaboration with the

EU DataGrid project


Middleware production

Integration of middleware technologies
into HEP experiments

Validation of Grid software

RSE
6
February
2002

Gavin McCance

8
/23

Middleware

What is middleware…???


Application programs

gridopen() call

Data access specifics


HPSS, Castor

Job submission specifics


PBS, LSF

Specific security procedures

Grid middleware

Layered API’s.

Transparent security.


Transparent data access.


Intelligent use of

distributed resources.

RSE 6 February 2002

Gavin McCance

9
/
23

Middleware Activities

GridPP ~mirrors EU DataGrid:

Workload Management


What jobs go where?

Data Management (*)


Where’s the (best) data?

Information Services


What’s the state of everything?

RSE 6 February 2002

Gavin McCance

10
/
23

…Middleware Activities

Fabric and Mass Storage Management


Interfaces to underlying systems

Network Monitoring


What’s the bandwidth from here to there?

Security


Crops up everywhere … transparent to
applications

RSE
6
February
2002

Gavin McCance

11
/
23

Data Management

Data Replication

Meta Data Catalogues

Replica Optimisation


Which replica should I use?

RSE
6
February
2002

Gavin McCance

12
/23

Data Replication

Problems if data exist only in one place


No one site can afford to store all data!


Multiple accesses to the same data
overload network! Petabytes!


What if site / network is down?

Make Replica!

But need to keep track of
all the files and their various replica!


Need replica catalogue!

RSE
6
February
2002

Gavin McCance

13
/
23

Replica Catalogues

Distributed catalogue in database:

Have a globally unique Logical File Name
(LFN) mapping to multiple physical instances
of the file (PFNs).


Database must be globally

accessible and secure


Key is to leverage industry

standard technologies

File
-
1

File
-
1

File
-
1

File
-
1

Paris

Glasgow

Chicago

LFN

RSE
6
February
2002

Gavin McCance

14
/
23

Metadata Catalogue






Allows a client to access securely any remote
SQL database on the Grid over HTTP(S)

= SQL Metadata Service

Oracle

MySQL

PostgreSQL

+

PKI Security

+

Standard communication

protocols

(XML/SOAP over HTTPS)

RSE
6
February
2002

Gavin McCance

15
/
23

Distribution

Don’t want a single point of failure or
bottleneck


Must distribute SQL database


Designing scalable architectures

e.g. a RC may exist on each storage site




responsible for its own files


CERN Root RC

CERN RC

UK RC

INFN RC

Queries will propagate down
until replica information is
found…

RSE
6
February
2002

Gavin McCance

16
/
23

Choosing the ‘best’

What does the ‘best’ replica mean?


Nearest? Fastest? Real cost?

For multiple files, the ‘best’ run location
is some minimisation


Network cost


network monitoring


Monetary cost


EU
-
US link

A reasonable decision must be made on
the basis of limited information!

RSE 6 February 2002

Gavin McCance

17
/
23

Economic models

Data files viewed as ‘commodities’ to be
bought and sold by storage sites

The ‘buyer’ is a job requesting a file

The (virtual) ‘cost’ is:



Reverse auction, buy from ‘cheapest’

)
Discount(
)
(
Policy
)
(
,
file
p
network
t
C
B
A
B
A
B
A





RSE
6
February
2002

Gavin McCance

18
/
23

Economic replication

If a storage site believes it can ‘make money’ on a
popular file (based on its observation of access
patterns) it can buy it from another site (replication)






Selfish local optimisation should lead to a reasonable
global optimisation for file distribution


Inherently distributed optimisation.. No distributed
planning overhead!

File 1

1

20

14

15

1

A

B

RSE 6 February 2002

Gavin McCance

19
/
23

Will it work???

real Grid...

…simulated Grid provides
testing arena for these ideas!

Developing simulation tool

RSE 6 February 2002

Gavin McCance

20
/
23

Testbed Software

UK HEP is providing testbed


EU experiments.. CERN LHC


US experiments.. Fermilab / SLAC

First EU DataGrid
software release!

Currently being tested..

RSE 6 February 2002

Gavin McCance

21
/
23

Experiment integration

Taking the kit and trying to integrate it
into the experiments’ software
frameworks

ATLAS/LHCb software

framework (GAUDI)

Grid middleware

GANGA framework

Make Grid Services


transparently

available to

ATLAS and LHCb

programs

RSE
6
February
2002

Gavin McCance

22
/
23

UK and EU Testbed

Some successful
tests so far…


e.g. large file
transfers UK, Italy,
US and CERN

Increasing Monte
Carlo challenges
planned

Currently UK testbed

RSE 6 February 2002

Gavin McCance

23
/
23

…finally

Basic Grid software has been delivered


More developments to come

Integration with experiments and testing


Already successful tests


A excellent base to build on!


Plenty still to do!