DATAGRID - The DataGrid Project - CERN

solidseniorServers

Dec 9, 2013 (3 years and 10 months ago)

100 views



IST-2000-25182
PUBLIC 1 / 41


DATAGRI D

AR C H I T E C T U R E A N D D E S I GN
WP5 - M A S S ST O R A G E M A N A G E M E N T



WP05: Mass Storage Management



Document
identifier:
DataGrid-05-D5.2-0141-3-4
Date:
14/02/2002
Work package:
WP5: Mass Storage
Management


Partner(s):
CERN, SARA, PPARC

Lead Partner:
PPARC

Document status:
APPROVED



Deliverable
identifier:
D5.2


Abstract
: This document describes the architecture and design of the Mass Storage
Management work package.

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 2 / 41


Delivery Slip

Name
Partner
Date
Signature
From
John Gordon PPARC 26/11/2001


Verified by
O Bärring, K Bos, J
Magowan
WP3, WP7 20/01/2002


Approved by

PTB All 28/01/2002



Document Log
Issu
e
Date
Comment
Author
0-0
26/11/2001 First draft JC Gordon
1_0 20/12/2001 More detailed design added O Synge, J Jensen
1_1 7/01/2002 Response to informal draft review J Jensen, O Synge,
2_0 14/1/2002 Reworking of Core description
J Jensen, O Synge, T Eves, JC
Gordon
3_0 21/1/2002
Reaction to K Bos comments on
structure
J Jensen, O Synge, JC Gordon
3_1 24/1/2002 K Bos minor comments on 3_1 J Jensen, O Synge, JC Gordon
3_2 30/1/2002
Minor changes based on comments
from K Bos, O Bärring, M Parsons
J Jensen, O Synge,
3_3 07/02/2002 Minor presentation changes G. Zaquine
3_4 08/02/2002 Minor presentation changes G. Zaquine

Document Change Record
Issue

Item
Reason for Change


Files
Software Products
User files /URL
Word
DataGrid-05-D5.2-0140-3-2-architecture.doc
http://edms.cern.ch/document/336679

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 3 / 41


CONTENT
1 INTRODUCTION.........................................................................................................................................................5
1.1 OBJECTIVES OF THIS DOCUMENT..........................................................................................................................5
1.2 APPLICATION AREA................................................................................................................................................5
1.3 DOCUMENT AMENDMENT PROCEDURE................................................................................................................5
1.4 TERMINOLOGY........................................................................................................................................................6
1.5 ACKNOWLEDGMENTS............................................................................................................................................7
2 EXECUTIVE SUMMARY....................................................................................................................................8
3 OVERVIEW...................................................................................................................................................................9
3.1 WHAT IS THE STORAGE ELEMENT?......................................................................................................................9
3.2 SE EXAMPLES.......................................................................................................................................................10
3.2.1 Data Protocol Support...................................................................................................................................10
3.2.2 Metadata Producer.........................................................................................................................................10
3.2.3 Pinning..............................................................................................................................................................10
3.3 WHAT IS A CLIENT?..............................................................................................................................................10
3.4 INTERACTION WITH OTHER WORK PACKAGES.................................................................................................10
3.5 INTERACTION WITH OTHER PROJECTS................................................................................................................11
4 USE CASES................................................................................................................................................................12
4.1 THE END-USER......................................................................................................................................................12
4.1.1 Requirements....................................................................................................................................................12
4.2 THE REPLICA MANAGER.....................................................................................................................................13
4.2.1 Requirements....................................................................................................................................................13
4.3 RESOURCE BROKER / JOB SUBMISSION SERVICE........................................................................................14
4.3.1 Requirements....................................................................................................................................................14
4.4 THE DATA CENTRE..............................................................................................................................................14
4.4.1 Requirements....................................................................................................................................................15
5 REQUIREMENTS..................................................................................................................................................16
6 ARCHITECTURE......................................................................................................................................................18
6.1 INTRODUCTION.....................................................................................................................................................18
6.2 GENERAL ARCHITECTURE...................................................................................................................................18
6.3 SERVER COMPLEXITY ANALYSIS.......................................................................................................................18
6.4 LAYERED MODEL.................................................................................................................................................19
6.5 TOP LAYER.............................................................................................................................................................19
6.5.1 Interfaces..........................................................................................................................................................19
6.5.2 Clients...............................................................................................................................................................20
6.5.3 Communication with the core.......................................................................................................................20
6.6 MIDDLE LAYER - SE CORE.................................................................................................................................20
6.7 BOTTOM LAYER - CLIENT MODULES..................................................................................................................21
7 DESIGN..........................................................................................................................................................................22
7.1 INTRODUCTION.....................................................................................................................................................22
7.2 GENERAL DESIGN.................................................................................................................................................22
7.3 HIERARCHICAL DECOMPOSITION........................................................................................................................22
7.3.1 Top Layer..........................................................................................................................................................23
7.3.2 Layer Communication....................................................................................................................................23
7.3.3 SE core..............................................................................................................................................................24
7.3.4 Bottom layer (client) modules.......................................................................................................................25
7.4 DATA FLOW...........................................................................................................................................................27
7.4.1 Data flow Initiation.........................................................................................................................................27
7.4.2 Data flow 1.......................................................................................................................................................28
7.4.3 Data flow 2.......................................................................................................................................................28
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 4 / 41


7.4.4 Data flow 3.......................................................................................................................................................28
7.4.5 Data flow 4.......................................................................................................................................................28
7.4.6 Data flow 5.......................................................................................................................................................28
7.4.7 Data flow 6.......................................................................................................................................................28
7.4.8 Data flow 7.......................................................................................................................................................28
7.4.9 Data flow 8.......................................................................................................................................................29
7.5 CLIENT INTERFACE PROTOCOLS.........................................................................................................................29
7.5.1 Data Transfer...................................................................................................................................................29
7.5.2 Information.......................................................................................................................................................30
7.5.3 Control Interface.............................................................................................................................................30
7.6 SE CONFIGURATION.............................................................................................................................................31
8 OPEN ISSUES..............................................................................................................................................................32
9 APPENDIX I MASS STORAGE SYSTEMS IN USE BY PARTNERS......................................................33
10 APPENDIX II SE QUERY VIA LDAP............................................................................................................34
11 APPENDIX III RFIO............................................................................................................................................35
12 APPENDIX IV MODULES.................................................................................................................................36
12.1 TOP LAYER (SERVER) MODULES......................................................................................................................36
12.1.1 SE-aware library modules........................................................................................................................36
12.1.2 Compatibility library modules.................................................................................................................36
12.1.3 Compatibility network modules...............................................................................................................37
12.1.4 Authentication and encryption.................................................................................................................37
13 APPENDIX V RELINKING...............................................................................................................................38
14 APPENDIX VI XML.............................................................................................................................................39
14.1.1 XML programming interfaces..................................................................................................................39
15 APPENDIX VII CLIENTS..................................................................................................................................40
16 APPENDIX VIII MODULAR DESIGN..........................................................................................................41

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 5 / 41


1 INTRODUCTION
1.1 OBJECTIVES OF THIS DOCUMENT
The purpose of this document is to provide an overview of the architecture and design of the
Storage Element component within Work Package 5,
1.2 APPLICATION AREA
This document applies to the Architecture and Design of the Storage Element (SE) defined in
[A1] The DataGrid Architecture.

Applicable documents
[A1] The DataGrid Architecture. G. Cancio, S. M. Fisher, T. Folkes, F. Giacomini, W.
Hoschek, B.L. Tierney. Version 2, June 2001.
http://cern.ch/grid-atf


Reference documents
[R1] The Anatomy of the Grid. I. Foster, C. Kesselman, S. Tuecke: Technical Report, GGF,
2001. http://www.globus.org/research/papers/anatomy.pdf

[R2] The Globus Project. http://www.globus.org

[R3] Job Description Language
How-To.
F. Pacini. http://www.infn.it/workload-grid/documents.htm

[R4] Global Grid Forum. http://www.gridforum.org

[R5] Globus CAS – Community
Authorisation Service.

http://www.globus.org/research

[R8] A Grid monitoring service
architecture.
B. Tierney, R. Wolski, R. Aydt and V. Taylor. Technical
report, GGF, 2001.
[R9] Grid Information Services for
Distributed Resource Sharing, 2001.

K. Czajkowski,.S. Fitzgerald, I. Foster and C. Kesselman.
[R10] A Resource Management
Architecture for Metacomputing
Systems.
K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S.
Martin, W. Smith, S. Tuecke:.
http://www.globus.org/research

[R11] GASS: A Data Movement and
Access Service for Wide Area
Computing Systems
J. Bester, I. Foster, C. Kesselman, J. Tedesco, S. Tuecke
[R12] SRM Joint Functional Design A. Shoshani (ed.)
[R13] Large scale C++ software
design
J. Lakos. [Addison Wesley, 1996, ISBN 0201633620]
[R14] DataGrid-02-D2.2-0103-1_2 W. Hoschek, J. Jaen-Martinez, P. Kunszt, B. Segal, H.
Stockinger, K. Stockinger, B. Tierney

1.3 DOCUMENT AMENDMENT PROCEDURE
This is an EDG deliverable and also an internal document which will be updated to reflect
changes in the architecture and design if necessary.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 6 / 41


1.4 TERMINOLOGY
Acronyms
API Application Programming Interface
ATF DataGrid Architecture Task Force
CAS Community Authorisation Service
CASTO
R
CERN Advanced STORage Manager
CE Computing Element
EDG EU DataGrid - i.e. this project
FTP File Transfer Protocol
GASS Global Access to Secondary Storage
GDMP Grid Data Mirroring Package (née Grid Data Management Pilot)
GGF Global Grid Forum
GMA Grid Monitoring Architecture
GRAM Grid Resource Allocation Management
GSI Grid Security Infrastructure
GUI Graphical User Interface
HPSS High Performance Storage System
HTTP HyperText Transfer Protocol
HSM Hierarchical Storage Manager
JDL Job Description Language
JSS Job Submission Service (WP1)
LDAP Light-weight Directory Access Protocol
LFN Logical File Name
MDS Globus Meta-computing Directory Service
MSM Mass Storage Management
MSS Mass Storage System
NFS Networked File System
PFN Physical File Name
RB Resource Broker (WP1)
RC Replica Catalogue (WP2)
RFIO Remote File IO
RM Replica Manager (WP2)
SAN Storage Area Network
SE Storage Element (WP5)
SOAP Simple Object Access Protocol
SRM Storage Resource Manager
SSL Secure Sockets Layer
TFN Transfer File Name
VFS Virtual File System
VO Virtual Organisation
WP Work Package
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 7 / 41


XML eXtensible Markup Language
Concepts
Hierarchical Storage Manager
(HSM)
A Hierarchical Storage Manager is a storage management
system which uses different types of storage media (e.g. fast
striped disk, slower disk, robotic tape, on the shelf tape) to
provide differing levels of service and transparently moves
files between levels to optimise the use of the more
expensive media.
Mass Storage System (MSS) This term covers all sorts of systems which store permanent
data from simple disk servers to multi-PetaByte HSMs
Replica Catalogue (RC) A Grid service which holds location metadata about
permanent files. Logical filenames are mapped onto one or
more physical instances of the file. These are known as
replicas
Replica Manager (RM) A Grid service which creates and manages replicas of
permanent datasets.
Virtual Organisation (VO) A distributed collection of people working on a common
project across geographical, administrative, and funding
boundaries.
Work Package (WP) An administrative sub-division of the EDG Project.
1.5 ACKNOWLEDGMENTS
The authors would like to express their gratitude to the following reviewers for their helpful
comments and suggestions: O. Bärring, K. Bos, S. Fisher, B. Jones, P. Kunszt, J. Magowan,
M. Parsons, J. Templon.
We would also like to thank the following people for helpful discussions and suggestions: J-P
Baud, D. Kelsey, W. Hoschek, R. Middleton, M. Sgaravatto, H. Stockinger, K. Stockinger, B.
Tierney.

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 8 / 41


2 EXECUTIVE SUMMARY
WP5 has designed a StorageElement (SE) software system. The role of the SE is primarily to
sit between the client and the Mass Storage System (MSS); to hide the MSS differences from
the client and to allow access to the MSS using protocols that it does not naturally support. In
addition to this role, the SE will also provide other Grid functions as applied to data access.
For example: security; access control; monitoring; logging; network transfer estimation.
To the outside world, the SE will provide three functions:
 For data transfer, it will support existing protocols such as RFIO, and GridFTP, and will
be extensible to new protocols that may appear. It will allow these protocols to access the
MSS.
 For control, it will provide a range of functions such as reservation, pinning, deletion, and
transfer time estimation. An API to these functions will be defined and implemented.
 For information, it will act as information providers to the DataGrid Information Service,
providing metadata about the SE, the underlying MSS, and the files therein.
The design of the SE follows a layered model with a central core handling all paths between
client and MSS. This approach was chosen as simpler to implement, for multiple client/MSS
combinations, than implementing each combination separately, as the latter would lead to
combinatorial inflation with resultant maintenance problems.
The SE that will be implemented from this design can be extended to support new protocols
for data transfer, new Mass Storage Systems, new Information Services, and new Grid
Services that wish access to data..
We describe in this document the architecture and design of a StorageElement which supports
existing Grid protocols and is flexible enough to support new protocols.
Chapter 4 describes a number of Use Cases for use of the SE and the requirements derived
from them. Chapter 5 groups these requirements into application areas from the user
grouping. The architecture is described in Chapter 6, and the design of the system in chapter
7.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 9 / 41


3 OVERVIEW
The DataGrid’s Architecture Task Force [A1] defined the concept of a StorageElement (SE)
which provides the Grid interface to all types of permanent storage. This document describes
in more detail the architecture lying beneath the SE concept. Figure 3.1 shows the model of
Grid services proposed in [R14]. WP5 is concerned with the working and the inter-working of
the StorageElement Services and the Fabric Storage Management. This architecture is not a
true layered model, therefore these services can be used by a variety of clients up to and
including the end-user applications. This document attempts to define the dependencies
between the various services and define an architecture which meets their requirements and
yet is still flexible enough to meet likely future requirements of existing and future services.

Underlying Grid Services
Underlying Grid Services
Fabric Services
Fabric Services
WP5 subsystems
Grid
Fabric
Local to User
Grid
Grid Application Layer
Grid Application Layer
Collective Services
Collective Services
Information
&
Monitoring
Information
&
Monitoring
Replica
Manager
Replica
Manager
Grid
Scheduler
Grid
Scheduler
Local Application
Local Application
Local Database
Local Database
Computing
Element
Services
Computing
Element
Services
Authorization
Authentication
and Accounting
Authorization
Authentication
and Accounting
Replica
Catalog
Replica
Catalog
Storage
Element
Services
Storage
Element
Services
SQL
Database
Services
SQL
Database
Services
Configuration
Management
Configuration
Management
Node
Installation &
Management
Node
Installation &
Management
Monitoring
and
Fault Tolerance
Monitoring
and
Fault Tolerance
Resource
Management
Resource
Management
Fabric Storage
Management
Fabric Storage
Management
Data
Management
Data
Management
Job
Management
Job
Management
Metadata
Management
Metadata
Management
Object to File
Mapping
Object to File
Mapping
Service
Index
Service
Index
SE Users

Figure 3.1: Subsystems in the Grid layered architecture relevant to the StorageElement.
3.1 WHAT IS THE STORAGE ELEMENT?
Access across the Grid to permanent storage is provided by a set of APIs, protocols and
interfaces. The StorageElement or SE is a software system which acts as a Grid Service. It
sits between the client (end-user or other Grid service) requesting access to data and the Mass
Storage System (MSS) which stores the data. To the outside world, the SE will provide three
functions: data transfer, control, and information. Behind the SE will sit a Mass Storage
System (MSS) which can be simple – a UNIX disk server – or complex – a Hierarchical
Storage Manager (HSM) with more than one type or level of storage.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 10 / 41


The role of the SE is primarily to sit between the client and the MSS; to hide the MSS
differences from the client and to allow access to the MSS using protocols that it does not
naturally support. In addition to this role, the SE will also provide other Grid functions as
applied to data access. For example: security; access control; monitoring; logging; network
transfer estimation. We shall thus say that the client accesses files “stored in the SE” even
though the file is actually stored in an underlying storage system and only managed by the
SE.
The rest of this document is concerned with the internals of the SE system as well as the
libraries the clients will need to access the SE.
3.2 SE EXAMPLES
As the SE has proven a difficult concept for the end-user to grasp, a number of examples are
shown here to ground the concept in reality. Although this pre-empts the use-case and
requirements gathering, it has been brought forward to this point to set the scene.
3.2.1 Data Protocol Support
HPSS (High Performance Storage System) is a proprietary HSM. It doesn't support GridFTP,
the most widely-used Grid transfer protocol. The SE will support both the GridFTP protocol
and HPSS
With the SE in the middle, the situation would be the same if the protocol were replaced by
another (such as RFIO) and/or the HSM replaced by Enstore.
3.2.2 Metadata Producer
Other Grid Services require metadata about the parameters of a Castor storage system and the
files it contains. Castor holds this information but does not publish in a way useful to the Grid
Services.
The SE acts as information providers to the DataGrid Information Service.
3.2.3 Pinning
The client (e.g. end-user or Replica Manager) wants to alert the MSS that it will be accessing
a number of files. Some MSS have this functionality but the SE will provide a common API
to it. In addition, it will provide similar functionality for disk-only storage systems.
Such pre-use reservation of files allows an HSM to bring files from tape to disk in
anticipation of the actual open or get request.
3.3 WHAT IS A CLIENT?
The clients are the users of the SE, i.e. the programs that access files or file metadata through
the SE. Although we have several different clients, the end-users and various Grid services,
for several reasons they will all be considered as only one type of client: the differences
between them will be managed by configuring the SE (see Appendix VII).
WP5 is responsible for the SE itself, as well as for libraries that the clients can use to access
the SE.
3.4 INTERACTION WITH OTHER WORK PACKAGES
Figure 3.2 depicts the interactions of WP5 with the other middleware work packages. The
architecture document from the DataGrid Architecture Task Force (ATF) [A1] describes in
more detail the interactions between all the middleware work packages. Note that several
interactions with other Work Package software are provided by the information architecture
developed by WP3.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 11 / 41


StorageElement
(WP5)
Resource broker
(WP1)
ReplicaCatalog
(WP2)
ReplicaManager
(WP2)
CE
(WP4)
Applications
(WP8,9,10)
Network
(WP7)
SE details
Data
GridFTP monitoring
Data
Verification
Metadata publication
(WP3)
Metadata
StorageElement
(WP5)
Resource broker
(WP1)
ReplicaCatalog
(WP2)
ReplicaManager
(WP2)
CE
(WP4)
Applications
(WP8,9,10)
Network
(WP7)
SE details
Data
GridFTP monitoring
Data
Verification
SE details

Figure 3.2: Interactions of the StorageElement with the other Middleware Services
The StorageElement relies on the following services and conventions and assumes their
existence
a) Replica Catalog (ref [R1]): A Grid service which holds location metadata about
permanent files. Logical filenames are mapped onto one or more physical instances of the
file. These are known as replicas.
b) Grid Information Service ([R9]): the SE will publish information about itself and the
files it contains. Details of this information service may change with time as long as
consumers and producers of information agree on the service and the information.
c) File naming model ([A1]): Files are identified by a Physical File Name (PFN) but
accessed by a Transport File Name (TFN) constructed from the PFN and an appropriate
transport protocol supported by both the client and the SE. The client can then access the
file through the TFN.
3.5 INTERACTION WITH OTHER PROJECTS
Since the Mass Storage Fabric at large centres involves scarce resources like tape robots, it is
foreseen that such centres may be providing mass storage services to more than one Grid
project. WP5 is aware of a number of projects (iVDGL, PPDG, BaBar, SAM....) and has
designed its architecture to be as flexible as possible to be able to interwork with them at the
protocol or service level.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 12 / 41


4 USE CASES
The following clients have been identified: the end-user, the Replica Manager, the Resource
Broker, and the data centre. The Use Cases below have been gathered from other Work
Package documents. For each set of Use Cases we list the requirements arising from those
cases (with a number indicating which specific case the requirement arose from).
4.1 THE END-USER
The end-user process (batch job or interactive) knows the location of a physical dataset. It
may have been provided with this location by a higher service (Resource Broker, Replica
Manager, Replica Catalogue) or directly by the user. It wishes to do one or more of the
following:
1. Copy the file to a local scratch space for further processing, for example using a GridFTP
client.
2. Open the file to access its contents as records: the user application can request one or
more records randomly either for reading or writing.
3. Access files randomly.
4. Make several requests to the SE in a short period of time. The SE will be expected to
handle tens of requests per second initially, eventually thousands of requests per second.
5. Create a new permanent file in the SE, reserving the estimated space required.
6. Access files using Globus IO.
7. Transfer confidential files securely (so those files can neither be read nor modified by any
third party).
8. Reserve one or several files for reading.
9. Access files using Grid filenames even though the program is not Grid-aware (and the
user is not able to modify the program to be Grid-aware).
10. Save its data and state in temporary files, either for the user’s inspection or in case of
software or hardware failures.
11. Allow anonymous file access (i.e. without authentication).
12. Allow another user access to a file, perhaps temporarily (i.e., delegation).
13. Allow a group of users to access a file.
4.1.1 Requirements
In the list of requirements below, the numbers in brackets refer to the items in the Use Case
list above.

The SE should be able to:
(a) Copy files to/from anywhere on the Grid; in particular, transfer files from one SE to
another. (1)
(b) Allow the user to open a file in the SE for random access reading or writing. (3)
(c) Support partial sequential reads: the SE remembers the offset in the file for the next read
access. (2)
(d) Provide session management with features such as relative addressing of files on the SE.
(1, 2)
(e) Allow control of multiple files – most importantly, it must allow pinning of multiple files.
(8)
(f) Support Grid security. (1,7)
(g) Support (at least one) encrypted file transfer protocols. (7)
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 13 / 41


(h) Process a specific number of requests per second
1
. (4)
(i) Come back up in a consistent state after a crash without immediate administrator
assistance. (10)
(j) Be able to stack requests if the SE is busy until the SE is able to process them again. (4,
10)
(k) Support access control (also to some extent provided by other Work Packages – the
methods we provide should be compatible with that of the other packages). (1, 11, 12, 13)
(l) Enable programs, which are not Grid-aware and cannot be modified, to access Grid files.
(9)
(m) Create temporary files. (10)
(n) Allow anonymous access. (11)
(o) Support delegation. (12)
(p) Support group access. (13)
(q) Support Globus IO (6)
4.2 THE REPLICA MANAGER
The Replica Manager (RM) moves and copies data around the world either according to
predefined policies (e.g. all Atlas Higgs events should be mirrored at Brookhaven) or on
demand of user or broker. The RM keeps track of such movements through the Replica
Catalogue (RC). The RM wishes to do one or more of the following:
1. Periodically verify the contents of an RC by checking with the SE on the existence and
status of files that should be contained therein.
2. Copy one or more files from a source SE to a target SE using GridFTP.
i. To do this it reserves the files for reading in the source SE;
ii. Then it reserves space for them in the target SE;
iii. Then it issues a copy request for a number of files;
iv. Then it releases the reservations in the source SE
3. Estimate the elapsed time required to perform a replication. For this it requires
information of the size of a set of files, the time to service the requests, network
bandwidth figures between the source and target SEs.
4. Perform a more sophisticated estimate of the time taken to access different replicas of the
same file in order to decide which replicas to use.
4.2.1 Requirements
The SE should be able to:
(a) Allow remote access to data. (2.iii)
(b) Support efficient data transfer protocols, in particular GridFTP. (2.iii)
(c) Allow operations on multiple files. (2.iii)
(d) Provide metadata about SE. (3, 4)
(e) Provide metadata about files. (1)
(f) Record the location and estimate the access time for files within the system. (3)
(g) Allow 3
rd
party copying between SEs. (2.iii)
(h) Allow reservation of files for reading and writing. (2.i, 2.ii)
(i) Support secure file transfer protocols. (2.iii)
(j) Allow delegation of access rights. (2.iii)


1
The actual number of requests per second that an SE can be expected to service will be SE
metadata.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 14 / 41


4.3 RESOURCE BROKER / JOB SUBMISSION SERVICE
The Resource Broker (RB) decides where to run a job using the location and availability of
files as input to its optimisation. The RB/JSS wish to:
1. Run a job
2
:
i. The user specifies LFN(s).
ii. A RB uses the RM to map an LFN into one or more PFNs.
iii. The RM may verify the existence of a file in an SE if RC proves untrustworthy
iv. The RB gets information from SE on ‘close’ CEs to which it will submit jobs that
access PFNs.
v. The RB can estimate access time for PFNs to help the JSS decide which set of PFNs
to use.
vi. The RB alerts the SE when the job is submitted so that those files can be brought
online before job runs.
vii. The user requires space for output datasets. The RB finds an SE with space available
and reserves it.
viii. The RB alerts the SE on completion of the job.
2. Submit jobs to a CE close to (on the same fabric as) the SE, or to a remote CE.
If the user submits a batch sequence of jobs that take turns processing the data, like a UNIX
pipe, the RB is responsible for managing the names of the temporary files.
4.3.1 Requirements
The SE should be able to:
(a) Make file metadata available. (1.iii, 1.v)
(b) Make SE metadata available. (1.iv, 1.v, 2)
(c) Pin files (reserve for reading). (1.vi)
(d) Allow reservation of space in SE for writing files. (1.vii)
(e) Create and delete temporary files. (1.vii)

4.4 THE DATA CENTRE
A Data Centre wishes to implement an SE to permit Grid access to the data held there. The
Data Centre wishes to:
1. Enable Grid access to data held in a proprietary MSS. The Centre doesn’t have access to
the source of the MSS software and must use the supplied interfaces for data access,
reservation, quotas etc.
2. Enable Grid access to data held on disk in standard UNIX file systems. Only the
management facilities of this UNIX file system are available. A method of reservation and
housekeeping is required.
3. Let Clients query which protocols are supported: All SEs may not support the same
protocols. Clients need to know which protocols are supported.
4. Let a MSS be shared between local users and the SE. The local users don’t use the SE to
access the MSS. The SE administrator must limit the amount of storage available to the
SE in the MSS.
5. Let a new Grid client (e.g. a RM) use the SE. The SE administrator configures the SE to
accept the new client and sets up access rights according to local SE policy.


2
Note that some of these steps do not give rise to SE requirements.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 15 / 41


6. Disallow access to the SE for clients whose Grid Certificate is revoked. The administrator
should be able to see what the client did, in case the certificate was used before it was
revoked.
7. Charge clients for using the SE, either based on the amount of data owned by the client in
the SE, or on the amount of data transferred by the client, or both.
8. Make old datasets available: they must be transferred to an SE to be accessed by new Grid
tools. For example, a file already stored in a MSS managed by the SE could be made
accessible from the Grid.
9. Change SE configuration without interrupting SE services.
10. Allow access to several different storage systems for users with a specific type of
software, e.g. a GridFTP client.
4.4.1 Requirements
The SE should be able to:
(a) Enable support for common APIs to a set of MSSs. (10)
(b) Add Grid functionality to MSSs. (1, 2, 10)
(c) Publish information on protocols supported by an SE/HSM. (3)
(d) Limit the amount of space the SE will allow its clients to use. (4)
(e) Grant or revoke rights and quotas to clients according to site policies. (5, 6)
(f) Log access to the SE for security and billing purposes. (6, 7)
(g) Allow files to be added by the administrator using non-Grid methods. (8)
(h) Cope with “black box” HSMs. (1)
(i) Be reconfigured without disrupting services. (5, 6, 9)

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 16 / 41


5 REQUIREMENTS
Since many of the requirements listed in chapter 4 are the same from one Use Case to the
next, we regroup them by function in this chapter for convenience and reference.
The similarities between the clients allows the implementation to support all the clients by
implementing all the functionality described in this chapter for each client. In other words,
each client should be able to do all the things listed in this chapter. What clients are able to do
will depend on the client’s permissions which will in part depend on the type of client
3
. For
example, we state that clients should be able to modify the SE configuration – obviously not
all clients will be allowed to do this, only those configured to be SE administrator(s).In the
table below, we list all the requirements along with the Use Case they arose from: i.e. “1f”
refers to Use Case 1, Requirement (f). This requirement can thus be found in section 4.1.1 (f).
Priority indicates the importance of a requirement:
 1 is considered an essential requirement: an SE must support such a requirement;
 2 is considered a normal requirement: an SE should support such a requirement;
 3 is considered a future requirement and need not be supported by the first version of the
SE.
The Volatility of a requirement is an estimate of how likely it is to change.
The Precision of a requirement is an estimate of how precise the specification is.
Section Requirement Use
Case
Priority Volatility
SECURITY
S1 Support Grid security. 1f 1 High
S2 Support authorisation and access control. 1k, 4e 1 High
S3 Delegate authorisation (possibly temporarily) to
other users.
1o, 2j 2 Low
S4 Support anonymous access. 1n 2 N/A
S5 Support group access. 1p 1 N/A
S6 Allow a 3
rd
party to transfer files to/from another
SE.
2g 1 N/A
S7 Log access to the SE for security and billing
purposes.
4f 1 Low
PROTOCOL
S

P1 Support GridFTP and/or other secure file transfer
protocols.
1g, 2i 1 High
P2 Support efficient transfer protocols, e.g.
GridFTP.
2b 2 High
P3 Support Globus IO. 1q 2 High
P4 Allow remote access to data. 2a 2 N/A
P5 Enable support for common APIs to a set of
MSSs.
4a 2 Low
P6 Add Grid functionality to MSSs. 4b 2 N/A


3
This is strictly speaking an implementation detail, but we mention it here to avoid confusion
and to explain why we need this chapter in addition to chapter 4. For the full details of this,
please refer to Appendix VII.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 17 / 41


METADATA
M1 Provide metadata about itself. 1h,
2d,
3b, 4c
1 Medium
M2 Provide metadata about files. 2e, 3a 1 Low
FILES
F1 Copy files to/from anywhere on the Grid. 1a 1 N/A
F2 Let a client open a file for random or sequential
access, for both reading and writing.
1b, 1c 2 Low
F3 Support File control such as pinning, reserve for
writing, changing file access, etc.
2h 2 Low
F4 Support controlling multiple files. 1e, 2c,
2h
3 Low
F5 Create temporary files. 1m,
3e
2 N/A
F6 Provide session management (e.g. a current
working directory).
1d 2 Low
MISC
M1 Restart in a consistent state after a crash, without
assistance from an administrator.
1i 1 N/A
M2 Queue requests if it is too busy or for some
reason some parts aren’t fully functional.
1j 1 N/A
M3 Enable programs that are not Grid-aware to
access Grid files, i.e., to access files through the
SE.
1l 2 Low
M4 Allow files to be added using non-Grid methods. 4g 2 N/A
M5 Cope with “black box” HSMs. 4h 2 Low
M6 Be reconfigured without interrupting services. 4i 1 Low
M7 Enforce user quotas. 4d 3 High




Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 18 / 41


6 ARCHITECTURE
6.1 INTRODUCTION
This chapter describes the internal architecture of the SE. The SE provides data storage and
retrieval for a number of clients using a variety of protocols. The SE stores data by managing
the MSS system it is connected to. By providing a series of clean protocol interfaces an SE
can be thought of as a server to clients accessing the SE. To a MSS the SE appears as a client
requesting files or information from the MSS. The goal of the SEs is to save clients from
needing to know about the internal workings of the MSS. In the middle, between the client
and the server parts of the SE, there is a management layer where the functionality of the SE
is implemented. This approach has led to a modular and layered design, which is extensible,
allowing new protocols to be supported, or new functionality to be added as required. This
chapter explains the reasons for this architecture and expands on the details of the of the SE’s
structure.
6.2 GENERAL ARCHITECTURE
For ease of development WP5 has decided to use a highly modular architecture. The benefits
of building a server from a series of discrete modules is well understood but some benefits
have been outlined in Appendix VI. The main points have been listed below
 Faster Development
 Simplified Testing
 Simplified Debugging
 Ease of maintenance
 Redundancy
When possible the architecture has been selected on the basis of simplicity and fulfilment of
the requirement stated in the previous chapter.
6.3 SERVER COMPLEXITY ANALYSIS
The SE provides an abstraction of local data stores over transfer protocols. At least one
protocol must work with each Mass Storage System for it to appear as an SE. If N is the
number of protocols and M is the number of distinct MSSs the SE must support, it is
conceivable that WP5 are expected to provide N.M links. As the SE develops support for new
protocols and broadens its support for MSSs it is quite possible that the workload could reach
beyond the scope of WP5.
To avoid the necessity of building N.M links, we intend to develop a common protocol for
internal use within the SE which all external interface protocols are mapped into. Thus total
number of links would be only N+M [Figure 6.1]. This approach will reduce the work
required to add other MSSs or protocols as well as making it simple to support several
protocols on every SE.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 19 / 41


Protocol A
Protocol B
Protocol C
N+M
Protocol A
Protocol B
Protocol C
MSS A
MSS B
MSS C
N∙M
MSS A
MSS B
MSS C
CORE
Figure 6.1: N∙M links vs. N+M links
6.4 LAYERED MODEL
The layered structure of the SE is a direct result of selecting an infrastructure based upon the
N+M approach to the complexity issues raised by supporting multiple protocols and MSSs.
The top layer will provide the support for the protocols while the bottom layer will support
the MSSs.
The DataGrid makes demands on the SE that are not present with all MSSs. Examples
include storing metadata such as locally marking a file as a replica. These features common to
the SEs that are not available within the MSS should be provided in a layer that maps the
protocol interfaces to the MSS interfaces. This makes the SE a three-tier system. (Fig 6.1) and
allows the common layer to take control of all features that are required by all SEs. In
particular, the SE must keep a database of metadata that is not handled by the underlying
MSS.
6.5 TOP LAYER
All clients access the SE through the “Top Layer” of the SE. The top layer is made up of a
collection of interfaces, and tools to connect with the other layers of the SE. The interfaces
will then pass the requests received by the clients through to the lower levels of the SE’s
system.
6.5.1 Interfaces
Each interface supports a protocol. These interfaces can be divided into two groups. One
group of protocols such as ssh and FTP maintain a session and provide a sequential control
interface which persists until the client logs out. The second group of protocols such as HTTP
are often stateless and perform a series of atomic requests. Both types of protocol can
simulate the other with some work on the programmer’s part. Stateful interfaces can simulate
requests as stateless by encompassing their state within the request. Stateless interfaces can
simulate the behaviour of stateful protocols by providing unique identification of the available
requests to the client and the server can simulate the clients state over a stateless protocol.
State is difficult to replicate across a system. For this reason, when session and state
management are demanded by a protocol, it should only be managed at a single point. The
communication between the top layer and the SE Core will be stateless. Since no state
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 20 / 41


protocol information will be passed to the core, protocols that require session management
will be responsible for managing their own state.
The SE will have three distinct types of interface: data transfer, control, and information. The
division into these areas is artificial and an implementation could easily mix components of
each but this model was chosen because
a) A number of data transfer protocols exist (FTP, HTTP, and SCP) which could be
incorporated.
b) Information services exist and are being developed. Implementing this functionality as a
producer to one of these services is best done by treating it as a separate component.
c) Remote Control of MSSs is a relatively new feature still being developed (refR12). There
are no standards in this field so it is likely to change over the lifetime of the project.
A particular SE need not support all three components. In special circumstances an SE could
support any combination of these interfaces but the more common cases will be data only or
all three. The three interfaces are described in more detail below.
6.5.2 Clients
The SE will interface a variety of clients including the replica manager and Grid applications
being run by users. For simplicity, all clients will be treated identically, only differing on the
basis of permissions within the SE (see Appendix VII). Instructions received by the SE over
the different protocols supported will also be treated identically.
6.5.3 Communication with the core
Communication with the middle layer of the SE from the supported interfaces should be
through a common point. Providing a common communication mechanism between interfaces
provides the benefits of a modular system with its advantages stated in appendix VI.
Regulation of the single communication mechanism with the lower layers of the SE would
easily provide the SE with facilities to manage requests from the top layer of the system and
handle each request in turn. This would also reduce the possibility of a loaded SE failing with
transient high loads and make the SE behave in a comprehensible and predictable manner
when under a heavy load.
6.6 MIDDLE LAYER - SE CORE
The core provides a single internal interface to the top layer and its interfaces. This layer
provides an abstraction insulating the top layer interfaces from the various MSS interfaces,
and the bottom layer from the different access protocols such as GridFTP and RFIO. It also
implements several major functions that apply to all SEs:
 Verifying authentication, checking authorisation and logging access.
 Provides a central point for control and enforcement policy for a site.
 Handling metadata (the SE must keep a database for metadata not handled by the
underlying MSS).
 Measurement of dynamic metrics for the SE to publish as metadata, for example it’s
current load and free space.
 Housekeeping, such as tracking which HSM stores which files, whether the files are
cached, etc.
The core should have a simple framework, which allows for expansion and adaptation
between top and bottom layers.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 21 / 41


 The SE must store data about the MSS it is managing and the Grid services it is providing.
The core layer will be responsible for managing this data and providing the necessary
housekeeping facilities
The SE core will provide support for the locally defined details of the SE policy. The
differences will include configurations set with regards to the physical layout of the system.
We expect significant differences between sites’ local policies, e.g. when reserving (pinning)
files. We expect to support site policies, such as the timeout before staging a file to tape.
6.7 BOTTOM LAYER - CLIENT MODULES
The SE is responsible for communicating with a variety of site-specific MSSs in a consistent manner
ideally through a single interface. Support for specific MSSs will be modular and through a common
interface which allows for expansion. Sites may differ greatly in operation as well as interfacing and
this suggests that direct mapping of operations between MSSs will not be possible. The SE is required
to support MSSs where the source code is not available but all MSSs provide interfacing facilities. It is
expected that interfacing clients or libraries will be reused within the SE.

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 22 / 41


7 DESIGN
7.1 INTRODUCTION
This chapter describes the implementation of the SE. The SE will be built from 3 layers, as
stated in the architecture chapter. This chapter describes these layers and how they interact
with each other.
Interface
1
Interface
3
Interface
2
Message Queue
Session Manager
System Log
House Keeping
Meta Data
MSS
Interface
MSS
Interface
MSS1
MSS2
Top
layer
Core
Bottom
layer
Clients
Storage
Element
Fig 7.1. Schematic Diagram illustrating layered design of the Storage Element
The top layer provides the network interfaces supporting all the current protocols of the Grid
and will also support future protocols. The middle layer or core of the system will provide
those features which are required of an SE but not provided by the MSSs. The bottom layer
will provide the interfacing with the MSS.
7.2 GENERAL DESIGN
This chapter will explain the design through two different approaches. The first explanation is
a hierarchical decomposition of the SE and its structure showing the events that occur in the
layers. The second explanation will explain the design through data flow diagrams.
7.3 HIERARCHICAL DECOMPOSITION
As stated in the architecture chapter, the SE will be based on a three tier structure. This is
illustrated schematically in Fig 7.1. Each of the tiers will be highly modular in structure as
stated in appendix VIII. These layers and their communication are explained in the following
subsections of this document.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 23 / 41


7.3.1 Top Layer
The top layer is made up of the interfaces which the SE will support. Each will be
implemented independently, making their development and support easier. Tasks that all
interfaces have to do, e.g. handle security, will be implemented by providing the common
functionality in libraries.
7.3.1.1 Security
Security for the SE will follow the Grid Security Infrastructure (GSI): we will support
certificates and we will track changes in the Grid security policies, as these are not yet
formalised.
7.3.1.2 Interfaces
Network interfaces to the SE can be divided into two groups, stateful and stateless interfaces.
In the context of interfaces to the SE, stateless calls provide a call response without requiring
data from previous calls within the same session. The following stateless and stateful
interfaces will be supported first.
Stateless Interfaces
 Ftree/LDAP has been produced by WP3 and is primarily intended for publishing static
data for Grid middleware. The R-GMA interface is expected to supersede the Ftree/LDAP
interface and will be supported when it does. The Ftree/LDAP interface can also provide
dynamic querying of files and their attributes.
 SOAP-like interface/XML RPC
Globus have indicated in the Global Grid Forum (GGF) that they intend to move to a SOAP-like
protocol as commonly used in e-commerce. Due to the simplicity of implementing such
interfaces, the protocol developed by WP5 will be a stateless XML call. This interface will expose
all of the SE’s functionality.
Stateful Interfaces
 RFIO supports access to Castor and HPSS, for remote network file access. For ease of
comprehension, RFIO replicates the functionality of the UNIX file API. RFIO is stateful
in the sense that the open and close equivalent calls initiate and terminate file access.
 GridFTP is a stateful system encompassing logins, logoffs and current working directories
for relative file access.
Since the architecture calls for building a single common interface to all network protocols, it
is easier to provide a common stateless interfaces upon which stateful interfaces can send
their state information if necessary as part of the request they are handling. An example for
RFIO would be that of keeping the offset of a file for sequential reading. This means that a
protocol like GridFTP will send the user id with every request to the core.
7.3.2 Layer Communication
All data will be sent through named pipes created by the interfaces through a pipe store
manager. Communication with the core layer is achieved through a common interface to a
queue. The queue prevents the system from overloading the core with concurrent requests as
the core will operate to capacity rather than overloading. The queue will help the SE have a
predictable load handling behaviour alowing the system administrator to limit the number of
concurrent requests handled by the SE. The XML placed on the queue will describe the
named pipes and give their context of operation. The core will then identify and interpret the
XML to establish the purpose of the pipes.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 24 / 41


Once the messages have been taken off the queue by the core they will be processed using a
request handler which interprets the XML and launches a number of small modules which
provide the real work of the SE. Neither the clients nor the MSSs need be aware of this. It is
possible, though, that an SE-aware client can use a subset of this XML to access the SE
directly, using the HTTP protocol and SOAP although this will be wrapped in an interface
like all other protocols.
7.3.3 SE core
The SE core receives messages from the message queue and will process as many as the site
administrator has configured may run concurrently. Each of the messages will be in the form
of XML.
These XML requests will describe not only which interface sent the request (so that the right
interface gets the response), but also how to process the requests. The SE core stores a list of
all the handler modules that can be called upon by the request manager. The core consists of a
request manager and a series of handlers. Once the request manager receives the XML, the
request manager will pipe the XML request through the first handler and then repeat this
process until the XML is fully processed. Each handler reads the XML from its stdin,
removes its own information from the XML, similar to the SOAP protocol, and sends the
processed XML to its stdout. Metadata is returned, if necessary, in the processed request
along with information about the request. The returned information can be very simple, such
as "the request was (was not) successful", or it might be a more complicated structure
containing meta-information about the request such whether the processing has finished and
which handler to pass the request to next.
The XML request files will define the communication mechanisms available to the core with
the respective network interfaces in the top layer. The XML will be passed through a UNIX
named pipe in the first stage of development although sockets may be used at a later date so
the interfaces may be distributed across a network.
7.3.3.1 Responses to the Top Layer
Information is passed back to the top layer as XML, and the modules in the top layer are then
responsible for translating this data into something the clients understand. Responses are only
passed back from the core when the core has finished processing the request. For example if
the request involves talking to the MSS, the core will wait till the MSS request has completed
before passing the response back. The core handles requests concurrently and any blocking
and waiting for requests to complete will be the responsibility of the top level interface (the
interface can either block until a response comes back through the pipe or it can query
whether a response has come back).
The response sent back to the client could be a simple number, such as the number passed in
FTP/GridFTP or HTTP. This has the advantage that client implementers are already familiar
with these numbers since their usage is well documented, and a number is easy to parse by
clients. In the case of errors, the SE should also pass back a string describing the nature of the
error; the client can pass this error information on to the user.
An open question here is how much we can use or reuse solutions developed by other projects
such as WP2 and WP3, or the Apache XML project. WP2 and 3 also use XML based
query/response, either XML to XML, or an HTTP GET query which then gets an XML
response. WP3 is also looking specifically at SOAP, and we need to investigate whether this
is useful for us as well.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 25 / 41


7.3.3.2 Core Handlers
Logically one can separate further the handlers in the core into three categories: information,
control, and data transfer. (Recall that session handling is provided by top layer.) The
information handlers would provide file meta-data or information about the SE itself; both of
these could then be split further into "static" information which changes infrequently or never,
and "dynamic" information which might change with each query (see below).
We will provide example handlers, both simple and complex. We may provide libraries
and/or C++ classes that facilitate writing handlers if users of the SE wish to extend its
features. The handlers will in general be simple programs that could be written in C, Java, or
Perl (for example), using libraries to parse XML.
Core handlers are responsible for providing metadata. The SE metadata is divided into four
groups according to whether or not the data is static or dynamic, and whether it pertains to an
individual file or to the SE as a whole.
Static SE Metadata
This metadata changes infrequently so it can be cached in other places. Examples are:
 Size of SE
 Protocols supported
 Policies
 Scheduled outages
Dynamic SE Metadata
Dynamic SE metadata needs to be up to date and will have a very small lifetime, if any.
Examples are:
 How Full the SE is
 Load index (queue size, load, average latency, etc.)
File Static Metadata
Information about files that is static over a period in the order of the lifetime of a file.
Examples in this category are:
 Filename
 Creation date, ownership, etc.
File Dynamic Metadata
Information about files that changes frequently and cannot be cached. Examples are:
 Latency (estimate of access time)
 Pinning status of file
 File size
7.3.4 Bottom layer (client) modules
The lower layer will consist of modules which translate XML requests into MSS specific
function calls. One module would know how to talk to HPSS, one might talk GridFTP (i.e.
be a GridFTP client), and one might talk to CASTOR
4
. The client modules will not differ
programmatically from the core modules, i.e. they will be called in the same way as the core
modules. Client modules will also receive XML on stdin and write XML to stdout and operate


4
One subject that needs further investigation is to find out how much of the SE's functionality
could be managed by CASTOR through such a module.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 26 / 41


within the same request handler environment. This simplifies the way core modules and the
MSS modules are built into the system.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 27 / 41



7.4 DATA FLOW
The aim of this SE data flow section is to explain how the SE operates internally. It is
expected that the operation of the SE will run concurrently for multiple clients performing
multiple requests; this means that the operations of the SE will occur in parallel.
I nt er f ace
Queue Manager
Re q u e s t Ma n a g e r
Pi pe Ma na ge r
Tape
Di sk
Na me d Pi pe
1
2
3
4
5
6
7
8
I nt e r f a c e La ye r The Cor e a nd t he Bot t om La ye r
MSM
Ha ndl e r
Na me d Pi pe
Na me d Pi pe
Na me d Pi pe
Pi p e St o r e
Ne t wo r k

Fig 7.2: Diagram Illustrating data flow through the SE.
Figure 7.2 illustrates the data flow, providing a breakdown of the layers of the system so that
the data flow can be compared to the hierarchical decomposition. The following text describes
each stage of the data flow and how the SE manages the client requests. The numbers on
figure 7.2 correspond to the subsections on data flow.
7.4.1 Data flow Initiation
The communication with the SE by the client connecting with the SE’s interface initiates the
data flow through the SE. The interfaces interprets the client request and establishes the
clients intention. The simplest case will occur when the client does not need to provide any
data transfer. In such cases the interface may only need a single acknowledgement of
completion of the task required of the SE. In more complex cases the client may expect the
interface to transfer multiple files for reading or writing concurrently.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 28 / 41


7.4.2 Data flow 1
The interface requests from the pipe manager the number of pipes that the interface requires
to perform the client’s request. In the simple case of no file transfer this may be a single pipe
to acknowledge that the operation is completed if this is required by the protocol. The pipe
manager will then create the named pipes and communicate their identifiers back to the
interface. The interface will then connect to the named pipes in read or write mode. Once the
named pipes are connected the interface can then continue operation.
7.4.3 Data flow 2
The interface loads a template XML file with the appropriate information for the core to
process. The XML will include details such as the names of operation acknowledgement
pipes, and data transfer pipes and their direction of data flow (read or write to the SE).
Additional information including logging, authorisation and other data which the interface can
establish will be included within the XML file. Protocols that require operation upon multiple
files, or like GridFTP provide multiple concurrent transfers of different subsections of files,
will place multiple files requests. The XML file or files then represent the client’s request.
The request files are placed upon the queue which provides a single point of regulation of
concurrent requests by any single SE allowing for predictable behaviour at high loads. The
queue manager regulates the queue’s operations, including adding and removing files from
the queue.
7.4.4 Data flow 3
The queue regulates the number of concurrent requests which can reach this stage of the data
flow. When a file is taken off the queue it is simply piped into the input of the request handler.
(See the hierarchical decomposition for details of the request handlers operation).
7.4.5 Data flow 4
The request handler pipes the request XML file into the handler’s stdin, and reads from the
stdout of the handler. Since the XML contains the details of which handlers need to operate it
is trivial for the request handler to pipe the XML through each handler in turn, until all the
components of the request are processed by the handlers. The handlers write the XML to
stdout with status information logging their operation, preventing the system performing an
infinite loop.
7.4.6 Data flow 5
Each handler is a discrete application (see 7.3.3.2), some of which communicate with the
MSS. Simple “bottom layer” handlers may communicate with the MSS to stage a file to the
cache but many will provide data transfer. This will typically be provided using MSS client
libraries and should be straightforward to implement.
7.4.7 Data flow 6
The MSS provides its own implementation of staging and data caching to and from tape. The
SE will not need to participate in this stage of the data flow.
7.4.8 Data flow 7
Handlers which provide data transfer will always reside on the same physical machine as the
interface they serve. This is not necessarily true for all handlers within SEs made up of
clustered systems Data transfer handlers can establish the identity of the named pipes as this
information will be present in the request XML file the handler received via stdin. The data
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 29 / 41


transfer handler can then connect through the named pipes with the interface and through
client libraries to the MSS .
7.4.9 Data flow 8
Named pipes will be connected to the interfaces as stated earlier in the data flow. For data
transfers this will provide a highly optimised communication abstraction irrespective of the
interface. In the case of acknowledgement pipes this is the last stage of the process: they are
used to communicate the status of the request to the interface.
7.5 CLIENT INTERFACE PROTOCOLS
Clients can be divided in two groups: those that are written specifically to access the SE, and
those that are not. For the SE aware applications we can provide a rich interface which can
change later, even if that means modifying the client. For applications that are not SE-aware,
there should be modules that, as far as the client is concerned, look exactly like the API the
client expects, but actually redirects calls to the SE. This can be done without modifying the
client at all, but may require running the client through a specific wrapper program which
takes care of the authentication, and also takes care of redirecting the UNIX file access calls
through SE libraries [see Appendix V]. The wrapper program will also take care of setting up
an environment for the client in which the client is implicitly authenticated to the SE.
Furthermore, since the SE might provide, or access may require, a richer interface than the
client's expected API, some services may have to be handled in a way that circumvents the
limitations of the client's API. For example, there exists an interface to RFIO [Appendix III]
which provides the same system calls as the usual POSIX file interface, but also provides
additional calls, for example rfio_setcos on HPSS. The SE may make such additional HSM
functionality available to its clients, but obviously through a consistent HSM-independent
interface. For the clients accessing the SE through a compatibility interface, the SE should
make sure that reasonable defaults are picked for those additional services.
Client interface protocols have been broken down into three types:
 Data Transfer
 Information
 Control
Currently no client interface covers all three types of interface in a suitably extensible
approach. The requirements have also broken down into these three logical groups. We expect
that adding XML web services or a SOAP like interface could combine the interfaces at a
later stage of development.

7.5.1 Data Transfer
Three protocols will be supported in the initial versions of the SE. We intend to be implement
them in the order of ease of implementation.
 RFIO
 GridFTP
 NFS
There exist command line utilities for doing Data Transfer through RFIO (see Appendix III),
so providing a RFIO API will already allow end users to access the SE. Later, we intend to
provide SE-aware command line utilities.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 30 / 41


7.5.1.1 Implementation
These are the fundamental protocols of the SE and are optimised to be scalable if possible.
These protocols are stateful and designed to avoid network overhead while transferring large
amounts of data. Since the top layer of the SE is intended to support sessions while the core
and bottom layer should be as stateless as possible, we intend to use a standard listen and fork
on socket connection model. The state (session specific information) will then be stored on a
per process basis which will communicate with the core layer through the common XML
interface. As protocols such as GridFTP only challenge the client on connection, the process
which manages the connection to the core will have to send authentication information with
each request. We do not expect that this will slow down the processing noticeably.
7.5.2 Information
Information interfaces will often provide status or static data about the SE. These will be
provided by a set of protocols and APIs to the EDG Information Service that provide access
to the SE-specific information. See Appendix II.
The following clients need to access SE Information:
 Replica Manager (WP2) (section 4.2)
 Resource Broker (WP1) (section 4.3)
 LDAP clients (WP3)
 Other SEs
 Users. Users can access file metadata through a command line utility such as sestat
5
.
There will also be a command for accessing SE metadata. WP5 will be responsible for
these clients.
7.5.2.1 Implementation
The information service will follow a publishing model rather than an interactive session
model. This makes support simple, although this may change when R-GMA becomes a
standard information protocol for the DataGrid. Commonly the publishing protocol will either
be public, or in the case of the LDAP/Ftree implementation, provided by WP3 the protocol
the interface will provide its own authentication. Since queries on files and other dynamic
calls will not carry state information, servicing requests should be trivial in the core. It is
possible that much of the information will not even require authorisation.
7.5.3 Control Interface
Data transfer clients typically have some control functions (e.g. to delete a file) but the bulk of
this functionality will be added through new APIs. The form of the control interface is not yet
finalised but it will provide the following features:
 Reserving files
 Multiple file control
 Access to file metadata
 Delegation of access rights
 Declaration of permanence/temporary status of files
 Deletion


5
It makes sense to choose names similar to those of the standard UNIX file utilities. A utility
“sestat” would be the SE version of “stat”.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 31 / 41


7.5.3.1 Implementation
These controls are SE specific interfaces and thus need to be implemented both for client and
server. To minimise workload these will be XML messages which, if required, will be trailed
with binary data starting immediately after the last XML tag.
The XML will state that it is a control message for a European DataGrid SE and its version
number, possibly over HTTP. An XML response message will be sent back to the client
indicating the success status and any valid response information in a format similar to the
request. This control interface may be expended to support a richer set of options at a later
stage.
7.5.3.2 SE Management
The Control interface will be expanded to provide facilities for managing the SE itself so that
the SE can be controlled from (possibly remote) clients – this will typically be clients written
specifically to control the SE, so they will not need to do data transfer. Such functions
include adding new users to the system, granting rights to users, changing logging
configuration and other SE policies. These features are not intended in the first releases as
they will be easier to add to an established server on an as needed basis.
7.6 SE CONFIGURATION
The SE administrator must be able to configure the SE. The configuration must allow the
administrator to add users and to give and remove privileges to users, analogous to those
provided by databases where users are granted permission to create or delete entries, create
tables, etc. Furthermore, administration must be possible through a well-known configuration
file format, such as that of the apache web server.
It must be possible to reconfigure the SE without stopping it. If this part of the SE is
implemented as a daemon, standard UNIX signals should be used to signal to the SE that the
configuration files have changed (cf. the apache web server, where HUP signals “reload
configuration files now” and USR1 signals “reload gracefully” (requests that are getting
processed are not interrupted)).
As for the configuration itself, we will look at the work done by WP4. Although our needs
for configuration are simpler in general than theirs, we should still be able to use some of their
methods and ideas for configuring SEs locally or remotely.

See Appendix VII for additional comments about the configuration.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 32 / 41


8 OPEN ISSUES
 Interactive services: WP5 believes that the implications of supporting interactive services
have not been fully scoped by the overall project. WP5 believes that there is nothing in the
StorageElement architecture or design that prevents interactive use. Although the
performance delivered by physical mass storage systems may be unacceptable for
interactive users, the information component allows them to anticipate the delivered
performance and take appropriate action. The precise meaning of “interactivity” needs
further discussion between the middleware and application work packages.
 Policies: WP5 has designed in the ability for an individual SE to publish some details of
policies which it applies (e.g. no remote access, read-only) but these are specific to WP5.
WP5 feels that a wider system of Grid policies should be developed for common use by
different Grid services. Examples:
o When users move, copy, rename, delete, or modify files, should the RM be
informed so that it can update the RCs?
o More specifically, when a master copy of a file is deleted, when are the replicas
removed, and by whom?
o If the SE is unable to process a request immediately (either because it is busy or
because the request takes a “long” time to process), should it inform the Job
Scheduling Service (WP1) so that the job can be put on hold till the request
finishes?
o Authorisation can be managed entirely by the RM by letting it grant other users
access to files using delegation.
o Grid certificates: can they be integrated with site specific security systems like
Kerberos? For example, GSS-API can operate with either PKI or Kerberos
security mechanisms.
 Access Control: Existing mass storage systems have their own methods of access control
based on local usernames and/or access control lists. While Grid access can be mapped
onto these, a generalised Grid access control solution can only be effective when all
access to data is via Grid methods which implement this access control. In most cases,
large mass storage systems will be shared with local non-Grid users for some time. Such
local use invalidates Grid access control unless the resource is partitioned between Grid
and non-Grid use.
 Objects: If the users choose to deal with objects rather than files and the LFN/PFN/TFN
model is changed to reflect this, a lookup is needed to find out in which file and where in
the file the object is stored. Since an object is stored physically in a file, we will need to
find out where the lookup happens.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 33 / 41


9 APPENDIX I MASS STORAGE SYSTEMS IN USE BY PARTNERS
For details of these systems see Deliverable 5.1
HPSS http://www.sdsc.edu/hpss/hpss1.html

CASTOR http://wwwinfo.cern.ch/pdp/castor/

Enstore http://www-isd.fnal.gov/enstore

DMF http://www.cray.com/products/software/dmf.html

Jasmine http://cc.jlab.org/scicomp/JASMine/Goddard_2001.html

satSTORE http://styx.esrin.esa.it/grid/docs/docs/ESA_AMS_HSM_paper.pdf

http://styx.esrin.esa.it/grid/docs/docs/SatStoreICD401.doc

VTP http://www.e-science.clrc.ac.uk/Activity/ACTIVITY=DataStore



Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 34 / 41


10 APPENDIX II SE QUERY VIA LDAP
Introduction
The SE publishes information on the ldap/ftree system provided by WP3, further protocols
such as MDS2.1 or R-GMA will be supported as they are standardised as Grid protocols.
WP5 has written a query object se_ldap_query in C++. This object provides a simplified
interface to the LDAP libraries. WP5 has also provided C++ code to access the LDAP/ftree
interface directly.
Public Methods
All methods, unless indicated otherwise, return 0 on a success and negative values indicate
failure.
int connect(const string &ldap_host, int ldap_port);
connect takes a character array of the host name and a numerical port number and connects to
the ldap server.
int disconnect();
Disconnect closes the connection to the ldap server. Disconnect is called by the destructor and also on
further connection.
int GetStorageElements(std::vector<string> &result) const;
GetStorageElements gets a vector of strings containing a list of all available SE’s known to the ldap
server.
int GetStorageElementAttribs(const string &StoreageElement,
std::vector<string> &result) const;
GetStorageElementAttribs returns a list of all storage element attributes.
int GetStorageElementCompElement(const string
&StoreageElement,
std::vector<string> &result) const;
GetStorageElementCompElement returns a list of all CE close to the SE.
long GetStorageElementFreeSpace(const string &StoreageElement
) const;
GetStorageElementFreeSpace returns a long containing the available space on the SE.
int GetStorageElementFileAttrib(const string &StoreageElement,
const string &Filename,
std::vector<string> &result) const;
GetStorageElementFileAttrib returns the file attributes.
A simple command line tool of the same name has been written that demonstrates this interface and
takes the following form.
se_ldap_query –H ldap://gppmds.gridpp.rl.ac.uk:2171 –
GetStorageElements
se_ldap_query –H ldap://gppmds.gridpp.rl.ac.uk:2171 -
GetStorageElementAttribs gppmds.gridpp.rl.ac.uk

All command line options are identical to the object methods and take equivalent parameters. This is
still a test application but it will be released and improved by the time this document is released.

Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 35 / 41


11 APPENDIX III RFIO
RFIO (Remote File IO) is an API which allows client programs to access a MSS, as well as
local and remote disk files. As far as the client program is concerned, the API looks like a
standard UNIX library API, albeit in a different namespace. If the client program is written in
C, it accesses functions like those of the C library, but each of the standard symbols is
prefixed with the string "rfio_". A header file exists which renames the standard C API to the
RFIO C API, so converting an existing client to using RFIO can be done by simply including
this header file in the source and recompiling (obviously this requires access to the source
code). The arguments to the functions are the same for the RFIO API as they are for the
standard C file API; thus, one can find the synopsis for rfio_fopen (say) by looking up the one
for fopen(3). If the client program is written in C++, it can access a stream type library
similar to the standard C++ streams iostream and fstream.
In C, clients would thus open a file with the function call

handle = rfio_fopen(filename,access_mode);

access it with rfio_fread(handle,...), rfio_fwrite(handle,...), etc., and finally close it with
rfio_fclose(handle).

The standard C and C++ file IO API are, of course, well known and need not be documented
here.

However, RFIO may provide a richer interface than the standard library. This means that
RFIO may provide additional functions which do not have any equivalent in the standard
libraries. For example, in the C API there is a rfio_setopt() function which allows the client to
change certain RFIO parameters. Furthermore, some MSSes might provide a richer yet
interface. For example, when using RFIO as a front-end to HPSS, a function rfio_setcos()
allows the client to set the HPSS Class of Service. In C++, this additional functionality can
be implemented either as additional methods for the rfstream class, as stream flags, or using
manipulators.

RFIO can also provide access to remote files transparently to the client. If the RFIO library
decides that it cannot handle a request locally, it can send the request to a RFIO daemon
running on a remote server which may then be able to service the request.

RFIO libraries have been written to access various MSSs, such as HPSS (IBM) and CASTOR
(CERN). Thus any RFIO client can access any MSS that talks RFIO (as long as the client
doesn’t require/use any extended functionality). Client programs that don’t already use RFIO
just have to be recompiled with the proper RFIO library, and they can then access the same
MSSs as well.

Finally, RFIO provides command line tools for performing simple file operations such as
copying (rfcp), deleting (rfrm), and renaming (rfrename), and commands for creating,
removing, and listing the contents of directories.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 36 / 41


12 APPENDIX IV MODULES
12.1 TOP LAYER (SERVER) MODULES
As mentioned in section 6.5.1 on page 19, we can logically separate the interfaces provided
by each of these modules into three parts: information, control, and data transfer. Clients that
wish to access the module may wish to access only a part of the module’s interface. For
example, a RC may use only the information interfaces, whereas the RM needs to access also
the control interface but does not need to access the data interface directly.
In the architecture chapter, we have described how the top layer modules can be divided into
groups according to whether they are SE-aware or compatibility modules, and whether they
are network modules or client libraries. We now describe in more detail the design of these
modules. Note, however, that we don’t consider the category of “SE-aware network
modules” since clients don’t need such modules.
12.1.1 SE-aware library modules
These will be implemented as shared libraries that will be linked against the client when the
client is compiled. Header files will specify the prototypes for the functions provided by
these libraries. For simplicity, separate header files for the information, control, and data
parts of the API will be provided.
12.1.2 Compatibility library modules
There are three different ways to get a client to use the SE library rather than the one it
expects. These are, in decreasing order of severity:
 recompiling: if the source code is available, the client can be recompiled to use the SE
library;
 relinking: the client can be re-linked with the SE library;
 preloading: the SE library is preloaded before the client is run [Appendix V], possibly by
a wrapper program.
The first option is a possibility, but obviously only if the source code is available. It is the
method used to make a program that uses standard UNIX file IO use RFIO instead (see
Appendix III). The second option is possible on architectures that support dynamic linking
(as most UNIX architectures do). The third option, however, is the most attractive since it
requires no modification to the client program. The SE compatibility library is preloaded by
the dynamic linker, and the symbols provided by the SE library are chosen instead of those
provided by the standard linked libraries (again assuming that these are linked dynamically, of
course). The client accesses what it believes to be the standard C library file API, but it
actually calls the SE compatibility library and accesses files through the SE. Linux and
Solaris, for example, support this method and we have demonstrated this [see Appendix V].
A wrapper program can be used to preload the compatibility library for the client.
Running the client through a wrapper program, which sets up an environment in which the
client is authenticated to the SE, can solve the authentication problem mentioned in section
7.1.2. The wrapper program could also take care of preloading the required module library.
This is similar to the Grid proxy: the user runs the proxy, the proxy sets up the environment,
and then the user is authenticated (i.e. any program that needs authentication information asks
the environment rather than asking the user).
Note that if the client uses LFNs as filenames, then the compatibility library will have to look
up or construct the corresponding PFNs and TFNs.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 37 / 41


12.1.3 Compatibility network modules
Clients who access these modules are typically written to use some specific existing protocol.
It might be a GridFTP client, or perhaps a web browser. A web browser would use the HTTP
protocol with authentication over a SSL/TLS connection; and files could be made available
for up- or downloading via an HTML interface.
12.1.4 Authentication and encryption
Common to all the top layer modules is the requirement that they must provide a security
interface to the SE core. This part of the Session API allows the client to authenticate itself to
the SE, and to request that all further communication be encrypted. We will implement the
security API through one or more of the following standard APIs:
 GSSAPI (RFC2743): Generic Security Services
This service is also used by Globus 2, albeit indirectly.
 TLS (RFC2246): Transport Layer Security.
In particular, the security system must support delegation. For example, a RM should be able
to grant a client read access to a file for replication purposes.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 38 / 41


13 APPENDIX V RELINKING
In this Appendix we sketch briefly how file access calls from an existing client can be
redirected through a library without modifying or even relinking the client, as described in
section 7.5. Suppose we have a program main which accesses files through the standard C
library API. Suppose also we have written a shared library libf.so (located in the current
working directory) which redefines those functions. Then we can make the main program
use our library using the following method:

% export LD_LIBRARY_PATH=`pwd`
% LD_PRELOAD=libf.so:libdl.so ./main

The LD_PRELOAD on the same line as the program means that the environment variable is
set only for this particular command (if it was set in the shell’s environment then the library
would be preloaded every time the shell runs a command). Not all shells support this, though.
The libdl.so library is necessary for handling dynamic loading on some architecures; this
is only necessary if libf.so uses dynamic loading (dynamic loading can be used to access
the standard C API from the library).
Whether the LD_PRELOAD method works on a particular architecture depends on the
dynamic linker (the program that links dynamic libraries into executables). It is known to
work on Linux, NetBSD (tested on NetBSD-Sparc), and Solaris (tested on SunOS 5.8). The
example code we have written, though, only works on Linux at the moment.
Alternatives to relinking
There are alternative methods that allow programs to access an SE through the standard
UNIX file API, although they are arguably tricker to implement than a preloaded library. The
first is a Virtual File System (VFS) that translates the calls into calls to the SE. A VFS
implementation already exists for Linux (of course not one that talks to an SE). The second
possibility is to modify a NFS server to translate file access into calls to the SE. (Modifying
an existing NFS server could be simpler than changing the VFS implementation.)
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 39 / 41


14 APPENDIX VI XML
We chose XML as a basis for our internal message passing with in the SE primarily because it
is human readable. This makes testing and debugging the system far easier. It is expected that
our XML files will be small and contain only control information around the SE.
XML is a very flexible system as it is a text representation of a tree structure. XML is said to
be extensible as new elements to the tree can be “grafted” as needs arise with out affecting the
interface of other applications requiring access to the XML file. We intend to include
versioning as a simple attribute to the elements. Versioning our XML allows modules to be
added to the SE as well as upgraded in a transparent manor while the system is running. It is
also possible to transform XML via XSL to cope with changing message formats although
this is costly in processor resources and not a long-term solution. With large XML data files
we have found that some XML viewing applications are useful. We do not expect our XML to
reach sizes where such tools are necessary.
14.1.1 XML programming interfaces.
XML is a platform neutral format and the variety of text encoding used by computers to
represent text files including UNICODE and UTF-8 are all handled transparently. XML
programming interfaces come in two types. The Simple API for XML (SAX) is a good way of
scanning through XML and triggering events on tags and data. On top of this interface the
Document Object Model (DOM) has been created which allows an in-memory representation
of the tree structure of the XML to be created.
14.1.1.1 SAX
http://www.saxproject.org/

The SAX API is now in its second version, originally a Java only API it is now well
supported in C, Perl, Python, and in many other languages that support events or callbacks.
Multiple implementations of SAX exist for C. SAX is typically used to parse XML
documents where speed is important and little modification to the XML is required.
14.1.1.2 DOM
http://www.w3.org/DOM/
The Document Object Model for XML is specified by W3C and, like SAX, is available in
multiple implementations, DOM allows easy walking of XML trees and useful facilities for
creating and modifying XML. Java is the best-supported language but implementations in
C++ are now mature. Many other languages support the DOM including Perl and Python.
14.1.1.3 Expat
http://expat.sourceforge.net/
Expat by James Clark appears to be most reused SAX implementation.
14.1.1.4 Xerces
http://xml.apache.org/
Xerces was initially developed by IBM then donated to the XML Apache project and looks
promising as a SAX and DOM implementation.
Further experimentation with these tools will be needed before we conclude which
implementation we shall use.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 40 / 41


15 APPENDIX VII CLIENTS
As described in Section 3, WP5 has chosen to treat all its clients in a uniform way; the
differences between the clients will be managed through the SE configuration. The clients
are:
 The end-user (or programs run by the end-user, interactive or batch-jobs).
 The Resource Broker and the Job Scheduling Service (WP1).
 The Replica Manager (WP2).
 The SE administrator.
Their Use Cases and the requirements arising from those Use Cases are listed in Chapter 4.
The advantages of treating all of the above as just one Client are as follows:
 The SE can cope with changing Grid policies (cf. chapter 8). For example, we can allow
only RMs to access files or we can allow also end-users (as in the Use Cases in section 4)
to access the files using command line utilities. This policy is not hard-coded into the SE
but can be changed simply by changing the SE configuration. As far as the SE
implementation is concerned, there is no difference between an RM and an end-user.
 Different sites can have different policies. For example, a site may say: “Our SE is read-
only”. Such a site would treat the above clients differently from a site where clients are
allowed also to create files. WP5 must not hard-code such policy into the SE. The above
approach allows such local policies to be specified easily for all clients.
 Many requirements (chapter 4) are the same from one Use Case to the next.
 Ease of implementation and maintenance. As an example, all clients can access the SE
using the same Grid networking protocols and Grid security. Since the implementation
doesn’t distinguish between the clients, it only has to consult the documentation to see
what a given client is authorised to do.
Of course, the SE be configured with “reasonable defaults”, i.e., where the above clients are
allowed to do the things one would normally expect: the administrator has full rights on the
system, the RM can read files and create replicas, end-users and jobs can create files, etc.
Thus the SE administration programs can have shortcuts like “allow a new RM to access
files” which enables the SE to recognise a new certificate and give it default permissions
suitable for a RM; the administrator can then grant or revoke this client other rights if
necessary.
Doc. Identifier:

DataGrid-05-D5.2-0141-3-4


ARCHITECTURE AND DESIGN
WP5 - Mass Storage Management
Date: 14/02/2002



IST-2000-25182
PUBLIC 41 / 41


16 APPENDIX VIII MODULAR DESIGN
The benefits of a design based upon small easily isolated and well-tested subsections are well
known [ref R13]. This form of system design is often called modular software. A modular
system is designed to reduce the interdependency between its constituent parts. The
combination of discrete modules with little interdependence provides the following benefits:
 Faster development: Many of the modules may be written in parallel, with each developer
only needing to know the interface the module must support and the purpose of the
module. In a larger system this also means less time spent compiling in the development
cycle since only the changed module need be recompiled and relinked, not the entire
system
 Simpler testing: Each module is initially tested independently, therefore modifications to
the system do not require unchanged sections of the system to be re-tested. This reduces
over-all testing time and allows for a simpler testing environment because each module
will have a smaller set of features to test and the tests can be run on different modules
concurrently. A complete system test will be necessary since unforeseen interactions
between modules of a system can produce unexpected results. However, less testing will
be required than in a monolithic system.
 Ease of maintenance: By using independent modules both module replacement and
upgrade can be much simpler.
 Simplified Debugging: Locating errors in small programs is easier than large programs
because there are less options and locations where the program errors could exist. It is
also generally easier to isolated a problem to specific module than a specific portion of a
large program and it is easier to introduce inter-module communication logging, than
instrument a large program.
 Redundancy: Allows multiple implementations to be developed when an optimal solution
is not obvious. Redundancy will be designed in where the final solution cannot be decided
yet or where there are conflicting requirements to be met.
 Mix and Match: Development can occure in a variety of languages or tool sets so the best
approach for a single task can be found.
Modular systems do have disadvantages. The modular architecture often forces the
complexity of a system into the communication between modules. Simple consistent
interfaces between modules can reduce the complexity greatly, for this reason all of the
modules that comprise the SE should communicate using simple interfaces that are consistent.