GridPP2 Planning Document

kneewastefulΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

295 εμφανίσεις


GridPP Project Management Board

GridPP2 Planning Document

Applications: Experimental Grid Interfaces

Document identifier :






Document status:



R Barlow, N Brook



The applications posts released by the GridPP project have led to a widening adoption and use of
Grid technology by the experiments that benefited from them. In the next phase we hope to build
on this success, and extend it to experiments currently un

This period will see Grid usage move from prototype to serious production systems, and from the
relatively simple applications like centralised Monte Carlo simulation to the more varied tasks
involved in analysis jobs by individual physicis

We therefore wish to continue to deploy posts to the experiments during GridPP2, at a modestly
increased level from the present number. The additional posts would be targeted at activities which
did not gain any posts in the previous round, or whic
h have arisen since that time.

Reports by the individual experiments follow. The experiments granted posts in the first phase
present brief accounts of what they have achieved (full accounts have been given in the GridPP
quarterly reports) and cases for
their continuation. Cases for additional posts are also presented.

These reports present a rich and varied picture of Grid activity of many different types, across the
whole range of Particle Physics. The Grid is becoming an established part of computing

and the experiments are being quick to grasp at the opportunities offered for increased access to
facilities and more effective ways of working. This involves the creation and deployment of a lot of
middleware, and a remarkable amount has been
and is being produced. We want to continue this
remarkable activity, knowing that the longer it continues at this stage the greater the payoffs will be.

The GridPP project envisages that it would put out a call for these continuations and additions
d January 2004, with a response by April. This would follow the system used for the previous
phase of the project (see
) with full peer review

of the continuations and the new post applications by the PPRP.



ATLAS’s current activities are driven by the need to meet UK EDG commitments, and have
concentrated mainly on the integration of EDG software into ATLAS production scripts. This
cise has proven beneficial to both ATLAS and the EDG activities, revealing design flaws and
helping refine the user requirements. It is an activity that must clearly continue with the various LCG
and EGEE releases. The UK is driving the use of the EDG tool
s in the reconstruction phase of our
Data Challenges, and is about to run its own Grid reconstruction tests within the UK, using locally
written solutions to problems with the EDG file access, replication and cataloging. This effort has
also contributed in

a major way to the general running of the Data Challenges (an explicit
deliverable within the EDG). ATLAS UK are collaborating with colleagues in the Trillium and
NorduGrid projects, integrating experimentally agreed tools and helping in their validation

The integration effort and the Data Challenge production, in particular, have had large contributions
from outside GridPP. It should be stressed that the Data Challenge effort has only been realised
through the work of the GridPP
funded ef

Work has also been ongoing in the area of installation tools, to allow easy deployment of software
to Grid fabrics. This area of interest has been put in abeyance while the LCG decides general
policy on code management and deployment tools. This, h
owever, remains an outstanding
problem to be resolved before the objectives for a true Grid will have been reached. The installation
effort has now been redirected into early attempts to allow user analysis on the Grid.

The UK has been leading the creati
on of a Grid
enabled Monte Carlo production system through
the GridPP program. It has successfully created a prototype Monte Carlo production system,
AtCom, in liaison with CERN. This incorporates the techniques and lessons learned in the EDG
integration a
ctivity, and has been used for parts of the Data Challenge production. The UK is
responsible for all of the plug
ins that deals with the various batch and Grid systems. AtCom also
acts as a test
bed for components being developed for the final Grid interfa
ce, GANGA, which is a
UK responsibility. The Monte Carlo post has also contributed ATLAS Monte Carlo
application handlers to the GANGA project.

The ATLAS Monte Carlo post will continue to evolve the system, using AtCom as a test
bed for the
t 12 months, but as GANGA becomes more mature, continuing the development in that context.
Job splitting and forking will be incorporated in the GANGA context. At the end of the 12 months, a
preliminary version of the Monte Carlo production system will exi
st, but there will be a continued
evolution along with GANGA, with a major new version planned in the following year. The Monte
Carlo post will also contribute towards the user analysis capability on the Grid, taking particular
responsibility for the fast
simulation (ATLFast) user
defined analysis classes.

The integration and Data Challenge effort will continue in the next 12 months, addressing the new
versions and subversions of the LCG
1 release. It is already likely that these releases will still
re significant new development, and it is clear that the LCG project will only move forward
through the integration and testing efforts from the experiments. At the same time, the Data
Challenges will proceed, driven by both computing and other demands. Da
ta Challenge 2 will begin
in the last few months of the existing posts, and will attempt to make significant use of Grid tools.
based analysis should also begin as the initial project ends. This will require significant generic
and experiment

developments, and it is proposed that a common theme to be adopted in
the extension of the Grid posts be the development of that Grid based analysis framework, as
outlined below.

While anticipated in the GANGA planning, and already underway at a low leve
l, the development of
the Grid
based user analysis represents a logical and essential continuation of the activity in the


first phase of the project. It will require user code to be deployed with the job and run in a suitable
time environment, and revi
sits the installation tools activity. It also has requirements for analysis
outside of the Athena/Gaudi framework, from Fortran/C++ analysis on ntuples, through root (or
similar) analysis in batch on POOL files through to interactive analysis on distribute
d datasets.
Aspects of this work will fall under the LCG PI project and are generic (but will require UK input and
ATLAS involvement as a major client), but many other parts fall in the experiment
specific domain.
The strong requirement for debugging tools

that work in the Grid environment similarly has aspects
that are generic and others are require a knowledge of the experiment software, code management
and run
time environment.



is a UK driven project that is providing a Grid interface for
the ATLAS and LHCb
experiments at the LHC. The motivation of this joint project was that both experiments are using a
common software framework that allows for an area of cohesion. Following a GridPP
international workshop that brought together m
any of the Grid experts from the ATLAS and LHCb
projects (both from Europe and the US), the GANGA project developed to have participation not
only from the UK but also from CERN and from the US, University of Chicago, Indiana University
and the Lawrence Be
rkeley National Laboratory. The project is actively supported by the software
architectures of both ATLAS and LHCb and, as such, US colleagues are pursuing funding to further
support GANGA under the wing of the recently submitted ITR proposal. This is an e
proposal, from US LHC physicist and computer scientists, recently submitted to the NSF.

GANGA is key to the UK Grid developments in both experiments. In particular, both experiments
are developing Grid
aware Monte Carlo production systems that wi
ll be integrated with GANGA,
and the activities in the various projects are mutually supportive.

Figure 1: Schematic of GANGA software bus

GANGA is built around the idea of a Python software bus shown in Figure 1. The current
developments have concentra
ted on delivering a user GUI that will allow the configuration of an
ATLAS or LHCb application. The GUI philosophy is similar to that of Microsoft’s Outlook Mailbox
with folders tracking the job as it is processed and progressors. The interface allows a jo
b to be
prepared for submission to local resources (LSF & PBS batch systems as well as a simple process
fork) and to the EU DataGrid. A number of application handlers have been developed to allow



particular applications to be used through the GANGA interfa
ce. In addition it is possible to access
a configuration database to allow pre
defined options to be used with the application.

Work is ongoing to further develop the Grid interface aspect of GANGA. In particular it is planned to
expand the functionality
of GANGA to allow for the translation of high
level event data specifications
into Grid aware files and replication of datasets. It is also planned to integrate the experiment
specific metadata systems to allow user
browsing of the metadata collections and

to build new
collections though user
defiend searches on metadata keys. It is envisaged to increase the
provision of monitor facility to display progress and history of submitted jobs as well as to provide
information with regards estimation of resource c
osts. In the next 18 months, a first production
release of GANGA will be made available which will incorporate the Monte Carlo production
systems of both ATLAS & LHCb.

By summer 2004, it is planned that the CERN LCG project will move into the pilot mode o
(defined as 50% of the components required for CMS or ATLAS) to allow distributed end
analysis. It is only at this stage will it become apparent what the planned services of the LCG will be
for the production phase of January 2005. This phase
of the project will be used to prove that the
LHC computing model will work. GANGA will need to develop and track the rapid changes that will
be ongoing. By that stage, the relationship between the LCG and any EU activities through
framework 6 funding (EGE
E) will have become apparent.

GANGA will need to develop into a Grid
enabled physics analysis desktop in time for first data
taking in 2007. There are technical issues that must be resolved to allow the easy Grid
use of user
defined analyses. These are p
artly addressed in experiment
specific bids, with more generic
aspects addressed here. However, as at present, the experiment
specific activities will support the
generic GANGA activity and vice versa. The GANGA project itself will incorporate data access,

resource monitoring and interactive visualisation and analysis tools to support the physics desktop.

Access to a data/metadata catalogue browser will be required to allow the physicist to find
collections of data, and to define new collections based on
defined search patterns. This will
be dependent on the LCG POOL (persistency framework) project and on the experiment metadata
and file replication systems. A fully functional release of POOL is not due to March 2005.

Using a network performance mon
itor, the physicist will be able to customise and optimise data
movement from the dynamically provided information. This coupled with a computation resource
browser, selector and monitor will be highly desirable for the early stages of the Grid environment

for analysis.

To enable a physicist to ensure enough disk space is available, a storage resource browser would
be invaluable. Enhancements to monitoring capability developed in the initial phase of GANGA will
be essential for a physicist trying to debug

his/hers application in the distributed environment.
Further, Grid debugging tools will have to be interfaced to GANGA, allowing variable inspection,
pointing and eventually thread
reversal on designated target nodes where problems have
been encount
ered. It is hoped that many of these tools will be developed in the context of LCG
and/or middleware projects but it is essential these be integrated into the uniform desktop
environment for the end

As stated, early deliverables of the next phase of

the project would include a sophisticated network
performance monitor coupled with a computation resource browser, selector and monitor and
replication facility. This will build and expand on the current planned GANGA functionality. Also a
data/metadata b
rowser allowing for access to pre
defined data collections would be incorporated.

Later deliverables would expand the metadata browser to allow complex searches and new
collections to be user defined. The later emphasis would be to fully expand the project

into a Grid
enabled physics analysis desktop addressing some if not all of the issues raised above. It is
estimated that the continuation of the current GridPP posts over the years leading up to first data
taking matched with the current level of unfunded

support activity and the experiment
activity requested elsewhere would be able to meet those needs.



LHCb UK has played an important role in the design and deployment of the new LHCb Monte
Carlo production system called DIRAC (Distributed I
nfrastructure with Remote Agent Control). This
has grown from the original system based on a single application controlled from each local centre
into a complex distributed application running at 19 different sites and averaging 830,000 events
per day. Thi
s is equivalent to the simultaneous use of 2,300 1.5GHz computers. A large amount of
expertise has been built up from the installation, testing and running of this system on the Proto
1 Centre at RAL and 6 other sites within the UK.

The current PVSS
control system has been adapted to meet the 2002/2003 series of Data
Challenges, in which over 40 million events have been processed for the optimisation of the
experiment. In addition, a review of this system has been written and the requirements for a Gr
based system have been established and evaluated against existing monitoring middleware

Development of a new Grid
based monitoring system is now underway and a Work Flow Desktop
is currently being written. This will include a Module Editor, Step Diag
ram Editor, Workflow Diagram
Editor, Production Monitor and Resource Browser, which also provide a toolkit for the GANGA
project for analysing large datasets spread over many files. It has always been planned that the
control/monitoring system would become

fully integrated into GANGA during 2004, but steps are
already underway to combine the common parts of the two projects. The idea is to combine the
personalisation of the single
user GUI in GANGA with the robustness and scalability of the DIRAC
system int
o a client
server architecture with full monitoring and control capability. For reliability
reasons, the job
submission is currently designed around a "pull" submission ideology rather than
the "push" concept usually associated with Grid technology.

experience with the current distributed system, it has become clear that the development of
sophisticated applications software can only be completed and optimised once the full range of Grid
middleware is available and understood in a large
scale applicat
ions environment. This complete
suite of submission, monitoring and control software will need to develop into a fully Grid
system in advance of data taking in 2007 and this is dependant on a complete Grid environment
and services being available f
rom the LCG project. These will only become apparent as the LCG
production phase is approached in January 2005. In addition, it is expected that there could be
considerable overlap with the PI (Physicist Interface) project of the LCG and it will be importa
nt to
unify contributions from all of these projects. In particular, the development of the sophisticated
application monitoring tools will be essential to allow the debugging of an application in the context
of a Grid. The continuation of this GridPP pos
t up to 2007 is vital to ensure that the requirements of
both the LHCb production system and the physicist
interface (through GANGA) are met.

A major feature of Gaudi design is the philosophy that physics algorithms should act on transient
data and not o
n data objects in a persistent data store. A set of services is therefore required to
populate the transient data store from persistent storage and vice
versa. During 2002, this work
was incorporated into the POOL persistency framework of the LCG.

The UK

work has therefore moved into the area of contributing directly to the POOL project with a
view to then providing the required services for GAUDI/Athena using standardised LCG packages
where provided. A first prototype of the LHCb file catalogue has been
designed and implemented
which allows for both reading and modification of the catalogue. A GUI has been provided by the
UK for POOL, and can be used to browse the file catalogue. Methods to browse the catalogue by
both logical filename (LFN) and physical
file name (PFN) will be introduced.

The V0.4 release of POOL in March 2003 contains three catalogue implementations based on
XML, MySQL and the EDG file catalogue. In this development release, POOL has also become
integrated with other LCG projects, such

as SPI and SEAL. This will introduce dependencies, but
will provide a more powerful common environment. The first production release of POOL is



planned for June 2003 and this will be deployed over the summer to provide a service level capable
of meeting t
he 2004 data challenges by November 2003.

As the provision of data and meta
data services will play a vital role in the applications exploitation
of the Grid and this lost ground has to be made up. In addition, it will not be until POOL has been

in the full production environment of an experiment that all the requirements will be
understood. It is therefore important that experience gained from the 2004 Data Challenges is fully
exploited in the final design, writing and optimisation of the persi
stency framework. This is best met
by extending the current post up until the point that the experiment is ready for data taking in 2007.


We have made a significant contribution to the large
scale CMS Monte Carlo production
programme. Production is no
w at 10TB/year volume, and has taken place at UK T1 and T2
centres. We have played a major role in the development of grid interfaces for production software;
an important element of our contribution was participation in a full, extended stress test for CM
production software running on the EDG testbed; over 8000 jobs were submitted via the Grid. This
test resulted in a great deal of useful feedback both to CMS and to the EDG project. CMS will soon
begin a more ambitious set of data challenges, including ‘
DC04’, the month
long operation of a
25% scale worldwide computing system. In the UK, we will execute tens of thousands of jobs to
produce ~50TB of simulated and reconstructed data at T1 and T2 sites. Implementation of web
portal for MC production and anal

that will enable CMS users will be able to flexibly submit
production and analysis jobs via a web interface. This will lead to initial work on a Grid
analysis interface for CMS users in the next 12 months.

CMS UK have carried out an evaluatio
n of the large
scale data management issues surrounding
the early deployment of UK T1 and T2 centres for CMS. This has led to an ongoing set of tests,
under production conditions, of all data
handling components required for the implementation of a
pe regional centre, and for data management on the worldwide CMS computing system.
Important achievements include the large scale automated replication of files to and from mass
storage systems at Lyon, CERN and FNAL, and the key discovery that the current

of the Replica Catalogue software is not suitable for a production Grid. Another aspect of data
management has been the testing and integration of a new basic object persistency service
(POOL); this work has been carried out within the LCG

After production
scale deployment
and stress testing of the POOL product as part of LCG, we will begin the task of integration with
Grid data management, monitoring, workload management and fabric management systems.

The UK have integrated RGMA
into the CMS production management system This subtask
provided a GRID enabled monitoring system integrated into the CMS job submission and tracking
system “BOSS”. The EDG software R
GMA was used to provide producers and consumers of
monitoring data. The s
uccessful use of R
GMA here has lead to plans to deploy it in other roles in
the CMS computing environment, including bookkeeping for all stages of Monte Carlo generation.
The integration of the CMS COBRA framework with the Grid, via RGMA, will continue. W
e will
increase the robustness of the system under real production conditions, and investigate the use of
RGMA to handle other forms of metadata.

The GridPP2 activities will be a natural growth and extension of our GridPP work, at a greater level
of scale
and complexity as we approach LHC startup. It is essential that we can achieve a
production environment, which will seamlessly use the distributed resources for forthcoming CMS
data challenges. We will integrate our effort at the four sites (representing e
lements of distributed
2 and Tier
1 centres.) This builds upon and links our work on analysis frameworks, monitoring
and data management is a clear "Grid
Interface" area. It links into the data challenges by
monitoring data quality in almost real

The User Interfaces

work will continue towards a full Grid
enabled simulation, reconstruction and
analysis system. Deployment of Grid
enabled CMS software will take place worldwide, with
seamless interoperation between different implementations of the b
asic Grid services. High
user interfaces will be developed to allow the transparent use of the CMS computing system for
both scheduled and ‘chaotic’ reconstruction and analysis use. This will build on upon both the tools


and experience developed wi
thin the GridPP project, and upon new middleware services that will
be developed before GridPP2.

The scope and scale of CMS data challenges will increase rapidly during the GridPP2 period. Our
GridPP work has shown that traditional approaches to large
le data management fail at these
scales, as do naive Grid
based approaches. We will continue to develop and test new approaches
to Grid
based data storage, migration, caching and replication, as required; the may include the
deployment of “intelligent” aut
omated data management systems. The work will take place in the
context of ever
increasing computing and network resource at regional centres worldwide, which
we will need to exploit a high level of efficiency to meet CMS goals.

CMS UK will undertake the
support, maintenance and development of the RGMA/CMS code base,
providing enhancements and extending the range of applications in the CMS computing
environment. We will maintain our close links with the UK developers of monitoring middleware and
work with
them to verify, validate and improve the performance of their product. We will extend our
approach to provide metadata management for other areas, such as calibration and conditions
data and handling of multiple data sets, an area of great importance for t
he final LHC computing

Early deliverables will include: the analysis of DC04 results to understand the issues surrounding
‘chaotic’ use of the Grid for analysis and reprocessing (workload management); Basic integration of
POOL object persistency
layer with Grid services (data management); design of Grid
metadata services for calibration data (metadata).

term goals will be: the deployment of a full user interface to the CMS Grid
based analysis
system (workload management); deploymen
t of automated full data management services for data
and metadata, in support of the Grid
based analysis system (data and metadata management).
This will culminate with the final
scale deployment and testing of the CMS Grid computing system
worldwide, in
preparation for LHC startup.


phenoGRID is a new virtual organisation dedicated to develop the phenomenological tools
necessary to interpret the events produced by the LHC. Historically, members of phenoGRID
have produced a wide range of vital t
ools such as the shower Monte Carlo HERWIG and next
leading order QCD Monte Carlos such as DISENT, JETRAD and DYRAD. We expect to build on
these applications to develop newer and more sophisticated codes for use by LHC
experimentalists. By integrat
ing into the GRID environment we expect that these tools will ensure
that the GRID supports the functionality we require and that the tools themselves are “gridified”.

The UK members of phenoGRID are fully involved in the phenomenology of the LHC, and are

playing a significant role in the organisation and execution of intenational activities such as the Les
Houches and the CERN Monte Carlo workshops this summer. The IPPP organises several
workshops annually dedicated to the phenomenology of the LHC. The

UK also plays a leading role
in European Networks such as QCDnet, Physics at Colliders etc.

Most tool development is currently limited to single CPU's with “inal runs” distributed across a
processor farm. There is some activity within ATLAS by the Camb
ridge group to test the
functionality of the GRID though the production of tree matrix elements using MADGRAPH.

The first stage for phenoGRID is the establishment of
hardware supporting a Grid user interface

the IPPP in Durham to enable members to access the GRID. Subsequently, we expect to develop
applications that test the functionality of the GRID. For example, the evaluation of one
loop matrix
elements for multiparticle processes rely on dividi
ng the various loop integrals into multiple sectors,
each of which can be numerically determined, for each phase space point. Keeping track of the
data from each sector, and subsequently combining the validated data is the sort of task that the
GRID shoul
d be able to do. By developing the applications in the GRID environment, we both test
and benefit from the GRID. Feynman diagram generation and evaluation is another application
that requires similar functionality. Splitting calculations across the GR
ID and then recombining


them should allow a quantum step up in phenomenology tool development and we expect to
develop GUI interfaces to enable the transparent creation, storage, management and validation of
the intermediate data.

We have contacts with me
mbers of GRIDPP at Cambridge, Edinburgh, Glasgow, Manchester and
elsewhere, with the e
science centre in Durham and the regional centre in Newcastle. The sort of
data production, manipulation and validation is similar to that of QCDgrid and we have made
ontact with them. Durham IPPP is also potentially part of the Tier 2 efforts in either SCOTGRID or


UKQCD have a working data grid for secure and convenient distributed storage/retrieval of QCD
data (mainly lattice gauge configurations)
. UKQCD members at pilot nodes can routinely access
data for diverse projects such as flavour singlet physics and proton decay. The GridPP effort has
been supplemented with personnel from several UKQCD institutes. UKQCD have initiated a world
wide lattice
QCD grid project involving major collaborations in the USA, Japan and elsewhere. The
middleware was developed in
house using the GridPP
funded effort, is relatively lightweight, easily
maintainable and built directly on top of Globus 2 running on Linux 7 o
r 8. Interoperability with EDG
middleware is planned.

UKQCD are working on extending the nodes of the grid (currently Edinburgh, Liverpool, Swansea
and RAL) to include Glasgow and other sites of UKQCD groups as well as possible collaborators
abroad. In p
articular the plan is to install a QCDgrid node in Columbia, New York to inject data from
the prototype QCDOC machine into QCDGrid. They will also address Phase 2 of the GridPP1
project whose deliverables are:


Web portal for data access and local job subm
ission, and


UKQCD grid portal for data and distributed jobs

UKQCD have initiated a more detailed requirements specification for job submission and will re
assess the suitability of EDG middleware for its implementation. It is also planned to extend the
L schema (QCDML) to cope with a wider range of QCD data, including several varieties of

The prototype QCDOC machine (1 Tflop sustained) is now expected to produce the first data at the
end of 2003. The full 5 Tflop machine is expected in mid
2004. Therefore a further task jointly facing
their application programmers and grid technical staff is the implementation, within the new
software, of automatic generation of meta
data marked up in QCDML alongside configuration
generation. This functional
ity is generally known as

. Some generic open source tools
for this exist. These are currently only in Java, so we plan to investigate strategies for adapting and
merging these with C++ software.

The main activities foreseen during the period

2007 are the compute grid, data federation and
data binding. The
job submission component of the compute grid will be further developed under
GridPP2. Effort will focus on further integration with the EDG software and on expansion of the
range of sys
tems and codes available to QCD physicists on the grid. It is planned to cooperate
other UK HEP groups (including phenomGRID) and to share HEP resources at Tier 1 level and
Tier 2.
A standardized set of web services will be utilised to provide easier
access to the QCDgrid
replica catalogue. The standards will be agreed through the international ILDG collaboration,
allowing unified access to all public lattice QCD data. A public
access web portal will be created to
provide a simple interface to the un
derlying web services.

It is planned to develop enhanced data
binding techniques for more widespread automatic production and assimilation of meta
data. The
early deliverables will include demonstrating automatic XML mark
up within main QCDOC
production co
des and remote browsing and access of world lattice data collections across 3
continents. Ultimately the aim is to have compute grid solutions for quark propagators on 2
architectures using chained scripts and access to non
UK resources, via a Web portal.



In the or
ginal GridPP1 program,
BaBar was awarded 2.5 posts to work on

Metadata Management

based Job Submission


One GridPP funded position

is re
sponsible for the bookkeeping system used for BaBar data

system which has to keep track not just of simple run numbers but on the processing and
calibration systems used, on the versions of programs performing event stream/skim selection,
and, for simu
lated data, on the type of data generated and the version of the Monte Carlo programs
used. There are many million events spread across tens of thousands of files, some unique and
some existing as multiple copies. Matters are further complicated by keepi
ng Objectivity and non
Objectivity versions of the same data.
he software tools
have been
maintained and

doing this (skimData) and
the capabilities
being extended
to provide an interface suitabl
e for Grid
jobs, which included deploying an

compatible replica catalog.

paper presenting this work
was selected and presented at CHEP 2003.

he job submission post

has worked as par
t of

team with
D0 and middleware personnel
and has



job submission system to the BaBar software environment.

prototype system
running at IC and RAL, and is in the process of


to other UK
sites. This involves not just writing software but ensuring that the sites involved run appropriate
systems (e.g. versions of Red Hat, globus, and edg releases.) The sites and their representatives
are working well together o
n this and good progress is being made.

BaBar UK

are on track to roll out a production version of this metadata access and job submission
(the two are closely linked) this summer, and to extend its functionality and user
robustness thereafter. The


posts would enable these very significant
gains to be exploited and expanded and move

to the point where
physicists use the Grid as
the standard job submission system (both for
private analysis and for production work) due to its
clear superiority over standard ways of working. This will be particularly important as the UK Tier 2
sites build up and make significant hardware resources available.

he other GridPP funded postholde

is working on the handling of persistent data within the Grid, a
half post shared with CMS. He has produced a report, as his first milestone, on the implications of
the BaBar decision to move away from Objectivity to a ROOT
based sy
stem. In the long term this
(courageous!) move will benefit BaBar by bringing it into line with the way the rest of HEP is going.

benefit greatly from having someone in this area who is also on one of the LHC
experiments, and the insights that bri
ng. (We hope they also benefit from his working on an existing
experiment with real data and its real problems.) The handling of persistent data is a problem that
has yet to be fully addressed in a Grid framework, and as such systems evolve (for BaBar and

LHC experiments like CMS) over the next few years
BaBar UK

will really need someone dedicated
to this particular topic.

Another part of BaBar Grid effort has come from
a post outside of GridPP funding
This post was

for the JIF funded hardware, the PC farms at the university sites which have brought
about the UK’s leading position for simulation production in the collaboration.
It resulted in


Grid systems to these farms, particularly for Monte Carlo Generation, adapting the existing pre
system to a Grid based one. Unfortunately this JIF funded post comes to an end in December
2003. A replacement for this po
st, with the job specification of adapting BaBar simulation
production to a grid
based system and thus maximizing it, making use of the existing JIF farms and


the new Tier 2 facilities, would be valuable not only in its own right but also in providing a hi
demand use for a BaBar Grid system that would drive the collaboration (through the above posts)
into developing and implementing it.

We would accordingly like to request the continuation of the present 2.5 FTEs for a further 2 years,
with addition of a

post for Grid based MC production.


The DØ experiment was awarded two posts to work in close conjunction with the SAM project
based at the Tevatron. We had two main goals:

Enhancement of the SAM
Grid to become a fully
fledged computational Grid.

o develop a user interface (mc_runjob) to the Grid to allow DØ collaborators to make full
use of the Grid.

The contributions from the posts funded by GridPP,
have been extremely sig
nificant. The DØ team
has met all its deliverables to date on time or ahead of schedule.

carried out the successful
integration into SAM of GridFTP as standard file transfer protocol and is leading the effort to migrate
from Unix user ident
ification to the use of grid certificates.
In addition, it

has led the development of
mc_runjob at DØ to act as the standard interface for MC production on the Grid. In November 2002
DØ and CDF successfully demonstrated a prototype SAM Grid with ba
sic functionality. By May DØ
expects to have implemented the first production version of the SAM Grid at five sites. At this point
in time Monte Carlo Production within the UK should start running on the Grid using the mc_runjob
product as the Grid interfa

Over the next eighteen months the SAM Grid project will continue to develop and implement
improvements so that we have a Robust SAM
based grid service for DØ that is being
used by all
collaborators in a transparent manner. Steady improvements will be made to accept Grid CA’s,
improvements in monitoring, as well as dealing with issues of scaling as the number of
users/processes increases steadily. As the data taking at the Te
vatron progresses the need for
data reprocessing for specific physics analyses (such as CP violation studies) will require off
reprocessing and reconstruction. The runjob packages capabilities will be expanded to act as the
interface for all data repr
ocessing as well as generic user jobs making it a standard interface to the
Grid for all DØ users.

The SAM/DØ program involves several collaborators. SAM is now being used by the CDF
collaboration as a solution to its data handling problems. SAM works wi
th PPDG, the Condor team,
to continue to develop etc. etc.
Any one else… European groups…
The mc_runjob software, which
was developed at DØ, is now the standard MC production tool for the CMS collaboration, so now
support an even larger user community. A p
roject to integrate and merge the experiment specific
versions as much as possible has commenced at FNAL.

Deliverables: Need 2005 and 2006 deliverable any suggestions?

2004: The completion of a robust SAM
based grid service for DØ.

2005: Runjob as web/grid

service providing sophisticated job control and monitoring services for
the DØ and CMS experiments.

2006: Interoperability between SAM and LCG.

request the continuation of the posts currently funded by GridPP.

will continue to work
on the development of SAM core functionality and services to meet the changing needs of the
Also it

will take over the responsibility for maintaining mc_runjob for the DØ
collaboration. In addition
it would be beneficial for the UK to work

on the integration of the
mc_runjob packages between the CMS and DØ collaborations
need more words here



The CDF experiment was given two GridPP posts in order to carry out t
he following primary tasks:

Converting the CDF data handling system to accommodate SAM, a data handling system
already in use and under development by the D0 experiment.

Aiding in the integration of SAM to make it compatible for use on the Grid.

Using thes
e tools to supply a Monte Carlo resource on the grid for FNAL and CDF in
particular as an important use
case for the SAM

This activity was considered so important that PPARC agreed to a change in direction for some of
our funds from the Joint Infrast
ructure Fund to support and supplement these efforts. We are
pleased to be able to report significant progress in this project as a direct result of the GridPP
funded and unfunded effort that has been expended so far. We are therefore asking for continued
support for our two positions for a further 3
year period beyond December 2004.

The CDF experiment was in a data handling crisis prior to the introduction of the UK effort in 2002.
The previous system was based on a highly centralised data service model.

A central machine,
staging area, and tape storage facility were planned to be the first, and only, direct access to the
data from the user. Any data replication by groups external to F.N.A.L. was done on a purely

basis and was generally biased in f
avour of groups with the best and most dedicated IT
support, although groups with people in key positions within the CDF data handling system also
enjoyed a higher level of access than the average user.

Thus far we can report that we have completed one m
ajor task that our groups have taken on, that
of migrating CDF to SAM. Significant manpower for testing, integration, and process monitoring for
the standard user are still required, but SAM is now an integral part of the CDF code framework.
This success h
as garnered the appreciation of the collaboration far in excess of the quite small
relative size of the UK groups on CDF.
The UK team

continue to make substantial contributions to
e data handling effort.

The integration of SAM into the Grid has made significant strides forward. Condor has been
accepted as the primary job submission platform and Gridftp is being implemented for SAM file
transfers. Several dependencies on FNAL specif
ic data caching systems have been removed in
favour of a more generic set of middleware. An effort has also started to change F.N.A.L. policy
regard the acceptance of Grid CA so that newly discovered security issues can be addressed and

The M
onte Carlo generation tools are essentially complete and integration into SAM is nearly
ready, however this project depends on more effort than can currently be supplied by the UK
groups alone, so we are delayed by priorities at FNAL which are keeping thei
r local experts from
implementing our latest changes. When this occurs (expected soon) a period of testing will ensue
prior to full release to the collaboration.

Several new developments have taken place in the o
verall structure of the Data Handling group at
CDF which have positioned the UK at a high level to implement Grid
aware SAM tools across the
Laboratory and not just in CDF and D0. Dr. R. St. Denis has been appointed co
leader of the SAM
project with T. Wy
att. This project is seen by Dr. V. White (Associate Head of Computing Division at
FNAL) as influential in standardising the data model for all experiments at FNAL and will be
instrumental in at least making their data distribution model compatible with th
e Grid with an
ultimate goal of full compliance. As co
leader of data handling and of SAM, Dr. St. Denis is in a
position to oversee more than 25 FTE's from 6 funding agencies to realize SAM in CDF and D0
and to integrate SAM with Grid. The GridPP funded
posts were key in putting Dr. St. Denis in this
position and their efforts will be required to fully implement the Grid compliant version of SAM in the
future (after 2004).

It must be said that we expect the Grid and Globus software itself to be a moving

target during the
up to the LHC era. We will certainly need our SAM
Grid experts in order to support any
DataGrid changes and make the necessary modifications to the SAM framework to accommodate


or exploit them. We emphasize that this effort must come

from experts in the CDF, SAM, and SAM
Grid software packages; as support of the user community during a running project must be
immediate. Consequently we welcome the opportunity to keep the body of expertise, which has
already been accumulated.

In addi
tion there are issues associated with data production (where the raw data is spun through
the CDF code and given over for final distribution), and also potentially, even the data coming
directly from the trigger, which may require more than simple
ad hoc

olutions to integrate properly
with the Grid
compliant version of SAM. These issues have been put on hold while the main tasks
of SAM integration are being implemented, but will need resources well past the December 2004
for these tasks.

As SAM is migrat
ed further into the Grid new issues arise. SAM was designed to be a scalable
system and is fulfilling that promise, but is starting to accommodate far more users than ever
before. The sharing of resources in SAM will be the next logical step to take and sh
ould borrow
heavily from the Grid (which has resource management designed in from the beginning). In
addition to borrowing from the Grid, the real
world expertise gained from the use of SAM in the near
future should be brought to the LHC computing model wh
erever possible. These new tasks will be
increasingly important from 2006
2008 as the LHC begins to take data and experience its own real
world computing environment.

Currently we have 2 funded posts from the GridPP effort. This manpower is the critical m
needed to allow CDF data handling to go from its current system to the SAM system. We therefore
bid to continue these posts
h are essential to

fully support Dr. St. Denis in his new position as
leader of the Data H
andling group and help with the full migration of SAM to the Grid.



TARES is currently building a high
energy neutrino telescope in the Mediterranean Sea. The
computing model for ANTARES involves all data being passed from the detector, situated at
2400m depth, 40km off the coast of southern France, along an electro
l cable to shore. On
shore fast data filtering will take place and filtered events will then be reconstructed. It is expected
that of the order of 1 Tb/week of data will be filtered for reconstruction.

By Grid
enabling the entire ANTARES reconstruction sof
tware framework (including reconstruction,
selection and filtering) the experiment stands to benefit significantly from improved throughput of
data and reduced processing times. Specifically, we envisage interfacing the current ANTARES
database system (use
d internally to store calibration and event parameters) to equivalent Grid data
storage resources and implementing the appropriate tools for resource management. The net
result would be to improve data quality (since more sophisticated offline data filteri
ng should be
possible with improved resources), enhance data throughput and to facilitate better access to the
data from remote sites.

Clearly the whole ANTARES collaboration will benefit from this approach. Furthermore, improved
and more efficient access

to data will strengthen proposed data sharing strategies between the
energy neutrino, high
energy gamma ray and cosmic ray observatories which play a central
role in the proposed HEPDO network forming part of the ApPEC Integrated Infrastructure Initi
seeking substantial EU funding from Framework 6. The sharing of such data will facilitate multiple
wavelength, “multi
messenger” analyses and alert systems for a whole host of experiments.

One of the predominant backgrounds to be understood in the A
NTARES detector is that from
multiple muon events which arise from primary cosmic ray interactions in the atmosphere. Since
the rate of cosmic rays is very high very large samples of events need to be generated to simulate
accurately even a few days of rea
l data
taking. These multiple muon events are generated using
the CORSIKA package, co
authored by a UK physicist. This package, which is used by many
particle physics and particle astrophysics groups worldwide, is well suited to conversion to Grid
n and previous experience gained with EDG tools will be greatly beneficial in this respect.

Interfacing CORSKIA to the Grid in this manner will enable large datasets to be generated and
stored efficiently. This is particularly important in those energy reg
imes where large event samples
are required to perform accurate background studies. Large CORSIKA event samples thus
generated are also likely to be of interest to experiments other than ANTARES such as those
studying cosmic rays, high energy gamma rays an
d even some accelerator
based experiments.

The principal areas to be addressed during the two year application interface post would be:

interfacing the ANTARES data and calibration databases with existing Grid data handling

establishing approp
riate authentication schemes (certification, etc.);

This is not expt

just need to set up a VO???

access to reconstructed data from other HEPDO partners;

inclusion of ANTARES data into fast, global, astrophysics alert systems

The milestones envi
sage are

as follows:

Month 6
: Initial interfacing of ANTARES databases to Grid. Review of CORISKA interface issues,
optimisation of data access (Replica Catalogue, etc.).

Month 12
: Prototype interfacing of ANTARES reconstruction software using simple auth
First tests of CORSIKA
Grid interface.

Month 18
: EDG tests of Grid
enabled ANTARES software framework using pre
existing datasets.

Month 24
: Final system live on ANTARES production datastream. Authentication from multiple sites
and data access
from experiments other than ANTARES. Final implementation of “CORSIKA


The intention would be to deliver a first porting of the software framework (reconstruction,
databases, simple authentication) after 12 months and to have a fully interfaced sys
tem (including
data filtering, global access, optimised resource usage) after 24 months.



Mice is a worldwide collaboration, the aim
of which is to create a feasible method for cooling muons
for the neutrino factory and muon collider. The experiment will be based at the Rutherford Appleton
Laboratory and the UK is responsible for provision of the muon beam and surrounding
. In addition the UK physics groups are responsible for the tracker for the experiment
itself. The goal is to begin to take data with one liquid
hydrogen absorber in 2006.

Rutherford Laboratory will require a Tier 0 centre as far as MICE is concerned, and
it is here that
MICE could provide a significant opportunity for GridPP2. For other experiments, Rutherford is a
Tier 1 centre. Significant financial resources are flowing from the UK into CERN in order to ensure
the smooth operation of CERN as a Tier 0 ce
ntre. There is an undeniable case for the UK to
provide such funding, so that the UK may fully exploit the LHC. But, there is a danger that the
knowledge and expertise gained in the operation of the CERN Tier 0 centre will not be returned to
the UK. Worldw
ide access to MICE data must be provided, and in the process of accomplishing
this knowledge of Tier 0 operation will be applied and disseminated locally. Since MICE will not
require any special facilities, over and above those required by the LHC, we for
esee a small team
being able to effectively implement and maintain a Tier 0 centre

within 2 years
, exploiting the CERN
model and CERN supplied software. Once such a facility was available it could provide the basis
for a unified method of distributing data

from Rutherford to users of ISIS, and in the future
DIAMOND. We would request one person as a contribution to such effort.
In the shorter term, t
person would also be available to provide a focus for grid enabling the Monte Carlo software for
ese two tasks complement each other well, in that the Monte Carlo work to optimise the
channel will need to be available well ahead of data taking, while the data distribution from the
Rutherford laboratory will only be needed once the experiment starts ta
king data.

Mice is too young for the experiment to have made significant contributions to GridPP, however a
number of the UK members of the MICE collaboration have been involved in GridPP projects.



In the next two years the volume of data generated by UKDMC detectors operating at the Boulby
Mine in North Yorkshire will increase by more than an order of magnitude to >5 Tb/year. The
Collaboration proposes a second major increase in the number and size

of operational detectors in
the period 2005
2007 leading to a further factor ten increase in both raw data rate and required
Monte Carlo statistics. It is clear that the current UKDMC offline computing model assuming use of
alone Linux workstations
for data reduction and Monte Carlo simulation will require
substantial evolution in order to cope with these developments. The ideal solution is to grid
offline software so that compute
intensive jobs can be run on all the computing resources availa
to UKDMC institutes both within the UK and overseas. Not only will this speed the turn
between data acquisition and analysis, giving improved efficiency for fault diagnosis, but it will also
disseminate grid
computing techniques to the wider par
astrophysics community, through the
strong links which exist between UKDMC and other dark matter collaborations. Uptake of grid
techniques has thus far been most pronounced among the accelerator
based experiments and this
will help to redress that im

In preparation for increased data rates a number of changes to the reconstruction and simulation
software framework have already taken place. Detector simulation is now carried out with generic
tools such as GEANT4, FLUKA and MCNP, rather than pro
prietary codes. The intention is to run
these simulations at collaborating institutions over the grid and with this goal in mind a team

is studying the interaction between G4 and EDG tools. An early highlight of th
is project
was the successful remote submission of UKDMC ZEPLIN
III simulation jobs to the RAL Tier
farm, reported
at the May 2002 GridPP Collaboration Meeting. The Collaboration has also recently
become involved in the CERN IT/API DIANE
project (
) for
oriented component
based distributed parallel data simulation and analysis, working
directly with the project co
nator, Dr. Jakub Moscicki (PPARC CERN GRID Fellow). Potential
benefits from this collaboration for both DIANE (through access to a ‘real
world’ use
case) and
UKDMC (through efficient processing and simulation of dark matter data) are considerable.

ort is needed in order to see these projects through to their conclusions, but the benefits both to
UKDMC and to the wider G4 and grid user communities will be considerable.

Reconstruction and analysis of UKDMC data is currently carried out by several suit
es of software
including ZAP (C) and UNZAP (C++). These codes take digitised waveforms as input, together with
data from an offline calibration database, and generate ntuples or ROOT
files containing detector
shape estimators for interactive analysis
. The codes can currently be run in batch mode on
single machines or farms but will require extensive modification in order to permit grid
using e.g. EDG tools. These modifications will however be very similar to those implemented by
other collab
orations (file
name specification, interaction with the replica
catalogue, distributed
access to an online conditions database etc.) and it is expected that given sufficient effort this
project can be completed in advance of acquisition of first data from
scale detectors such as
MAX (Spring 2007).


to deliver

with 2 FTE/year for 2 years, the following

Month 6
: Design document for interface of existing software with EDG tools.

Month 12
: Prototype interfaces between G4/FLUK
A and ZAP/UNZAP, and EDG tools.

Month 18
: Fully functioning online UKDMC detector conditions database.

Month 24
: Fully functioning UKDMC software framework permitting submission of G4, FLUKA and
ZAP/UNZAP reconstruction jobs to multiple grid
enabled Collab
oration computing resources.



The ZEUS e
xperiment is currently taking data at the HERA II accelerator, the only operational
energy accelerator in Europe. The data set which will be accumulated during the next few
years of running is expected to reach the peta
byte level in terms of stor
age, providing new
challenges in the areas of data reconstruction, reduction and analysis as well as in Monte Carlo

There are two main areas in which ZEUS has the potential to exploit and test Grid technologies, the
first of which is in the are
a of Monte Carlo production. Currently, ZEUS processes its Monte Carlo
events using an early prototype computing grid, known as Funnel. Funnel uses approximately 250
machines distributed amongst 13 sites around the world and processed approximately 240 mil
events during the year 2002. Funnel was developed in the early '90s and was custom
built to
accommodate the specific needs of the ZEUS simulation. Since its creation Funnel has been
continually adapted to keep pace with the globally changing computing

landscape. With the
increasing convergence on Grid technologies within HEP, moving Funnel to Grid authentication
and resource management is the natural next step in the evolution of Funnel. However, due to the
legacy protocols and specialised control syst
em, this would require significant effort for which
additional manpower, in the form of two Application Interface posts, is sought.

The second area is data analysis. The ZEUS experiment currently uses a farm of 40 dual
processor PCs for physics analysis wh
ich use the dCache system for data handling and LSF for
resource brokerage. The aim here would be to allow users to access the analysis farm remotely, as
well as PC farms outside of DESY to work as part of the ZEUS analysis system. There is already
rable expertise in the technical aspects of the Grid at DESY: the dCache system (one of the
official mass storage systems for the Grid) was developed by DESY in collaboration with FNAL.

The project envisaged for these posts would be primarily aimed at grid
enabling the Funnel system,
although many of the goals of the project would also be beneficial to the conversion of the analysis
system. The milestones envisaged are as follows:

6 months: Produce architecture design document;

12 months: Commissioned tes
tbed tailored to ZEUS needs;

12 months: Authentication strategy established;

18 months: I/O layers adapted to suit the Funnel system;

During second year: Development of resource management concepts and monitoring tools;

2 years: Roll
out of production v
ersion of Grid
enabled Funnel system to the Collaboration.

In the case that only one post is available, then milestones 1
3 will be achieved during the two year
period, with an attempt to integrate Grid authentication tools into the existing funnel system
. Since
this is a ZEUS
wide project vital to the collaboration, there is a significant chance that some further
effort will become available from other collaborating countries, in which case the full programme
might still be achievable even with one post f
rom the UK.

This project would result in significantly increased Funnel capacity, as well as allowing the analysis
of ZEUS data to continue long after the experiment ceases to take data. Much of the experience
gained during the project would also be useful

in Grid
enabling the ZEUS analysis system. The
combination of all these factors will improve the physics performance of the whole Collaboration.
The ZEUS Collaboration would be able to contribute the experience already gained over many
years of operating
a production level Grid
like system to the HEP community. The results of this
project would also offer new opportunities to gain experience with applications critical to a running
experiment and a chance to evaluate the performance of Grid middleware quant
itatively against the
existing infrastructure.



The CALICE collaboration is studying calorimetry for a future linear collider. It is a worldwide
collaboration of around 140 physicists in 24 institutes from 8 countries, spread over Europe, US
and As
ia. The major aim of the collaboration is to design a calorimeter system, both
electromagnetic and hadronic, which can give excellent jet energy resolution at linear collider
energies within an acceptable budget. Following on from earlier work, it is now
vital to the future
success of the project to perform a detailed, systematic comparison of the state
simulation of detector response with test beam data. The testing and subsequent tuning of our
simulation to agree with data is scheduled for the

next three years and is the basis for this bid.

The beam test data are expected to be taken with several particle types and energies for each of
four candidate calorimeter technologies. A sample of around 1 million events will be needed for
each combinat
ion of type, energy and hardware technology, with a total of around 100
combinations resulting in around 10^8 events. Each event will be around 30kBytes so the total
event sample is expected to be of order 3 TBytes. The beam test is scheduled to take place

throughout the second half of 2004 and all of 2005. Handling and enabling the analysis of these
events is part of the aim of this bid.

The major aim of the collaboration is to tune the simulation and therefore Monte Carlo samples
using different models,
each with several parameter choices, will be required for each sample of
real data. As each simulated sample should be larger than the corresponding data sample, in
excess of 10

shower events will have to simulated, although these will not necessarily all

needed at the same time. To obtain the detailed simulation needed, the simulation is run with low
values for the particle step cut
offs and each shower event currently takes between one and ten
seconds, depending on the energy, on a 1 GHz Linux PC. Thi
s is equivalently to between 10


showers per PC per year. The simulation dataset will begin to be needed by the end of 2005
although much of the simulation will be required to be produced throughout 2006 as the tuned
parameters will not be known un
til this time. Setting up the Monte Carlo simulation to use the Grid
and then produce these large samples is also part of this bid.

This project is to bid for a post of 1 FTE/year for 2 years, for Grid simulation, storage and access
for CALICE data. The
postholder would convert the existing CALICE shower simulation to be Grid
compatible and then to use the Grid to produce the large simulation dataset required, ot order as
new parameter sets are determined. The data, both real and simulated, will total ar
ound 10 TBytes

at any one time and will need to be stored at some location, with both RAL and the LeSC being
possibilities as hosts. Access to these data via the Grid for analysis would also be required, both for
UK and for foreign collaborators.


Month 12: CALICE shower GEANT4 Monte Carlo running in a Grid environment

Month 18: Start production of large GEANT4 Monte Carlo dataset

Month 24: CALICE beam test and simulation data accessible through Grid



The Linear Collider


and Beam Delivery system collaboration comprises several
university groups interested in Research and Development for a future Linear Collider. It builds on
existing activity and hopes to achieve a lot more, in response to the resurgence in accelersator
and D in the UK, and the impending call for proposals.

Accelerator science involves intensive computing simulations, to understand the non
linear effects
that have to be considered and understood in the new generation of accelerators: wake fields,
rbunch and intrabunch dynamics. The physical interaction of beam particles with matter
(spoilers and monitoring systems) needs to be properly modelled by a simulator containing a full
description of physics processes, such as Geant4.

The high computing
power and time required by these detailed simulation requires that means be
found to exploit available computing resources, and to save the results produced so that the same
simulation does not have to be run several times. The Grid provides the toolkit fo
r doing this,
through the Resource Broker in the first instance and the metadata catalog system in the second.
We have here a rapidly increasing area of activity which will seize on the opportunity to exploit Grid
technology. It is a similar but not ident
ical set of requirements and use cases to the familiar
experimental detector simulation: the user might want to process bunches of particles with various
parameters (offsets, emittences, etc) through a particular optics system, or they might study
in the optics system, or in the modelling assumptions used by the program. A useful run
may take several CPU
hours even with modest bunch sizes.

ABD groups have a wide experience with the various simulation programs on the market

PIG, TURTLE, GEANT and others. These are used for somewhat
different purposes

some are appropriate for studying backgrounds, for example, and others are
specifically for studying the development of the main bunch. There is no universal simulator
on the market. However they can in some cases be used together, the output of one
forming the input of another. The desired system must be able to cope with this variety

We seek a post which would be able to take the descriptions of accelerator simulati
on using the
descriptive terms used by the accelerator physicist community, to adapt these to a Grid interface
system (GUI portal or otherwise) to interface them to a Resource Broker, and to devise a metadata
catalogue system for the storage of the results

in a way which is meaningful and useful to the


Resource Broker interface for accelerator physics

Grid based (i.e. world wide) Metadata catalogue for accelerator physics