GriPhyN Project Planning Document

flameluxuriantData Management

Dec 16, 2012 (5 years and 7 months ago)



Technical Report GriPhyN


GriPhyN Year 2

CMS Project Plan

Draft Version

26 November


Developed by members of the GriPhyN Project Team

Submit changes and material to: Mike Wil
de, editor



Two of the major scientific softwareThis project plan details CMS activities for GriPhyN project year 2, in which
we will integrate GriPhyN research results in virtual into CMS

simulation production, and begin to experiment with
applying virtual data concepts to the CMS analysis problem.

For information on CMS work for Years 3
5, and for how this plan fits into the overall GriPhyN plan and the
activities of other GriPhyN experim
ents, see the GriPhyN Overall Project Plan.


CMS GriPhyN Goals for Year 2

The high
level goals for year 2 of the project are:


Show the utility of GriPhyN technology as a basis for enhancing the robustness and reproducibility of
distributed computing, by int
egrating grid components more deeply into the CMS production software, with the
results of the integration being used either in real production or in challenge demos. [UF taking lead, heavy
interaction with PPDG, US production, FNAL]


Gain and demonstrate t
he commitment of CMS to the evaluation, testing, integration, and use of GriPhyN


Create a testbed in which both GriPhyN and PPDG CMS activities can be conducted.


Forge joint activities and tighter coordination between PPDG, European Data Grid
, Teragrid, and GriPhyN (as
CMS is part of all four grid projects).



Apply GriPhyN virtual data research to CMS simulation production. Deploy the virtual data mechanism and use
it to provide automated production as well as a GriPhyN laboratory.


Integrate t
he virtual data catalog into an important production application, and demonstrate the benefits of
detailed data derivation tracking and large
scale data re


Instrument and measure the use of virtual data and request planning mechanisms to gather
data and feedback for
further CS research.


Apply preliminary GriPhyN research results and execution planning and scheduling mechanisms (for example,
using DAGman) to CMS production.


Start exploring the CMS analysis process, creating prototypes of GriPhyN
ased analysis systems that can lead
the way to live science use in project year 3.


Create compelling new demos, always available, and runnable via the web, for outreach and for SC.


Use CMS activities for education and outreach (move to activity section: by

creating case studies, giving talks
to emerging (and smaller) science projects; talking to under
funded institutions about how they can leverage
technology resources in other grids; offer live unused grid cycles to small institutions, even in a demo conte
allow access to cms simulation data and grid tools to small physics institutions)

Technical goals: The following are technical proof
concept goals for the GriPhyN CS research areas in project
year 2:


adjust GDMP/GridFTP/RFT interfaces, architectural

relationship, and integration for maximum application


get a production catalog infrastructure in place (including catalogs for replicas, virtual data, application


test a deployment of the replica location service (RLS)


Permit virtu
al data requests entered at any location to be executed at many locations


deploy a scalable relational database solution for replica and virtual data catalogs


understand CMS data model dependency tracking issues


explore the use of scripts to generate virtu
al derived data specifications


understand object
level dependency tracking issues


create and deploy rudimentary execution planners


get rudimentary policy control mechanisms in place (at least for disk space, and to some extent for CPU)


deploy effective log
ging and monitoring mechanisms to perform grid job logging and tracing


CMS GriPhyN Project Year 2 Activity Overview

The CMS activities for year 2 will consist of testbed development, integration of VDT technology into CMS
simulation prodiction, and the pro
totyping of using VDT technology in the CMS analysis process. These are
described in the sections below.


Create a GriPhyN Deployment Testbed for CMS

Several grids will be involved in GriPhyN Y2 CMS activities:

GriPhyN Test Grid
: a shared GriPhyN experiment
al grid to be used by all experiments, for the initial stages of
prototyping. This is where virtual data software and the VDT is first developed. It includes AFS and ability to swap

USCMS Production Test Grid
only experimental testbed to be shar
ed by GriPhyN and PPDG, to be used by
CMS collaborators to prototype mehanims to run CMS production and distributed analysis.

In the USCMS Grid, sites will take on roles as follows:

Tier 1: FNAL


Tier 2: CIT, UFL, UW

Tier 3: U of Chicago, UCSD


LHC Computing Grid
: Eventually both NSF Teragrid machines and LHC computing grid machines
should be integrated into the USCMS production grid. In PY2 we will lay the project management plans to make
this happen. For the Teragrid, at least, this

will involve porting and testing efforts to enable CMS production
software to run on IA64 architectures and on different versions of Linux beyond Red Hat 6.2.


Integrate GriPhyN Virtual Data technology into the CMS Monte Carlo simulation production

The main GriPhyN CMS effort for Y2 is to integrate virtual data and request planning mechanisms into the CMS
Monte Carlo Simulation Production system (for which current US CMS plans includes the production use of tools
such as IMPALA, MOP, and BOSS). As p
art of this effort, we need to bring together several production job
management mechanisms into a single one, applying GriPhyN VDT components for tangible benefits to CMS and
research feedback to the GriPhyN project. GriPhyN Grid technologies that we inten
d to integrate into this
mechanism include replication, replica location service, reliable file transfer, the Community Authorization Service,
and prototypes of the GriPhyN Job Execution Planner.

The benefits to CMS of this effort are: easier, more automa
ted job submission; easier recalculation of data products;
accurate tracking of data derivation; ability to re
derive latter stage outputs of the production data pipeline without
the recomputing of the earlier phases, in cases where the later
stage process
ing programs require changes. We intend
to highlight ease of grid usage as a major benefit to CMS, especially in the automated handling of failures and
complex grid configuration and usage details. We also intent to explore how to effectively utilize the G
subscribe paradigm within CMS production.

The benefits to GriPhyN of this effort are: live testing of a fundamental GriPhyN concept in intensive production for
real users; user feedback on the value of the virtual data paradigm and usability of

the tool set; measurement of
virtual data process effectiveness; capture of live logs of the detailed activity and resource utilizations of the
production process for further CS research.


Prototype the application of GriPhyN Virtual Data technology to the

CMS analysis process

In this activity we will begin to explore the vital later
phase CMS process of “analysis”

the combing of massive
numbers of events for the signature patterns that offer supporting evidence of the various theories of nature being
died. Once CMS is taking live data from the detector, analysis activities will be the dominant use of computing

In a typical analysis, a physicist would select events from a large TAG database, gather the full event from various
sources, and cre
ate(reconstruct) those events not already existing on some "convenient" storage system. The process
would work something like this:


Search through 10^9 TAG events (~1 TB), select 10^6 events and get the full event data for them (~2 TB


Do a "new improved"
reconstruction pass, make a new set of analysis objects (~100 GB) and TAGs (~1 GB)


Analyze the new AOD and TAG datasets interactively and extract a few hundred signal events.


Histogram the results and visualize some of the "interesting" events.

We seek to
create virtual data techniques to track data dependencies for the files and/or objects in this process from
tag schemas and tag databases or tables back to the reconstructed event sets and possibly back to the raw data.

We will build tools for this type of

driven fine
granularity physics dataset extraction and transport over the
grid, driven by an easy
understand interface that reduces the difficulties of marshalling distributed grid resources.
This effort will exploit newer collection management fe
atures developed by COBRA/CARF team at CERN.

We expect to work further on exploring the impact of end
user physics analysis workloads on the grid system, by
prototyping distributed end user analysis tools, demos, and pilot facilities.


The GriPhyN CMS Tea





James Amundson

Lothar Bauerdick

Dimitri Bourilkov

Rick Cavanaugh

Greg Graham

Koen Holtman

(Iosif Legrand


(噬慤im楲 i楴i楮

? )

䡡evey 乥睭an

o慪敳e o慪am慲


(Conr慤 却敥pb敲g


g敮s 噯散k汥l

i C䵓

䝲楐hy丠䙎䅌 C䵓

䝲楐hy丠商i C䵓

䝲楐hy丠商i C䵓

mm䑇a䙎䅌 C䵓

䝲楐hy丠Cfq C䵓




Condor 啗




mhys楣is琻⁍佐 䑥a敬ep敲




C䵓Mmhys楣is琻⁉䵐䅌A 䑥v敬ep敲

C䵓MCompu瑥t 卣楥i瑩st




C䵓MApp 卵pport



VDC / VDT development

Injection of CS research topics into C
MS plan here

sequence, when, where, how? Or move some of this
explanation to the master plan.


Project Year 2 Timetable Overview

This section presents a high
level overview of GriPhyN CMS activities and milestones for project year 2. Full
details are cont
ained in the associated Microsoft Project plan document.

Testbed timetable




VDT 1.0 Release

VDT 1.0 Installation on GriPhyN Testgrid machines

Develop testgrid certification tests

Verify testgrid functionality

Test Grid R

Develop USCMS Grid Plan V1 (for Q1
Q2 2002)

Circulate plan for feedback


USCMS Grid Plan V1 Approved

USCMS Grid Machines allocated and/or purchased

VDT 1.0 installed on USCMS Grid machines

Verify USCMS Grid V1 functionality

USCMS Grid V1 rea

: VDT 2.0 Release (includes first VDC/VDL)

Merge ATLAS, LIGO and Sloan machines into GriPhyN Test Grid (tentative)

Production Plan for USCMS Grid production schedule drafted and circulated




: Production Plan for USCMS Gri
d production schedule approved by US
CMS management.

Upgrade Testgrid to VDT 2.0

Upgrade USCMS Grid toVDT 2.0

Capacity increase for USCMS Grid

Production Software V1 installed on USCMS Grid


Production Software V1 testing




CMS testbed upgrad
ed to VDT 1.0

Progrid established

Increase automation functionality

USCMS testbed plan v1

USCMS: n machines at UFL

USCMS: n machines at UW

USCMS: n machines at FNAL, CIT, etc;

USCMS: MOP app suite checkout (certification) test

USCMS: Mop in use by UW phys

CMS Simulation Production Timetable




Productoin Software Test Plan Drafted

Production Software







VDL 1 design document

IMPALA & MOP design documents

VDT Design meeting I

Design for convergence I

VIMPALA Design meeting II

Design for Convergence II

Virtual Data Catalog integration into CMS simulation production framework (MOP)


dovetail with CMS production schedule (an agreement and planning document)


in use at uw, then uf, then fnal, then o
ther T1 sites


available for more ad
hoc simulations (user
requested; other research groups (eg, uw, ufl)


want to have a major production with DAGMan


validate MOP results

Integrate w T2 wbs plans

CMS project review checkpoints

CMS certificati
on tests

CMS design approval


Design meetings

Determining role of Impala

Publishing design

Making MOP changes and integration

Friendly user testing

User I/F design

Database selection

Database deployment

Document: detail all logging data that will be recorde

Get logging and measurement plan in place

CS research activity: Perform data regeneration tests; measure speed and accuracy

Analysis Prototyping Milestones




No activity planned for this quarter







design of da
ta dependency model (based on Koen’s papers)

integration of a vdc into a clarens prototype

GTR for grid requirements for user analysis

Advanced Planner and Policy Functionality

Prototyping Milestones




No activity planned for this quarter







CAS in place in TestGrid

CPU sharing policy prototype

Storage sharing policy prototype

Refined DAGman language for end
user job submission


Simulation Production Challenge Problem Specification

Revise this section in terms of f
eature sets / separate feature sets from challenge problems

Virtualization of Monte Carlo production in the MOP high
throughput framework. Actual production, at least
at Fermilab, could use the virtualized MOP framework. Other sites will be able to bo
th replicate and materialize data
products produced under Fermilab control.


Sites will be able to replicate early initial files in the simulation pipeline and materialize the final files in the

Later in the project year, an automated planner coul
d make decisions about where to execute simulation runs.

OI: what logic/features/functions are needed from each of MOP, Impala, and VDC


Data derivation tracking

Experiments / demonstration (GriPhyN

only) in data reproducibility (but not for production use

Fitting features of mop into a DGA architecture and making the solution usable outside griphyn.

Request estimates stored in VDC?

See if virtual data generator functions can be used in MOP


Handling both File and OO data

Event sets?

Location of tracked dat
a stores

Since the future of Objectivity in the CS collaboration is uncertain,

The early versions of the VDC will do only rudimentary data dependency tracking of objectivity data, most likely in
the form of tracking and treating objectivity databases as o
rdinary files with some pre

and post
requirements (such as detaching and attaching of database files from their parent federations).

Later versions may explore more sophisticated forms of object re
clustering into new database files, and/or fin
grained object dependency tracking. We plan to explore (at least from a design perspective) the tracking of these
dependencies down to the level of single raw or simulated events.


Data Management

Staging and replica management; gdmp and rft integ

mine push and/or pull model for data tracking

Take advantage of replicas

File replication service.

Replica catalog / VDC coherence and integrity

Find the place for replica cataloging: First cut

distributed oracle (T0, T1, T2 each have a catalog: uses Or
replication) CERN, FNAL

Create some degree of metadata cataloging: RAW,ESD,AOD,TAG

(needs RLS in VDT layer in year of in PY3)

site cached file service

Do intelligent staging

Later in year

integrate RFT RFR; revised GDMP


Job Management

Create a

user request interface to the grid

maybe this is condorG submit

Integration and development of a work planner

Documentation of the job request process and data request process.

Heterogeneous resource schedulers? (Condor, PBS, etc?)

Unification (or co
istence) of classads and rsl, and how they will be used


Generalization of request planning

Request estimation? (at both file and job level? In this system, whats the outermost unit of request: files or jobs?

See if execution sites could be picked with some

degree of automation in MOP

make MOP requests execution

Explicit request planning

picking site of execution; data moved automatically (estimated automatically)

Automated request planning

automatically pick site of execution

, 4Q.

Interaction with site resource policy


get Catalin involved

broker use of resources without human
intervention (task: design of policy language and interpreter)

Handle executable management

transporting, tracking and execution dependencies

Intreface to/for executabl signature tracking; excutable automated building (move this part to a later PY?)

Clean failure handling and job restart; not FT but a step toward high integrity and ease of use.

Further R&D on fault tolerance and scalability, ce
ntered around the RES job execution service. [CIT team]

Work with the Condor team to develop DAGMan further, in particular its expressiveness in terms of error recovery,
with the goal of applying it to the CMS production system. [UF taking lead]



? F
or cpu, disk


Monitoring and Telemetry

Let a user know job, queue, and system status easily

Return info to diagnose job problems (steps towards grid wide reporting and exception handling)

coordinated time


for data reporting, fault detecti
on, resource utilization

esp disk space

*Work on monitoring software [CIT team led by Iosif Legrand at CERN, also work at UFL]

Deploy some results from the PG monitoring group



Disk space allocation policy; cpu alloc policy

Distinguish between Glob
al, local, and individual requests


Quality Assurance

CMS specification s for production data integrity, code integrity, testing, etc,

Test Plans

Certification Process

Identification of CMS Standards


Resolve vdata cataloging and file naming issues


CMS Analysis Challenge Problem Specification

Work further on exploring the impact of end
user physics analysis workloads on the grid system, by prototyping
distributed end user analysis tools, demos, and pilot facilities which allow end
user physicists wit
hout specific grid
training to accomplish basic physics data manipulation tasks using the grid.

Show user collection creation and
transport over the grid, driven by an easy
understand grid interface. [CIT talking lead]


Tracking the data objects of anal

Scheduling analysis resources

Remote analysis

Tracking tag database schema evolution

Clustering of cut sets

Regeneration of: reconstruction, esd, aod, tag.

Explore database issues:


database heterogenaeity


table naming and version mgmt


schema evo
lution and version magement


derivation issues for self
describing data (ie, tables with a header row)


tracking the evolution of schemas as an object

Exploring architecture and component interaction issues:


propagation and update of tag databases




distributing a more detailed analysis over the smallest/best set of sites where the necessary data lives or could be
moved to


going all the way back to reconstruction


problem Solution Development Activities

This section describes activities. S
ecion X (5?), below, presents a timetable and responsibilities for the activities.


Experiment Knowledge
base Building and Documentation Activities

Analysis of value of reproducibility in terms of data replication

Develop several revisions of :

Analsys pup
ose and general model

refer to main GriPhyN plan

Data and Application Map

Data dictionary

Tool dictionary

Data requirements spreadsheet

State model for the application: Koen’s docs, GriPhyN reqs doc; Monarc; Conrad’s remote analysis work

Develop the data

flow model for CMS production w/w

Develop data dependency model


incoporate both raw and simulated flows


incorporate reconstruction (is writeDigis the only current reconstruction tool?)

Enhance CMS web page

Explore where reconstruction needs to happen
(Vladimir: after initial phase of writeDigis

Work with the Globus team and the EU DataGrid to develop a set of long
term scalability requirements for the file
replica catalog service.

Do further work on the issue of reconciling the object and object
tion nature of the CMS data model with the
file nature of the low
level data grid services.


Actively participate in the development of a Grid architecture by reviewing architectural documents created in the
Grid projects, and by communicating architectural

lessons learned in CMS production to the Grid projects.

Publication of CMS replication requirements and the dataflow behind it; use as a model for the other experiments

Hook into CMS prardigms for module signature/identification

Build consensus/support fo
r CMS adoption of GMOP

Ensure Clarens is the best platform for building an analysis framework

Refinement of this plan:

GMOP external specification document

GMOP detailed design document (GriPhyN CMS document; meets CMS specification for official project so

Analyze the parts of MOP that perform griphyn
like operations (for example

figuring out what sequence of
operations needs to be performed.


Development Activities

The primary focus of CMS activities in GriPhyN during Year 2 will be the development

of virtual data tools for the
production of Monte Carlo simulated CMS data and virtual data tools for the analysis of CMS data. These software
packages will rely both upon existing tools from the current Virtual Data Toolkit as well as contribute new too
which are general enough in nature, into future versions of the Virtual Data Toolkit.

In support of these activities, efforts will be directed towards the establishment of a catalog infrastructure including:
Replica Catalogs (RC), Virtual Data Catalogs

(VDC), and Meta
Data Catalogs (MD). Significant progress towards
the development of a prototype VDC has already been accomplished by Voeckler using a PostGreSQL database
coupled with a Perl interface. The catalog tracks the dependencies of data files an
d transformations between data
files. As such, it is able to regenerate any (missing or deleted) data file on demand.

While CMS does not currently use virtual data concepts, CMS has detailed several future needs related to virtual
data [GRIPHYN 2001
. Using the experience gained by integrating the VDC with current CMS production and
analaysis needs (see below), CMS
GriPhyN plans to work with CS
GriPhyN to further develop virtual data concepts
by taking the following as work items during Year 2:

Item 1: Develop and prototype the concept of "grid uploaded" files and algorithms (i.e. transformations
between data files). Such files and transformations would exist in a grid wide database ("replica catalog?") and
be distinguished by Unique Identifiers


Work Item 2: Interface the current VDC with the prototype "replica catalog" so that virtual data products can be
materialized from "grid uploaded" files and transformations. Each materialized virtual data product would
receive a UID and entry into

the "replica catalog" and/or the VDC. Platform dependencies and their relation to
virtual data product UIDs will be investigated.

Work Item 3: Develop prototypes for Consistency Management of the "replica catalog" over a grid. (Open
issue: Should we re
ly on the job to fail and provide "failure" exit codes if the expected data does not exist
thereby updating the replica catalog, or should we rely on a "consistent" replica catalog, or a combination of

Work item: Some amount of intelligence for dea
ling with Objectivity object sharing of simulation results will be
required to place this work into production.

groups at Caltech and Florida will investigate the integration of monitoring and logging.


Production of Monte Carlo Simulated Data

A tool for th
e distributed production of CMS Monte Carlo simulated data, known as MOP, is currently under
development from the Particle Physics Data Grid. MOP (which is based upon Globus, Condor
G, and DAGMan) is
loosely integrated with a set of shell scripts, known a
s IMPALA, that are currently used by CMS for Monte Carlo
production as well as the Grid Data Movement Package, or GDMP. Currently these tools do not employ virtual data


Over the course of Year 2, CMS
GriPhyN will augment MOP and IMPALA with virt
ual data tracking using the
Virtual Data Catalog. This will involve decomposing the job submission and bookeeping logic of IMPALA into
parameters and transformations which are specific to CMS and logic which is more general to batch job execution
g in the form of abstract Directed Acyclic Graphs (DAGs). In addition, the distributed job execution logic of
MOP will be embedded into the VDC to facilitate virtual data materialization in a grid environment. In order to fine
tune these concepts and syn
chronise with CMS production efforts, two challenge problems a proposed over the next

Challenge Problem 1: Produce 50,000 Monte Carlo fully simulated CMS events (including pileup) using the
VDC on a USCMS Grid Testbed (see below). This should expos
e any technical and conceptional modifications
required to use the VDC in realistic situations.

Challenge Problem 2: Fulfill one (or several) official Monte Carlo Production request(s) from CERN on a
USCMS Grid Testbed. This will demonstrate the feasibil
ity of using the VDC in "real world" CMS production

The aim of this effort is two
fold: 1) provide an ever more autonomous environment for CMS Monte Carlo
production by enabling automatic error recovery, rigourous bookeeping, and transparient
production at different
CMS grid sites and 2) provide valuable insight into virtual data concepts for future prototyping.


Remote Data Analysis

Virtual data as it applies to scientific analysis of CMS data has only recently been considered. It is current
ly unclear
whether CMS physicsts will employ a single monolithic analysis tool, or use a standarized set of analysis tools, or
even use different sets of analysis tools. As a result, CMS research into different data analysis paradigms will
continue to b
e monitored by CMS
GriPhyN throughout Year 2.

Given that CMS requires that physicists have the option of defining their own sets of data products (files, objects,
etc) for scientific analysis, it important to begin the process of tracking virtual data pro
ducts as they relate to end
user data analysis. In order to probe this and to facilitate whatever analysis tool(s) that may be used in the future by
CMS, a remote data server (known as Clarens) is being developed by Steenberg to enable analysis of CMS da
distributed over a wide
area network. Clarens is based on a Client/Server approach and provides a framework for
remote data analysis. Communication between the client and server is conducted via XML
RPC. The server is
implemented in C++ and linked to
the standard CMS analysis C++ libraries. The client can be end
user specific and
implimentations currently exist for several data analysis tools including: C++, PHP, The Java Analysis Studio, and
a Python plug
in for SciGraphica. This allows the user fu
ll access to remote CMS data via a choice of analysis

During Year 2, Steenberg plans to grid
enable Clarens by taking full advantage of the Virtual Data Toolkit
including: the Virtual Data Catalog for tracking CMS data, the Globus Security Infra
structure for authentication,
and Grid
ftp for CMS data movement:

Work Item 4: Integrate Clarens with VDT 1.0.

Work Item 5: Integrate Clarens with the VDC.

As a joint endeavor with efforts in the virtual data tracking of CMS Monte Carlo production, the f
ollowing data
challenge problem is proposed:

Challenge Problem 3: Remotely analyze 50,000 events using Clarens integrated the VDC as used in Challenge
Problem 1.

Finally, investigations into more fine grained data collections at the object level and their
relation to a VDC will also
be done during Year 2.


Infrastructure development and deployment activities: a USCMS Testbed

GriPhyN is currently building a USCMS Testbed, in cooperation with the Particle Physics Data Grid, at five
initial sites: The Ca
lifornia Institute of Technology, The University of Florida, Fermi National Accelerator
Laboratory, The University of California
San Diego, and The University of Wisconsin
Madison. To ensure
interoperability, the testbed will be based upon the GriPhyN Vir
tual Data Toolkit Version 1.0, which includes
Condor 6.3.1 and Globus 2.0.


The initial goals of the testbed will be to produce a platform which facilitates grid
enabled CMS software
development and which probes policy issues related to User
ID management

and Certificate Authorities. As the
USCMS Testbed matures, integration with the US
ATLAS testbed is envisaged followed by integration with the


Software Components

As a baseline, the testbed software will consist of the Virtual Data Toolkit and Obj
ectivity/DB. The entire suite of
CMS software is complex with many external software package dependencies and dynamically linked to shared
object libraries. Hence, the CMS software will initially be fully installed at each grid site consisting of a comm
versions of: a CERN patched libraries for the gnu C++ compiler, Anaphe, CMKIN, CMSIM, OSCAR, CARF,
COBRA, and ORCA. However, to facilitate sophisticated CMS software developement, dynamic installation of
versioned (or personalized) CMS software will b
e investigated and implimented via DAR (a "smart" tarball) as
provided from Fermilab.

Storage and Handling of Data

Initially, storage and data handling will be performed in an ad hoc way. However, as the testbed matures, the
Storage Resource Broker from S
DSC will be investigated as a possiblity for managing the storage of production data
at each site.

Open issue: how would this interact with GDMP?

Data Replication and Virtual Data

The Grid Data Movement Package via Globus, will maintain a Replica Catalog

for movement of Monte Carlo
production data.

Integrate MCAT into the environ (experimentally?

has connections to BaBar work)

SQL based RC and VDC and MCAT in same glue

(Oracle, MySQL?)


Platform Differences

Currently, the CMS software environment support
s the Red Hat 6.1, 6.2 and Solaris operating systems. To ensure
compatibility with the CMS software environment as well as the VDT 1.0, the testbed will be entirely composed of
machines running Red Hat 6.2.


Security and Resource Sharing Policies

Each CMS
GriPhyN and CMS
PPDG registered user will receive an account at each of the five grid sites. Initially,
the testbed will use the Globus Certificate Authority for the distribution of user and gatekeeper certificates.
However, as ESnet is expected to provi
de a Certificate Authority in early 2002, it is envisaged that the USCMS
testbed will migrate to the ESnet Certificate Authority.

It is expected that CMS, and the iVDGL will set resource sharing and accounting proceedures. In the absence of
such policie
s, resource sharing and accounting proceedures will be studied and implemented on a case
case basis
as needed.


Integration with other Grid Projects

[CIT,UF] Build basic services at 1
2 prototype Tier 2 centers. [
Need more info: what are “basic services
”? What
are the prototype centers? How does this relate to testbed?]

Use by other GriPhyN projects?

Are you looking to establish a reference GriPhyN testbed platform ?

Mw: Yes, I am interested first in a common GriPhyN testbed for GriPhyN research
and challenge
demonstration, and second for experiment science usage. Its clear to me that the former needs to be built from the
VDT. Its not clear to me to what extent GriPhyN can control or influence the latter.

Ruth: For CMS I believe FNAL shoul
d appear in the infrastructure development and deployment bullets.

Ivdgl and jtb connections? Connecting the 2 SC grids;


CA issues? (ala jtb?)


Detailed Timetable

Review Changes / Integrate with Rick


Dependencies, Risks, Contingencies

CMS project comittme
nt to use GriPhyN
enhanced MOP in real production.

Need expertise in CMS apps

Need information/docs on CMS data file formats and object structures; app man pages

Risk: cant convince CMS to let us insert code into its production framework

Risk of EUDG diver

Stability of the vdc

Mitig: intensive VDC test plan

Need decision by CMS on OO framework

Requires VDT 2.0

Need people who understand the CMS apps

porting, “harnessing”, and deployment over testbed. Static builds and
related issues;

Need willi
ngness of MOP team to integrate

(Koen)...MOP, PPDG, testbeds

(Koen)...could construct some dependencies here based on CMS goals above...

Explore fluid OS re
deployment within a cluster.

App porting

Uncertainty of Objy future; likelihood of transition to


Open Issues

Where reconstruction happens; what type of file type its captured in.

Effects of pileup on data dependency tracking model

Old statement: does this have any more relevance?: [CIT,UF] Complete High Level Trigger milestones and perform
s with ORCA, the CMS

oriented reconstruction and analysis software.
[More details! How does this
use VDT services? Need to make clear how this relates to GriPhyN.] MW: is this software simulation of the
hardware HLT? If so, how does it relate to
the MC production that’s part of MOP? Same, similar, or very different?

How will GriPhyN testbeds be structured?


can multiple experiments use the same testbed, or whill each have its own?


Can we create a single shared GriPhyN research testbed, separate fro
m the 1 to 4 production testbeds?

Do we need Objy virtual data tracking mechanisms

or will tracking at the level of a database file suffice for now?


Appendix: Intra
Project Technology Transfer

From CMS to other experiments

From other experiments to CMS

how how the CMS MOP GriPhyN app and/or architecture could be retrofitted to ATLAS and other experiments.

Xfer to
from PPDG, EUDG ?

Integration of GriPhyN results into Globus?



Items to move back into the original plan




Network use allo
cation driven by policy? (Beyond Y2… Maybe network aware but not network QoS controlling)

Integration with tertiary storage? (Y3)

Place in the overall project:


advanced planning and resource management; fault tolerance; resource sharing based on polic


adjustments, tuning, more into production

Demonstrate the robustness gains of the use of DAGMan in a realistic CMS production setup by doing a challenge
demo which includes at least 3 sites.

In this demo, certain times system crashes will be injec
ted to show the
capability of the system to auto
recover without human intervention.


show how research seeds are planted and the fruits grafted into the project in later years

e.g., dvelop FT
ideas now, integ in project Y3

demonstrate an effec
tive mechanism for fault recovery

Plant hooks for policy now; enhance policy language and decision making through planner in later years.

Research topic: explore notions of SLA in a grid environment



Connect GriPhyN to Other Projects

In support
of the above activities, and to give the results the maximum value to the CMS collaboration, we will need
to coordinate GriPhyN efforts with those of other projects, in particular, iVDGL, PPDG and the EU Datagrid
(EUDG). These coordination points include t
he following:

The USCMS testbed will be shared between PPDG and GriPhyN.

GriPhyN will integrate PPDG technologies (such as GDMP) into its VDT

We will seek to achieve GriPhyN
PPDG architectural consensus

We will Define and execute relationship to iVDG
L; integrate with an initial iGOC.

Use logging mechanism from joint PPDG
GriPhyN analysis
development process

In addition, the massive computational resources of the NSF
sponsored Distributed Teragrid Facility will be of great
potential value to the

CMS collaboration, if plans can be executed which make CMS tools execute reliably,
accurately, and easily on the DTF. These plans will need to address:


data transport issues


grid security (GSI and certification authority) issues


application portabi
lity and certification issues


Service Level Agreements regarding resource availability and application performance


explore new CMS computing paradigms to exploit DTF architecture


Education and Outreach

Create an always
available web
based demo


Create h
level and highly visual project summary material

Demonstrate and publish results

Make the CMS simulation and analysis prototypes available to Non
collaborating institutions?

Specific CMS E&O activities include:


Prototype the next advancements in VDT

Job Language specifying constraints n resources needed and locations of execution and storage

CAS Basic planner capable of translating policy and job specifications into an execution plan

with policy language suitable for sharing CMS resources b
etween different experiment groups