Extending the ATLAS PanDA

hesitantdoubtfulAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

210 views

Alexei Klimentov

Brookhaven National Laboratory

September 12, 2013, Varna


XXIV International Symposium on Nuclear Electronics and Computing






Extending the ATLAS
PanDA



Workload Management System


For New Big Data Applications



Alexei Klimentov

BNL/PAS

Main topics


Introduction


Large Hadron Collider at CERN


ATLAS experiment



ATLAS Computing Model and Big Data
Experiment Computing Challenges



PanDA

:


Workload Management System for Big Data



NEC 2013

9
/
12
/1
3

2

Alexei Klimentov

BNL/PAS

Korea and
CERN / July
2009

3

Enter a New Era in Fundamental Science

T
he
Large Hadron Collider (
LHC
), one of the largest and truly global
scientific projects
ever built,
is the most exciting turning point in
particle physics.

Exploration of a new energy frontier

Proton
-
proton and Heavy Ion collisions

at E
CM

up to 14
TeV

LHC ring:

27 km
circumference


TOTEM

LHCf

MOEDAL

CMS

ALICE

LHCb

ATLAS

Alexei Klimentov

BNL/PAS

Proton
-
Proton Collisions at the LHC


LHC delivered billions of collision events to the
experiments from proton
-
proton and proton
-
lead
collisions in the Run 1 period (2009
-
2013
)

→ collisions
every

50 ns


= 20 MHz
crossing

rate



1.6 x 10
11

protons per bunch


at

L
pk

~ 0.8x10
34/
cm
2
/s


≈ 35 pp interactions per
crossing



pile
-
up

→ ≈ 10
9

pp interactions per second !!!



in each collision


≈ 1600 charged
particles

produced



enormous
challenge for the
detectors and
for


data
collection/storage/analysis


Raw data rate from LHC detector : 1PB/s

This translates to Petabytes of data recorded world
-
wide (Grid)


The challenge how to process and analyze the data and
produce timely physics results was substantial, but at the end
resulted in a great success


8
/
6
/1
3

4

GRID


Candidate
Higgs
decay


to four electrons recorded


by
ATLAS in 2012.

Alexei Klimentov

BNL/PAS

Our Task

9
/
12
/1
3

NEC 2013

5

We use experiments

to inquire about what

“reality” (nature) does


The goal is to understand

in the most general; that’s

usually also the simplest.

-

A.
Eddington

We intend
to fill this
gap

ATLAS Physics Goals


Explore high energy frontier of
particle physics


Search for new physics


Higgs boson and its properties


Physics beyond Standard
M
odel


SUSY, Dark Matter, extra
dimensions, Dark Energy,
etc


Precision measurements of
Standard Model parameters

Reality

Theory

Alexei Klimentov

BNL/PAS


ATLAS Experiment at CERN

8
/
6
/1
3

Big Data Workshop. Knoxville, TN

6


A
T
horoidal

L
HC
A
pparatu
S

is
one of the six particle detectors
experiments at Large Hadron
Collider (LHC) at CERN


One of two multi
-
purpose
detectors


The project involves more than
3000 scientists and engineers
from 38 countries


ATLAS has 44 meters long and
25 meters in diameter, weighs
about 7,000 tons. It is about half
as big as the Notre Dame
Cathedral in Paris and weighs
the same as the Eiffel Tower or a
hundred 747 jets





3000 scientists

174 Universities and Labs

From 38 countries

More than 1200 students

ATLAS Collaboration

6 Floors

Bldg.40

CERN

Alexei Klimentov

BNL/PAS


ATLAS . Big Data Experiment

8
/
6
/1
3

Big Data Workshop

7

Alexei Klimentov

BNL/PAS

Some numbers from ATLAS

9/12
/1
3

NEC 2013

8

d
ata

Rate

of events streaming out from High
-
Level Trigger farm
~400 Hz

each event has a size of the order of
1.5MB

about 10
7

events in total per day

will have roughly 170 “physics” days per year

thus about 10
9

evts
/year, a few
Pbyte


“prompt” processing

Reco

time per event on std. CPU: < 30 sec (on CERN batch node)

increases with pileup (more
combinatorics

in the tracking)


simulating a few billions of events

are mostly done at computing centers outside CERN

Simulation very CPU intensive


~4 million lines of code

(reconstruction and simulation)

~1000 software developers on ATLAS

Alexei Klimentov

BNL/PAS

Reduce the data volume in stages

8
/
6
/1
3

NEC 2013

9

Higgs Selection using the Trigger

Level 1
:

Not all information

a
vailable, coarse

granularity

Level 2
:

Reconstruct events

Improved ability to reject

events

Level 3
:

High
q
uality reconstruction

algorithms, using information

f
rom all detectors

400 Hz

Alexei Klimentov

BNL/PAS

ATLAS High
-
Level Trigger (Part)

6/27/2013

Big Data in High Energy Physics

10

Total of 15,000 cores
in 1,500 machines



Disk Buffer





Grid

We really do throw away 99.9999% of LHC data before
writing it to persistent storage

Reduce data volume in stages


Select ONLY ‘interesting’ events

Initial data rate (50 ns) :

40 000 000 events/s

Selected and stored

400 events/s

Alexei Klimentov

BNL/PAS


Two main types of physics analysis at LHC


Searching for new particles


Making precision measurements


Searches statistically limited


More data is the way of improving the search


If don’t see anything new set limits on what
you have excluded


Precision measurements


Precision often limited by the systematic
uncertainties


Precision measurements of Standard Model
parameters allows important tests of the
consistency of the theory


11

Data Analysis Chain

Have to collect data from many channels on many sub
-
detectors (millions)

Decide to read out everything or throw event away (Trigger)

Build the event (put info together)

Store the data

Reconstruct data

Analyze
them

Start with the output of reconstruction

Apply event selection based on reconstructed objects quantities

Estimate efficiency of selection

Estimate background after selection

Make plots

do
the same with a simulation

correct data for detector effects

Make final plots

Compare
data and theory


~TB

~K
B

Alexei Klimentov

BNL/PAS

9
/12/2013

NEC 2013

12

Like looking for
a single drop of
water from th
e

Geneve

Jet
d’Eau

over 2+
days

Alexei Klimentov

BNL/PAS

Starting from this event…

We are looking for this “signature”

Selectivity: 1 in 10
13


Like looking for 1 person in a
thousand world populations


Or for a needle in 20 million
haystacks!

The ATLAS Data
Challenge



800,000,000 proton
-
proton interactions per
second


0.0002 Higgs per second


~
150,000,000 electronic
channels

ATLAS Data Challenge

13

Alexei Klimentov

BNL/PAS

Data Volume and Data Storage


1 event size 1.5
MByte

x Rate
400 Hz


Taking into account LHC duty
cycle


Order of 3
PBytes

per year per
experiment


ATLAS total data storage 130+
PetaBytes

distributed between
O(100) computing centers




“New” physics is rare and
interesting events are like
single drop from the Jet
d’Eau


9
/12/2013

NEC 2013

14

Click to edit Master title style

Big Data in the arts and humanities

Letter of
Benjamin Franklin to Lord Kames, April
11, 1767. Franklin warned British official what
would happen if the English kept trying to control
the colonists by imposing taxes, like the Stamp
Act. He warned that they would revolt.

The
political, scientific
and
literary papers of
Franklin
comprise approx. 40
volumes
containing approx.
100,000
documents.

Click to edit Master title style

Big Data in the arts and humanities

George W. Bush Presidential Library:

200 million e
-
mails

4 million photographs

A.Prescott

slide

Alexei Klimentov

BNL/PAS

9
/
12
/1
3

NEC 2013

17

Business e
-
mails sent per year


3000
PBytes

Content

Uploaded

to Facebook each year.

182
PBytes

Google search index


98
PBytes

Health Records


30
PBytes

Youtube

15
PBytes

LHC Annual

15
PBytes

Climate

Library of congress

Nasdaq

US census

http://
www.wired.com
/magazine/2013/04/
bigdata


ATLAS Annual

Data Volume

30
PBytes


ATLAS Managed


Data Volume


130
PBytes

Big Data Has Arrived at an Almost Unimaginable Scale


Alexei Klimentov

BNL/PAS

ATLAS Computing Challenges

9
/12/2013

NEC 2013

18


A lot of data in a highly distributed environment.


Petabytes of data to be treated and analyzed


ATLAS
Detector generates about 1PB of raw data per second


most filtered out in real
time by the trigger system


Interesting
events are recorded for further reconstruction and analysis


As of 2013 ATLAS manages
~
130
PB
of data, distributed world
-
wide to
O(100)
computing
centers and analyzed by O(1000) physicists


Expected
rate of data influx into ATLAS Grid ~40 PB of data per year in
2015


Very large international collaboration


174 Institutes and Universities from 38 countries


Thousands
of physicists

analyze the
data



ATLAS
uses grid computing paradigm to organize distributed
resources

A few years ago ATLAS started Cloud Computing
RnD

project to explore
virtualization and clouds


Experience with different cloud platforms : Commercial (Amazon, Google), Academic, National

Now we are evaluating how high
-
performance and super
-
computers can be
used for data processing and analysis



Alexei Klimentov

BNL/PAS

ATLAS Computing Model

12/9/2013

Alexei Klimentov

19


The LHC experiments rely on distributed computing
resources


World LHC Computing Grid


a global solution, based on
Grid technologies/middleware


Tiered structure


Tier0 (CERN), 11 Tier1s, 140 Tier2s


Capacity


350,000 CPU cores


200 PB of disk space


200 PB of tape space


In ATLAS sites grouped into clouds for organizational reasons


Possible communications


Optical private network


T0:T1


T1:T1


National network


T1
-
T2





Restricted communications


Inter
-
cloud T1:T2


Inter
-
cloud T2:T2


K.De

slide

Click to edit Master title style

ATLAS Data Volume

131.5
PBytes

Derived data

Raw data

Simulated

data

ATLAS data volume on the Grid sites (multiple replicas)

TBytes

Formats

End Run 1

Data Distribution patterns are discipline dependent.

(ATLAS RAW data volume ~3 PB/year, ATLAS Data
Volume on Grid sites 131.5
PBytes
)

Click to edit Master title style

How does

3
PB become
130+
PB

1.
Duplicate the raw data (not such a bad idea)

2.
Add a similar volume of simulated data (essential)

3.
Make a rich set of derived data products (some of
them larger than the raw data)

4.
Re
-
create the derived data products whenever the
software has been significantly improved (several
times a year) and keep the old versions for quite a
while

5.
Place up to 15
§

copies of the derived data around the
world so that when you “send jobs to the data” you
can send it almost anywhere

6.
Do the math!


§

now far fewer copies due to demand
-
driven temporary replication


but much more reliance on wide
-
area networks

Alexei Klimentov

BNL/PAS

Processing the Experiment Big Data


The simplest solution in processing LHC data is using
data affinity for the jobs


Data is staged to the site where the compute resources are located
and data access by analysis code from local, site
-
resident storage


However


In distributed computing environment we don’t have enough disk
space to host all our data at every Grid site


Thus we distribute (pre
-
place) our data across our sites


The popularity of data sets is difficult predict in advance


Thus computing capacity at a site might not match the
demand for certain data sets


Different approaches are being implemented


Dynamic or/and on demand data replication


Dynamic : if certain data is popular over Grid (i.e. processed
or/and analyzed often) make additional copies on other Grid
sites (up to 15 replicas)


On
-
demand : User can request local or additional data copy


Remote access


The popular data can be accessed remotely


Both approaches have the underlying scenario that puts the
WAN between the data and the executing analysis code

9
/
12
/1
3

NEC 2013

22

Alexei Klimentov

BNL/PAS

PanDA



Production and Data Analysis System


ATLAS
c
omputational resources are managed by
PanDA

Workload
M
anagement System (WMS)


PanDA

project was started in fall of 2005 by BNL and UTA groups


P
roduction
an
d
D
ata
A
nalysis system


An
automated

yet
flexible

workload management
system
which can
optimally

make
distributed resources

accessible to
all
users


Adopted as the ATLAS wide WMS in 2008 (first LHC data in 2009) for all
computing
applications. Adopted by AMS in 2012, in pre
-
production by CMS.


Through
PanDA
, physicists see a single computing facility
that is
used to
run all data
processing for the experiment, even though data
centers are physically scattered all over the world.


PanDA

is flexible


Insulates physicists from hardware, middleware and complexities of underlying
systems


In adapting to evolving hardware and network configuration


Major groups of
PanDA

jobs


Central computing tasks are automatically scheduled and executed


Physics groups production tasks, carried out by group of physicists of varying size are
also processed by
PanDA


User analysis tasks


Now successfully manages O(10
2
) sites, O(10
5
) cores, O(10
8
) jobs per
year, O(10
3
) users






9
/12/2013

NEC 2013

23

Alexei Klimentov

BNL/PAS

PanDA

Philosophy


PanDA

Workload Management System design goals


Deliver transparency of data processing in a distributed
computing environment


Achieve high level of automation to reduce operational
effort


Flexibility in adapting to evolving hardware, computing
technologies and network configurations


Scalable to the experiment requirements


Support diverse and changing middleware


Insulate user from hardware, middleware, and all other
complexities of the underlying system


Unified system for central Monte
-
Carlo production and
user data analysis


Support custom workflow of individual physicists


Incremental and adaptive software development


9
/
12
/1
3

NEC 2013

24

Click to edit Master title style


PanDA
. ATLAS Workload Management System

EGEE/EGI

PanDA

server

OSG

pilot

Worker Nodes

condor
-
g

pilot

scheduler

(autopyfactory)

https

https

submit

pull

End
-
user

analysis


job

pilot

task/job

repository

(Production DB)

production


job

job

Logging

System

Local

Replica

Catalog

(LFC)

Data Management


System (DQ2)

NDGF

ARC Interface

(aCT)

pilot

arc

Production


managers

define

https

https

https

https

Local

Replica

Catalog

(LFC)

Local

Replica

Catalog

(LFC)

submitter

(bamboo)

https

25

Click to edit Master title style

ATLAS Distributed Computing.

Number of concurrently running
PanDA

jobs (daily average). Aug 2012

-

Aug 2013

150k
-
>

Jobs


Includes central production and data (re)processing, user and group analysis on
WLCG Grid


Running on ~100,000 cores worldwide, consuming at peak 0.2
petaflops


Available resources fully used/stressed

MC Simulation

Analysis

Click to edit Master title style

ATLAS Distributed Computing.


Number of completed
PanDA

jobs (daily average. Max 1.7M jobs/day).


Aug 2012

-

Aug 2013

1M
-
>

Jobs

Analysis

Jobs types

Click to edit Master title style

ATLAS Distributed Computing.

Data Transfer Volume in TB (weekly average). Aug 2012

-

Aug 2013

6
PBytes

-
>

TBytes

CERN to BNL data transfer


time .

An
average 3.7h

to
export
data from
CERN


Alexei Klimentov

BNL/PAS

PanDA’s

Success

9
/
12
/1
3

NEC 2013

29


PanDA

was able to cope with increasing LHC
luminosity and ATLAS data taking rate


Adopted to evolution in ATLAS computing model


Two leading experiments in HEP and
a
stro
-
particle
physics (CMS and AMS) has chosen
PanDA

as
workload management system for data processing
and analysis.


ALICE is interested in
PanDA

evaluation for Grid MC
Production and L
е
adership

Computing Facilities.


PanDA

was chosen as a core component of Common
Analysis Framework by CERN
-
IT/ATLAS/CMS project


PanDA

was cited in the document titled “Fact sheet: Big Data
across
the Federal Government” prepared by the Executive
Office of the President of the United States as an example of
successful technology already in place at the time of the “Big
Data Research and Development Initiative”
announcement


Alexei Klimentov

BNL/PAS

Evolving
PanDA

for

Advanced Scientific Computing

9
/
12
/1
3

NEC 2013

30


Proposal titled “Next Generation Workload
Management and Analysis System for
BigData



Big
PanDA

was submitted to ASCR DoE in April 201
2
.


DoE
ASCR and HEP funded

project s
tarted
in
Sep
2012.


Generalization of
PanDA

as meta application, providing location
transparency of processing and data management, for HEP
and other data
-
intensive sciences, and a wider
exascale

community
.


Other efforts


PanDA

: US ATLAS funded project


Networking : Advance Network Services



There are three dimensions to evolution of
PanDA


Making
PanDA

available beyond ATLAS and High Energy
Physics


Extending beyond Grid (Leadership Computing Facilities,
Clouds, University clusters)


Integration of network as a resource in workload management


Alexei Klimentov

BNL/PAS

“Big
PanDA
” work plan


Factorizing the code


Factorizing the core components of
PanDA

to enable
adoption by a wide range of
exascale

scientific
communities


Extending the scope


Evolving
PanDA

to support extreme scale computing
clouds and Leadership Computing Facilities


Leveraging intelligent networks


Integrating network services and real
-
time data access to
the
PanDA

workflow


3 years plan


Year 1. Setting the collaboration, define algorithms and
metrics


Year 2. Prototyping and implementation


Year 3. Production and operations

9
/12/2013

NEC 2013

31

Alexei Klimentov

BNL/PAS

“Big
PanDA
”. Factorizing the Core



Evolving
PanDA

pilot


Until
recently the pilot has been ATLAS specific, with lots of code only
relevant for
ATLAS


To
meet the needs of the Common Analysis Framework project, the pilot is
being
refactored


Experiments
as plug
-
ins


Introducing
new experiment specific classes, enabling better organization of the
code


E.g
. containing methods for how a job should be setup, metadata and site
information handling
etc
, that is unique to each
experiment


CMS
experiment classes
have been implemented


Changes
are being introduced gradually, to avoid affecting current
production


PanDA

instance
@Amazon Elastic Compute Cloud (EC2)


Port
PanDA

database back
to
mySQL


VO independent


It will be used as a test
-
bed for non
-
LHC experiments

»
PanDA

Instance with all functionalities is installed and
running at EC2. Database migration from Oracle to
MySQL is finished. The instance is VO independent.

»
LSST MC production is the first use
-
case for the
new instance


Next step will be refactoring
PanDA

monitoring package



9
/
12
/1
3

NEC 2013

32

Alexei Klimentov

BNL/PAS

“Big
PanDA
”. Extending the scope


Why we are looking for opportunistic resources
?


More demanding LHC environment after 2014


Higher energy, more complex collisions…


We plan to record, process and analyze more data


Physics motivated (Higgs
coupling measurement, Physics beyond Standard
Model


SUSY, Dark Matter, extra dimensions, Dark Energy,
etc
)


The demands on computing resources to accommodate the Run2 physics needs
increase


HEP now risks to compromise physics because of lack of computing
resources
$
.


Has not been true for ~20 years


Several venues to explore in the next years


Optimizing/changing our workflows, both on analysis and on the Grid


High
-
Performance Computing


Leadership Computing Facilities


Research and Commercial Clouds



ATLAS@home



Common characteristic to
the opportunistic
resources: we have to be
agile in how we use them


Quick onto them (software, data and workloads) when they become available


Quick off them when they’re about to disappear


Robust against their disappearing under us with no notice


Use them until they disappear


don’t allow holes with unused cycles, fill them with fine
grained
workloads


$
I.Bird

(WLCG Leader) presentation at HPC and super
-
computing workshop for Future Science Applications (BNL,
Jun 2013
)


9
/
12
/1
3

NEC 2013

33

Alexei Klimentov

BNL/PAS


9
/
12
/1
3

NEC 2013

34

Spikes in demand for computational resources

Can significantly exceed available ATLAS Grid resources

Lack of resources slows down pace of discovery

1M

Alexei Klimentov

BNL/PAS




Compute
Engine (GCE) preview project


Google allocated additional resources for ATLAS for free


~5M
cpu

hours, 4000 cores for about 2 month, (original
preview allocation 1k cores
)


Resources are organized as
HTCondor

based
PanDA

queue


Centos 6 based custom built images, with SL5 compatibility libraries to run
ATLAS software


Condor head node, proxies are at BNL


Output exported to BNL SE


Work on capturing the GCE setup in Puppet


Transparent inclusion of cloud resources into ATLAS Grid


The idea was to test long term stability while running a cloud cluster similar in size to
Tier 2 site in ATLAS


Intended for CPU intensive Monte
-
Carlo simulation workloads


Planned as a production type of run. Delivered to ATLAS as a resource and not as
an R&D platform.


We also tested high performance PROOF based analysis cluster


9
/
12
/1
3

NEC 2013

35

“Big
PanDA
”. Extending the scope

Alexei Klimentov

BNL/PAS

Running
PanDA

on Google Compute Engine


We ran for about 8 weeks (2 weeks were planned for scaling up)


Very stable running on the Cloud side. GCE was rock solid.


Most problems that we had were on the ATLAS side.


We ran computationally intensive jobs


Physics event generators, Fast detector simulation
,

Full
detector simulation


Completed 458,000 jobs, generated and processed about 214 M
events


5/29/2013

Alexei Klimentov

36

Failed and Finished Jobs


reached throughput of 15K jobs per day


most of job failures occurred during start up and scale up phase

Alexei Klimentov

BNL/PAS

“Big
PanDA
” for Leadership Computing Facilities


Expanding
PanDA

from Grid to Leadership Class
Facilities (LCF)
will
require significant changes in
our system


Each LCF is unique


Unique architecture and hardware


Specialized OS, “weak” worker nodes, limited memory per WN


Code cross
-
compilation is typically required


Unique job submission systems


Unique security environment


Pilot submission to a worker node is typically not
feasible


Pilot/agent per supercomputer or queue
model


Tests on
BlueGene

at BNL and ANL. Geant4 port to
BG/P


PanDA

project at Oak
-
Ridge National Laboratory LCF
Titan


37

Alexei Klimentov

BNL/PAS

Leadership Computing Facilities. Titan

8
/
6
/1
3

Big Data Workshop

38

Slide from Ken Read

Alexei Klimentov

BNL/PAS

“Big
PanDA

project on
ORNL LCF


Get
experience with all relevant aspects of the
platform and
workload


job submission mechanism


job output handling


local storage system details


outside transfers details


security environment


adjust monitoring model


Develop appropriate pilot/agent model for
Titan


Collaboration between ANL, BNL, ORNL, SLAC, UTA,
UTK


Cross
-
disciplinary project
-

HEP, Nuclear Physics , High
-
Performance Computing

39

Alexei Klimentov

BNL/PAS

ATLAS
PanDA

Coming to Oak
-
Ridge Leadership Computing Facilities

“Big
PanDA
”/ORNL Common Project.

Alexei Klimentov

BNL/PAS

9
/
4
/1
3

NEC 2013

41

“Big
PanDA


at ORNL

PanDA

& multicore jobs scenario

PanDA

on Oak
-
Ridge Leadership Computing Facilities


PanDA

deployment at OLCF was discussed and agreed, including AIMS project component


Cyber
-
Security issues were discussed both for the near and longer term.


Discussion with OLCF Operations


ROOT based analysis is tested


Payloads for TITAN CE (followed by discussion in ATLAS)


D.Oleynik

Alexei Klimentov

BNL/PAS

Adding Network Awareness to
PanDA



LHC Computing model for a decade was based on MONARC model


Assumes poor networking


Connections are seen as not sufficient or reliable


Data needs to be preplaced. Data comes from specific places


Hierarchy of functionality and capability


Grid sites organization in “clouds” in ATLAS


Sites have specific functions


Nothing can happen utilizing remote resources on the time of running job


Canonical HEP strategy : “Jobs go to data”


Data are partitioned between sites


Some sites are more important (get more important data) than others


Planned replicas

»
A dataset (collection of files produced under the same conditions and the same SW) is a
unit of
replication


Data and replica catalogs are needed to broker jobs


Analysis job requires data from several sites triggers data replication and consolidation at one site or
job splitting on several jobs running on all sites


A data analysis job must wait for all its data to be present at the site

»
The situation can easily degrade into a complex n
-
to
-
m matching
problem

There
was
no need to consider network as a resource in
WMS in static data distribution scenario


New networking capabilities and initiatives in the last 2 years (like LHCONE)



Extensive standardized monitoring from network performance monitoring (
perfSONAR
)


Traffic engineering capabilities


Rerouting of high impact flows onto separate infrastructure


Intelligent networking


Virtual Network On Demand


Dynamic circuits

…and
dramatic changes in computing
models


From strict hierarchy of connections becomes more of a mesh


Data access over wide area


“no division” in functionality between sites

We would like to benefit from new networking capabilities and to integrate networking services
with
PanDA
. We start to consider network as a resource on similar way as for CPUs and data
storage


9
/12/2013

NEC 2013

42

Alexei Klimentov

BNL/PAS

Network as a resource


Optimal site selection to run
PanDA

jobs


Take network capability into account in jobs assigning and task
brokerage


Assigned
-
> Activated jobs workflow


Number of assigned jobs depend on number of running jobs


can we use network status to adjust rate up/down?


Jobs are reassigned if transfer times out (fixed duration)


can
knowledge of network status help reduce the timeout
?


Task brokerage


Free disk space

in
Tier1


Availability

of input dataset (a set of files)


The amount of CPU resources

= the number of running jobs in
the cloud (static information system is not used)


Downtime

at
Tier1


Already queued tasks

with equal or higher priorities


High priority task can jump over low priority
tasks


Can knowledge of network help


Can we consider availability of network as a resource, like we
consider storage and CPU resources?


What kind of information is useful?


Can we consider similar (highlighted )factors for networking?



9
/12/2013

NEC 2013

43

Alexei Klimentov

BNL/PAS

Intelligent Network Services and
PanDA


Quick re
-
run a prior workflow (bug found in reconstruction
algo
)


Site A has enough job slots but no input data


Input data are distributed between sites B,C and D, but sites have a
backlog of jobs


Jobs may be sent to site A and at the same time virtual circuits to
connect sites B,C,D to site will be built. VNOD will make sure that such
virtual circuits have sufficient bandwidth reservation.


Or data can be accessed remotely (if connectivity between sites is
reliable and this information is available from
perfSONAR
)


In canonical approach data should be replicated to site A


HEP computing is often described as an example of parallel
workflow. It is correct on the scale of worker node (WN) .
WN doesn’t communicate with other WN during job
execution. But the large scale global workflow is highly
interconnected, because each job typically doesn’t produce
an end result in itself. Often data produced by a job serve
as input to a next job in the workflow.
PanDA

manages
workflow extremely well (1M jobs/day in ATLAS). The new
intelligent services will allow to dynamically create the
needed data transport channels on demand.

9
/12/2013

NEC 2013

44

Alexei Klimentov

BNL/PAS

Intelligent Network Services and
PanDA



In
BigPanDA

we will use information on how much
bandwidth is available and can be reserved before data
movement will be initiated



In Task Definition user will specify data volume to be
transferred and deadline by which task should be
completed. The calculations of (
i
) how much bandwidth to
reserve, (ii) when to reserve, and (iii) along what path to
reserve will be carried out by Virtual Network On Demand
(VNOD).

9
/12/2013

NEC 2013

45

Measurement
sources



Sonar



PerfSonar



XRootD

Site Status Board



Raw data,
historical data

Grid Information

System



Averaged
network data
for atlas sites

SchedConfigDB



Averaged
network data
for panda
sites

Brokerage



Site selection
module

PanDA


A.Petrosyan

Alexei Klimentov

BNL/PAS

Conclusions


The ATLAS experiment Distributed Computing and
Software performance was a great success in LHC
Run 1


The challenge how to process and analyze the data and
produce timely physics results was substantial, but at the end
resulted in a great
success


ASCR gave us a great opportunity to evolve
PanDA

beyond ATLAS and HEP and to start
BigPanDA

project


Project team was set up


The work on extending
PanDA

to LCF has started


Large scale
PanDA

deployments on commercial
clouds are already producing valuable results


Strong interest in the project from several
experiments (disciplines) and scientific centers to
have a joined project.


46

Alexei Klimentov

BNL/PAS

Acknowledgements


Many thanks to
J.Boyd
,
K.De
,
B.Kersevan
,
T.Maeno
,
R.Mount
,
P.Nilsson
,
D.Oleynik
,
S.Panitkin
,
A.Petrosyan
,
A.Prescott
,
K.Read
,
T.Wenaus

for slides and materials used in this
talk

NEC 2013

9
/
12/
1
3

47