Scientific Computing at Fermilab

jazzydoeSoftware and s/w Development

Oct 30, 2013 (3 years and 11 months ago)

107 views


Scientific Computing at
Fermilab

Our “Mission”


Provide computing, software tools and expertise to all
parts of the Fermilab scientific program including theory
simulations (e.g. Lattice QCD and Cosmology), and
accelerator modeling


Work closely with each scientific program


as
collaborators (where a scientist/staff from SCD is
involved) and as valued customers.


Create a coherent Scientific
C
omputing program from
the many parts and many funding sources


encouraging sharing of facilities, common approaches
and re
-
use of software wherever possible


Work closely with CCD as part of an overall coherent
program

Scientific Computing
-

Fermilab S&T Review, Sept 5 2012

2

A Few points


We are ~160 strong made up of almost
entirely technically trained staff


26 Scientists in the Division


As the lab changes its mission, scientific
computing is having to adapt to this new and
more challenging landscape.


Scientific Computing is a very “
matrixed

organization. I will not try to cover all we do
but pick and choose things that are on my
mind right now…

3

Scientific Discovery


the reason we are here


The computing capability needed for scientific
discovery is bounded only by human
imagination


Next Generation of Scientific Breakthroughs


Require major new advances in computing
technology


Energy
-
efficient hardware, algorithms, applications
and systems
sofware


Data “Explosion”


Big Data is here


Observational, sensor networks, and simulation


Computing/Data Throughput challenges

4

About to Experience a Paradigm Shift in
Computing


For the last decade


GRID and computing
resources has been very stable


However….


End of Moore’s Law is looming


New Computing technologies on the near horizon


Phase change memories


Stacked dies


Exponential grown in parallelism in HPC


IBM Blue Gene leading the charge


Heterogeneous systems delivering higher
performance/watt (Titan)


Power is a constraint


Programmability…

5

Computing Landscape will change..


HEP is going to have to adapt to this changing
world


While the future for the next few years is clear,
we don’t really know where we will be in the
next decade or 20 years


Ultimately market forces will determine the
future


We need to turn this into a positive force for
both High Energy Physics and High
Performance computing.

6

Think Back on your Computing Careers…

7

And Today….

8

Ask Yourself…
Has Computing Gotten any Easier in
the last 30 years?

Lattice
b
_c

machine…

Starting to Make the Bridge to the Future

New Funding Initiatives


COMPASS
Scidac

Project (3 year) $2.2M/year


US Lattice QCD Project (5 year) ~$5M/year


Geant4 Parallelization
--

joint with ASCR (2 year)
$1M/year


CMS on HPC machines

(1 year) $150k


PDACS


Galaxy Simulation Portal


joint with Argonne (1
year
) $250k


Science Framework for DES (1 year) $150k


Tevatron

Data Preservation (2 year) $350k/year



Partnering with NSF through OSG


Will be aggressive in upcoming data and knowledge
discovery opportunities at DOE


9

Geant4

10

Workshop Held



between HEP and ASCR



Discussed how to transform
GEANT4 to run efficiently on
modern and future multi
-
core
computers and hybrids


Workshop chairs were Robert
Lucas (USC) and RR.


Funded for $1M/year for 2 years


Here: Algorithmic
development to
be able to utilize multi
-
core
architectures and are porting G4
sections to the
GPUs)


CMS


CMS would like to maintain current trigger thresholds for
2015 run to allow full Higgs Characterization


Thus nominal 350hz output would increase to ~1khz.


Computing budgets expected to remain constant


not
grow.


Need to take advantage of leadership class computing

faciilites


Need to incorporate more parallelism into software


Algorithms need to be more efficient (faster)



11

PDACS


Portal for Data Analysis Services for Cosmological
Simulation.


Joint Project with Argonne, Fermilab, and NERSC


Salman
Habib

(Argonne) is the PI


Cosmological data/analysis service at scale


a
workflow management system


Portal based on that used for computational biology


idea is to facilitate analysis/simulation effort for those
not familiar with advanced computing techniques


12

Dark energy, matter

Cosmic gas

Galaxies

Simulations connect fundamentals with observables

Data Archival Facility

13


Would like to
offer archive
facilities for
broader
community


Will require work
on front ends to
simplify for non
HEP Users


Had discussions
with Ice Cube

One of Seven 10k slot tape robots at FNAL

We Can’t forget our day job….


14

CMS Tier 1 at Fermilab


The CMS Tier
-
1 facility at Fermilab
and the experienced team who
operate it enable CMS to reprocess
data quickly and to distribute the
data reliably to the user community
around the world.


We lead US and Overall

CMS in Software and

computing


15

Fermilab

also operates:


LHC
Physics Center (LPC)


Remote Operations Center


U.S
. CMS Analysis Facility

Intensity Frontier Program (Diverse)

16

Intensity Frontier Strategy


Common approaches/solutions are essential
to support this broad range of experiments
with limited SCD staff. Examples include
ArtDAQ
, ART, SAM IF,
LArSoft
,
Jobsub
,…



SCD has established a liaison between
ourselvs

and experiments to insure
communication and understand
needs/requirements



Completing the process of establishing MOU’s
between SCD and experiment to clarify our
roles/responsibilities


17

Intensity Frontier Strategy
-

2


A shared analysis facility where we can quickly and
flexibly allocate computing to experiments



Continue to work to “grid enable” the simulation and
processing software


Good success with MINOS,
MINERvA

and Mu2e



All experiments use shared storage services


for
data and local disk


so we can allocate resources
when needed



Perception that intensity frontier will not be computing
intensive is wrong


18

artdaq

Introduction

artdaq

is a toolkit for creating data acquisition systems to be run on
commodity servers


It is integrated
with the
art

event reconstruction and analysis
framework for event filtering and data compression
.


provides data transfer, event building, process management,
system and process state behavior, control messaging, message
logging, infrastructure for DAQ process and
art

module
configuration, and writing of data to disk in ROOT format.


The goal is to provide the common, reusable components of a DAQ
system and allow experimenters to focus on the experiment
-
specific
parts of the system. This software that reads out and configures
the experiment
-
specific front
-
end hardware, the analysis modules
that run inside of
art
, and the online data quality monitoring
modules.


As part of our work in building the DAQ software system for
upcoming experiments, such as Mu2e and
Darkside

50, we will be
adding more features

•.
19

artdaq

Introduction

We are currently working with the DarkSide
-
50 collaboration to develop and deploy
their DAQ system using
artdaq
.


The DS
-
50 DAQ reads out ~15 commercial VME modules into four front
-
end
computers using commercial
PCIe

cards and transfers the data to five event
builder and analysis computers over a QDR
Infiniband

network.


The maximum data rate through the system will be 500 MB/s, and we have
achieved a data compression factor of five.


The DAQ system is being commissioned at LNGS, and it is being used to collect
data and monitor the performance of the detector as it is being commissioned.
(plots of phototube response?)


a
rtdaq

will be used for the Mu2e DAQ, and we are working toward a demonstration
system which reads data from the candidate commercial
PCIe

cards, builds
complete events, runs sample analysis modules, and writes the data to disk for later
analysis.


The Mu2e system will have 48 readout links from the detector into commercial
PCIe

cards, and the data rate into the
PCIe

cards will be ~30 GB/s. Event
fragments will be sent to 48 commodity servers over a high
-
speed network, and
the online filtering algorithms will be run in the commodity servers.


We will be developing the experiment
-
specific
artdaq

components as part of
creating the demonstration system, and this system will be used to validate the
performance of the baseline design in preparation for the CD
-
review early next
year.

20

Cosmic Frontier


Continue to curate data for SDSS


Support data and processing for Auger, CDMS and COUPP


Will maintain an archive copy of the DES data and provide
modest analysis facilities for
Fermilab

DES scientists.


Data management is an NCSA (NSF) responsibility


Helping NCSA by “
wrappering
” science codes needed for 2
nd

light
when NCSA completes its framework.


DES use Open Science Grid resources opportunistically and
will make heavy use of NERSC


Writing Science Framework for DES


hope to extend to
LSST


Darkside

50 writing their DAQ system using
artDAQ

21

Tevatron

(Data) Knowledge Preservation


Maintaining full analysis capability for next few years
though building software to get away from custom sys.


Successful FWP funded. hired TWO domain
knowledgeable scientists to lead the preservation effort
on each experiment

(and 5
fte

of SCD effort)


Knowledge Preservation


Need to plan and execute the following…


Preserving analysis notes, electronic logs
etc


Document how to do analysis well


Document sample analyses as cross checks


Understand job submission,
db
, and data handling issues


Investigate/pursue virtualization


Try to keep CDF/D0 strategy in synch

and leverage
common resources/solutions

22

Synergia at Fermilab


Synergia is an accelerator
simulation package
combining collective effects
and nonlinear optics


Developed at Fermilab,
partially funded by SciDAC


Synergia utilizes state
-
of
-
the
-
art
physics
and
computer science


Physics: state of the art in
collective effects and optics
simultaneously


Computer science: scales from
desktops to supercomputers


Efficient running on 100k+ cores


Best practices: test suite, unit
tests

Synergia is being used to model multiple Fermilab machines

Main Injector for Project
-
X
and Recycler for ANU

Booster instabilities and
injection losses

Mu2e: resonant extraction
from the Debuncher

Weak scaling to
131,072 cores

Synergia collaboration with CERN for LHC injector upgrades


CERN has asked us to join in an
informal collaboration to model
space charge in the CERN
injector accelerators


Currently engaged in
benchmarking exercise


Current status reviewed at Space
Charge 2013 workshop at CERN


Most detailed benchmark of PIC
space charge codes to date, using
both data and analytic models


Breaking new ground in
accelerator simulation


Synergia has emerged as the
leader in fidelity and performance


PTC
-
Orbit has been shown to
have problems reproducing
individual particle tunes

Individual particle tune vs. initial position

PTC
-
Orbit
displays noise and
growth over time

Synergia
results are
smooth and
stable

Phase space showing trapping

benchmark

Synergia

SCD has more work than human resources


Insufficiently Staffed at the moment


Improving Event generators


especially for Intensity
Frontier


Modeling of neutrino
beamline
/target


Simulation effort


all IF experiments want more
resources; both technical and analysis


Muon

Collider Simulation


both accelerator and
detector


R&D in
Sofware

definable networks



25

Closing Remarks


SCD has transitioned to fully support the
Intensity Frontier


We also have a number of projects underway
to prepare for the paradigm shift in computing


We are short handed and are having to make
choices

26

Back Up


AT the Moment
….


27

Riding the Wave of Progress…

28

MKIDs (Microwave Kinetic Inductance Devices)


Pixelated micro
-
size resonator array.


Superconducting sensors with
meV

energy gap. Not only a single photon detector:


Theoretically, allow for energy resolution (E/
Δ
E) of about 100 in the visible and near
infrared spectrum.


Best candidate to provide medium resolution spectroscopy of >1 billion galaxies,
QSO and other objects from LSST data if the energy resolution is improved to 80 or
better, currently at ~16. Note that
s
canning that number of galaxies is outside the
reach of current fiber based spectrometers.


An
MKID array of 100,000 pixels will be enough to obtain medium resolution
spectroscopic information for all LSST galaxies up to magnitude
24.5

with an
error .


High bandwidth: Allows for filtering of atmospheric fluctuations at ~100 Hz or
faster.




Multi 10K
-
pixel instrument and science with MKIDs


PPD and SCD teamed up to build an instrument with a number of pixels between 10K and 100K.


External collaborators: UCSB (Ben
Mazin
, Giga
-
Z) , ANL, U. Michigan.


Potential collaboration: strong coupling with the next CMB instrument proposed by
J
ohn
C
arlstrom

U.
C
hicago and Clarence Chang ANL that also requires the same DAQ readout
electronics.



Steve Heathcote, director of the SOAR telescope, Cerro
Tololo
, has expressed interest in hosting
the MKID R&D instrument in 2016.
(Ref. Steve Heathcote letter to Juan Estrada (FNAL))
.



SOAR
telescope operations in late
2016:


10 nights x 10
hours/night.


would
give a limiting magnitude of
~ 25.


Potential science (under consideration): Photometric redshift calibration for DES, Cluster of
galaxies, Supernovae host galaxy redshift, Strong lensing.



SCD/ESE will design the DAQ for up to 100K pixel instrument.


1000 to 2000 MKIDs per RF feed
-
line, 50
feedlines
.


Input bandwidth: 400 GB/s


Triggerless

DAQ.


Data reduction: ~200 MB/s to storage.


Digital signal processing for FPGAs, GPUs, processors, etc.


Status:


Adiabatic dilution refrigerator (ADR) functioning at
Sidet
.


Test of low noise electronics underway.


MKID testing to start this summer.


Electronic system design underway.

Scientific Computing
-

Fermilab S&T Review, Sept 5 2012

31


The Open Science Grid (OSG) advances science through
open distributed computing. The OSG is a multi
-
disciplinary
partnership to federate local, regional, community and
national
cyberinfrastructures

to meet the needs of research
and academic communities at all scales.











Total of 95 sites; ½ million jobs a day, 1 million CPU
hours/day; 1 million files transferred/day.



It is cost effective, it promotes collaboration, it is working!

Open Science Grid (OSG)

The US

contribution

and
partnership with the LHC
Computing Grid is
provided through OSG
for CMS and ATLAS