An Overview over Online Systems at the LHC

mammetlizardlickSoftware and s/w Development

Nov 3, 2013 (4 years and 5 days ago)

222 views

An Overview over Online Systems at
the LHC

Invited Talk at NSS
-
MIC 2012

Anaheim CA, 31 October 2012

Beat Jost , Cern

Beat Jost, Cern

Acknowledgments and Disclaimer

I would like to thank David Francis, Frans Meijers and
Pierre vande Vyvre for lots of material on their
experiments


I would also like to thanks Clara Gaspar and Niko
Neufeld for many discussions


There are surely errors and misunderstandings in this
presentation which are entirely due to my shortcomings

2

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Outline


Data Acquisition Systems


Front
-
end Readout


Event Building


Run Control


Tools and Architecture


Something New


Deferred Triggering


Upgrade Plans

3

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Role of the Online System


In today’s HEP experiments millions of sensors are
distributed over hundreds of m
2

and actuated dozens
million times per second


The data of all these sensors have to be collected
and assembled in one point (computer, disk, tape),
after rate reduction through event selection


This is the
Data Acquisition
(DAQ) system


This process has to be controlled and monitored (by
the operator)


This is the
Run Control

System


Together they form the
Online system

And, by the way, it’s a pre
-
requisite for any physics
analysis

4

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Setting the Scene


DAQ Parameters

5

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

A generic LHC DAQ system

6

NSS
-
MIC Anaheim 31 October 2012

Sensors

Front
-
End
Electronics

Aggregation

Aggregation/

(Zero Suppression)

Zero Suppression/

Data Formatting/

Data Buffering

Event Building

Network

HLT Farm

Perm. Storage

On/near Detector

Off Detector

Front
-
End
Electronics

Front
-
End
Electronics

Front
-
End
Electronics

Front
-
End
Electronics

Today’s data
rates
are too
big to let all the data flow through a single
component

Beat Jost, Cern


The DAQ System can be viewed like a gigantic funnel
collecting the data from the sensors to a single point
(CPU, Storage) after selecting interesting events.


In general the response of the sensors on the detector
are transferred (digitized or analogue) on point
-
point links
to some form of 1
st

level of concentrators


Often there is already
a concentrator
on the detector
electronics, e.g. readout chips for silicon detectors.


The
more upstream in the system, the more the technologies
at
this level differ
, also within the experiments


In LHCb the data of the Vertex detector are transmitted in
analogue form to the aggregation layer and digitized
there


The
subsequent level
of aggregation is usually
also used
to
buffer the data and format them for the event
-
builder
and High
-
level
trigger


Somewhere along the way, Zero suppression is performed


Implementations


Front
-
End Readout

7

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

DDL

Optical 200 MB/s ~500 links

Full duplex: Controls FE (commands,
Pedestals, Calibration data)

Receiver card interfaces to PC

Yes

SLINK

Optical: 160 MB/s ~1600 Links

Receiver card interfaces to PC.

Yes

SLINK 64

LVDS: 400 MB/s (max. 15m) ~500 links

Peak throughput 400 MB/s to absorb
fluctuations,
typical usage 2kB@100 kHz =
200

MB/s.

Receiver card interfaces to commercial NIC
(Myrinet)

Yes

Glink (GOL)

Optical 200 MB/s ~4800 links

+ ~5300 analog links

Before Zero Suppression

Receiver card interfaces to custom
-
built
Ethernet NIC (4 x 1 Gb/s over copper)


(no)

Trigger
Throttle

Readout Links of LHC Experiments

8

NSS
-
MIC Anaheim 31 October 2012

Flow Control

Beat Jost, Cern

Implementations


Event Building


Event building is the process of collecting all the data
fragments belonging to one trigger in one point, usually
the memory of a processor of a farm.


Implementation typically using a switched network


ATLAS, ALICE and LHCb Ethernet


CMS 2 steps, first with Myrinet, second Ethernet


Of course the implementations in the different
experiments differ in details from the ‘generic’ one,
sometimes quite drastically.


ATLAS implements an additional level of trigger, thus reducing
the overall requirements on the network capacity


CMS does event building in two steps; with Myrinet (fibre) and
1 GbE (copper) links


ALICE implements the HLT in parallel to the event builder thus
allowing bypassing it completely


LHCb and ALICE use only one level of aggregation downstream
of the Front
-
End electronics.

9

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Ethernet

Single Stage event

building

-

TCP/IP based Push protocol

-

Orchestrated by an Event Destination Manager

Ethernet

Staged Event
-
Building via a two
-
level trigger system

-

partial readout driven by ROI (Level
-
2 trigger)

-

full readout at reduced rate of accepted events

TCP/IP based pull protocol

Myrinet/

Ethernet

Two
-
stage Full Readout of all triggered events

-

first stage Myrinet (flow control in hardware)

-

second stage with Ethernet TCP/IP and driven by Event
Manager

Ethernet

Single stage event
-
building directly from Front
-
end
Readout Units to HLT farm nodes

-

driven by Timing&Fast Control system

-

pure push protocol (raw IP) with credit
-
based
congestion control. Relies on deep buffers in the
switches

SWITCH
HLT farm
Detector
TFC
System
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
READOUT NETWORK
L0 trigger
LHC clock
MEP Request
Event building
Front
-
End
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Experiment Control System (ECS)
VELO
ST
OT
RICH
ECal
HCal
Muon
L0
Trigger
Event data
Timing and Fast Control Signals
Control and Monitoring data
SWITCH
MON farm
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
10

NSS
-
MIC Anaheim 31 October 2012

Event Building in the LHC Experiments

GDC
TDSM
CTP
LTU
TTC
FERO
FERO
LTU
TTC
FERO
FERO
LDC
LDC
BUSY
BUSY
Rare/All
Event
Fragment
Sub
-
event
Event
File
Storage
Network (8GB/s)
PDS
L0, L1a, L2
L0, L1a, L2
360
DDLs
D
-
RORC
D
-
RORC
EDM
LDC
D
-
RORC
D
-
RORC
Load Bal.
LDC
LDC
D
-
RORC
D
-
RORC
HLT
Farm
FEP
FEP
DDL
H
-
RORC
10 DDLs
10 D
-
RORC
10 HLT LDC
120
DDLs
DA
DQM
DSS
Event
Building
Network (20 GB/s)
430
D
-
RORC
175
Detector LDC
75
GDC
30 TDSM
18
DSS
6
0
DA/DQM
75
TDS
Archiving on Tape
in the Computing
Centre (
Meyrin
)
SWITCH
HLT farm
Detector
TFC
System
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
READOUT NETWORK
L0 trigger
LHC clock
MEP Request
Event building
Front
-
End
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Experiment Control System (ECS)
VELO
ST
OT
RICH
ECal
HCal
Muon
L0
Trigger
Event data
Timing and Fast Control Signals
Control and Monitoring data
SWITCH
MON farm
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
Beat Jost, Cern

Controls Software


Run Control


The main task of the run control is to
g
uarantee that all
components of the readout system are configured in a
coherent manner according to the desired DAQ activity.


10000s of electronics components and software processes


100000s of readout sensors


Topologically implemented in a deep hierarchical tree
-
like
architecture with the operator at the top


In general the configuration process has to be sequenced
so that the different components can collaborate
properly



Finite State Machines (FSM)


Inter
-
Process(or) communication (IPC) is an important
ingredient to trigger transitions in the FSMs

11

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Control Tools and Architecture

ALICE

ATLAS

CMS

LHCb

IPC Tool

DIM

CORBA

XDAQ (HTTP/SOAP)

DIM/PVSS

FSM Tool

SMI++

CLIPS

RMCS/XDAQ

SMI++

Job/Process Control

DATE

CORBA

XDAQ

PVSS/FMC

GUI Tools

Tcl/Tk

Java

Java Script/Swing/Web Browser

PVSS

12

NSS
-
MIC Anaheim 31 October 2012

LV
Dev1
LV
Dev2
LV
DevN
DCS
SubDetN
DCS
SubDet2
DCS
SubDet1
DCS
SubDet1
LV
SubDet1
TEMP
SubDet1
GAS


Commands
DAQ
SubDetN
DAQ
SubDet2
DAQ
SubDet1
DAQ
SubDet1
FEE
SubDet1
RO
FEE
Dev1
FEE
Dev2
FEE
DevN
Control
Unit
Device
Unit


Legend
:
INFR.
TFC
LHC
ECS
HLT
Status & Alarms
Run Control

Detector

Control

e
x. LHCb Controls Architecture

Beat Jost, Cern

GUI Example


LHCb Run Control

13

NSS
-
MIC Anaheim 31 October 2012


Main operation panel
for the shift crew


Each sub
-
system can
(in principle) also be
driven independently

Beat Jost, Cern

Error Recovery and Automation


No system is perfect. There are always things that
go wrong


E.g. de
-
synchronisation of some components


Two approaches to recovery


Forward chaining


We’re in the mess. How do we get out of it?


ALICE and LHCb: SMI++ automatically
acts to recover


ATLAS: DAQ Assistant (CLIPS) operator assistance


CMS: DAQ Doctor (Perl) gives operator assistance


Backward chaining


We’re in the mess. How did we get there?


ATLAS: Diagnostic and Verification System (DVS)


Whatever one does: One needs lots of diagnostics to
know what’s going on.

14

NSS
-
MIC Anaheim 31 October 2012

Snippet of forward chaining (Big Brother in
LHCb
):


object
:
BigBrother


state: READY


when (
LHCb_LHC_Mode

in_state

PHYSICS ) do PREPARE_PHYSICS


when (
LHCb_LHC_Mode

in_state

BEAMLOST ) do PREPARE_BEAMLOST


...


action: PREPARE_PHYSICS


do
Goto_PHYSICS

LHCb_HV


wait (
LHCb_HV

)


move_to

READY


action: PREPARE_BEAMLOST


do STOP_TRIGGER
LHCb_Autopilot


wait (
LHCb_Autopilot

)


if (
VELOMotion

in_state

{CLOSED,CLOSING} ) then


do Open
VELOMotion


endif


do
Goto_DUMP

LHCb_HV


wait (
LHCb_HV
,
VELOMotion

)


move_to

READY


...

Beat Jost, Cern

Summary


All LHC Experiments are taking data with great success


All implementations work nicely


The systems are coping with the extreme running conditions,
sometimes way beyond the original requirements


ATLAS and CMS have to cope with upto 40 interactions/bunch
crossing (requirement was ~20
-
25) LHCb ~1.8 interactions instead
of 0.4 as foreseen.


Significantly bigger event sizes


Significantly longer HLT processing


Availability of the DAQ systems are above 99%


Usually it’s not the DAQ hardware that doesn’t work


The automatic recovery procedures implemented keep the
overall efficiency
𝐿𝑖𝑖𝑦

  
𝐿𝑖𝑖𝑦

 𝑖 

typically above 95%,
mainly by faster reaction and avoidance of operator
mistakes.

15

NSS
-
MIC Anaheim 31 October 2012

Beat Jost, Cern

Something New


deferred Trigger


The inter
-
fill gaps (dump to stable
-
beams) of the LHC can be
significant (many hours, sometimes days)


During this time the HLT farm is basically idle


The idea is to use this idle CPU time for executing the HLT
algorithms on data that was written to a local disk during the
operation of the LHC.


16

NSS
-
MIC Anaheim 31 October 2012

Moore

Moore

Moore

MEPrx

DiskWr

MEP

Result

Overflow

Ovr
Wr

Reader

MEP buffer full?

No

Yes

Farm Node

Beat Jost, Cern


Currently deferring ~25% of the L0 Trigger Rate


~250 kHz triggers


Data stored on 1024 nodes equipped with 1TB local
disks


Great care has to be taken


to keep an overview of which nodes hold files of which runs.


Events are not duplicated


During deferred HLT processing files are deleted from disk as
soon as they are opened by the reader


Starting and stopping is automated according to the
state of the LHC


No stress for the shift crew

Deferred Trigger


Experience

17

NSS
-
MIC Anaheim 31 October 2012

Start of
Data
taking

Beam
Dump

Start of
deferred
HLT

End deferred
HLT

Start of
Data
taking

Beam
Dump

Online
troubles

New fill

Start of
Data
taking


Number of files

Beam
Dump

Start of
deferred
HLT

Beat Jost, Cern

Upgrade Plans


All four LHC experiments have upgrade plans for the nearer or
farther future


Timescale 2015


CMS


integration
of new point
-
to
-
point link (~10 Gbps) to new back
-
end electronics (in
µTCA
)
of new trigger/detector
systems


replacement
of Myrinet with 10 GbE (TCP/IP) for data aggregation in to PCs and
Infiniband

(56
Gbps) or 40 GbE for event
building


ATLAS: merging of L2 and HLT networks and CPUs


Each CPU in Farm will run both triggers


Timescale 2019


ALICE: increase acceptable trigger rate from 1 to 50kHz for Heavy Ion operation


New front
-
end readout link


TPC continuous readout


LHCb: Elimination of hardware trigger (readout rate 40 MHz)


Readout front
-
end electronics for every bunch crossing


New front
-
end electronics


Zero suppression on/near detector


Network/Farm capacity increase by factor 40 (3.2 TB/s, ~4000 CPUs)


Network technology: Infiniband or 10/40/100 Gb Ethernet


No architectural changes


Timescale 2022 and beyond


CMS&ATLAS: implementation of a HW track trigger running at 40 MHz and
surely many other changes…

18

NSS
-
MIC Anaheim 31 October 2012