HEPiX Spring 2008 @ CERN -

boundlessbazaarServers

Dec 9, 2013 (3 years and 8 months ago)

84 views

HEPiX Spring 2008 @ CERN
-
Summary Report

HEPSysMan @ RAL

19
-
20 June 2008


Martin Bly

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Overview


Venue/Format/Themes


CPU Benchmarking Working Group


Storage and File Systems Working Group


Scientific Linux


Selected topics

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Spring HEPiX 2008


Venue: CERN
-

5
th

to 9
th

May


Council Chamber


Very comfortable, good wireless network access


Format


Sessions based on themes with a morning
‘plenary’ by an invited speaker


½ to 1 day per theme


Agenda:
http://indico.cern.ch/conferenceTimeTable.py?confId=27391

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Themes


LHC and Data Readiness


LHC overview


Trigger farms of LHC experiments


LCG overview and status


CCRC


Site Reports


Storage technology


CPU technology


Data centre management, availability, and reliability


Problem resolution, problem tracking, alarm systems


System management


Networking infrastructure and computer security


Applications and Operating systems


HEPiX ‘bazaar and think
-
tank’


General Virtualisation


Grid stuff (Monitoring etc.)


Miscellaneous

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Benchmarking Working Group


WLCG MoUs based on SI2K


SPEC2000 deprecated in favour of SPEC2006


no longer available and maintained


Remit


Find a benchmark accepted by HEP and others as many sites serve different communities


Review of existing benchmarking practices (CERN, FZK, INFN, …)


Last 6 months: setup of benchmarking test
-
bed with dedicated HW at CERN and others


Covering of wide range of processors with typical HEP configuration (2GB/core)


Run SPEC benchmarks with agreed flags


SL4/64bit OS with benchmarks at 32
-
bit/gcc 3.4


Look at SL5, 64
-
bit, gcc4


Run of variety of ‘standard candles’ from LHC experiment’s code to compare with SPEC


Provides scaling and recalibration of computi9ng requirement


Looking at understanding the statistical treatment of experiment results


Recently uncovered different methodologies for random numbers!


No major scaling problem with either SI2K or SI2K6


Should allow a smooth transition

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

File Systems Working Group


Started with a questionnaire about storage at T1s


Followed up with a technology review and selection

-
Posix FS (TFA) : LUSTRE, GPFS, AFS

-
SRM : CASTOR, dCache, DPM

-
Xrootd


Performance comparison between selected technologies


Testbed setup at CERN with 10 servers and 60 8
-
core clients with 1
Gb/s connection, 4
-
5 6TB


480 simultaneous client tasks


3 tests : writing, sequential read, pseudo
-
random read


Most implementations able to sustain wire
-
speed in writes and sequential
reads


Significant performance advantage for LUSTRE in pseudo
-
random reads but
must clarify test conditions


Use case may be an advantage for LUSTRE client
-
side caching

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Scientific Linux


Review of recent releases: SL5.1, SL4.6


Trying to put the 64bit versions out at the same time as the 32bit
versions


Obsolete 3.0.1 to 3.0.8


Description of issue with ‘new’ tags in version numbers
appearing to make new versions appear older to yum


Working on automating ‘fastbugs’ repositories


Clarifying policy on security errata


Future:


SL3.0.9 to continue till October 2010.


Planning on doing SL4.7, SL5.2, SL6.

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

SL discussions


Support for SL4?


RHEL4: full support 3 years, deployment support 3
-
5 years, maintenance support for 5
-
7 years.


RHEL released Feb 2005 so in deployment support


Critical that Grid middleware is available


DESY need SL4 to Autumn 2011


CERN intending to introduce batch and UIs for SL5 in Autumn 2008, so WN gLite payload should be available


Some concern over experiment readiness


Compiler is the important factor rather than the actual version of SL


Encourage shorter deadlines with more flexibility on extending deadlines


likely to get better buy
-
in from users


So suggest July 2010? Suggestion of October 2010, same as SL3, to stop short
-
term migration.


XFS in SL?


In or out? Consensus is to have it in using the usual kernel module system. Jan Iven hears from unreliable source
that back
-
ports of latest version are coming. SL4 or SL5? SL4 contrib, SL5 standard. Does it work with 32bit?
Yes, kernel now less hostile.


Scientific Linux 6: Should it be based on CentOS?


Still do installer changes


Still add RPMs we usually do


Use precompiled RPMs


Change/recompile RPMs we feel the need to (SL graphics).


Kernels modules: Adding security repo during the install gets the correct kernel but incorrect
modules. Can fix installer, fix up afterwards with a script, or use dkms. Add dkms to release, do it
instead of kernel modules?


Stop Press: RHEL 4 lifetime extended: ‘full support’ for 4 years…

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Selected Topics I


Well attended talk by Sascha Brawer from Google,
describing their technology and methods for
handling very large datasets over distributed
geographical locations


Based on truckloads of low cost systems


Care about performance per $ not raw performance


In house rack design, chassis
-
less PC
-
class
motherboards, low end storage


Many data centres around the world


Need to design software to cope with failures

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Selected Topics II


Several talks on experiences with Lustre


DESY


good description of setting it up


GSI


talk about production use


Lustre appears stable and reliable as a
production distributed file system


Proof against various failure modes


Sverre Jarp

gave a review of the CERN
OpenLab and what they are working on


Collaboration with HP, Intel, Oracle…


20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Experience with Windows Vista at CERN


Update on Vista activities at CERN


status, plans etc.


Using readiness check to determine suitability, Vista not
the default (XP).


Now 300 machines (~5%) running Vista.


Notes on introduction of SP1


Feb 2008: still preparing for the upgrade rollout. RFM
removed in favour of popup nagging.


Vista SP1 improved performance over XP or standard
vista, but not by much in most cases.

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Virtualisation with Windows at CERN


Review of virtualisation in IT services at
CERN


17 physical servers with 45 ‘clients’ ranging
through Windows server variants and SLC4/5


Using Virtual Server 2005


New Hyper
-
V


part of Windows Server 2008,
needs 64bit CPU


Supports 32/64bit guests, large RAM (>32GB) in
VMs

20 June 2006

HEPiX Spring 2008 Report
-

HEPSysMan @ RAL

Remote Administration via Service Modules


Work at GSI on using IPMI modules to
administer remotely located server
hardware


Disadvantages of remote access using standard
tools, not the least of which is you need a
running OS.


Discussion of advantages of using IPMI
modules for remote control


changing BIOS settings, resets, installing…


Detailed description of capabilities.