Domenico Elia1 Calcolo ALICE: stato e richieste finanziarie Domenico Elia Riunione Referee Calcolo LHC / Padova, Riunione con Referee Calcolo LHC Padova, 25 Maggio 2016
Domenico Elia2Riunione Referee Calcolo LHC / Padova, Outline ALICE Computing status: impiego delle risorse 2015, attività calcolo Run2 performance siti italiani, attività di R&D Richieste finanziarie: situazione CPU e storage nei Tier-2, dismissioni richieste suppletive 2016 (Tier-1) richieste ordinarie 2017 (Tier-1 e Tier-2)
Domenico Elia3Riunione Referee Calcolo LHC / Padova, ALICE Computing status First year Run2 data taking 13 TeV 5.02 TeV
Domenico Elia4Riunione Referee Calcolo LHC / Padova, ALICE Computing status First year Run2 data taking – 7.3 PB (one replica) All data processed in final reconstruction pass 2015 – 7.2 PB (one replica)
Domenico Elia5Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 Overall CPU/DISK/TAPE usage: T1, T2 over pledge (opportunistic, extra-WLCG) DISK usage below request (delay in 2015 data reconstruction) high TAPE usage (unexpected high pile-up in pp 13 TeV bs 25 ns) CERN-RRB
Domenico Elia6Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in :
Domenico Elia7Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities (when not in run): included in the Grid as a fully virtual site
Domenico Elia8Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities (when not in run) usual share of the activities: ~150 MC cycles (papers + first physics analysis of 2015 data) Run1 raw data re-processing, Run2 data processing (bulk of raw and MC production for Run2, both pp and PbPb, still to be done) organized and user (chaotic) analysis
Domenico Elia9Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities usual share of the activities: 61K parallel jobs on average MC productions: 71% RAW data processing: 9% User analysis: 6% Organized analysis: 14%
Domenico Elia10Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing: code improved to reduce memory consumption (now 2 GB/job) 2015 data reconstructed partially: -distortions in the TPC occur in runs with high interaction rate -specific corrections need to be developed, currently being validated -plan to complete fully calibrated reconstruction by next ~1-1.5 months
Domenico Elia11Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy: needed to cope with the available storage single ESD replica global disk space needed for 2015 processing: -5-6 PB (RAW + MC) -barely feasible with the expected resources
Domenico Elia12Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy Popularity and cleanup: -removed very old MC productions -removed second ESD replica for low acces productions Volume of data vs Nr of accesses in X=3,6,12 months First bin: data created before period X began and not accessed during that period
Domenico Elia13Riunione Referee Calcolo LHC / Padova, ALICE Computing status Performance of the Italian sites TO LNL CNAF BA CT ~14% INFN Problems with the LUSTRE FS in the old Bari site (BC2S) fully migrated to the new ReCaS datacenter
Domenico Elia14Riunione Referee Calcolo LHC / Padova, ALICE Computing status Performance of the Italian sites Resource T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015
Domenico Elia15Riunione Referee Calcolo LHC / Padova, ALICE Computing status Performance of the Italian sites Resource T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015 large upgrade in 2 sites (ReCaS) within 2015: CATANIA (in production since April, ~1500 core, 1 PB: Catania-VF) BARI (in production for ALICE since mid-August): ~300 server, 105 kHS06 (~10000 core) - 25 kHS06 CMS pledge + 10 kHS06 ALICE pledge ~4 PB disk storage PB tape library TB CMS pledge TB ALICE pledge 20 Gbit/s network connection (ready for 40 Gbit/s)
Domenico Elia16Riunione Referee Calcolo LHC / Padova, ALICE Computing status New ReCaS BA infrastructure Official opening July 9, 2015: BARI Tier-2 from BC2S to ReCaS: -migration from LUSTRE to pure XRootD -large opportunistic use of CPU (up to ~6000 slots) BC2S ReCaS Pledge 2015
Domenico Elia17Riunione Referee Calcolo LHC / Padova, ALICE Computing status Performance of the Italian sites Bari Torino PD-LNL Catania Pledge: New ReCaS center in Bari New ReCaS center in Catania: Catania-VF
Domenico Elia18Riunione Referee Calcolo LHC / Padova, ALICE Computing status Performance of the Italian sites Monitoring T2 data from APEL: BALNLT1 CTTO
Domenico Elia19Riunione Referee Calcolo LHC / Padova, ALICE Computing status R&D activity and s/w for Run3 Virtual Analysis Facility (STOA-LHC PRIN): Cloud-based VAF deployed in BA, CA, LNL, TO and TS XRootD-based Data Federation (DF) set-up and populated: local redirectors in each site + national redirector in BA system fully tested, final PRIN report completed by end of April ’16 Software development for Run3: ITS standalone tracking based on cellular automaton (TO) ITS geometry (AL) response simulation for the pixel (pAlpide) chip (TS, BS-PV) First experience with EOS at TS
Domenico Elia20Riunione Referee Calcolo LHC / Padova, ALICE Computing status R&D activity on the Dashboard The project: a Dashboard concentrate in a single graphical interface all the information concerning the ALICE activity in each site (MonALISA, local Batch system, local Monitoring system metrics). Currently running in the Bari T2 site (since ~2 years) Recently exported also to the Torino site Next steps: –export in all ALICE T2 and others WLCG sites –global dashboard for the Italian computing in ALICE Abstract submitted to CHEP’16
Domenico Elia21Riunione Referee Calcolo LHC / Padova, Sito web calcolo ALICE Italia
Domenico Elia22Riunione Referee Calcolo LHC / Padova, Sito web calcolo ALICE Italia Attività Contatti Documenti Link Eventi
Domenico Elia23Riunione Referee Calcolo LHC / Padova, Situazione risorse e richieste finanziarie
Domenico Elia24Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Italia In produzione al Tier-1: CPU:29000 HS06 (pledge 2016) DISK:3900 TB (pledge 2016) TAPE:5500 TB (pledge 2016) In produzione ai Tier-2 (+ Cagliari): BariCatania Padova- LNL TorinoCagliariTotale HS TB Disponibili (incluso obsoleti non ancora dismessi) Maggio 2016 Quota pledge da Febbraio 2016 (x2) Pledge 2016: HS TB
Domenico Elia25Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Tier-2 Acquisti seconda metà 2015: CPU: 1720 HS06 a LNL (bonus ~450 HS06) HS06 a TO storage: espansioni 4x180 TB a BA e LNL (bonus ~50 TB) esito ottimizzato con combinazione gare (BA) e acquisti di sito
Domenico Elia26Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) rinvio dismissioni storage CT/CA, per metà dismissioni PD-LNL e TO assegnata al 50% la richiesta overhead pledge 2016 garantite in accordo all’esito CRSG/RRB
Domenico Elia27Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Schema suddivisione tra i siti: CPU: ~14400 HS06 BA: HS06 (1950 crescita rimpiazzi = 3518 HS06) LNL: HS06 (2500 crescita rimpiazzi = 7996 HS06) TO: HS06 (1300 crescita rimpiazzi = 2884 HS06) DISK: ~620 TB BA: 1184 TB (260 crescita = 260 TB) LNL: 1202 TB (50 crescita rimpiazzi = 180 TB) TO: 1223 TB (100 crescita + 80 rimpiazzi = 180 TB)
Domenico Elia28Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Situazione acquisti 2016: completati: BA: 3840 HS06 (BA) TB (espansione per LNL) LNL:8600 HS06 (LNL) + licenza per espansione storage da finalizzare: BA:260 TB (gara comune con CMS, totale ~200 k€) TO:2880 HS TB (sinergie con acquisti altre sigle e C3S) overhead (ricognizione esigenze completata e storni effettuati)
Domenico Elia29Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Situazione CPU/storage Tier-2 Situazione aggiornata con risorse 2016: CPU:45333 HS06 in eccesso al pledge: 1488 HS06 DISK: 4876 TB in eccesso al pledge: 47 TB BariCatania Padova- LNL TorinoCagliariTotale HS TB Disponibili a fine 2016 (fatte dismissioni + completati acquisti 2016*) Pledge 2016: HS TB * Ipotesi di buon esito acquisti residui a BA e TO
Domenico Elia30Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB
Domenico Elia31Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB HS TB Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB Dismissioni ReCaS (BA e CT) previste nel 2018 = HS06
Domenico Elia32Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB HS TB Situazione complessiva Tier-2 a inizio 2017: CPU:45333 – 3840 = HS06 DISK:4876 – 591 = 4285 TB
Domenico Elia33Riunione Referee Calcolo LHC / Padova, Richieste finanziarie RRB Aprile 2016 Share INFN per 2017: CPU, DISK per Tier-1 e Tier-2: 18.9% (18.5% per 2016) TAPE per Tier-1: 34.8% (35.2 per 2016, 41.1% per 2015) RRB October 2015
Domenico Elia34Riunione Referee Calcolo LHC / Padova, Richieste finanziarie RRB Aprile % (4%) CPU al Tier-1 (0) +30% (22%) TAPE al Tier-1 (0) increased processing time for high pile-up pp events (x2) + TPC calibration issues increased raw data volume for pp events (x3.5) as observed in 2015 sample
Domenico Elia35Riunione Referee Calcolo LHC / Padova, Richieste finanziarie RRB Aprile % (4%) CPU al Tier-1 (0) +30% (22%) TAPE al Tier-1 (0) Richiesta suppletiva 2016 per Tier-1: CPU: 2700 HS06 35 k€ (pledge 2016 rev: HS06) TAPE:1.6 PB 40 k€ (pledge 2016 rev: 7.1 PB) increased processing time for high pile-up pp events (x2) + TPC calibration issues increased raw data volume for pp events (x3.5) as observed in 2015 sample
Domenico Elia36Riunione Referee Calcolo LHC / Padova, Richieste finanziarie RRB Aprile 2016 Incrementi 2016 rev. 2017: CPU:13.8% (T0) 31.5% (T1) 17% (T2) DISK:27.4%16.8%19.9% TAPE:30.8%39.9% RRB October 2015
Domenico Elia37Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Richieste 2017: Tier-1 e Tier-2 CPU Tier-1 (HS06) DISK Tier-1 (TB) TAPE Tier-1 (TB) CPU Tier-2 (HS06) DISK Tier-2 (TB) Pledged T1 Disp. – dismiss. T Scrutinati ALICE Delta Stima costo (k€) Totale (k€) Overhead T2 (k€) 54.1 Stima costi T2 (T1): 11 (13) € / HS06 e 200 (210) € / TB Dismissioni Tier-1: non incluse Overhead Tier-2: 6% CPU + 5% DISCO (rete) + 7% totale (server aggiuntivi)
Domenico Elia38Riunione Referee Calcolo LHC / Padova, Richieste finanziarie Richieste 2017: per sito Tier-2 Dismissioni HS06 / TBk€ Bari00,0 0 Catania00, ,8 LNL-Padova00, ,0 Torino384042, ,4 81,6 Cagliari00,0 204,0 Dismissioni totale HS06 / TBk€ , ,2 160,4 Crescita netta HS06 / TBk€ , ,9 281,0 Dismissioni + crescita HS06 / TBk€ , ,1 441,4
Domenico Elia39Riunione Referee Calcolo LHC / Padova, Backup
Domenico Elia40Riunione Referee Calcolo LHC / Padova, ALICE Computing status Resource usage in 2015 CPU resource evolution: steady grouth of the number of active jobs system scaled from 500 to 100,000 concurrently running jobs scheduled analysis now prevaling on chaotic analysis organized analysis +60% in 2015 wrt 2014 better efficiency
Domenico Elia41Riunione Referee Calcolo LHC / Padova, ALICE Computing status Run2 overview
Domenico Elia42Riunione Referee Calcolo LHC / Padova, ALICE Computing status Status of 2015 data processing Substantial IR-induced distortions in the TPC Affect both p-p and Pb-Pb data Sophisticated correction algorithms development in the past 6 months Data reconstructed partially (first physics, Lower IR runs) Bulk of reconstruction still pending 42