Domenico Elia1 Calcolo ALICE: stato e richieste finanziarie Domenico Elia Riunione ALICE Italia - Referee / Roma, Riunione ALICE Italia - Referee Roma, 26 Maggio 2016
Domenico Elia2Riunione ALICE Italia - Referee / Roma, Outline ALICE Computing status: impiego delle risorse 2015, attività calcolo Run2 performance siti italiani, attività di R&D Richieste finanziarie: situazione CPU e storage nei Tier-2, dismissioni richieste suppletive 2016 (Tier-1) richieste ordinarie 2017 (Tier-1 e Tier-2)
Domenico Elia3Riunione ALICE Italia - Referee / Roma, ALICE Computing status First year Run2 data taking 13 TeV 5.02 TeV
Domenico Elia4Riunione ALICE Italia - Referee / Roma, ALICE Computing status First year Run2 data taking – 7.3 PB (one replica) All data processed in final reconstruction pass 2015 – 7.2 PB (one replica)
Domenico Elia5Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 Overall CPU/DISK/TAPE usage: T1, T2 over pledge (opportunistic, extra-WLCG) DISK usage below request (delay in 2015 data reconstruction) high TAPE usage (unexpected high pile-up in pp 13 TeV bs 25 ns) CERN-RRB
Domenico Elia6Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in :
Domenico Elia7Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities (when not in run): included in the Grid as a fully virtual site
Domenico Elia8Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities (when not in run) usual share of the activities: ~150 MC cycles (papers + first physics analysis of 2015 data) Run1 raw data re-processing, Run2 data processing (bulk of raw and MC production for Run2, both pp and PbPb, still to be done) organized and user (chaotic) analysis
Domenico Elia9Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 ALICE Grid: new entries in HLT farm used for offline activities usual share of the activities: 61K parallel jobs on average MC productions: 71% RAW data processing: 9% User analysis: 6% Organized analysis: 14%
Domenico Elia10Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing: code improved to reduce memory consumption (now 2 GB/job) 2015 data reconstructed partially: -distortions in the TPC occur in runs with high interaction rate -specific corrections needed to be developed and validated -plan to complete fully calibrated reconstruction by next ~2-3 months
Domenico Elia11Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy: needed to cope with the available storage single ESD replica global disk space needed for 2015 processing: -5-6 PB (RAW + MC) -barely feasible with the expected resources
Domenico Elia12Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 Current activities: MC productions (papers, first physics 2015, upgrade) Raw data processing Changing replication policy Popularity and cleanup: -removed very old MC productions -removed second ESD replica for low acces productions Volume of data vs Nr of accesses in X=3,6,12 months First bin: data created before period X began and not accessed during that period
Domenico Elia13Riunione ALICE Italia - Referee / Roma, ALICE Computing status Performance of the Italian sites TO LNL CNAF BA CT ~14% INFN Problems with the LUSTRE FS in the old Bari site (BC2S) fully migrated to the new ReCaS datacenter
Domenico Elia14Riunione ALICE Italia - Referee / Roma, ALICE Computing status Performance of the Italian sites Resource T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015
Domenico Elia15Riunione ALICE Italia - Referee / Roma, ALICE Computing status Performance of the Italian sites Resource T2: following the usual internal coordination plan monthly meetings (performance recording) + annual workshop overall ~50% increase in total WCT from 2014 to 2015 large upgrade in 2 sites (ReCaS) within 2015: CATANIA (in production since April, ~1500 core, 1 PB: Catania-VF) BARI (in production for ALICE since mid-August): ~300 server, 105 kHS06 (~10000 core) - 25 kHS06 CMS pledge + 10 kHS06 ALICE pledge ~4 PB disk storage PB tape library TB CMS pledge TB ALICE pledge 20 Gbit/s network connection (ready for 40 Gbit/s)
Domenico Elia16Riunione ALICE Italia - Referee / Roma, ALICE Computing status New ReCaS BA infrastructure Official opening July 9, 2015: BARI Tier-2 from BC2S to ReCaS: -migration from LUSTRE to pure XRootD -large opportunistic use of CPU (up to ~6000 slots) BC2S ReCaS Pledge 2015
Domenico Elia17Riunione ALICE Italia - Referee / Roma, ALICE Computing status New ReCaS BA infrastructure Official opening July 9, 2015: BARI Tier-2 from BC2S to ReCaS: -migration from LUSTRE to pure XRootD -large opportunistic use of CPU (up to ~6000 slots) BC2S ReCaS Pledge 2015
Domenico Elia18Riunione ALICE Italia - Referee / Roma, ALICE Computing status Performance of the Italian sites Bari Torino PD-LNL Catania Pledge: New ReCaS center in Bari New ReCaS center in Catania: Catania-VF
Domenico Elia19Riunione ALICE Italia - Referee / Roma, ALICE Computing status Performance of the Italian sites Monitoring T2 data from APEL: BALNLT1 CTTO
Domenico Elia20Riunione ALICE Italia - Referee / Roma, ALICE Computing status R&D activity and s/w for Run3 Virtual Analysis Facility (STOA-LHC PRIN): Cloud-based VAF deployed in BA, CA, LNL, TO and TS XRootD-based Data Federation (DF) set-up and populated: local redirectors in each site + national redirector in BA system fully tested, final PRIN report completed by end of April ’16
Domenico Elia21Riunione ALICE Italia - Referee / Roma, ALICE Computing status R&D activity and s/w for Run3 Virtual Analysis Facility (STOA-LHC PRIN) : Cloud-based VAF deployed in BA, CA, LNL, TO and TS XRootD-based Data Federation (DF) set-up and populated system fully tested, final PRIN report completed by end of April ’16 BA (next slide) Experience with TS Software development for Run3 (ITS-upgrade): vertexing and SA tracking based on cellular automaton (TO) geometry (AL) response simulation for the pixel (pAlpide) chip (TS, BS-PV) cluster shape definition (TO)
Domenico Elia22Riunione ALICE Italia - Referee / Roma, ALICE Computing status R&D activity on the Dashboard The project: a Dashboard concentrate in a single graphical interface all the information concerning the ALICE activity in each site (MonALISA, local Batch system, local Monitoring system metrics). Currently running in the BA Tier-2 site (since ~2 years) Recently exported to TO Next steps: –export in all ALICE Tier-2 and others WLCG sites –global dashboard for the Italian computing in ALICE Abstract to CHEP’16 Project with GARR: “ Sistema di monitoraggio per datacenter distribuiti geograficamente basati su OpenStack”
Domenico Elia23Riunione ALICE Italia - Referee / Roma, Sito web calcolo ALICE Italia
Domenico Elia24Riunione ALICE Italia - Referee / Roma, Sito web calcolo ALICE Italia Attività Contatti Documenti Link Eventi
Domenico Elia25Riunione ALICE Italia - Referee / Roma, Situazione risorse e richieste finanziarie
Domenico Elia26Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Italia In produzione al Tier-1: CPU:29000 HS06 (pledge 2016) DISK:3900 TB (pledge 2016) TAPE:5500 TB (pledge 2016) In produzione ai Tier-2 (+ Cagliari): BariCatania Padova- LNL TorinoCagliariTotale HS TB Disponibili (incluso obsoleti non ancora dismessi) Maggio 2016 Quota pledge da Febbraio 2016 (x2) Pledge 2016: HS TB
Domenico Elia27Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Tier-2 Acquisti seconda metà 2015: CPU: 1720 HS06 a LNL (bonus ~450 HS06) HS06 a TO storage: espansioni 4x180 TB a BA e LNL (bonus ~50 TB) esito ottimizzato con combinazione gare (BA) e acquisti di sito
Domenico Elia28Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) rinvio dismissioni storage CT/CA, per metà dismissioni PD-LNL e TO assegnata al 50% la richiesta overhead pledge 2016 garantite in accordo all’esito CRSG/RRB
Domenico Elia29Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Schema suddivisione tra i siti: CPU: ~14400 HS06 BA: HS06 (1950 crescita rimpiazzi = 3518 HS06) LNL: HS06 (2500 crescita rimpiazzi = 7996 HS06) TO: HS06 (1300 crescita rimpiazzi = 2884 HS06) DISK: ~620 TB BA: 1184 TB (260 crescita = 260 TB) LNL: 1202 TB (50 crescita rimpiazzi = 180 TB) TO: 1223 TB (100 crescita + 80 rimpiazzi = 180 TB)
Domenico Elia30Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Tier-2 Finanziamento 2016 da CSN3: richieste: 435 k€ (387 crescita e rimpiazzi + 48 overhead) assegnazioni: 332 k€ (308 crescita e rimpiazzi + 24 overhead) Situazione acquisti 2016: completati: BA: 3840 HS06 (BA) TB (espansione per LNL) LNL:8600 HS06 (LNL) + licenza per espansione storage da finalizzare: BA:260 TB (gara comune con CMS, totale ~200 k€) TO:2880 HS TB (sinergie con acquisti altre sigle e C3S) overhead (ricognizione esigenze completata e storni effettuati)
Domenico Elia31Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Situazione CPU/storage Tier-2 Situazione aggiornata con risorse 2016: CPU:45333 HS06 in eccesso al pledge: 1488 HS06 DISK: 4876 TB in eccesso al pledge: 47 TB BariCatania Padova- LNL TorinoCagliariTotale HS TB Disponibili a fine 2016 (fatte dismissioni + completati acquisti 2016*) Pledge 2016: HS TB * Ipotesi di buon esito acquisti residui a BA e TO
Domenico Elia32Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB
Domenico Elia33Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB HS TB Rinvio dismissioni storage dalla seconda metà del 2016 al 2017: 130 TB (CT) TB (LNL) + 80 TB (TO) + 20 TB (CA) = 360 TB Dismissioni ReCaS (BA e CT) previste nel 2018 = HS06
Domenico Elia34Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Dismissioni Anno di dismissione BariCatania LNL- Padova TorinoCagliariTotale HS TB HS TB HS TB Situazione complessiva Tier-2 a inizio 2017: CPU:45333 – 3840 = HS06 DISK:4876 – 591 = 4285 TB
Domenico Elia35Riunione ALICE Italia - Referee / Roma, Richieste finanziarie RRB Aprile 2016 Share INFN per 2017: CPU, DISK per Tier-1 e Tier-2: 18.9% (18.5% per 2016) TAPE per Tier-1: 34.8% (35.2 per 2016, 41.1% per 2015) RRB October 2015 RRB April 2015
Domenico Elia36Riunione ALICE Italia - Referee / Roma, Richieste finanziarie RRB Aprile % (4%) CPU al Tier-1 (0) +30% (22%) TAPE al Tier-1 (0) increased processing time for high pile-up pp events (x2) + TPC calibration issues increased raw data volume for pp events (x3.5) as observed in 2015 sample
Domenico Elia37Riunione ALICE Italia - Referee / Roma, Richieste finanziarie RRB Aprile % (4%) CPU al Tier-1 (0) +30% (22%) TAPE al Tier-1 (0) Richiesta suppletiva 2016 per Tier-1: CPU: 2700 HS06 35 k€ (pledge 2016 rev: HS06) TAPE:1.6 PB 40 k€ (pledge 2016 rev: 7.1 PB) increased processing time for high pile-up pp events (x2) + TPC calibration issues increased raw data volume for pp events (x3.5) as observed in 2015 sample
Domenico Elia38Riunione ALICE Italia - Referee / Roma, Richieste finanziarie RRB Aprile 2016 Incrementi 2016 rev. 2017: CPU:13.8% (T0) 31.5% (T1) 16.0% (T2) DISK:27.4%16.8%19.9% TAPE:30.8%39.9% RRB October 2015
Domenico Elia39Riunione ALICE Italia - Referee / Roma, Richieste finanziarie RRB Aprile 2016 Incrementi 2016 (RRB’15) 2017 (RRB’15): CPU:13.9% (T0) 31.8% (T1) 12.6% (T2) DISK:14.3%15.2%17.6% TAPE:19.0%26.3% RRB October 2015
Domenico Elia40Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Richieste 2017: Tier-1 e Tier-2 CPU Tier-1 (HS06) DISK Tier-1 (TB) TAPE Tier-1 (TB) CPU Tier-2 (HS06) DISK Tier-2 (TB) Pledged T1 Disp. – dismiss. T Scrutinati ALICE Delta Stima costo (k€) Totale (k€) Overhead T2 (k€) 54.1 Stima costi T2 (T1)*: 11 (13) € / HS06 e 200 (210) € / TB *Per 2016: 12 (14) €/HS06 e 220 (240) €/TB Dismissioni Tier-1: non incluse Overhead Tier-2: 6% CPU + 5% DISCO (rete) + 7% totale (server aggiuntivi)
Domenico Elia41Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Missioni calcolo 2016 Finanziamento da CSN3: richieste BA: assegnazione 2 k€, largamente insufficiente: già spesi ~9 k€: 3 x CdG CNAF (Elia), 2 x GARR (Elia+Vino), 2 x offline week (Elia+Vino), 2 x Torino dashboard (Elia+Vino), CB WLCG Lisbona (Elia), workshop T1/T2 (Elia), workshop CCR (Elia), riunioni referee PD+RM (Elia) CdG esteso intera infrastruttura (T1+T2): richiesta presenza RNC attività Vino, esportazione dashboard e partecipazione offline week
Domenico Elia42Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Missioni calcolo 2016 Finanziamento da CSN3: richieste BA: assegnazione 2 k€, largamente insufficiente: già spesi ~9 k€ CdG esteso intera infrastruttura (T1+T2): richiesta presenza RNC attività Vino, esportazione dashboard e partecipazione offline week stima impegni (minimale): 3 x CdG (1.5 k€), 3 x offline (3), 3 x meeting T2 (1.5), 1 x CSN3 (0.5), 3 x dashboard (1.5) richiesta suppletiva missioni 2016: 8 k€
Domenico Elia43Riunione ALICE Italia - Referee / Roma, Commenti finali e sommario Stato del calcolo ALICE: molto bene l’impiego delle risorse nel 2015 necessità di rivedere le stime risorse calcolo per Run2 ricostruzione dati 2015 in corso (correzione distorsioni TPC) siti italiani attivi (anche su R&D) ed efficienti (CPU overpledge) Sommario richieste finanziarie: richieste suppletive 2016: integrazione CPU/TAPE Tier-1 75 k€ missioni BA 8 k€ richieste ordinarie 2017: crescita CPU/DISK/TAPE Tier-1377 k€ rimpiazzi e crescita CPU/DISK Tier-2441 k€ overhead Tier-2 54 k€
Domenico Elia44Riunione ALICE Italia - Referee / Roma, Backup
Domenico Elia45Riunione ALICE Italia - Referee / Roma, ALICE Computing status Resource usage in 2015 CPU resource evolution: steady grouth of the number of active jobs system scaled from 500 to 100,000 concurrently running jobs scheduled analysis now prevaling on chaotic analysis organized analysis +60% in 2015 wrt 2014 better efficiency
Domenico Elia46Riunione ALICE Italia - Referee / Roma, ALICE Computing status Run2 overview
Domenico Elia47Riunione ALICE Italia - Referee / Roma, ALICE Computing status Status of 2015 data processing
Domenico Elia48Riunione ALICE Italia - Referee / Roma, Richieste finanziarie Richieste 2017: per sito Tier-2 Dismissioni HS06 / TBk€ Bari00,0 0 Catania00, ,8 LNL-Padova00, ,0 Torino384042, ,4 81,6 Cagliari00,0 204,0 Dismissioni totale HS06 / TBk€ , ,2 160,4 Crescita netta HS06 / TBk€ , ,9 281,0 Dismissioni + crescita HS06 / TBk€ , ,1 441,4