Scaricare la presentazione
La presentazione è in caricamento. Aspetta per favore
PubblicatoFederico Marrone Modificato 6 anni fa
1
Commissione Scientifica III stato e prospettiva Run2/Run3
Roma, 21 Marzo 2017 Calcolo ALICE: stato e prospettiva Run2/Run3 Domenico Elia Domenico Elia Riunione CSN3 / Roma,
2
Outline ALICE computing in LHC Run2: Prospects for Run3: Conclusions
ALICE data taking objectives in Run2 data collection/processing and Grid usage in expectations for (including requests) Prospects for Run3: new computing model (O2 system) raw estimate of the needed resources Conclusions Domenico Elia Riunione CSN3 / Roma,
3
Run2 data taking objectives
ALICE Run2 Run2 data taking objectives Pb-Pb collisions: reach target of 1 nb-1 integrated luminosity for rare triggers increase statistics of min bias and centrality triggered events pp collisions: collect reference rare trigger sample of 40 pb-1 (equivalent to 1 nb-1 sample in Pb-Pb) enlarge statistics of the unbiased data sample (including min bias collisions at top energy) p-Pb collisions: enlarge the existing data sample (in particular the unbias event 5.02 TeV) Domenico Elia Riunione CSN3 / Roma,
4
Run2 data taking 2015-2016 ALICE Computing @ Run2 Total 2016:
p-p HLT compression, High IR Pb-Pb End of year stop No HLT compression HLT + ROOT compression, Low IR p-A Total 2016: ~7 PB raw 80% replicated to T1s (ran out of tape) Domenico Elia Riunione CSN3 / Roma,
5
Data processing status
ALICE Run2 Data processing status Raw data processing: problem with the TPC track distorsion solved: correction algorithm Run3-like developed finalized and fully validated with both pp and 2015 Pb-Pb allowed to start raw data processing: 2015 Pb-Pb p-Pb period longest pp data taking periods, both for 2015 and 2016 complete 2015 and 2016 data sets for di-muon spectrometer Associated MC productions in time for QM (end 2016) Full processing of remaining 2015/2016 data ongoing Domenico Elia Riunione CSN3 / Roma,
6
Grid usage High Grid usage (opportunistic resources):
ALICE Run2 Grid usage High Grid usage (opportunistic resources): average 76K parallel jobs, stable share of the various workflows HLT cluster used as Grid site (4K jobs, ~5% of the total) Domenico Elia Riunione CSN3 / Roma,
7
Resource usage and shares
ALICE Run2 Resource usage and shares CPU Delivered MHS06 hours Tape Used Wall/CPU time Domenico Elia Riunione CSN3 / Roma,
8
Plans for 2017-2018 Computing model: Critical input parameters:
ALICE Run2 Plans for Computing model: no (relevant) changes in Offline procedures/tools largely the same model used also for Run1 Critical input parameters: usage of LHC computing resources closely scrutinized: several times per year, in great detail any change (increase) needs to be thoroughly justified LHC better + Experiments better means more data: first critical resource is tape to store raw data CERN added tape in 2016 to allow overrun (all LHC Exps) ALICE is *NOT* allowed to take more pp data Domenico Elia Riunione CSN3 / Roma,
9
Expectations for 2017-2018 ALICE Computing @ Run2 Pb-Pb 2018 ~12 PB
pp ~17.5 PB During pp data taking mode will be set to limit the TPC readout rate to 400 Hz: total amount of data recorded will be 17.5 PB Pb-Pb run in 2018: assuming the HLT compression of a factor of 6, total readout rate of 10 GB/s total amount of data recorded will be 12 PB Domenico Elia Riunione CSN3 / Roma,
10
Richieste/assegnazioni 2017
ALICE Run2 Richieste/assegnazioni 2017 Assegnazioni CSN3 (Tier-2): richieste: 438 k€ (390 crescita e rimpiazzi + 48 overhead) assegnazioni inventario: 376 k€ garantita intera crescita netta (287 k€) parte dei rimpiazzi a LNL-PD, CT e TO (74 k€) 30% della richiesta overhead (15 k€) CPU Tier-1 (HS06) DISK Tier-1 (TB) TAPE Tier-1 CPU Tier-2 DISK Tier-2 Pledged T1 Disp. – dismiss. T2 29045 3885 5491 41493 4285 Scrutinati ALICE 2017 40885 4625 9940 50875 5791 Delta 11840 740 4449 9382 1506 Stima costo (k€) 94.7 148.0 111.2 89.1 301.1 Totale (k€) 353.9 390.2 Overhead T2 (k€) 47.7 T2 T1 Domenico Elia Riunione CSN3 / Roma,
11
Richieste/assegnazioni 2017
ALICE Run2 Richieste/assegnazioni 2017 Assegnazioni CSN3 (Tier-2): richieste: 438 k€ (390 crescita e rimpiazzi + 48 overhead) assegnazioni inventario: 376 k€ garantita intera crescita netta (287 k€) parte dei rimpiazzi a LNL-PD, CT e TO (74 k€) 30% della richiesta overhead (15 k€) ulteriori richieste a Ottobre 2016: motivate prevalentemente dalla performance LHC (non per ALICE) ~20-30% aggiuntivo rispetto a CRSG Aprile 2016 per tutti gli exp ALICE: 90 k€ per T1 e 160 k€ per T2 riconosciuto solo TAPE (tutto) e DISK (50%): ~30 k€ a carico della GE Domenico Elia Riunione CSN3 / Roma,
12
Richieste (preliminary) 2018
ALICE Run2 Richieste (preliminary) 2018 Come da ultimo RRB (Ottobre 2016), in revisione: CPU Tier-1 (HS06) DISK Tier-1 (TB) TAPE Tier-1 CPU Tier-2 DISK Tier-2 Pledged 2017 38295 4477 10815 50875 5791 Preliminary ALICE 2018 56610 6105 18305 81030 7752 Delta 18315 1628 7490 30155 1961 Stima costo (k€) 146 325 187 286 392 Totale (k€) 658 678 Overhead T2 (k€) 84 Stima costi: 10 € / HS06 e 200 € / TB (2017), possibile riduzione ~10% (da discutere con collegio referale calcolo LHC) Dismissioni: non incluse (ReCaS CPU …) Sconta la quota non assegnata nel 2017, verosimile analoga riduzione ... Domenico Elia Riunione CSN3 / Roma,
13
Richieste (preliminary) 2018 etc
ALICE Run2 Richieste (preliminary) 2018 etc Discussione con CRSG in corso: nuovo documento richieste (2017 and 2018) in preparazione al momento definita solo la stima per run speciali in corso stima dettagliata per la gran parte di presa dati: numero di eventi e dimensione evento per ciascuna categoria stime finali per il prossimo CRSG/RRB (entro un mese) Numeri stimati a Gennaio 2016 per CSN3: 2017: ~500 k€ 2018: ~650 2019: ~600* 2020: ~550* Rimpiazzi ReCaS (BA, CT): 20 kHS06 (~200 k€) 1.5 PB (~300 k€) * Incrementi limitati (~10%) e prevalentemente rimpiazzi. Domenico Elia Riunione CSN3 / Roma,
14
ALICE upgrade New conditions after LS2 (2019-2020):
ALICE Run3 ALICE upgrade New conditions after LS2 ( ): expected peak interaction rate: 50 kHz (now 8 kHz) no reliable trigger strategies for several physics channels goal for Run3: increase readout rate to 50 kHz (now ~1 kHz) improve pointing resolution both in the barrel (new ITS) and in the forward muon arm (new Muon Forward Tracker, MFT) Capability of reducing online the data volume delivered by the detectors, since the expected integrated luminosity is > 10 nb-1 for Pb-Pb (x100 wrt Run1) New ITS: 7 pixel layers 10 m2 of silicon 12.5 G pixel Domenico Elia Riunione CSN3 / Roma,
15
ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system expected data rate: ~1.1 TB/s O2 project aims to integrate in a single infrastructure: DAQ, HLT, Offline (for the reconstruction part) O2 TDR approved in September 2015 by the LHCC data volume coming from the detectors must be substantially reduced before sending data to the mass storage online processing is the only option computing strategy must rely on a heterogeneous architecture to match the interaction rate: ~250 FLP worker nodes equipped with FPGA ~1500 EPN worker nodes equipped with GPU yearly amount of data ( ): ~50 PB Domenico Elia Riunione CSN3 / Roma,
16
ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system expected data rate: ~1.1 TB/s O2 project aims to integrate in a single infrastructure: DAQ, HLT, Offline (for the reconstruction part) O2 TDR approved in September 2015 by the LHCC impressive online data volume reduction for the TPC: zero suppression clustering and compression removal of clusters not associated to interesting particle tracks (eg very low momentum electrons) data format optimization (largely based on the present HLT results) Domenico Elia Riunione CSN3 / Roma,
17
Computing centers in Run3
ALICE Run3 Computing centers in Run3 Grid Tiers mostly specialized for given role: O2 facility (2/3 reco & calibration), T1s (1/3 reco & calibration, archiving to tape), T2 (simulation) AODs collected on few specialized Analysis Facility (AF) sites capable of processing ~5 PB data in ½ time scale Reconstruction Calibration Archiving CTF: Compressed Time Frame Reconstruction Calibration typically (a fraction of) HPC facility: ~10-20,000 cores / 5-10 PB disk storage on a very performant file system Simulation GOAL: minimize data movement and optimize processing efficiency! Analysis Domenico Elia Riunione CSN3 / Roma,
18
Replication and deletion policy
ALICE Run3 Replication and deletion policy No replication policy: only one instance of each raw data file (CTF) stored on disk backup on tape (restore from tape in case of data loss) Deletion policy: with the exception of raw data (CTF) and derived analysis data (AOD), all other intermediate data from variuos processing stages are transient (removed after a given processing step) or temporary (limited lifetime) all CTF stored on disk buffers (in O2 and T1s) in the previous year will have to be removed before new data taking starts all data not finally processed during this period will remain parked on tapes until the next opportunity for the re-processing arises (LS3) Domenico Elia Riunione CSN3 / Roma,
19
Expected resource needs
ALICE Run3 Expected resource needs Estimates based on: no replication + deletion policies just mentioned an online compression factor of ~16 ~20% yearly growth during Run3 ~x 2 of the resources at the end of Run3 wrt end of Run2 pessimistic estimates based on a compression factor of ~12 ~27% yearly growth during Run3, ~x 2.5 end of Run3 wrt Run2 Caveat AF out of these evaluations: 2-3 centers, progressive deployment (no need full size at Run3 start) impact: 10-15% of the total WLCG resources provided as in-kind contribution to the experiment? Domenico Elia Riunione CSN3 / Roma,
20
Conclusioni Proseguimento Run2: Prospettive Run3:
computing model consolidato, attività stabile stime per i prossimi 3 anni: ~500 k€ / anno per T2 discorso a parte per T1: stime dello stesso ordine o di poco superiori a quelle per T2 accordo in corso di approvazione INFN-CINECA per 2018: - tutti i nodi di calcolo per LHC (crescita e rimpiazzi) su A1 di CINECA - complessivamente 100 kHS06 solo per 2018 (valore 1 M€) - ~metà della dotazione CPU di CNAF, costo atteso ~1/4 primo passo per una evoluzione dell’infrastruttura Prospettive Run3: nuovo modello di calcolo, risorse entro x 2.5 wrt Run2 AFs quota non marginale, in aggiunta Domenico Elia Riunione CSN3 / Roma,
Presentazioni simili
© 2024 SlidePlayer.it Inc.
All rights reserved.