La presentazione è in caricamento. Aspetta per favore

La presentazione è in caricamento. Aspetta per favore

Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni.

Presentazioni simili

Presentazione sul tema: "Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni."— Transcript della presentazione:

1 Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni

2 Otranto, 8/6/06M. Paganoni2 La federazione dei Tier2 di CMS a breve pagina twiki Legnaro-Padova e Roma approvati come Tier2 di CMS Pisa è Tier2 sub-judice (costo infrastruttura) Bari è proto-Tier2 (determinazione infrastruttura e OK locale) Finanziamenti per il 2006 verranno discussi al CSN1 di luglio Tutti 4 i centri contribuiscono a CMS, con il supporto forte delle comunità di riferimento (inclusi Tier3)

3 Otranto, 8/6/06M. Paganoni3 Tier2 Legnaro-Padova 76 computing nodes (152 cpus), most of them in 5 Intel Blade Centers (with dual Xeon from 2.4GHz to 3.0GHz), plus some dual-core Opteron 275 (~ 200 kSI2K) Old “production” storage: disk servers with 3ware RAID arrays, access through ‘classic’ rfio protocol (16 TB) New storage (under a storage mgmt system, currently DPM, not yet in production for CMS): –~ 5TB in old 3ware servers (used in SC3) –~ 7TB in our new SAN infrastructure (FC controllers + SATA/FC disk boxes): just installed the first components, need to build experience on this, plan to use in SC4

4 Otranto, 8/6/06M. Paganoni4 Tier2 Roma 11 WN for a total of 23 kSI2k + 3 service machines (CE, UI, Squid) range from PIII (being phased out) to dual core Opterons 275 4 NAS servers, 16 TB effective 2 for local use (6 TB) 2 for Grid use (3 TB classic SE, 7 TB DPM SE)

5 Otranto, 8/6/06M. Paganoni5 Stato attuale Tier2

6 Otranto, 8/6/06M. Paganoni6 Richieste 2006 Tier2

7 Otranto, 8/6/06M. Paganoni7

8 Otranto, 8/6/06M. Paganoni8 The roles of Tier0,1,2 for CMS Tier0 (CERN): –safe keeping of RAW data (first copy); –first pass reconstruction; –distribution of RAW and RECO to Tier1; –reprocessing of data during LHC down-times. Tier1 (ASCC,CCIN2P3,FNAL,GridKA,INFN-CNAF,PIC,RAL): –safe keeping of a proportional share of RAW and RECO (2 nd copy); –large scale reprocessing and safe keeping of the output; –distribution of data products to Tier2s and safe keeping of a share of simulated data produced at these Tier2s. Tier2 (~40 centres): –handling analysis requirements; –proportional share of simulated event production and reconstruction.

9 Otranto, 8/6/06M. Paganoni9 Service Challenge 4 SC4 goal is to progress the distributed computing infrastructure to a production level service (WLCG) In April throughput phase for disk-to-disk and disk-to-tape transfers In May roll-out of gLite 3.0 The first two weeks of June CMS will complete a computing model functionality test (rerun of the functionalities missing in SC3) The last two weeks of July: integration tests The first two weeks of September CMS will prepare CSA06 (see next slides)

10 Otranto, 8/6/06M. Paganoni10 Transfer activities Tier1-Tier2 for SC4 Tier-1 to Tier-2: very bursty and driven by analysis Goal is to reach from 10MB/s (worst Tier-2s) to 100MB/s (best Tier-2s) by June 2006. Tier-2 to Tier-1: continuous simulation transfers Goal is to reach 10MB/s from Tier-2s to Tier-1 centers (1TB per day) The PhEDEx FTS integration has been reached Two tools (Heartbeat and transfer activity) help CMS with the continuous transfer CMS distributed analysis uses CMS Remote Analysis Builder (CRAB), now interfaced to CMSSW Also trivial file catalogs work The goal is 25 - 50 kjobs/day

11 Otranto, 8/6/06M. Paganoni11 First outcomes from SC4 The difficult part has been the end-to-end system and maintaining the rates over long periods of time It takes too long to get going and it takes too much effort to keep going Even if the challenge has concentration periods we need a continous effort to make things work and scale Need a CMS coordinator to monitor PhEDEx and a service coordinator to monitor FTS (shifts ?) A larger number of application failures come from data publishing and data access problems than from problems with grid submission Need more testing of the new event data model and data management infrastructure

12 Otranto, 8/6/06M. Paganoni12 Goals of SC04 Transfers Demonstration of PhEDEx driving FTS at EGEE sites Demonstration of Data Administration on sites Transfer into Trivial File Catalog and Access Data Remove Data from site Request new data for site Achieve Tier-1 to Tier-2 transfers at all permutations Analysis Workflow CRAB Access to CMSSW Data at all sites Bulk submission use of gLite Achieve more than 1k successful jobs/day on all Tiers Production Workflow Submission to all participating LCG and OSG sites and return of results Data registration in DBS and import to PhEDEx for replication to CERN

13 Otranto, 8/6/06M. Paganoni13 Computing, Software, & Analysis Challenge 2006 –A 50 million event exercise to test the workflow and dataflow associated with the data handling and data access model of CMS –Receive from HLT (previously simulated) events with online tag at 25 % of the HLT bandwidth (35-40 Hz) –Prompt reconstruction at Tier-0, including determination of calibration constants (some FEVT and all AOD to the Tier-1s) –Streaming of ~7 physics datasets (Local creation of AOD and distribution to all Tier-1s) –Physics jobs on AOD at some Tier-1s –Skim jobs at some Tier-1s with data propagated to Tier- 2s to run there Physics Jobs (50 kjobs-day in total) Wide scale system test of software-computing synchronization at the production level focusing on the early data scenario. Performance metric under scrutiny

14 Otranto, 8/6/06M. Paganoni14 Timescale foreseen for CSA06 1-6-06: Simulation Software ready for CSA06 Computing systems ready for SC4 15-6-06: Physics validation complete 1-7-06: start simulation production (25M minbias; 5M electrons; 5M muons; 5M jets; 5M HLT “cocktail”; 5M miscalibrated/misaligned) 15-8-06: Calibration, alignment, HLT, reconstruction, and analysis tools ready 30-8-06: 50 Mevt produced, 5M with HLT pre-processing 1-9-06: Computing systems ready for CSA06 15-9-06: Start CSA06 15-11-06: Finish CSA06

15 Otranto, 8/6/06M. Paganoni15 Resources needed for CSA06 Taking into account that 40% of the resources are located at the Tier-2s and that CSA06 is a test at 25% of what is needed in 2008 ➨ 100 CPUs per Tier-2 ➨ 25 TB per Tier-2 ➨ 10-100 MB/s to each Tier-2 Should test most of the possible Tier-1 Tier-2 permutations The pre-production of MC events is on the critical path

16 Otranto, 8/6/06M. Paganoni16 Coordinamento delle attività Phone conference settimanale (lun 14:30) Riunioni periodiche delle comunità di riferimento dei Tier2 (ex. Roma 18-5-06) Riunioni al CNAF per il coordinamento di Tier1 e Tier2 Riunioni al CERN per il coordinamento delle attività con CMS e WLCG (SC4, CSA06, …) Contatti con altri centri di calcolo della collaborazione (Lione, DESY, Barcellona, …) Dashboard (pagina web o wiki) Oltre ai responsabili locali, ogni Tier2 individua le persone che svolgono le funzioni di site manager per CRAB, PhEDEx, produzione MC

17 Otranto, 8/6/06M. Paganoni17 Site manager di CRAB –Mantiene i contatti con la comunità degli sviluppatori Per definire quando è necessario fare upgrade, seguire eventuali problemi,... –Mantiene i contatti con la comunità degli utenti Necessità specifiche? Richieste? Supporto? –Installazione/configurazione e manutenzione Capire se ci sono necessità specifiche –Software da installare sulle macchine? –Configurazioni di code dedicate? –In contatto con coordinatore nazionale CRAB (S. Lacaprara)

18 Otranto, 8/6/06M. Paganoni18 Site manager di PhEDEx –Gestisce le operazioni day-to-day di PhEDEx Controlla log per eventuali problemi,... –Richiede l’iniezione di nuovi file, in risposta alle richieste di CMS della comunità di utenti “locali” del Tier2 –Gestisce l’iniezione dei file prodotti dal T2 in PhEDEx –Agisce da punto di contatto con gli sviluppatori ed i gestori PhEDEx dei Tier1 e degli altri Tier2 –Determina necessità specifiche Spazio disco insufficiente ?... –Installazione/configurazione e manutenzione sistemistica di PhEDEx –In contatto con coordinatore nazionale PhEDEx (D. Bonacorsi)

19 Otranto, 8/6/06M. Paganoni19 Site manager della produzione MC Gestisce la produzione MC ufficiale del T2 interfacciandosi con CMS –Richiede nuovi dataset quando una produzione è completa –Verifica che il trasferimento dei dati prodotti sia andato a buon fine –Ottimizza l’uso delle risorse (CPU, disco,...) –Compiti day-to-day di produzione Controllo log, produzioni fallite ed eventuali resubmit,... –Gestisce le richieste di update del software di produzione Interfacciandosi con il Software Manager di CMS –Richiede manutenzione sistemistica, quando necessaria –In contatto con coordinatore nazionale Produzione MC (S. Gennai)

20 Otranto, 8/6/06M. Paganoni20 The Tier2 and the GRID infrastructure CMS user point of view: 1.Hidden interface to distributed data and resources (CRAB) 2.Standard and unified support interface (GGUS ticketing system) 3.Advanced policy management(in the near future) Dynamic allocation of resources for task of production and analysis Dynamic allocation of resources for CMS analysis groups Tier2 administrator point of view: 1.TIER2 infrastructure can be built upon standard grid farm infrastructure (maintained by grid people) by sharing hardware and middleware support 2.User access, authentication and management done by the GRID Middleware 3.Grid infrastructure controlled and monitored 24 hour/day 7day/week 4.automatic discovery of problems related to: job submission, data management etc. handled by OMC, CIC’s and ROC’s support (via ticketing system) ROC shifts GridICE notification 5.Shared interface for error handling of user related problems and infrastructure failures 6.Standard information system to publish farm configuration and software tags

21 Otranto, 8/6/06M. Paganoni21 Open questions Storage Management (dCache/DPM/STORM) –DPM è attualmente preferito per semplicità di interfaccia, dai Tier2, ma la sua scalabilità non è garantita Ha problemi di interfaccia con srm (implementazione per castor) e altre funzionalità mancanti –dCache richiede localmente una expertise più complessa, ma è scalabile a sistemi più complessi –STORM è in fase di sviluppo Database locali –Trivial Catalogue o implementazione locale di LFC?

22 Otranto, 8/6/06M. Paganoni22 Conclusioni Stiamo mettendo insieme la struttura della Federazione La difficoltà principale consiste nel processo di decisione a molti livelli (Tier2 locale, Federazione Tier2, esperimento, GRID) Abbiamo bisogno che CCR continui il supporto, specialmente sugli aspetti di gestione sistemistici e di consulenza per le gare

Scaricare ppt "Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni."

Presentazioni simili

Annunci Google