Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni.

Slides:

Advertisements

Presentazioni simili

A.Fanfani - C.Grandi CMS Bologna 10 febbraio 2009 La nuova farm di CMS Bologna al CNAF Alessandra Fanfani Claudio Grandi.

Advertisements

P. Capiluppi Organizzazione del Software & Computing CMS Italia I Workshop CMS Italia del Computing & Software Roma Novembre 2001.

1 La farm di ATLAS-Napoli 1 Gb/s 7 nodi con 2 CPU PIII a 1 GH, RAM 512 MB, 2 schede di rete a 100 Mb/s. Server con 2 CPU PIII a 1 GH, RAM 1 GB, 2 schede.

Test del Monitoraggio del Tracker usando un Tier2 M.S. Mennea, G. Zito, N. De Filippis Università & INFN di Bari Riunione Consorzio – Torino 18 Novembre.

Queuing or Waiting Line Models

La facility nazionale Egrid: stato dell'arte Egrid-Team Trieste, 9 ottobre 2004.

Tier1 - cpu KSI2k days ATLAS KSI2k days CMS. Tier1 - storage CASTOR disk space CMS requires: T1D0, T0D1 ATLAS requires: T1D0, T0D1 and T1D1.

Conclusioni M. Paganoni workshop CMS Italia, Napoli 13-14/2/07.

1 M. Biasotto – Legnaro, 22 Dicembre 2005 Prototipo Tier 2 di Legnaro-Padova INFN Legnaro.

5 Feb 2002Stefano Belforte – INFN Trieste calcolo per CDF in Italia1 Calcolo per CDF in Italia Prime idee per lanalisi di CDF al CNAF Numeri utili e concetti.

Capitolo 20: Sistemi multimediali

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari

Project Review Novembrer 17th, Project Review Agenda: Project goals User stories – use cases – scenarios Project plan summary Status as of November.

A man has decided to include in his home intelligent systems to monitor the home. He installed two water pulse counters into the central water system.

INFN-BOLOGNA-T3 L. Rinaldi I siti Tier-3 nel modello di calcolo di Atlas Configurazione del sito INFN-BOLOGNA-T3 Attività di Analisi e Produzione Attività.

CSN1 – 7 febbraio 2006 Francesco Forti, INFN-Pisa per il gruppo di referaggio.

FESR Consorzio COMETA Pier Paolo CORSO Giuseppe CASTGLIA Marco CIPOLLA Industry Day Catania, 30 Giugno 2011 Commercial applications.

CCR 14-15/03/2006 Status Report Gruppo Storage CCR.

Infrastruttura GRID di produzione e i T2 Cristina Vistoli Cnaf.

CMS RPC R&D for phase 2 Two types of upgrades proposed for the CMS RPC muon system: 1.Aging and longevity: installed in 2007, must continue to operate.

Test Storage Resource Manager per SC4 Giacinto Donvito Vincenzo Spinoso.

16 Maggio CSN1 Computing-Software-Analysis CMS-INFN TEAM Analisi in CMS: stato e prospettive del supporto italiano.

Extreme Cluster Administration Toolkit Alberto Crescente, INFN Sez. Padova.

News da LHC… *** Discussion title: CMS Commissioning Notes from 09:00 meeting Current fill (1182) has now been in stable beams for >12 hours. Delivered.

CMS RPC R&D for phase 2 Two types of upgrades proposed for the CMS RPC muon system: 1.Aging and longevity: installed in 2007, must continue to operate.

LNL CMS M.Biasotto, Roma, 22 novembre I Tier2 in CMS Italia Massimo Biasotto - LNL.

8-Giugno-2006L.Perini Workshop Otranto 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid.

Calcolo LHC - F. Ferroni, P. Lubrano, M. SozziCSN1 - Catania Calcolo LHC 2003 (F. Ferroni, P. Lubrano, M. Sozzi)

23 Giugno CSN1 P. Capiluppi CMS Computing 2003 e oltre u Stato e richieste u LCG e CMS u Modello di Calcolo CMS.

Condor standard. Sistema Batch. Tool di installazione D. Bortolotti,P.Mazzanti,F.Semeria Workshop Calcolo Paestum 9-12 Giugno 2003.

LNF Farm E. V. 9/8/2006. Hardware CE, LCFG, HLR, 3 WN: DL 360 1U; SE: DL 380 2U 5 WN: BL 25 P In totale 25 jobs general purpuse (coda Atlas) + una coda.

8 Maggio 2002Workshop CCR - La Biodola W2K Coordination Group & HEP-NT Report Enrico M.V. Fasanelli Gian Piero Siroli.

Un problema multi impianto Un’azienda dispone di due fabbriche A e B. Ciascuna fabbrica produce due prodotti: standard e deluxe Ogni fabbrica, A e B, gestisce.

Layered Grid Architecture. Application Fabric “Controlling elements locally”: Access to, & control of, resources Connectivity “Talking to Grid elements”:

CMS a Trieste Roma, 05/04/2005 GDR. 2 Cronologia : Matura interesse per LHC in Sezione/Gruppo 1 a Trieste Giro d’orizzonte degli esperimenti.

BaBar Tier A Administration Workshop CCR, Paestum Giugno 2003 Alberto Crescente, INFN Sez. Padova.

21/9/06M. Paganoni, Trieste, CSN11 CMS Computing 2007 M. Paganoni CSN1, Trieste, 21/9/2006.

Workshop CCR Otranto - giugno 2006 Gruppo storage CCR Status Report Alessandro Brunengo.

Cosa cambia per CDF  S.Belforte mantiene per ora 20% (1 day/week) su CDF: il necessario per una attivita’ di coordinamento e transizione verso un nuovo.

Il calcolo LHC in Italia: commenti Gruppo di referaggio Forti (chair), Belforte  Bossi, Menasce, Simone, Taiuti, Ferrari, Morandin, Zoccoli.

Halina Bilokon ATLAS Software di fisica DC1 – DC2 DC1 aprile fine 2003 (versioni di software  3.x.x – 7.x.x)  Validation del Software  Aggiornamento.

CMS 1 M. Biasotto – Bologna 20/01/2005 Infrastruttura di calcolo per CMS-Italia M.Biasotto – INFN Legnaro e i gestori dei centri CMS Italia.

Calcolo LHC Francesco Forti, Università e INFN – Pisa Per il gruppo di referaggio: F. Bossi, C. Bozzi, R. Carlin, R. Ferrari, F.F., D.Martello, M.Morandin,

Riunione CCR 21/12/2005 Gruppo Storage Relazione sulla analisi di infrastrutture Fibre Channel e presentazione attivita’ per il 2006 Alessandro Brunengo.

 SLP Tests in VME test stand: Saverio – Pierluigi (Daniel)  New test stand: Enrico (Saverio – Pierluigi)  VME Tests versus new test stand tests (Enrico-Saverio-Pierluigi)

P5  2009 shifts VS shifts until the end of 2009  2010 plan.

Project Review Novembrer 17th, Project Review Agenda: Project goals User stories – use cases – scenarios Project plan summary Status as of November.

CMS RPC ITALIA' , Settembre Ischia-ITALIA RPC DCS Giovanni Polese.

Storage (ieri, oggi e domani) Luca dell’Agnello INFN-CNAF.

BOLOGNA Prin-STOA Report L. Rinaldi Bari – 12/11/2015.

D. Martello Dip. Fisica - Lecce Sintesi piani esperimenti CSN2 CNAF 7-marzo-2007.

Atlas Italia - Milano, 17/11/2009 G. Carlino – News dal Computing 1 1 News dal computing Gianpaolo Carlino INFN Napoli Atlas Italia, Milano, 17/11/09 Nuovo.

Parma, 22 Settembre 2010 G. Carlino – ATLAS, Attività di 7 TeV 1 ATLAS Attività di TeV Attività di computing Attività di computing.

Referaggio, 17 Marzo 2010 G. Carlino – ATLAS – Referaggio Tier2 1 Referaggio Tier2 ATLAS Attività di Computing 2009 Attività di Computing 2009 Stato dei.

P. Morettini. Organizzazione della CCR Le principali attività della CCR consistono da un lato nell’assegnazione di fondi per le infrastrutture di rete.

Calcolo a LHC Concezio Bozzi, INFN Ferrara per il gruppo di referaggio: F. Bossi, CB, R. Ferrari, D. Lucchesi, D. Martello, [M. Morandin], S. Pirrone,

ATLAS NAPOLI Software & Computing e il Tier-2 Gianpaolo Carlino INFN Napoli Il gruppo ATLAS di Napoli Le attività Software & Computing Il prototipo Tier-2.

The INFN Tier-1: progetto di ampliamento Cristina Vistoli – INFN CNAF Referee Meeting Sep

10 Ottobre CSN1 P. Capiluppi Tier2 CMS Italia 3Bari  In comune con Alice 3Legnaro  In comune con Alice 3Pisa 3Roma1  In comune con Atlas Sedi.

ATLAS Italia – Sestri Levante, 15 Giugno 2010 G. Carlino – Richieste Run Efficiency = time for physics / total time LHC Efficiency = time with colliding.

CCR - Roma 15 marzo 2007 Gruppo storage CCR Report sulle attivita’ Alessandro Brunengo.

Domenico Elia1 Calcolo ALICE: stato e richieste finanziarie Domenico Elia Riunione Referee Calcolo LHC / Padova, Riunione con Referee Calcolo.

1 Computing di BaBar Fabrizio Bianchi Universita’ di Torino e INFN Sez. di Torino Roma, 20 novembre 2007.

Futuro di EGI EGI è menzionato esplicitamente nel draft delle nuove calls EU ( H2020 ) Da ultima versione (per me) data 18-9 di –HORIZON 2020 – WORK PROGRAMME.

Esigenze di Rete degli Esperimenti LHC e di Gr1 G. Carlino – INFN Napoli CCR – Roma 8 Settembre 2014.

20-21/03/2006Workshop sullo storage - CNAF Alessandro Brunengo.

Do You Want To Pass Actual Exam in 1 st Attempt?.

Dichiarazione dei servizi di sito nel GOCDB

From 8 to 80 boxes. From FBSNG to Condor CPU Satura !

Assegnazione risorse Stato INFN CNAF,

Transcript della presentazione:

Otranto, 8/6/06M. Paganoni1 La federazione dei Tier2 di CMS M. Paganoni

Otranto, 8/6/06M. Paganoni2 La federazione dei Tier2 di CMS a breve pagina twiki Legnaro-Padova e Roma approvati come Tier2 di CMS Pisa è Tier2 sub-judice (costo infrastruttura) Bari è proto-Tier2 (determinazione infrastruttura e OK locale) Finanziamenti per il 2006 verranno discussi al CSN1 di luglio Tutti 4 i centri contribuiscono a CMS, con il supporto forte delle comunità di riferimento (inclusi Tier3)

Otranto, 8/6/06M. Paganoni3 Tier2 Legnaro-Padova 76 computing nodes (152 cpus), most of them in 5 Intel Blade Centers (with dual Xeon from 2.4GHz to 3.0GHz), plus some dual-core Opteron 275 (~ 200 kSI2K) Old “production” storage: disk servers with 3ware RAID arrays, access through ‘classic’ rfio protocol (16 TB) New storage (under a storage mgmt system, currently DPM, not yet in production for CMS): –~ 5TB in old 3ware servers (used in SC3) –~ 7TB in our new SAN infrastructure (FC controllers + SATA/FC disk boxes): just installed the first components, need to build experience on this, plan to use in SC4

Otranto, 8/6/06M. Paganoni4 Tier2 Roma 11 WN for a total of 23 kSI2k + 3 service machines (CE, UI, Squid) range from PIII (being phased out) to dual core Opterons NAS servers, 16 TB effective 2 for local use (6 TB) 2 for Grid use (3 TB classic SE, 7 TB DPM SE)

Otranto, 8/6/06M. Paganoni5 Stato attuale Tier2

Otranto, 8/6/06M. Paganoni6 Richieste 2006 Tier2

Otranto, 8/6/06M. Paganoni7

Otranto, 8/6/06M. Paganoni8 The roles of Tier0,1,2 for CMS Tier0 (CERN): –safe keeping of RAW data (first copy); –first pass reconstruction; –distribution of RAW and RECO to Tier1; –reprocessing of data during LHC down-times. Tier1 (ASCC,CCIN2P3,FNAL,GridKA,INFN-CNAF,PIC,RAL): –safe keeping of a proportional share of RAW and RECO (2 nd copy); –large scale reprocessing and safe keeping of the output; –distribution of data products to Tier2s and safe keeping of a share of simulated data produced at these Tier2s. Tier2 (~40 centres): –handling analysis requirements; –proportional share of simulated event production and reconstruction.

Otranto, 8/6/06M. Paganoni9 Service Challenge 4 SC4 goal is to progress the distributed computing infrastructure to a production level service (WLCG) In April throughput phase for disk-to-disk and disk-to-tape transfers In May roll-out of gLite 3.0 The first two weeks of June CMS will complete a computing model functionality test (rerun of the functionalities missing in SC3) The last two weeks of July: integration tests The first two weeks of September CMS will prepare CSA06 (see next slides)

Otranto, 8/6/06M. Paganoni10 Transfer activities Tier1-Tier2 for SC4 Tier-1 to Tier-2: very bursty and driven by analysis Goal is to reach from 10MB/s (worst Tier-2s) to 100MB/s (best Tier-2s) by June Tier-2 to Tier-1: continuous simulation transfers Goal is to reach 10MB/s from Tier-2s to Tier-1 centers (1TB per day) The PhEDEx FTS integration has been reached Two tools (Heartbeat and transfer activity) help CMS with the continuous transfer CMS distributed analysis uses CMS Remote Analysis Builder (CRAB), now interfaced to CMSSW Also trivial file catalogs work The goal is kjobs/day

Otranto, 8/6/06M. Paganoni11 First outcomes from SC4 The difficult part has been the end-to-end system and maintaining the rates over long periods of time It takes too long to get going and it takes too much effort to keep going Even if the challenge has concentration periods we need a continous effort to make things work and scale Need a CMS coordinator to monitor PhEDEx and a service coordinator to monitor FTS (shifts ?) A larger number of application failures come from data publishing and data access problems than from problems with grid submission Need more testing of the new event data model and data management infrastructure

Otranto, 8/6/06M. Paganoni12 Goals of SC04 Transfers Demonstration of PhEDEx driving FTS at EGEE sites Demonstration of Data Administration on sites Transfer into Trivial File Catalog and Access Data Remove Data from site Request new data for site Achieve Tier-1 to Tier-2 transfers at all permutations Analysis Workflow CRAB Access to CMSSW Data at all sites Bulk submission use of gLite Achieve more than 1k successful jobs/day on all Tiers Production Workflow Submission to all participating LCG and OSG sites and return of results Data registration in DBS and import to PhEDEx for replication to CERN

Otranto, 8/6/06M. Paganoni13 Computing, Software, & Analysis Challenge 2006 –A 50 million event exercise to test the workflow and dataflow associated with the data handling and data access model of CMS –Receive from HLT (previously simulated) events with online tag at 25 % of the HLT bandwidth (35-40 Hz) –Prompt reconstruction at Tier-0, including determination of calibration constants (some FEVT and all AOD to the Tier-1s) –Streaming of ~7 physics datasets (Local creation of AOD and distribution to all Tier-1s) –Physics jobs on AOD at some Tier-1s –Skim jobs at some Tier-1s with data propagated to Tier- 2s to run there Physics Jobs (50 kjobs-day in total) Wide scale system test of software-computing synchronization at the production level focusing on the early data scenario. Performance metric under scrutiny

Otranto, 8/6/06M. Paganoni14 Timescale foreseen for CSA : Simulation Software ready for CSA06 Computing systems ready for SC : Physics validation complete : start simulation production (25M minbias; 5M electrons; 5M muons; 5M jets; 5M HLT “cocktail”; 5M miscalibrated/misaligned) : Calibration, alignment, HLT, reconstruction, and analysis tools ready : 50 Mevt produced, 5M with HLT pre-processing : Computing systems ready for CSA : Start CSA : Finish CSA06

Otranto, 8/6/06M. Paganoni15 Resources needed for CSA06 Taking into account that 40% of the resources are located at the Tier-2s and that CSA06 is a test at 25% of what is needed in 2008 ➨ 100 CPUs per Tier-2 ➨ 25 TB per Tier-2 ➨ MB/s to each Tier-2 Should test most of the possible Tier-1 Tier-2 permutations The pre-production of MC events is on the critical path

Otranto, 8/6/06M. Paganoni16 Coordinamento delle attività Phone conference settimanale (lun 14:30) Riunioni periodiche delle comunità di riferimento dei Tier2 (ex. Roma ) Riunioni al CNAF per il coordinamento di Tier1 e Tier2 Riunioni al CERN per il coordinamento delle attività con CMS e WLCG (SC4, CSA06, …) Contatti con altri centri di calcolo della collaborazione (Lione, DESY, Barcellona, …) Dashboard (pagina web o wiki) Oltre ai responsabili locali, ogni Tier2 individua le persone che svolgono le funzioni di site manager per CRAB, PhEDEx, produzione MC

Otranto, 8/6/06M. Paganoni17 Site manager di CRAB –Mantiene i contatti con la comunità degli sviluppatori Per definire quando è necessario fare upgrade, seguire eventuali problemi,... –Mantiene i contatti con la comunità degli utenti Necessità specifiche? Richieste? Supporto? –Installazione/configurazione e manutenzione Capire se ci sono necessità specifiche –Software da installare sulle macchine? –Configurazioni di code dedicate? –In contatto con coordinatore nazionale CRAB (S. Lacaprara)

Otranto, 8/6/06M. Paganoni18 Site manager di PhEDEx –Gestisce le operazioni day-to-day di PhEDEx Controlla log per eventuali problemi,... –Richiede l’iniezione di nuovi file, in risposta alle richieste di CMS della comunità di utenti “locali” del Tier2 –Gestisce l’iniezione dei file prodotti dal T2 in PhEDEx –Agisce da punto di contatto con gli sviluppatori ed i gestori PhEDEx dei Tier1 e degli altri Tier2 –Determina necessità specifiche Spazio disco insufficiente ?... –Installazione/configurazione e manutenzione sistemistica di PhEDEx –In contatto con coordinatore nazionale PhEDEx (D. Bonacorsi)

Otranto, 8/6/06M. Paganoni19 Site manager della produzione MC Gestisce la produzione MC ufficiale del T2 interfacciandosi con CMS –Richiede nuovi dataset quando una produzione è completa –Verifica che il trasferimento dei dati prodotti sia andato a buon fine –Ottimizza l’uso delle risorse (CPU, disco,...) –Compiti day-to-day di produzione Controllo log, produzioni fallite ed eventuali resubmit,... –Gestisce le richieste di update del software di produzione Interfacciandosi con il Software Manager di CMS –Richiede manutenzione sistemistica, quando necessaria –In contatto con coordinatore nazionale Produzione MC (S. Gennai)

Otranto, 8/6/06M. Paganoni20 The Tier2 and the GRID infrastructure CMS user point of view: 1.Hidden interface to distributed data and resources (CRAB) 2.Standard and unified support interface (GGUS ticketing system) 3.Advanced policy management(in the near future) Dynamic allocation of resources for task of production and analysis Dynamic allocation of resources for CMS analysis groups Tier2 administrator point of view: 1.TIER2 infrastructure can be built upon standard grid farm infrastructure (maintained by grid people) by sharing hardware and middleware support 2.User access, authentication and management done by the GRID Middleware 3.Grid infrastructure controlled and monitored 24 hour/day 7day/week 4.automatic discovery of problems related to: job submission, data management etc. handled by OMC, CIC’s and ROC’s support (via ticketing system) ROC shifts GridICE notification 5.Shared interface for error handling of user related problems and infrastructure failures 6.Standard information system to publish farm configuration and software tags

Otranto, 8/6/06M. Paganoni21 Open questions Storage Management (dCache/DPM/STORM) –DPM è attualmente preferito per semplicità di interfaccia, dai Tier2, ma la sua scalabilità non è garantita Ha problemi di interfaccia con srm (implementazione per castor) e altre funzionalità mancanti –dCache richiede localmente una expertise più complessa, ma è scalabile a sistemi più complessi –STORM è in fase di sviluppo Database locali –Trivial Catalogue o implementazione locale di LFC?

Otranto, 8/6/06M. Paganoni22 Conclusioni Stiamo mettendo insieme la struttura della Federazione La difficoltà principale consiste nel processo di decisione a molti livelli (Tier2 locale, Federazione Tier2, esperimento, GRID) Abbiamo bisogno che CCR continui il supporto, specialmente sugli aspetti di gestione sistemistici e di consulenza per le gare