4/12/00M.Mazzucato – GR1 - Roma1 The INFN Regional Center(s) as main node(s) of the INFN Grid Mirco Mazzucato On behalf of Expts Comp. Coord. and tech experts(~EB of INFN-Grid) Paolo Capiluppi, Domenico Galli, Alberto Masoni, Laura Perini, Fulvio Ricci, Antonia Ghiselli, Federico Ruggieri
4/12/00M.Mazzucato – GR1 - Roma2 Main conclusion of the Hoffmann Review The Panel1 recommends the multi-tier hierarchical model proposed by Monarc as one key element of the LHC computing model with the majority of the resources not based at CERN : 1/3 in 2/3 out About equal share between Tier0 at CERN, Tier1s and lower level Tiers down to desktops Tier0 ier all Tier2 +… General consensus that GRID technologies developed by Datagrid can provide the way to efficiently realize this infrastructure All experiments should perform Data Challenges of increasing size and complexity until LHC start-up involving also Tier2 EU Testbed : 30-50% of one LHC experiment by 2003 (Match well with INFN Grid assumptions : 10% of final size for each experiment) Limit heterogeneity : OS = Linux + backup solution, Persistency = 2 tools max
4/12/00M.Mazzucato – GR1 - Roma3 HEP Regional Centre Hierarchy CERN Tier 0 Tier 1 Tier 2 Tier 3 Tier 4 FranceItaly 2.5Gbps UK Fermilab Tier2 center Site 2.5Gbps ~Gbps 622Mbps 2.5Gbps desktop 100Mbps-1Gbps INFN-GRID
4/12/00M.Mazzucato – GR1 - Roma4 Tier Computer Centers functionalities (LHC Computing Review) The Monarc model (first approximation) definitions: Tier0(Cern): raw data storage and first calibration + reconstruction. Very large storage capacity (MSS) Tier0+Tier1 at CERN : 10 PB/yr tapes; 2PB/yr disk; 2 M SI95 Tier1:further calibration+reconstruction passes. Large fraction of simulation and analysis. Large storage capacitiy. Associated support. All services. Typical Tier1 : 3 PB/yr tapes; 0.5PB/yr disk, 0.9M SI95 Tier2 and lower levels: simulation, analysis. Typical Tier2 = 20-30% of Tier1 Tier0 and Tier1 are basically open to all members of a Collaboration, a MoU will specify the conditions, Tier1 will then be needed for the lifetime of LHC Detailed quantitative estimates of resources and costs are not touched in this talk, but see F.Ruggieri talk for preliminary INFN estimates
4/12/00M.Mazzucato – GR1 - Roma5 R&D in Italy : THE INFN-GRID PROJECT The proposal has been submitted to INFN management and CSNs at the end of July and presented at CSN in September The size of the project : 26 Sites, ~200 people, ~ 70 FTEs The requested funding for 3 years : ~ 10 M Euro Activities are organized at national level around WPs developed by Datagrid (9.8 ME), testbeds and experiments applications. Aim to provide multi-tier integration. Large scale testbeds provided by LHC experiments (Regional Center prototypes) + Virgo Hope to receive recommendation of approval by CSNs <2000 Preliminary reserved financing of basic HW infrastructure and R&D: ~ 2 M Euros for year 2001 : OK compared to EU partners
4/12/00M.Mazzucato – GR1 - Roma6 Strategy of INFN-Grid Foster close collaboration between LHC experiments and computer professionals (~25 FTEs in INFN Grid) Software and Middleware Evaluation and deployment of existing GRID basic services (Globus, Condor,…) mainly looking for production quality Robustness, scalability (hundreds of users, hundreds of jobs to run, huge data sets, …), reliability very important requirements Use of GRID software for real applications and on real experiment production environments (CMS already started, Alice, ATLAS, LHCb coming soon..) Implement missing functionalities in close collaboration with US teams Incremental deliveries Tight collaboration with EU partners in Datagrid to Develop HEP middleware to implement multi-Tier Monarc model for expts Coordinate relations with US projects: Globus, GriPhyN, PPDG…
4/12/00M.Mazzucato – GR1 - Roma7 Strategy of INFN-Grid (Cont.) Hardware infrastructure Deploy testbeds using resources of all INFN sites, connected to DataGrid testbed as recommended by Panel1 of Hoffmann Review :..involve most of the final distributed computing system Develop Computing Fabric prototypes to understand issues related to : Architecture Hardware choices Performance Vs. cost of different disk technologies : SAN/RAID/SCSI/EIDE = 10/5/3/1 in cost Can we do everything with EIDE disks ? Which tech.is more adequate for each Data set ? Set up, maintenance and support of large systems Define network topology and evaluate net technology services
4/12/00M.Mazzucato – GR1 - Roma8 The target prototypes Regional Centers for LHC experiments in INFN-Grid All 4 LHC experiments have chosen to concentrate in this prototyping phase ( ) the bulk of computing resources in one place ATLAS : Rome CMS : Legnaro INFN National Laboratories Alice : Torino LHC-b(2001) : Bologna But since INFN has a positive advanced experience of distributed computing (e.g. Condor pool INFN) and since INFN manpower (Physicists and Computing experts) is spread out, several Tier1/2 functionalities are supported by few different sites : Atlas : Milano CMS : Bologna, Bari, Padova, Pisa and Rome Alice : Bologna, Bari, Catania No common discussions were done in INFN Grid about a possible solution for the final Tier1 centers. Decision was postponed at the end of the prototyping phase. Note: Hofmann Panels conclusions Comp. MoU end 2001 INFN management and CSN: anticipate decision on RCs
4/12/00M.Mazzucato – GR1 - Roma9 Hoffmann Review outcome relevant for INFN RCs For the first time planned an integrated world-wide computing system : Tier1s open to all members of the collaboration (Up time and efficiency!!) Very large weight given to Tier(>=2) Centers (Contrary to the past!) Share of resources between Tier1 and Tier(>2) centers is recognized to be an internal national affair CERN IT will provide and support, as for LEP, only the basic SW and HW infrastructure at CERN: Tier0 +Tier1 for each experiment. Budget limited personnel is foreseen to support experiments issues concerning the usage of this infrastructure. As for LEP apparently there will be no IT support on specific experiments activities : software distributions, productions, simulations, data replications and distribution, analysis etc. These are considered internal experiments issues
4/12/00M.Mazzucato – GR1 - Roma10 The issue: One vs more Tier1 in INFN Some preliminary considerations…. Technology (almost all commodity and scalable) Computing and storage fabric are built up from commodity components Simple PCs Inexpensive network attached disks Standard network interfaces and probably Standard Lan backbone (Whatever Fast-Giga Ethernet will be in 2006) High bandwidth WAN connections may not become a commodity but EU Community and Countries will probably support high bandwidth Research Network for strategic reasons (see EU Geant project, Garr-B….) Mass Storage will probably not be a commodity.. but future Optical Storage Very easy to split an re-group according to needs WAN in 2006 at Gbits like present PC bus Different from 1988 when the scene was dominated by mainframes Assume MSS is the only large not scalable single piece remaining Do we need it in INFN? Need technical evaluation !
4/12/00M.Mazzucato – GR1 - Roma11 One vs more Tier1 in INFN (Cont.) Can we learn from other experiences ? A lot from experienced HEP computing centers (CERN.., Babar, CDF… ) Technology, manpower….. But need to adapt computing model to INFN conditions INFN availability of computing professionals and know how is not negligible at all (~25 FTEs in Grid) but spread out in many different sites (26 in INFN Grid) INFN Computing Services are used to support computing facilities and network connections (A lot of people and expertise…but again distributed) INFN-Grid provides a framework for a large common cooperative effort in computing. Common project allows to best profit of all distributed expertise. Very positive up to know. Sharing of work between sites seems as effective as for detector construction. In the past very successful experience of collaboration INFN wide was done with INFNet, Garr2 and Garr-B planning, INFN Condor pool. Very important role again played by CNAF in providing central coordination and support to Datagrid middleware development and EU+INFN testbed deployment.
4/12/00M.Mazzucato – GR1 - Roma12 One vs more Tier1 in INFN (Cont.) INFN is unique in Europe (more similar to US) In France expertise and manpower mostly concentrated in Lyon Build up in 80s around a non scalable mainframe.. and very limited WAN badwidth Try now to establish more sharing (Marseille, Grenoble..) In UK computing professional manpower today is very limited. In Germany not a central organization like INFN (Computing mainly done in University Computing Centers…like CINECA or CILEA) and DESY is not involved In Netherland computing skill mostly concentrated in SARA (Multi-discipline CC)..
4/12/00M.Mazzucato – GR1 - Roma13 OUR CONCLUSION ON RCs The choice should be made with the only and unique objective to provide the most efficient and competitive way to perform data analysis for INFN groups taking into account the INFN reality and exploiting at best all available resources. Watch at other experiences, but no blind duplication!! The choice should take into account as much as possible INFN structure Up time and efficiency Experiment computing model Technology evolution and Costs The solution should provide a complete detailed plan of implementation taking into account role and commitments of the different sites, manpower available and foreseen, expertise etc.
4/12/00M.Mazzucato – GR1 - Roma14 OPTIONS TO BE EVALUTED In the first meeting it was decided to limit the evaluation to 3 options 1 TIER-1 MULTI-EXPERIMENT (Alice, Atlas, Cms, Lhc-B, Virgo) Located in Bologna area in a new (built or rent) computing facility Set up by CNAF Preliminary document prepared by Federico Ruggieri (yet to be discussed) 2 TIER-1 MULTI-EXPERIMENT Located in National labs of Frascati and Legnaro Documents in preparation 3 (or more) TIER-1 according to experiment whishes Compared to hyp 2 add (at least) 1 Tier1 in Torino General consensus that Tier1 have to be considered as special Grid node(s) which will provide to other experiment nodes (Tier2..Tier4) the functionalities that are missing or not convenient to duplicate (e.g. mass storage, system support manpower etc.). What is important is the integrated throughput of the overall system and not the one of one single component.
4/12/00M.Mazzucato – GR1 - Roma15 Preliminary set of questions For each of the 3 alternatives expts were asked to provide a document answering the following preliminary set of questions Global estimate of resources needed in Italy / Total expt. Provide evaluation and detailed computing model containing Role, resources, manpower etc. of each candidate Tier(n) site Relations between Tier-1 e Tier-n (functionalities and localization) Need of mass storage and localization Space and services available and to be acquired Available manpower by expts and Computing Services to support Tier Manpower to support experiment applications (Sim, Rec, An.) Implications on network connectivity of Tier location Manpower for expts user support Cost estimation Candidates Tier 1 sites asked to produce a document containing estimation of space,available and total manpower, costs etc.
4/12/00M.Mazzucato – GR1 - Roma16 Present Status Too short time for expts and candidates centers to produce documents discussed and agreed within the collaborations A lot of activities going on: Hoffman Review Datagrid INFN Grid Regional Centers prototypes Need to define timescale to have study and proposal(s) completed (CMS: June 1rst ?) Very preliminary answers received : raw material coming independently from each expts. Not at all discussed yet ! Shown to give flavor of starting views.
4/12/00M.Mazzucato – GR1 - Roma17 I MODELLI DI CENTRI REGIONALI POSIZIONE DI ALICE ITALIA CENTRO ITALIANO: CPU 450 KSI95, DISCO 400 TB (INSIEME TIER-1 + TIER-2) TOTALE ALICE: CPU 2100 KSI95, DISCO 1800 TB CONTRIBUTO ITALIANO COMPUTING 20% TOTALE SHARE COMPUTING: CERN, FRANCIA,GERMANIA,ITALIA CONTRIBUTO GRUPPI ITALIANI NELL' ESPERIMENTO ( FRANCIA+GERMANIA) ( 20% TOTALE) ATTUALMENTE: GRUPPI ITALIANI 92 FTE, 26 SUL CALCOLO ITALIA
4/12/00M.Mazzucato – GR1 - Roma18 MODELLI CONSIDERATI LA COLLABORAZIONE ALICE HA DISCUSSO (21-22/02/00) LE POSSIBILI ALTERNATIVE PER I CENTRI REGIONALI, LE CONCLUSIONI DELLA DISCUSSIONE SONO STATE: AVERE UN CENTRO TIER-1 DEDICATO ALL ESPERIMENTO ED UNA DISTRIBUZIONE DI RISORSE FRA DIVERSI TIER-2 E APPARSA LA SOLUZIONE MEGLIO RISPONDENTE ALLE ESIGENZE DI ALICE LA DISTRIBUZIONE OTTIMALE DELLE RISORSE FRA I VARI TIER VERRA EFFETTUATA SULLA BASE DELLA SPERIMENTAZIONE UNA NUOVA DISCUSSIONE SUI TRE MODELLI E AL MOMENTO IN ATTO ITALIA
4/12/00M.Mazzucato – GR1 - Roma19 CARATTERISTICHE TIER-1,2 DISTRIBUZIONE POTENZA TIER-1 - ( TIER-2) 50% - 50% (cfr anche LHC Computing Review) FUNZIONI TIER-1 CENTRO DI RACCOLTA ED ELABORAZIONE DATI ACQUISITI (100% DATI RICOSTRUITI E FRAZIONE DEI DATI RAW) SIMULAZIONI SUPPORTO ESPERIMENTO MANUTENZIONE CODICE E GESTIONE DEI DATI SUPPORTO COLLABORAZIONE ITALIANA COLLEGAMENTO CON TIER-2 E CON CERN ED ALTRI TIER-1 PERSONALE DEDICATO ALL' ESPERIMENTO CON COMPETENZE SPECIFICHE PER LESPERIMENTO SE IL TIER-1 NON COINCIDE CON UN CENTRO TIER-1,2 QUESTE FUNZIONALITA POTREBBERO SPOSTARSI IN PARTE SUI TIER-2 ITALIA
4/12/00M.Mazzucato – GR1 - Roma20 CARATTERISTICHE TIER-1,2 FUNZIONI TIER-1 ROBOTICA NASTRI: SI MA SI TIENE IN CONSIDERAZIONE ANCHE OPZIONE DATI ON-LINE SU DISCO IL SUO UTILIZZO DIPENDERA' DALL' EVOLUZIONE DEI COSTI DELLO STORAGE FUNZIONI TIER-2 SIMULAZIONI CENTRO DI RACCOLTA ED ELABORAZIONE DATI ACQUISITI (FRAZIONE DATI RICOSTRUITI ) SUPPORTO ESPERIMENTO (SE IL TIER-1 NON COINCIDE CON UN CENTRO TIER-1,2 DI ALICE, cfr. SLIDE PRECEDENTE) ROBOTICA NASTRI: NO ITALIA
4/12/00M.Mazzucato – GR1 - Roma21 LOCALIZZAZIONE CENTRI IPOTESI 1/2 (TIER-1 NON DEDICATO ALL' ESPERIMENTO) TIER-1 IPOTESI 1: CNAF IPOTESI 2 LNF, LNL QUESTE DUE IPOTESI SONO AL MOMENTO ALLO STADIO DI DISCUSSIONE PRELIMINARE ALL INTERNO DELLA COLLABORAZIONE CERTAMENTE (cfr. SLIDES PRECEDENTI) IN QUESTI DUE CASI DIVENTA RILEVANTE L ORGANIZZAZIONE DEL SUPPORTO SPECIFICO ALL ESPERIMENTO, PARTE DEL QUALE,POTREBBE ESSERE SPOSTATO SUI TIER-2 IPOTESI 3 (TIER-1 DI ESPERIMENTO) TIER-1: Torino TIER-2 Bari, Bologna,Catania TIER-3 Cagliari,Catania,Padova,Salerno,Trieste N.B. LE SCELTE DEFINITIVE SUI SITI E SUL BILANCIAMENTO RELATIVO DELLE RISORSE SI PRENDERANNO SULLA BASE DEI RISULTATI DELLA SPERIMENTAZIONE ITALIA
4/12/00M.Mazzucato – GR1 - Roma22 ATLAS and the INFN RCs Functions and requirements for INFN Tier1s The main function of the ATLAS Tier1 will be to house the impressive amount of disk space needed for ESD, as well as the bulk of the CPU needed; possibly, a huge MSS system will be required to store the full ESD sample. ATLAS would accept to share the Tier1 center with one or more of the other LHC experiments. The requirements for the Tier1 RC are: up-time and efficiency network connectivity fair response to ATLAS priorities fair sharing of resources with the other experiments The above demands require that the Tier1 be supervised by a scientific body, steered by the experiment(s).
4/12/00M.Mazzucato – GR1 - Roma23 ATLAS and the INFN RCs Number of Tier1s and of ATLAS Tier2s ATLAS has no specific requirements on the total number of Tier1s in Italy. This has to be balanced with cost considerations. In the preliminary report of the Distributed Computing Panel of the Hoffmann Review the Tier1s outside CERN are assumed to account for ~1/3 of the computing resources ( = Tier0+Tier1 at CERN, and = to lower Tiers).
4/12/00M.Mazzucato – GR1 - Roma24 ATLAS and the INFN RCs ATLAS views about INFN Tier1s In no ATLAS INFN site enough manpower resources are available for supporting a Tier1, as far as system management and operation are concerned. According to a 1998 estimate, the total manpower of this kind, that could be extracted at LHC start from all the Italian ATLAS sites, amounts to few FTEs. In no site as much as 2 full FTEs will be available. A new recognition will be performed in the next months, taking into account possible reassignment of resources, e.g. from LEP experiments. However, no huge variation is expected. ATLAS thus need outsourcing of this kind of manpower. We would also accept a solution where INFN provides for the housing, system management and operation of the ATLAS computing h/w in one or more INFN center(s).
4/12/00M.Mazzucato – GR1 - Roma25 ATLAS and the INFN RCs The manpower for s/w maintenance and user support ATLAS thinks that the personnel who takes care of the experiment s/w (installation, maintenance, user support etc.) and some core tools (DB, GRID etc.) has to work in close contact with the physicists taking care of analysis, simulation, reconstruction,etc. At least the leading part of this personnel needs to have gained real experience of the experiment s/w, having also normally taken part in the development of some parts. The personnel with this profile can only be located and trained in the sites where there are groups of physicists actively working in ATLAS s/w and computing. These requirements apply mostly for the initial phase, after the first years of running there may be less need for the close contact with physicists.
4/12/00M.Mazzucato – GR1 - Roma26 ATLAS and the INFN RCs The manpower for s/w maintenance and user support (2) As said before, ATLAS will benefit from the close interaction between one or more ATLAS groups and the staff of the Tier1. This opportunity requires the localization of the Tier1 close to some Atlas site. In the unfavorable case this will not happen, a strong presence of software experts, belonging to ATLAS laboratories, will be necessary in the Tier1 site. Therefore, in the case of a Tier1 far from ATLAS sites, we would prefer to have the s/w experts in the ATLAS groups, with at most 1 or 2 people based in the Tier1 site.
4/12/00M.Mazzucato – GR1 - Roma27 ATLAS and the INFN RCs The manpower for s/w maintenance and user support (3) The s/w professionals of the experiment have thus to be located mainly in the ATLAS groups, and in the Tier1 only if it happens to coincide with one of them. A fair fraction of these professionals will have however to work for ATLAS-Italy as a whole and not just for a single site: automatic installation and checks via network possible with the GRID tools the GRID project will entail h/w and procedure standardization of the different sites the possibility for people in different sites to work efficiently to a common project will be enhanced by powerful collaborative tools
4/12/00M.Mazzucato – GR1 - Roma28 ATLAS and the INFN RCs The ATLAS Tier2s and Tier3s Just desktops will probably count for 10-15% of the CPU resources of an experiment. They will be most probably clustered around servers providing common s/w to all the users of a site (Tier3). In this context, a Tier2 differs from a Tier3 mainly because it serves users of different sites. Clearly the mode of operations will evolve with time, starting from one or few prototypes, used by all the groups. The ATLAS sites who have already expressed interest in participating to the GRID experimentation with the aim of assuming a TierN (N>1) role are: Rome1, Milan, Naples, Pavia, Rome2 starting in 2001 Genova, Pisa starting in 2002 The s/w experts could cluster around ATLAS Tier2 sites, which would provide h/w and system resources for testing and implementing the new developments (s/w, GRID, DB etc.)
4/12/00M.Mazzucato – GR1 - Roma29 ATLAS and the INFN RCs Number of Tier1s and of ATLAS Tier2s The discussion on the functions of Tier2s has just started in ATLAS: the decisions taken till now in different countries are mostly politically driven Some small countries will have no Tier1 and a Tier2 will be their main national computing infrastructure; Some plan to have no Tier2 (Germany); Some plan to have as many resources in Tier2s as in Tier1s (US); The roles Tier2s will take in Italy for analysis, simulation etc., will depend both from the site(s) chosen for the ATLAS Tier1, from the global ATLAS Computing Models and from the opportunities offered by the GRID. In the next 3 years, it is important we set up some ATLAS sites providing Italy wide services (Tier2 prototypes), both in human and h/w resources; the natural candidates for such sites are today Rome1 and Milan.
4/12/00M.Mazzucato – GR1 - Roma30 ATLAS and the INFN RCs ATLAS Tier2s functionality As for the functions and the h/w resources: in the next months, we will finalize the layout for the next 3 years (the GRID project span); for the LHC time, it will be decided in ~1 year from now (in time for the computing MoU);
4/12/00M.Mazzucato – GR1 - Roma31 CMS Tier0 +Tier1 CERN resources (Storage) Tier0+Tier1 # of eventsEv-size (MBytes) Active Tape/Archive tape/Disk (Tbytes) Raw data1.E /1000/0 Rec.Raw1.E /0/200 Calibrat.0/10/10 Simulation (repository)5.E /1000/0 Re-proc. ESD1.E /200/200 Rec-simulation 5.E /200/30 Reprocessed ESD (Tier1)2.E /100/40 Revised ESD2.E /100/40 General AOD1.E /10/10 Revised AOD2.E /2/2 Local AOD,TAG, DPD2.E+080/10/10 Cache Disk for active Tapes0/0/154 User Data0/0/100 Total1540/2632/796
4/12/00M.Mazzucato – GR1 - Roma32 CMS, each Tier1 outside CERN resources (Storage) Tier1 # of events Ev-size (MBytes) Active Tape/Archive tape/Disk (Tbytes) SIM.Out1.E+082.0/200/0 SIM.Rec.1.E /0/30 Raw-sample5.E /0/0 Calibration0/10/10 ESD1.E /0/0 Re-proc.ESD0.2E /100/40 Re-vised ESD0.2E /100/40 General AOD1.E /10/10 Revised AOD2.E /2/2 TAG1.E /1/1 Local AOD,TAG, DPD2.E+080/10/10 User data0/0/50 Cache Disk for active Tapes0/0/120 Total590/433/313
4/12/00M.Mazzucato – GR1 - Roma33 CMS, each Tier2 outside CERN resources (Storage) Tier2 Active Tape/Archive tape/Disk (Tbytes) Local cached data (real + simulated)0/0/50 User data0/50/20 Total0/50/70 TOTAL Active tape for CMS : TIER0/1 CERN + 5 TIER1 = X590 = 4490 TB TOTAL Archive tape for CMS : TIER0/1 CERN + 5 TIER TIER2 = X X50= 6047 TB TOTAL tape for CMS : TIER0/1 CERN + 5 TIER TIER2 = X X50 = TB TOTAL Disk for CMS : TIER0/1 CERN + 5 TIER1 +25 TIER2 = 696+5X X70 = 3761 TB A total of 5 CMS Tier1s (outside CERN) are currently foreseen: Candidates are US(FNAL), UK(RAL), FR(Lyon), IT(INFN), Russia(Moscow) and a total of about 25 CMS Tier2s are also foreseen
4/12/00M.Mazzucato – GR1 - Roma34 CMS CPU resources, CERN Tier0+1, Tier1s Data Processing Tier0+1 CERN # eventsCPUper event kSI95/ev.s CPUtotal kSI95 Reconstruction1.E Reprocessing1.E+093. Included above Selection1.E+07-1.E+08Up to Analysis and DPD1.E TOTAL CPU615 Data Processing Each Tier1 # eventsCPUper event kSI95/ev.s CPUtotal kSI95 Simulation0.25E Rec-Simulation0.25E Re-Processing0.1E Selection1.E+07-1.E+08Up to Analysis1.E TOTAL153
4/12/00M.Mazzucato – GR1 - Roma35 CMS CPU resources, Tier2s Data Processing Each Tier2 # eventsCPUper event kSI95/ev.s CPUtotal kSI95 Simulation0.5E Rec-Simulation0.5E Analysis1.E TOTAL32 TOTAL CPU for CMS : TIER0/1 CERN + 5 TIER TIER2 = X X32 = 2180 kSI95
4/12/00M.Mazzucato – GR1 - Roma36 Architettura di Computing (Processing, Re-processing, Analysis) FLESSIBILE èGERARCHIA (Tier1, …2, …3,…) Sperimentazione di possibili Modelli adatti allINFN, ovvero EFFICIENTI èRC Distribuito Misto Individuazione delle Risorse, dei Costi e delle Responsabilita (MoU) èMan-Power èInvestimenti èChi fa cosa Realizzazione di Prototipi(o) di scala adeguata èSviluppo e realizzazione della Gestione (software, priorita, coordinazione) èUso delle risorse per attivita correnti importanti, ad es. le simulazioni di Fisica per HLT [non servono case study astratti] Collaborazione nello sviluppo e studio con altri Esperimenti LHC (INFN in particolare) e non solo. Decisions taken for INFN-Grid RC prototypes
4/12/00M.Mazzucato – GR1 - Roma37 RC Distribuito Misto Definizione piugrosso primus inter pares Tier1 composto da piu Siti tra i quali Uno e piu grosso, ma come primus inter pares ; comunque distribuito Personale e risorse distribuiti ad hoc Massima banda di rete tra i Siti ed il CERN Distribuzione bilanciata dei Servizi Slides as of November 1999, CMS
4/12/00M.Mazzucato – GR1 - Roma38 CMS current understandings (1) CMS Italia preferisce un Tier1 dove esistano diretti interessi per lEsperimento Se cosi non dovesse essere la gestione del Centro diventa critica ed inoltre del personale direttamente coinvolto nel Sotware e nelle attivita di Calcolo di CMS dovrebbe essere presente al Centro permanentemente CMS Italia ritiene che il Tier1 debba essere confrontato con: Gli altri Tier1 della Collaborazione Le dimensioni dei Tier2 (dislocati presso le Sezioni) Le funzionalita che modificano il Modello originale di MONARC adattandolo/integrandolo con i Tools di GRID CMS ritene che i Tools alla GRID possano rappresentare un notevole arricchimento delle potenzialita del Computing, oltre che del personale coinvolto Ed infatti CMS e fortemente impegnata in questo campo
4/12/00M.Mazzucato – GR1 - Roma39 La nuova flessibilta introdotta nel Modello Gerarchico di MONARC impone di tenere conto anche dei Tier inferiori ai Tier1. Pertanto non si possono disegnare e programmare soltanto i Tier1 (ed il Tier0 al CERN). La regola 1/3 2/3 va correttamente applicata tenedo conto anche delle Istituzioni che non avranno un Tier1, ma che dovranno poter fare lAnalisi dei dati. Eseguire lanalisi dei dati in modo efficiente e competitivo e il primo ed ultimo scopo del Computing di CMS (ed ovviamente anche della parte italiana di CMS) Il modello sopra citato (MONARC + GRID) risulta piu flessibile ed efficiente nellutilizzo delle risorse di Computing e di Personale, ed assumono un ruolo essenziale i Tier2, che CMS sta valorizzando a tutti i livelli di funzionalita I Tier2 permettono una piu diretta interazione nellanalisi e trattamento dei dati da parte dei Ricercatori La Gerarchia distribuita che cosi viene a crearsi valorizza anche le risorse dei Tier3 ed persino dei Tier4 (desktop), permettendo la fattiva collaborazione e completa partecipazione di tutti i Fisici coinvolti, compresi i laurendi! CMS current understandings (2)
4/12/00M.Mazzucato – GR1 - Roma40 CMS Computing non intende andare in conflitto con le responsabilita e gli impegni di personale sui Detector, ma ritiene che lattivita di Computing e la sua realizzazione siano parte integrante di CMS Italia CMS intende pertanto valutare realistiche implementazioni (ed e disponibile a collaborare nella valutazione) di Modelli di Tier1 e Tier2 in tempi brevi (qualche mese), valutazioni basate su Personale disponibile nelle varie sedi e/o da acquisire (anche temporaneo) Matching delle risorse e delle disponibilita di servizi per le attivita gia oggi in atto (simulazioni e studi di Fisica) Tempistiche Infrastrutture necessarie e disponibil Balancing di funzioni e risorse tra il Tier1 ed i Tier2 Fattibilita tecniche e di investimento Soluzioni di gestione dei centri (compreso loutsourcing se necessario) Gestione dei conflitti e della dinamicita delle risorse in Tier1 comuni a piu Esperimenti CMS current understandings (3)
4/12/00M.Mazzucato – GR1 - Roma41 Nellarco di un anno dovranno essere definiti i compiti, le funzionalita, le dimensioni e le dislocazioni dei Centri (Tier1 e Tier2) prototipali per CMS Italia (anche, ma non solo, per arrivare ad un MoU sul Computing) Oggi CMS propone alcune soluzioni temporanee che coprono le esigenze delle simulazioni per il prossimo anno (Legnaro, Bari, Bologna, Padova, Pisa, Roma1, Torino) CMS intende utilizzare queste soluzioni come semi per valutare il progresso e la dislocazione dei prototipi La scelta delle Sedi, dinamica e progressivamente da discutere coi Referee nominati dallINFN (oltre che allinterno della Collaborazione), e stata gia effettuata da CMS, configurando un programma di sviluppo che attraversa tutte le sedi italiane in modo differenziato CMS ritiene che la scelta di un Tier1 unico e/o in condominio con altri Esperimenti, debba essere effettuata dallINFN sulla base di un confronto scientifico di progetti differenti, in tempi brevi. CMS current understandings (4)
4/12/00M.Mazzucato – GR1 - Roma42 LHCb Computing Model LHCb data storage RAW and ESD data stored only in production centres (real data at CERN, MC data at the Tier-1 centre which produced them). No systematic RAW/ESD distribution/replication (very small samples are sent on demand: 10% the first 2 year, 2% later). No group analyses identified First stage in the analysis performed in common for all the analyses that subsequently follow. AOD and TAG produced in production centres (CERN for real data, Tier-1 centrfes for MC). AOD and TAG data systematically distributed to all Tier-1 centres via network as soon as they are produced (or regenerated) and there stored on disks. No Tier-2 centres. 80 TB/a real data AOD exported from CERN. 120 TB/a MC AOD exported from every Tier-1.
4/12/00M.Mazzucato – GR1 - Roma43 LHCb Computing Model (II) LHCb analysis model Analysis jobs run on Tier-3. Tier-3 analysis jobs send requests for events to Tier-1. Processes at Tier-1 select events and send them to the Tier-3 which requested them (tipically 10 7 events, 200 GB AOD and 10 GB TAG). Tier-3 jobs execute analysis which produces ntuples for interactive studies at Tier-4. LHCb alternative model for high statistics channels (e.g. B D * : 10 time more statistics than the other channels) 10 8 events, 2 TB AOD involved. Moving analysis jobs to data (from Tier-3 to Tier-1).
4/12/00M.Mazzucato – GR1 - Roma44 LHCb Regional Centres LHCb-Italy plans for: 1 concentrated Tier-1 computing centre; 9 Tier-3 computing centres, located in Bologna, Cagliari, Ferrara, Firenze, Frascati, Genova, Milano, Roma1, Roma2. LHCb Tier-1 can be housed indifferently: in a LHCb dedicated computing centre (like 2001 LHCb setup in Bologna) provided that technical manpower will be hired and trained by INFN; in a multi-experiment INFN national computing centre, provided that LHCb will have its own reosurces satisfying the experiment requirements. LHCb Tier-1 is thought to operate efficiently with resources concentraded in only one site. LHCb Tier-1 needs a tape library.
4/12/00M.Mazzucato – GR1 - Roma45 Incremental Acquisition of Equipment Each Year for a LHCb Tier-1 Regional Centre Units MC events a CPU SI Disk TB Active (robotic) tape TB Archive tape TB
4/12/00M.Mazzucato – GR1 - Roma46 VIRGO needs for data analysis Time constraints -The Central Interferometer of VIRGO will produce data in The full VIRGO interferometer will produce data in 2003 The VIRGO computing model 2 sites for the raw data storage: Tier 0 in Italy and Tier 1 in France Computing for VIRGO in Italy : 1 Tier 0, 2 Tiers 2, 2 Tier 3, Tier functions Tier 0 (raw data storage) Cascina (Virgo site) Tier 2/Tier 3 (data base and computing for pulsar search) Roma/Firenze Tier 2/Tier 3 (coalescent binary system search) Napoli/Perugia
4/12/00M.Mazzucato – GR1 - Roma47 VIRGO needs for data analysis Summary of the VIRGO needs for computing and network connections units end 2001end 2003 CPU capacity + 8,000 SI95 (350 Gflops) SI95(3.5Tflops) estd. number of cpus disk capacityTBytes disk I/O rateGBytes/sec 5 5 sustained data rateMbytes/sec WAN links to CascinaMbits/sec1552,500 WAN links to labs Mbits/sec WAN links to FranceMbits/sec
4/12/00M.Mazzucato – GR1 - Roma48 VIRGO needs for data analysis Man power for the VIRGO test bed and the full deployment of the system TIER 0: In Cascina there are already two system engineers. An operator staff of few units(2/3) is needed to run the system. The European Gravitational Observatory (EGO) consortium can provide it. TIER 2: INFN Section support is under discussion: in case of merging computing necessities from various experiments on site, the man power will be reduced. A realistic alternative for TIER 2 management is the outsourcing: consortia (as for example CASPUR for Roma La Sapienza) can provide the service for TIER 2.
4/12/00M.Mazzucato – GR1 - Roma49 VIRGO needs for data analysis Network improvement is compulsory in any scenario. The IN2P3 computer center in Lyon already required it explicitly. The Cascina site must be connected to the GARR-B backbone at higher speed immediately. Once the Central Interferometer runs (2001), VIRGO have to start the distribution of the data via network to the VIRGO groups. The analysis of data produced by the network of the Interferometric Gravitational Wave antennas require also network improvements.
4/12/00M.Mazzucato – GR1 - Roma50 CONCLUSIONS Work to evaluate Tier1 RCs location and relation with other Tier(n) in INFN has started Need time to understand, discuss and agree on possible choices Cannot be an abstract solution but should be accompanied by a detailed implementation plan defining role and responsibilities of each site Proposed date to finish the work is around June 2001