Tiziana Ferrari (INFN CNAF), Luciano Gaido (INFN TO) INFN Grid Operations Tiziana Ferrari (INFN CNAF), Luciano Gaido (INFN TO)
Outline Statistics: EGEE III SA1 ongoing activities Availability and reliability of Italian region LHC job submission via WMS EGEE III SA1 ongoing activities Grid core services: plan of upgrade Testbeds (status) Richieste 2009: Inventariabile Missioni e consumo
EGEE Availability/Reliability: June 08 (1/3) 36 certified sites Improvement of Italian region availability/reliability: still a lot of progress to be made 20-23 June: top-level BDII down (electrical power outage at T1 computing room affecting the DNS server) top-level BDII failover mechanism prevented to work Entire IT region affected Actions: Configuration of secondary DNS servers for the cnaf.infn.it domain and related sub-domains (GARR) - DONE IT ROC: direct weekly monitoring of SAM statistics of every site (from July 08)
EGEE Availability/Reliability: June 08 (2/3) ITALY
EGEE Availability/Reliability: June 08 (3/3) ITALY
Availability/Reliability: Jan-May 08 SAM tests affected by a gLite mw bug (Classic SE) An automatic mechanism for statistics amendment in case of mw and SAM tests problems STILL MISSING Alarms automatically raised in case of SAM test failures need to be put in production Re-engineering of SAM test in progress (Nagios, regionalization of probes)
LHC Job submission via WMS WMS Monitor: https://cert-wms- 01.cnaf.infn.it:8443/wmsmon/main/main.php Stats shown here are collected for the WMS production servers at CNAF (RB statistics not included here, a few additional WMS outside CNAF non included either) Submitted jobs (not including test activities): ALICE: 0.38 Mjob (migration to WMS since mid June) ATLAS: 0.2 Mjob CMS: 3.3 Mjob LHCb: 0.33 Mjob
ALICE ATLAS CMS LHCb
Main EGEE III SA1 ongoing activities (1/2) Improvement of failover solutions: procurement of new full redundant hardware and migration of most critical core (servers, network switches) – at CNAF and other sites hosting core services Improvement of monitoring and alarms (regionalization of SAM via nagios, SMS alarms, …) WMS load balancing testing DNS Improvement of site availability/reliability CREAM pilot services: functional and scalability tests (PD, CNAF, Bari, Catania)
Main EGEE III SA1 ongoing activities (2/2) Restructuring of Grid oversight activities (turni di monitoraggio) Grid security Replacement of classicSE instances with StoRM, StoRM support (currently installed in 9 Italian sites: ESA-ESRIN, INFN-BOLOGNA, INFN-CNAF-LHCB, INFN-FERRARA, INFN-GENOVA, INFN-PARMA, INFN-PISA, INFN-ROMA3, INFN-T1 ) Integration of new resources (PON) Accounting DGAS: planning of development activities to adopt the RUS standards across EGEE domains Storage accounting (SAGE, INFN CT): preliminary testing, porting, integration with
Grid core services: plan of upgrade Major hw upgrades in the coming month at CNAF: VOMS (two servers) + new VOMS replica of CERN instance LFC upgrade and oracle backend 10 WMS/LB servers (dedicated to LHC VOs) 10 blades (T1 tender), installation expected by the end of the month Virtualization: UI, site BDII (one instance to be added for failover), myproxy server, … utilizzo di fondi assegnati per il 2008 e integrati ad Apr 08 (20 keuro in totale) a fronte di una spesa di 30 keuro
Testbeds Development testbed: no major changes since last April for: EGEE certification testbed (SA3) INFN Grid certification testbed Pre-production testbed (mostly virtual machines, expected to reduce in size in the coming months) Experimental services: WMS (a few instances, mainly for testing of SL4 WMS features, CMS) CREAM: Functional tests on existing hw Scalabilty tests: PD: existing hw funded in Sep 2007 CNAF: about 10 servers currenlty hosting WMS/LB production server and waiting to be migrated to new fully redundant hw (16 blades, installation next week, funding: 6 KEuro (CNAF funding for 2008) + 14 Keuro (referee meeting Apr 08) + CNAF structural funds (servizio Grid Operations e UF T1) Bari/Catania: hw available on site
Richieste materiale INV 94 Ke + 100 Ke (tasca) 1/2 OBIETTIVO 1: SERVIZI CORE INFN Grid Bari: richiesta di 1 WMS e 1 LB (sostituzione di hw obsoleto, ATLAS/CMS backup) INV: 8 Keuro Catania: richiesta 1 WMS , 1 LB (backup ALICE/ATLAS/LHCb), accounting 1 HLR multi-sito (per siti della Grid del sud), sostituzione di hw obsoleto INV: 12 Keuro Ferrara: top-level BDII (backup servizio centrale per tutta la Grid) 3 Keuro Padova: 3 server per vitualizzazione servizi: VoMS (backup server centrale), HLR, WMS, LB (backup CMS e ALICE), top-level BDII (backup) 12 Keuro
Richieste materiale INV 94 Ke + 100 Ke (tasca) 2/2 OBIETTIVO 2: POTENZIAMENTO DI SITO/SOSTITUZIONE HW OBSOLETO: Genova: 3 calcolatori per SRM (con scheda FC), CE e BDII 9 Keuro Lecce: 3 server per CE, BDII e UI, SE con scheda FC 9 keuro Pisa: UI general purpose 2.5 Keuro Roma2: CE+BDII, SE (con 2 TB di disco) + switch (hw obsoleto) 7 Keuro Trieste: 16.5 Keuro I PRIORITA: server STORM + 3 TB disco (5.5 Keuro); switch per collegamento nuovi nodi nella farm (3 Keuro); II PRIORITA': 2 box twin per WN (8 Keuro) CNAF INV: 115 Keuro Potenziamento SE sito INFN CNAF e prove WMS con GPFS (elevato consumo di spazio disco) 9 keuro fondi assegnati per il 2008 e integrati ad Apr 08 (20 keuro in totale) insufficienti per gli upgrade necessari ad oggi (30 Keuro) 6 Keuro di integrazione Tasca per sostituzione servizi di sito Grid obsoleti (CE, SE, WN, …) in ulteriori siti INFN 100 Keuro (assegnati al CNAF)
Missioni e consumo (1/2) Missioni italia (workshop INFN Grid, riunioni di coordimento a livello italiano): 1.5 Keuro per siti piccoli, 3 Keuro per i T2, di piu' nelle sedi con personale SA1 Missioni estero: solo siti con attivita' SA1, o T2 (partecipazione di una persona a conferenza egee) Consumo: 4 Keuro T2, 2 Keuro altri siti Totale richieste: Missioni IT: 66.5 Keuro Missioni estero: 97.5 Keuro Consumo: 61 Keuro
Missioni e consumo (2/2) BA BO CA CT FE FI GE LE LNL MI NA PD PG PR PI Sito BA BO CA CT Cnaf FE FI GE LE LNL Persone/FTE 3/1.1 - 1/0.5 15/9.45 1/0.2 1/0.25 Missioni IT 5.0 1.5 2.0 3.0 14 Missioni Estero 46.5 6.0 Consumo 4 2 Sito MI NA PD PG PR PI RM1 RM2 RM3 TO TS Persone/FTE 1/0.25 4/2.7 - 6/4.4 Missioni IT 1.5 6.0 3.0 9.0 Missioni Es. 8.0 22 Consumo 4 2 3