La presentazione è in caricamento. Aspetta per favore

La presentazione è in caricamento. Aspetta per favore

8-Giugno-2006L.Perini Workshop Otranto 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid.

Presentazioni simili


Presentazione sul tema: "8-Giugno-2006L.Perini Workshop Otranto 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid."— Transcript della presentazione:

1 8-Giugno-2006L.Perini Workshop CCR @ Otranto 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid

2 8-Giugno-2006L.Perini Workshop CCR @ Otranto 2 Layout The Tier2s for ATLAS in Italy –3 slides from our dear Referee at CSN1 in April Structure and functions of the Federation The schedule for the near future –Mostly SC but not only.. The Grid tools and services –Relation with INFN production Grid –Status of specific tools and services No mention of Money whatsoever….

3 Tier2 ATLAS (referee Forti @CSN1 Aprile) Approvazione piena Roma1 Napoli, che non ha costi infrastrutturali e progetto solido Approvazione SJ Milano, a cui si richiede il miglioramento e chiarimento del progetto infrastrutturale reassessment della schedule di LHC (prevista per giugno 2006) ed effettiva partenza della macchina Incubatore (Proto-TIER2) LNF, le cui debolezze sono: finanziamento necessario significativo; manpower tecnico e tecnologo un po’ limitato, esperienza in grid da migliorare. Sia le sedi approvate che le altre dovranno essere sottoposte a verifiche periodiche Se non funziona l’etichetta Tier2 viene tolta

4 8- Giugno- 2006 Proposta dei referee @ CSN1 aprile Il modello di calcolo proposto dagli esperimenti e’ ragionevole Il costo totale infrastrutturale e’ inferiore a quello che si poteva temere La prudenza e le incertezze ci spingono ad approvare non più di 2 Tier2 adesso. Le risorse dell’INFN sono limitate e sono un elemento ad oggi non ben noto. Rappresentano un punto di domanda in tutto quello che segue Proponiamo tre livelli di approvazione: Approvazione piena Approvazione SJ Incubatore di Tier2 (Proto-Tier2) Le condizioni per la rimozione del SJ sono: la sede deve risolvere i propri punti di debolezza reassessment della schedule di LHC (prevista per giugno 2006) ed effettiva partenza della macchina tempistica O(6 mesi) Le condizioni per la l’uscita dell’incubatore sono: la sede deve risolvere i propri punti di debolezza mantenimento della schedule delle necessita’ di calcolo dell’esperimento validazione del modello di calcolo distribuito dell’esperimento Tempistica O(12 mesi)

5 8- Giugno- 2006 Proposta dei referee @ CSN1 aprile Le risorse di computing dovranno essere assegnate a tutte le sedi per rispondere alle esigenze dell’esperimento per mantenere attiva la comunita’ e partecipare a Grid ed ai service/data challenge per essere pronti al momento dell’arrivo dei dati dovranno essere pianificate attentamente per evitare acquisti prematuri per permettere ai gruppi italiani di prendersi le responsabilita’ sul sw derivanti dall’impegno sull’hw. Entita’ del finanziamento da discutere gli esperimenti devono a questo punto presentare un piano aggiornato

6 8-Giugno-2006L.Perini Workshop CCR @ Otranto 6 Tier2 Federation Structure Given the referee recommendation in the previous slide, the ATLAS federation includes also the Tier2 sj and the Tier2 inc –This choice is needed for organizing the practical work at hand –Organizing italian participation in SC4 (June-November) and the first ATLAS large test of distributed analysis (October-November) is the nearest major function of the federation (see next slides) The analysis phase will require use training and opening of user accounts (also for remote user) with some disk space, for experimenting implementations of the analysis model ATLAS Italy expects a decision about sj in September Thus using the Milan resources (experienced people and hw) for supporting the about 20 users (>half of them from Genova, Pavia, Pisa, Udine) who will be active in the analysis phase and had proposed to insist on the Milan Tier2, looks to us the only rational way to follow, till the decision about sj is pending In case Referees/CSN1 etc. think we should proceed otherwise we expect to be told and to have the opportunity to discuss with them how to proceed

7 8-Giugno-2006L.Perini Workshop CCR @ Otranto 7 Structure and setting up ATLAS-Italy is setting up a Tier2 federation now –Some aspects already defined some being defined –Some of the materials in these slides are fully agreed some are proposals by me A Federation Representative L.Perini (Mi) –Typically 1 year mandate – rotation on Tier2 A pool of federation referents for specific items: –Network: G. Lo Re (Na) –Sw distribution and related matters: A. De Salvo (Roma1) –SE and data architecture: still to be found… –Other areas may be identified in the next future –For each area local referents in all candidates Tier2 Defaulting on the local Tier2 responsible Regular (be-weekly) short phone conf. between the Fed. Rep., the local Tier2 responsibles (or deputy) and the fed. experts being considered

8 8-Giugno-2006L.Perini Workshop CCR @ Otranto 8 Aims and Functions - 1 Facilitate interface with LCG, ATLAS - Grid, INFN Grid –Relation with INFN as Funding Agency stays primarily with the National representative ( and with the computing national rep.) L.Mandelli and L. Luminari Foster common solutions in the areas where choices are still to be made –E.g. choose how to implement the analysis model in Italy, as well as which storage system and which local monitoring tools Represent the Federation when a unique voice is required The functions on the next slide will be coordinated by the Grid area coordinator (in the ATLAS-Italy Computing structure it is L.Perini) but will require the active support by the Tier2 federation, especially for the initial phase

9 8-Giugno-2006L.Perini Workshop CCR @ Otranto 9 Aims and Functions - 2 Organize ATLAS specific “Computing operation” work as far as Grid/Tier2 –E.g. operate efficiently the continuous ATLAS production via ProdSys, thus freeing some more expert manpower for the needed tasks of new sw- mw testing and development Organize the training required for the above step Coordination of the ATLAS-Italy contribution to the deployment and development effort in the area of interfacing ATLAS-LCG- EGEE mw to the ATLAS sw –ATLAS use of VOMS, LCG-executor in ProdSys, ATLAS DDM On the first 2 items, the INFN effort is already the biggest one in ATLAS, but more is needed –To be done in close contact with ATLAS global and the ATLAS-Italy Computing representative –See next slide for the needs

10 8-Giugno-2006L.Perini Workshop CCR @ Otranto 10 Status of ATLAS developments in the LCG-EGEE area ATLAS is about to start an action via International Computing Board and National Representatives to address a situation felt as increasingly risky –Manpower shortage on the ATLAS collaboration side to make full use of the LCG-EGEE mw, and to be able to proactively integrate and validate new functionality into the ATLAS applications running on the LCG-EGEE Grid. –“Hero model” INFN is today one of the major contributors but we are relaying on too few overloaded people, part of them shared with the EGEE work (which funds them). Enlarging the pool of Grid developers, experts deployers and operators is mandatory also for ATLAS-Italy

11 8-Giugno-2006L.Perini Workshop CCR @ Otranto 11 Schedule for next future Largely determined by the global schedule set up by ATLAS –SC4 next big engagement (see next slides) All the 4 existing sites are willing to participate Count on SE certificates for Naples and LNF coming soon –For Naples came last Monday –ATLAS continuous production is part of it –Distributed Analysis first tests scheduled for October-November are of extreme interest for our community Some Italy specific work is scheduled too –Most important Calibration, not going here in any details…

12 HEPiX Rome 05apr06 LCG les.robertson@cern.ch SC4 – the Pilot LHC Service from June 2006 A stable service on which experiments can make a full demonstration of experiment offline chain  DAQ  Tier-0  Tier-1 data recording, calibration, reconstruction  Offline analysis - Tier-1  Tier-2 data exchange simulation, batch and end-user analysis And sites can test their operational readiness  Service metrics  MoU service levels  Grid services  Mass storage services, including magnetic tape Extension to most Tier-2 sites Evolution of SC3 rather than lots of new functionality In parallel –  Development and deployment of distributed database services (3D project)  Testing and deployment of new mass storage services (SRM 2.1)

13 HEPiX Rome 05apr06 LCG les.robertson@cern.ch LCG Service Deadlines full physics run first physics cosmics 2007 2008 2006 Pilot Services – stable service from 1 June 06 LHC Service in operation – 1 Oct 06 over following six months ramp up to full operational capacity & performance LHC service commissioned – 1 Apr 07

14 8-Giugno-2006L.Perini Workshop CCR @ Otranto 14 ATLAS SC4 Schedule June :19 June till 7 July send 772 MB/sec "Raw" (at 320 MB/s), ESD (at 252 MB/s) and AOD (at 200 MB/s) from Tier 0 to Atlas Tier 1 sites, a total of 90K files per day. The "raw" to go to tape. The Tier2 subscribe fake AOD (20MB/sec) CDP=Continuous distributed production of 2M MC events/week requiring 2700 KSi2K. –CDP is being active in the last months (next slide by Ian Bird from SA1 talk in May Final EGEE EU review) All Tier2 INFN involved –Operated for > 50% by INFN people (<3!) on LCG resources July: Distributed reconstruction setting up using local stagein from tape (1-2 drives required). CDP August:Two 3-day slots of distributed reconstruction using local stagein from tape (1-2 drives required). Distributed analysis tests - 20 MB/sec incoming at each Tier 1. no CDP? September: Tier 0 internal tests CDP October: Distributed reprocessing tests - 20 MB/sec incoming at each Tier 1. AOD to Tier2s CDP November: Distributed analysis tests - 20 MB/sec incoming at each Tier 1 at the same time as distributed reprocessing continues. Massive Tier2 involvement. CDP

15 Enabling Grids for E-sciencE INFSO-RI-508833 Ian Bird, SA1, EGEE Final Review 23-24 th May 2006 15 Use of the infrastructure Sustained & regular workloads of >30K jobs/day spread across full infrastructure doubling/tripling in last 6 months – no effect on operations

16 8-Giugno-2006L.Perini Workshop CCR @ Otranto 16 Phase 19-6 to 8-7 Basically the first distributed test of ATLAS DDM (DQ2) All Tier1’s involved + some Tier2’s (many?) VOBOX only in Tier1, DQ2 servers –Data is shipped from Castor @ CERN, using FTS, to a storage area at a site. This is dummy data (no physics value), so sites may scratch it later fake ESD). Sites must report the SRM host/path where this data is to be written. In addition, we will use the LFC catalogs already available per Tier1 to catalog this dummy data - as with the real system. – DQ2 will be used to submit, manage and monitor - hopefully without significant user intervention - the Tier1 export. DQ2 is based on the concept of dataset subscriptions: a site is subscribed by the Tier0 management system @ CERN to a dataset that has been reprocessed. The DQ2 site service running at the site's VO BOX will then pick up subscriptions, submit and manage the corresponding FTS requests. –Tier2 will subscribe for fake AOD (20 MB/s target)

17 8-Giugno-2006L.Perini Workshop CCR @ Otranto 17 Tier2 INFN in SC4 I 4 siti sono tutti coinvolti Importante anche per acquisire esperienza su ATLAS DDM (nuovo!) Nella fase fino a 8 luglio i dati sono fake, ma tanti –1.6 TB al giorno se si raggiunge il target –Lo spazio disco oggi mediamente libero su disco non ci basta neppure per 2 giorni … Sono dati fake, li ripuliremo in continuazione e sopravvivremo Da ottobre i dati saranno veri –Disporre di disco aggiuntivo diventerà allora indispensabile

18 8-Giugno-2006L.Perini Workshop CCR @ Otranto 18 ATLAS SC4 ATLAS intende utilizzare SC4 1.Come test di trasferimento dati via rete (prima fase) 2.Ma soprattutto come test dei diversi aspetti del suo modello di calcolo (in particolare per la seconda fase) Il punto 2 richiede per ATLAS sw e mw che al 1-6 non è ancora in “produzione” –Mw: RB gLite, nuovo FTS, VOMS enabled fair share… –Sw ATLAS : varie parti di DDM (DQ2), analysis system with friendly interface (abbiamo invece Production System) Ritardo (sia gLite 3.0 che DQ2) rispetto a schedula originale Si lavora per avere da Ottobre un sistema “production-like” –Non facile ma possibile, magari con paio di mesi shift? –Poi servirà ancora parecchio sviluppo e sforzo per portare le nuove features a production level

19 8-Giugno-2006L.Perini Workshop CCR @ Otranto 19 Services-tools and INFN Grid Relay on the tools and services developed by EGEE-LCG (INFN Grid) as much as possible Take advantage of all the possible synergies with INFN Grid Operation structure Full integration in the ATLAS-LCG-EGEE system –Relatively easy in ATLAS as some insulation of Eu-Grid from US-Grid and Nord-Grid is built in the ATLAS system Specific tools and services dealt with in the next slides

20 8-Giugno-2006L.Perini Workshop CCR @ Otranto 20 RB,CE, SE,FTS,LFC,VObox ATLAS is using RB and Condor-G on the LCG resources US and NorduGrid use different submission systems The 3 interfaces (“executors”) are part of the same ATLAS ProdSys –With Condor-G friendly competition, INFN people fully engaged in RB use as developers and operators –The Condor-G workers are even less than our people..it is helping us in winning the competition…not good… Our WMS interface (“Lexor executor”) is now adapted to the new gLite RB Test RB servers with all last fix at Milan and CNAF seem ok NOW –Ready to start production with it in the next days! –Thanks also to the work of the ATLAS-LCG-EGEE task force

21 8-Giugno-2006L.Perini Workshop CCR @ Otranto 21 RB,CE, SE,FTS,LFC,VObox We are using the LCG CE –gLite and CREAM CE have some interesting features –Plan to test them on Pre-Prod TB in the Task Force Different SE are in use –For SC4 in INFN Tier2 will be DPM as SRM is needed ATLAS DDM uses FTS and LFC both as central and distributed catalogue –ATLAS VOboxes are only at the Tier1’s and include only “less risk category services” (=class 1) –FTS plugins are explored as a possibility for making VObox “thinner”… still way to go…

22 8-Giugno-2006L.Perini Workshop CCR @ Otranto 22 VOMS, Accounting, HLR, Job Priority ATLAS needs a system that acknowledges the existence of VOMS groups and roles as defined by the VO; uses the priorities as defined by sites and VO to distribute jobs; uses the VOMS groups as a basis for data storage. –The CPU and storage usage has to be accounted at the group and user level These functions should not relay on a unique central DB The accounting tool we plan for is the merged APEL+DGAS –Site HLR needed –Test in ATLAS TF asap: exploting the setting up already done in INFN GRID (HLR etc) In production in October????? For Job priority and fair share the only promising tool I know is GPbox –Preview TB testing foreseen in the TF, production timing to be understood

23 8-Giugno-2006L.Perini Workshop CCR @ Otranto 23 Monitoring and (local) management tools The only ATLAS specific monitoring tools are now for jobs monitoring using the ProdSys DB –Understand what need to be developed in addition to GridIce and DGAS –Favour adopting solutions already is use in INFN and common development if needed Storage monitoring looks a general need… –Participate in DGAS testing…. In any case it would be difficult to find ATLAS manpower for developing new solutions here…

24 8-Giugno-2006L.Perini Workshop CCR @ Otranto 24 Conclusion The months from here to the end of 2006 are critical for setting up the ATLAS data and analysis system –And have italian users start exploiting them A lot of work to be done The federation will have an important role in helping organise the ATLAS-Italy effort in these areas –as well as in setting up the tools, services and structures needed for managing and running the Tier2 themselves Our plan intend to use all our human and hw resources in the most efficient way for ATLAS-Italy as a whole


Scaricare ppt "8-Giugno-2006L.Perini Workshop Otranto 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid."

Presentazioni simili


Annunci Google