Scaricare la presentazione
La presentazione è in caricamento. Aspetta per favore
PubblicatoGildo Grassi Modificato 8 anni fa
1
21 Marzo 2006Luca Vaccarossa - INFN Milano Utilizzo dello storage da parte degli esperimenti: ATLAS Workshop sullo Storage 20 e 21 Marzo 2006 CNAF Bologna Luca Vaccarossa INFN – Sezione di Milano
2
21 Marzo 2006Luca Vaccarossa - INFN Milano Outline Analisi via Grid –Possibile Workflow –Tipo di file, dimensione tipica –Throughput tipico Gestione delle politiche di autorizzazione & Quota Management Esperienza con i vari SE nei siti
3
21 Marzo 2006Luca Vaccarossa - INFN Milano Job di Analisi via Grid Vari sistemi di analisi in sperimentazione Qui Consideriamo il seguente scenario: Analisi tramite ProdSys + DQ2 Data Catalogue: LFC (sia locale che globale) Accesso ai dati locale (copia dallo SE al WN)
4
21 Marzo 2006Luca Vaccarossa - INFN Milano Referenze Alcune trasparenze riprese da: Presentazione di F.Ambrogini e S.Resconi III Workshop Italiano sulla fisica di ATLAS e CMS Bari, 21 Ottobre 2005 Stato del Software e del Modello di Analisi piani per il commissioning e la presa dati
5
Computing Element Storage Element Site Z ATLASProductionSystem Computing Element Storage Element Site X User Interface Computing Element Storage Element Site Y (1) (2) (3) Some functionalities have been recently implemented in the Prod System to support distributed analysis : Shipping of customized analysis algorithms (private code to be compiled remotely) Submission of jobs to sites were the input data are already available the ATLAS Production System, which has been used to run Rome production jobs on 3 GRIDS, can be one possible tool also to perform analysis using GRID resources. (1) Analysis job defined by the user is split in "n" identical jobs and (2) sent to the "n" GRID sites where input data are stored. (3) output files are merged and final output sent to user Distributed Analysis : one possible scenario “data-driven” scenario: analysis jobs are sent on sites where data are stored
6
21 Marzo 2006Luca Vaccarossa - INFN Milano Event Data Model: Different types of data corresponding to different stages of Reconstruction : Contains a summary of the reconstructed event for common analyses: jets, (best) id of particles. POOL format ~1-10 kB/event Relevant information for fast event selection in AOD and/or ESD files Triggered events recorded by DAQ Reconstructed info : Analysis Object Data : analysis info Fast selection info Contains the detailed output of the detector reconstruction, includes : track candidates, hits, cells intended for calibration. POOL format = combines ROOT I/O with MySQL relational DB ~100 kB/event AOD TAG ~1.6 MB/event RAW ESD/RECO ~1.2 MB/event target size = ~ 500 KB/event
7
Is currently evolving due to increasing knowledge of what is actually needed for analysis. ESD, 1 MB/evento, 1 file (1000 eventi) = 1 GB AOD, 100 kB/evento, 1 file = 100 MB Back navigation AOD ESD : process that searches in ESD or RAW data for objects that are not found in AOD during analysis. Example : from a TauJet object in AOD is possible to navigate back to its constituents clusters, cells and tracks in the corresponding ESD file TauJet TrackParticle/ID tauObject CaloCluster TrackParticle/ID CaloCell Track/ID ESDAOD Vertex/Primary solid line = direct navigation dashed line = duplication of objects Dimensioni e Contenuto dei file ESD/RECO e AOD :
8
21 Marzo 2006Luca Vaccarossa - INFN Milano Throughput tipico di un job di Analisi Ogni job di analisi si copia sul WN il file AOD di input dal closeSE (~1 MB/s) Il file di output (1-10 MB) viene copiato sulla UI per l’analisi interattivita (dimensionamento della Output Sandbox del Broker) In una situazione tipica 2/3 dei job-slot potrebbero essere job di analisi (accesso contemporaneo allo storage). 100 job-slots x 1 MB/s = 100 MB/s
9
21 Marzo 2006Luca Vaccarossa - INFN Milano Attuale situazione dell’Analisi Main problems about analysis on AOD and possible solutions : much effort spent in building the C++ analysis class into Athena Interactive Analysis ( under development ) AOD processing SLOWER respect to CBNT (Combined Ntuples): –to process 35k evts from AOD : ~ 30 min from CBNT : ~ 5 min Athena-Aware Ntuples ( under development ) the transfer of AOD/CBNT was one of the major problem: –problems with GRID Data Management Tools and with some SE’s in GRID sites Distributed Analysis ( under development )
10
ATLAS Software Validation … and Commissioning : Il Data Challenge (DC-3), previsto per la fine 2005, e’ stato sostituito dal CSC (Computing System Commissioning) che consiste in una serie di attivita’ designate a validare tutti gli aspetti del computing and software prima del turn-on di ATLAS nel 2007 ATLAS Validation in next future, three step process : –Nightly validation by RTT (Run Time Test) : package specific tests done automatically, results are accessible via web page, currently 197 different tests on 18 different packages –10^5 sample will be run on GRID for every major (usable) release (Oct 05) 100k events from 10-15 physics samples, for example: Min Bias, Z ee, Z , Z , H (120 GeV), W , b- tagging samples, top, QCD di-jets samples in different pT bins, single particles for calibration purposes Overview delle prossime Attivita’
11
ATLAS Software Validation … and Commissioning : –10^6 sample will be run on GRID for all “production releases”, to be completed in one week 1M events, ~25 physics samples, quite all the samples above and more validation of full software chain from generation to reconstruction before passing to real production Real production : –10^7 events (DC2/Rome prod scale), typical scale of distributed production samples, to be completed in 6-8 weeks(primavera-estate 2006) : –For example, 10M events from physics groups: at least 100k per sample, 500k events for each sample used for validation, plus additional physics samples –full software chain (event generation, simulation, digitization, pileup, reconstruction, tag/merging, analysis)
12
21 Marzo 2006Luca Vaccarossa - INFN Milano Gestione delle Politiche di autorizzazione & Quota Management Atlas vuole usare VOMS per la gestione delle politiche di autorizzazione con una granatura delle politiche che permetta di dividere i vari gruppi della VO (produzione, analisi, etc) ed assegnare differenti ruoli (grid-sw-manager, production-manager, utente semplice)
13
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 VOMS for ATLAS l Dario Barberis l CERN & Genoa University
14
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 What can VOMS do for us? l ATLAS is a very large VO (the largest?) and consists of several “activity groups” that will compete for computing resources l Assume we have defined VOMS groups and roles and registered all ATLAS VO members accordingly l Naively we would like to use this information for: nMonitoring & accounting nAssigning job priorities nAllocating disk storage space l We would also expect to be able to specify requirements and use the information at user, group and VO level l We have therefore to be able to assign resources to activity groups and get accurate monitoring and accounting reports
15
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 3 Dimensions l Roles: nGrid software administrators (who install software and manage the resources) nProduction managers for official productions nNormal users l Groups: nPhysics groups nCombined performance groups nDetectors & trigger nComputing & central productions l Funding: nCountries and funding agencies
16
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 Group list nphys-beauty phys-top phys-sm nphys-higgs phys-susy phys-exotics nphys-hi phys-gener phys-lumin nperf-egamma perf-jets perf-flavtag nperf-muons perf-tau trig-pesa ndet-indet det-larg det-tile ndet-muon soft-test soft-valid nsoft-prod soft-admingen-user l It is foreseen that initially only group production managers would belong to most of those groups nAll Collaboration members would be, at least initially, in “gen-user” nSoftware installers would be in soft-admin l The matrix would therefore be diagonal nOnly ~25 group/role combinations would be populated
17
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 Job Priorities l Once groups and roles are set up, we have to use this information l Relative priorities are easy to enforce if all jobs go through the same queue (or database) l In case of a distributed submission system, it is up to the resource providers to: nagree the policies of each site with ATLAS npublish and enforce the agreed policies l The jobs submission systems must take these policies into account to distribute jobs correctly nthe priority of each job is different on each site l Developments are in progress in both OSG and EGEE in this direction nBut we do not have any complete solution to this problem yet
18
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 Storage allocation l The bulk of ATLAS disk storage will be used by central productions and organized group activities l Disk storage has to be managed according to VOMS groups on all SE’s available to ATLAS l In addition, individual users will have data they want to share with their colleagues on the Grid nSimilarly to the way (ATLAS) people use public directories and project space on lxplus at CERN l Therefore, again, we need resource allocation and accounting at user, group and VO level
19
Dario Barberis: VOMS for ATLAS GDB - 8 March 2006 My naive conclusions l Most members of the Collaboration have not been confronted yet with the limitations of current Grid middleware nThey expect a simple extension of the common batch systems (such as LSF @ CERN) User disk space Project (group) space Fair share job submission l VOMS is a step forward wrt the “free for all” current situation nBut the consistent implementation of all client tools is needed NOW!
20
21 Marzo 2006Luca Vaccarossa - INFN Milano Esperienza con i vari SE nei siti Grazie a G.Negri per i feedback I maggiori problemi si sono avuti con i vari CASTOR (difficile accedere ai files residenti su nastro). –Richiesta fondamentale ATLAS: potere decidere che alcuni file restano comunque su disco Troppe connessioni a CASTOR fanno crashare il server gridftp. lcg-tools “non-intelligenti” (il timeout è a tempo fisso, rischia di tagliare trasferimenti lenti)
Presentazioni simili
© 2024 SlidePlayer.it Inc.
All rights reserved.