La presentazione è in caricamento. Aspetta per favore

La presentazione è in caricamento. Aspetta per favore

Computing TDR ATLAS Computing TDR Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005.

Presentazioni simili


Presentazione sul tema: "Computing TDR ATLAS Computing TDR Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005."— Transcript della presentazione:

1 Computing TDR ATLAS Computing TDR Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005

2 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.2 Computing TDR: modifiche al Comp. Model Giorni di operazione nel 2007: 100 -> 25-50 Accesso alle risorse nei vari centri: Access to the Tier-0 facility is granted only to people in the central production group and those providing the first-pass calibration. Access to the Tier-1 facilities is essentially restricted to the production managers of the working groups and to the central production group for reprocessing. In principle, all members of the ATLAS virtual organisation have access to a given Tier-2. In practice (and for operational optimization), heightened access to CPU and resources may be given to specific working groups at a particular site, according to a local policy agreed with the ATLAS central administration in a way that the ATLAS global policy is enforced over the aggregate of all sites. An example may be that DPD for the Higgs working group may be replicated to a subset of Tier-2 facilities, and the working group members have heightened access to those facilities.

3 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.3 Computing TDR: modifiche al Comp. Model (2) Tier-3 Resources There will be a continuing need for local resources within an institution to store user ntuple-equivalents and allow work to proceed off the Grid. Clearly, the user expectations will grow for these facilities, and a site would already provide typically terabytes of storage for local use. Such ‘Tier-3’ facilities (which may be collections of desktops machines or local institute clusters) should be Grid-enabled, both to allow job submission and retrieval from the Grid, and to permit resources to be used temporarily and with agreement as part of the Tier-2 activities. Such resources may be useful for simulation or for the collective analysis of datasets shared with a working group for some of the time. The size of Tier-3 resources will depend on the local user community size and other factors, such as any specific software development or analysis activity foreseen in a given institute, and are therefore neither centrally planned nor controlled. It is nevertheless assumed that every active user will need O(1 TB) of local disk storage and a few kSI2k of CPU capacity to efficiently analyse ATLAS data.

4 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.4 Computing TDR: AOD production … As AOD events will be read many times more often than ESD and RAW data, AOD events are physically clustered on output by trigger or physics channel or other criteria that reflect analysis access patterns. This means that an AOD production job, unlike an ESD production job, produces many output files. The baseline streaming model is that each AOD event is written to exactly one stream: AOD output streams comprise a disjoint partition of the run. All streams produced in first-pass reconstruction share the same definition of AOD. On the order of 10 streams are anticipated in first-pass reconstruction… … Alternate models have been considered, and could also be viable. It is clear from the experience of the TeVatron experiments that a unique solution is not immediately evident. The above scenario reflects the best current understanding of a viable scheme, taking into account the extra constraints of the considerably larger ATLAS dataset. It relies heavily on the use of event collections and the TAG system. These methods are only undergoing their first serious tests at the time of writing. However, the system being devised is flexible, and can (within limits) sustain somewhat earlier event streaming and modestly overlapping streams without drastic technical or resource implications.

5 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.5 Computing TDR: Offline software Several orthogonal domain decompositions have been identified: The first spans the ATLAS detector subsystems: Inner detector ( pixel det. + silicon strip det. + transition radiation tracker). Liquid argon calorimeter. Tile calorimeter. Muon spectrometer. The primary data processing activities that must be supported for all of these detector subsystems are: Event generation, simulation, digitization, pile-up, detector reconstruction, combined reconstruction, physics analysis, high level triggering, online monitoring, calibration and alignment processing. Further domain decompositions cover the infrastructure needed to support the software development activity, and components that derive from the overall architectural vision. The overall structure is the following: Framework and Core Services (event processing framework based on plug-compatible components and abstract interfaces). Event generators, simulation, digitization and pile-up. Event selection, reconstruction and physics analysis tools. Calibration and alignment. Infrastructure (services that support the software development process).

6 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.6 Offline software: Athena Component Model

7 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.7 Offline software: Athena Major components Application Manager: the overall driving intelligence that manages and coordinates the activity of all other components within the application. Algorithms and Sequencers: algorithms provide the basic per-event processing capability of the framework. A Sequencer is a sequence of Algorithms, each of which might itself be another Sequencer. Tools: a tool is similar to an Algorithm, but differs in that it can be executed multiple times per event. Transient Data Stores: all the data objects are organized in various transient data stores depending on their characteristics and lifetimes (e.g. event data, detector conditions data, etc…) Services: provide services needed by the Algorithms. In general these are high-level, designed to support the needs of the physicist. Examples are the message-reporting system, different persistency services, random-number generators, etc. Selectors: components that perform selection (e.g., the Event Selector provides functionality for selecting the input events that the application will process. Converters: responsible for converting data from one representation to another. One example is the transformation of an object from its transient form to its persistent form and vice versa. Utilities: C++ classes that provide general support for other components.

8 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.8 Offline software: Simulation data flow

9 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.9 Offline software: Reconstruction chains

10 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.10 Offline Software for HLT and Monitoring

11 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.11 Computing TDR: Databases and Data Man. (Project) There are two broad categories of data storage in ATLAS: file-based data and database-resident data or more specifically, relational database-resident data. The two storage approaches are complementary and are used in appropriate contexts in ATLAS: File storage is used for bulky data such as event data and large conditions data volumes; for contexts in which the remote connectivity (usually) implied by database storage is not reliably available; and generally for cases where simple, lightweight storage is adequate. Database storage is used where concurrent writes and transactional consistency are required; where data handling is inherently distributed, typically with centralized writers and distributed readers; where indexing and rapid querying across moderate data volumes is required; and where structured archival storage and query-based retrieval is required. Vendor neutrality in the DB interface (with implemented support for Oracle, MySQL and SQLite) has been addressed through the development of the Relational Access Layer (RAL) within the POOL project. COOL (developed in a collaboration between LCG Application Area and ATLAS) is another DB-based storage service layered over RAL and is the basis for ATLAS conditions data storage. It provides for interval-of-validity based storage and retrieval of conditions.

12 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.12 Computing TDR: Databases and Data Management Use of the conditions database online for subdetector and HLT configuration presents considerable performance challenges. Parallel read performance is beyond the capacity of one database server and replication will have to be used to share the load amongst many slave servers: One interesting possibility comes from the Frontier project, developed to dis- tribute data using a web-caching technology, where database queries are translated into http requests for web-page content, which can be cached using conventional web proxy server technology. This is particular suitable for distributed read-only access, when updates can be forced by flushing the proxy caches,. Conditions data will also have to be distributed worldwide, for subsequent reconstruction passes, user analysis and subdetector calibration tasks: The LCG 3D (Distributed Deployment of Data-bases) project is prototyping the necessary techniques, based on conventional database replication, with an architecture of Oracle servers at Tier 0 (CERN) and Tier-1 centres, and MySQL- based replicas of subsets of the data at Tier-2 sites and beyond. The use of the RAL database backend-independent access library by COOL and other database applications will be particularly important here, to enable such cross-platform replication.

13 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.13 Local cat. Global cat. Lexor Dulcinea Capone LCGNGGrid3LSF LCG executor LCG executor NG executor G3 executor LSF executor supervisor prodDB dms (data man. system) RLS Don Quijote Windmill AMI Computing TDR: GRID-based prod. system

14 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.14 Production system performancesDC2 Rome prod Jobs per day on the LCG-2 infrastructure

15 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.15 Computing TDR: Tier-0 Operations

16 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.16 Event Builder Event Filter ~7.5 MSI2k Tier3 10 GB/sec 320 MB/sec ~ 75MB/s  622Mb/s links ~10 ~PB/sec Tier2 ~1.5 MSI2k ~4/Tier1 Tier0 5. MSI2k - 5 PB/y Tier1 8. MSI2k - 2 PB/y Replica dei dati RAW: Una replica completa dei raw data risiede nei Tier-1 (~1/10 per Tier1) Campioni di eventi sono memorizzati anche nei Tier-2 e, in misura minore, nei Tier3 ESD: Tutte le versioni degli ESD sono replicate e risiedono in almeno due dei Tier1 Gli ESD primari e i RAW data associati sono assegnati ai ~10 Tier1 con un meccanismo di roundrobin Campioni di eventi sono memorizzati anche nei Tier2 e, in misura minore, nei Tier3 AOD: Sono replicati completamente in ogni Tier1 e parzialmente nei Tier- 2 (~1/3 – 1/4). Alcune stream possono essere memorizzate nei Tier3 TAG: I database dei TAG sono replicati in tutti i Tier1 e Tier-2 DPD: Nei Tier1, Tier2 e Tier3

17 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.17 Computing TDR: Resource Requirement Evolution Tier-0 CAF

18 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.18 Comp. TDR: Resource Requirement Evolution (2) Tier-1 Tier-2

19 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.19 Computing System Commissioning Richiesti dai gruppi di fisica 10 8 eventi simulati con il last layout e con le conoscenze sulla risposta dei rivelatori dai run di cosmici, da studiare a fondo prima della partenza del run a luglio 2007 6 mesi di produzioni sostenute a partire da fine estate 2006 Risorse di calcolo necessarie (calcolate a partire dalla partecipazione alle attivita’ per il Physics workshop con simulazione, ricostruzione e analisi di 7*10 6 eventi = 15 volte piu' eventi in un tempo ~4 volte piu’ lungo): 4 * potenza di calcolo disponibile per il Physics workshop

20 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.20 Milestone 2006 1. Gennaio 2006: * production release per il commissioning del sistema di computing e studi iniziali sui raggi cosmici * completamento dell'implementazione dell'Event Data Model per la ricostruzione 2. Febbraio 2006: * inizio del Data Challenge 3, anche chiamato Commissioning del sistema di computing (Computing System Commissioning) 3. Aprile 2006: * integrazione dei componenti di ATLAS con il Service Challenge 4 di LCG 4. Luglio 2006: * production release per i run di raggi cosmici (autunno 2006) 5. Dicembre 2006: * production release per i primi data reali con i protoni.

21 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.21 Event Builder Event Filter ~7.5 MSI2k Tier3 10 GB/sec 320 MB/sec ~ 75MB/s  622Mb/s links ~10 Attività prevista nei centri italiani ~PB/sec Tier2 ~1.5 MSI2k ~4/Tier1 Tier0 5. MSI2k - 5 PB/y Tier1 8. MSI2k - 2 PB/y Ricostruzione: Muon Detector (LE, NA, PV), Calorimetri (MI, PI), Pixel Detector (MI) Calibrazioni/allineamento/detector data: MDT (LNF, RM1-3), RPC (LE, NA, RM2), Calorimetri (MI, PI), Pixel Detector (MI) Cond. DB (CS), Det. Descr. DB (LE, PI), Det. Mon. (CS, NA, UD) Studi di performance: Muoni (CS, LE, LNF, NA, PI, PV, RM1-2-3) Tau/jet/EtMiss/egamma (GE, MI, PI) Analisi: Higgs sia SM che MSSM (CS, LNF, MI, PI, PV, RM1) Susy (LE, MI, NA) Top (PI, UD) Fisica del B (CS, GE, PI) Simulazioni connesse alle attività suddette Studi sul modello di analisi

22 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.22 Event Builder Event Filter ~7.5 MSI2k Tier3 10 GB/sec 320 MB/sec ~ 75MB/s  622Mb/s links ~10 Risorse necessarie nei Tier-2/3 italiani ~PB/sec Tier2 ~1.5 MSI2k ~4/Tier1 Tier0 5. MSI2k - 5 PB/y Tier1 8. MSI2k - 2 PB/y Nei Tier-2: Simulazioni per computing system commissioning Copia degli AOD (10 8 eventi * 100KB = 10 TB) con diversi sistemi di streaming (esclusivi e inclusivi) per (studi del modello di) analisi Campioni di eventi in formato RAW e ESD per calibrazioni e sviluppo algoritmi di ricostruzione Calibration centers Attività di analisi organizzate 450 KSI2K (250 già disponibili a fine 2005) 80 TB (30 già disponibili a fine 2005) Nei Tier-3: Attività di analisi individuali e caotiche 40 KSI2K 10 TB

23 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.23 Risorse complessive dei Tier-2 ATLAS (Comp. TDR)

24 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.24 Valutazione costi Tier-2 (acquisto anno corrente) Tier2 INFN 20062007200820092010 Tot. K€ CPU (kSI2K) (new) 200 (tot) 450 3007501812253214203832 2709 6261 K€ 117 1144532413251250 Dischi (TB) (new) 50 (tot) 80 16024097212128472039 1334 3194 K€ 115 2248554664542114 Tot. K€23233813087077793364

25 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.25 Progetti di realizzazione dei Tier-2 7/9Identificazione referenti locali ATLAS per i progetti 7/9Identificazione referenti locali ATLAS per i progetti 12/9Formazione commissione tecnica di supporto 12/9Formazione commissione tecnica di supporto 12-20/9Input da coordinatori di attività e da commissione tecnica di supporto 12-20/9Input da coordinatori di attività e da commissione tecnica di supporto 23/9Primo draft progetti locali 23/9Primo draft progetti locali 26-29/9Esame preliminare progetti e feedback 26-29/9Esame preliminare progetti e feedback 30/9Versione “completa”(?) dei progetti 30/9Versione “completa”(?) dei progetti 3-4/10Workshop Comm. Calcolo -> verifica “tecnica” progetti 3-4/10Workshop Comm. Calcolo -> verifica “tecnica” progetti 5/10Riunione (virtuale) discussione progetti 5/10Riunione (virtuale) discussione progetti 10/10CSN1 10/10CSN1

26 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.26 Richieste 2006 (non Tier-2) GE 2 Biproc. + Disco 2 TB 12.4 Keuro BO Farm di analisi (infrastruttura + 5 Biproc.) + Disco 4 TB 26.5 PI 5 kSI2k + Disco 2 TB 11. PV 11. RM2 2 switch Gigabit 2. RM3 Biproc. + Disco 2 TB 7. UD Biproc. + Disco 1 TB 5.

27 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.27 Stima evoluzione dei costi utilizzata

28 CSN1 - Napoli 22/09/2005Lamberto Luminari - ATLAS Comp.28 ATLAS jobs run at each LCG site


Scaricare ppt "Computing TDR ATLAS Computing TDR Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005."

Presentazioni simili


Annunci Google