storage@T1
Situazione disco Spazio totale al Tier1: ~ 550 TB raw Tier1 CNAF 2005 2006 2007 2008 2009 2010 Disk (TB) 507 960 1400 3600 5000 8500 Tapes (TB) 800 900 7000 10000 Spazio totale al Tier1: ~ 550 TB raw Assegnati ~ 510 TB raw (~ 400 TB netti) + ~ 40 TB (CDF, BABAR) legacy Ancora assegnabili ora: ~ 40 TB raw Gara 2006: 400 TB raw (+ sesto quinto?) In consegna a Settembre 2006 In produzione a Novembre 2006 ? Totale disco non assegnato Novembre 2006: ~ 440 TB raw Tecnologia SATA/FC (non ottimale per accesso r/w random): Front-end CASTOR Accesso utenti Database (necessario per cataloghi, CASTOR, FTS etc…) In corso indagine di mercato per storage per db (~ 15-17 TB in tecnologia FC) Necessario anche per accesso utenti (es. BABAR)?
Totale in produzione (TB) Castor (TB) Pure disk (TB) ALICE 16.1 12.8 ATLAS 40.4 12.9 CMS 85.5 70.5 31 17.9 54.5 52.6 LHCb 26.3 4.8 20.6 0.2 5.7 4.6 AMS 2.7 2.6 ARGO 11.6 9.0 7.8 6 3.8 2.9 BABAR 148.6 117.2 3.1 1.3 145.5 115.9 CDF 65.6 36.5 MAGIC 1.1 0.6 VIRGO 27.4 25.3 1.6 1.5 25.8 23.8 PAMELA 3.6 1.2 1.8 0.1 GEANT4 TEORICI Valori netti per BABAR inclusi anche ~32 TB e per CDF ~ 10 TB “vecchi” a BABAR nel 2006 sono stati assegnati ~ 50 TB ulteriori (gia’ in produzione) 400 raw TB da fine Settembre
Totale in produzione (TB) valori netti Nuove assegnazioni (TB) Assegnati (TB) Totale in produzione (TB) valori netti Nuove assegnazioni (TB) 2005 2006 Allocato Usato richieste 09/06 12/06 04/07 ALICE 30 132 16.1 12.8 201 10 ATLAS 45 192 40.4 12.9 402 CMS 110 210 85.5 70.53 60 50 LHCb 66 26.3 4.8 264 AMS 2 3 2.7 2.6 === ARGO 11.6 9.0 BABAR 187 148.6 117.2 CDF 80 905 65.6 36.5 22 MAGIC 1 1.1 0.6 VIRGO 436 27.4 25.3 PAMELA 3.6 1.2 GEANT4 1.6 TEORICI Buffer per CASTOR (flusso di 5 TB/s) Buffer per throughput verso i T2 Da migrare buona parte SE classico a SE CASTOR Spazio disco “puro” (si puo’ riciclare maggior parte area di stage) CDF ha chiesto ulteriori 22 TB ( tot. 112 TB) 10 TB aggiuntivi da referee, totale di 33 TB per il 2006 approvato dal JECC
Yearly storage trends Alice Atlas CMS LHCb CDF BABAR Argo VIRGO
Tape library (1) TOTAL on-line CAPACITY: Tier1 CNAF 2005 2006 2007 2008 2009 2010 Disk (TB) 507 960 1400 3600 5000 8500 Tapes (TB) 800 900 7000 10000 1 Silos STK L5500 with 2000 LTO-2 slots and 3500 9940B slots 6 LTO2 drives (20-30 MB/s each) total bandwidth: 120-180 MB/s 7 drives 9940B (25-30 MB/s each) total bandwidth: 175-210 MB/s 1300 LTO2 tapes (200 GB native each) 1350 9940B tapes (200 GB native each) TOTAL on-line CAPACITY: 250 TB LTO-2 (up to 400TB) 260 TB 9940B (up to 700TB) Since L5500 is in end-of-service is not possible to install the new T10000 500GB/tape drives or other classes of new drives.
Tape library (2) 3 possible strategies for upgrading the current tape storage capacity to fulfill the LHC requests => 2010 (10 PB) : Upgrade of the current L5500 to a new SL8500 (10k slots capability in a single library) changing only the robotics and physically migrating the current drives/tapes. The SL8500 can mount 500GB/tapes technology and the future 1TB/tapes Acquiring of low-cost PowderHorn Silos (50K) each with 6000 slots capability. Can mount 500GB/tapes and probably the 1TB/tapes but is in end-of-life as our L5500 robot. L5500 can also be converted in one PowderHorn Silos giving up the LTO technology Acquiring of a new robot SUN SL8500 or IBM 3584 (7k slots capability) both are compatible with Castor and currently support 500GB/tape tecnology. STARTING TENDER IN EARLY AUTUMN 2006, NEW LIBRARY COULD BE OPERATIVE IN 2007
Tapes accounting
CASTOR Problema principale stabilita’ CASTOR 2 Debug continuo in collaborazione con CERN development team Individuate bug (apparentemente) non manifestatesi al CERN Soluzioni adottate: Aumento # diskserver (attualmente 1 ogni 12 TB) Differenziazione uso disk server Applicazione patch Valutazione introduzione altro stager Tempistica: Ottobre-Novembre 2006
Evoluzione Storage basato essenzialmente su CASTOR (ma presente anche disco puro) Migrazione a CASTOR2 quasi ultimata (mancano VIRGO, LVD, ARGO) Accesso alle risorse via rfio (locale), gridftp (WAN) ma anche xrootd (BABAR), NFS (residuale e scoraggiato) srm (v.1) per CASTOR 2 in produzione Test uso in produzione di gpfs nativo in corso (CDF), StoRM ? StoRM soluzione srm candidata per accesso “caotico” ai dati (se CASTOR non idoneo) Layer srm per GPFS (piu’ in generale file system POSIX) Implementa protocollo ‘file’ per accesso ai dati Attualmente in sviluppo la versione srm 2.2 (> Ottobre 2006)
StoRM (1) StoRM is a disk based Storage Resource Manager which: Production release implements SRM specification version 2.1.1 is migrating to the SRM version 2.2 (WLCG SRM group) is designed to support guaranteed space reservation. supports direct access (native posix I/O calls). Other access protocols remain available (e.g., rfio). takes advantage of high performance Cluster File System with ACL support, such as GPFS. Other posix file systems are supported (e.g., ext3) Authentication and Authorization are based on VOMS certificates (LCMAPS).
StoRM (2) As soon as possible (by second week of Sept): StoRM with v2.2 endpoint available for interoperability tests. By mid of September or before: All functionalities tagged as “almost done” will be provided. (PtG, PtP, Copy, PutDone). By the end of September or before: BringOnLine and ReleaseFiles Other functionalities: The schedule will follow the agreement … 3 FTE at CNAF + 3 FTE at ICTP
Storm: Status Summary Status Functionality Currently available srmPing, srmGetProtocols srmMkdir, srmRmdir, srmRm, srmLs srmReserveSpace, srmGetSpaceMetadata srmStatusOf[*] available in the next week / mid September srmPrepareToGet, srmPrepareToPut, srmPutDone, srmCopy (push mode) available by mid September srmBringOnLine, srmReleaseFile, srmMv Available in near future srmReleaseSpace, srmChangeSpaceForFile, srmPurgeFromSpace
database 4 Oracle clusters already installed for test purposes (Streams throughput, failover, LFC replication, recovery under different failures scenarios). Production phase will start in October, coordinated by LCG 3D project. 3 out of the 4 present clusters will be migrated in production (one will be maintained for test purposes). 3 Production RACs (all nodes are dual Xeon 3.2 GHz): ATLAS: Condition DB replica, 3TB raw LHCb: 2 nodes Condition DB replica, LFC, 6TB raw GRID: 3 nodes – LFC, FTS, VOMS replica, 1TB raw Castor 2 DLF: single instance, to be migrated to a new machine. Castor 2 stager: single instance on HP Proliant DL380.