Luciano Gaido Riunione CCR, Bari, 28 Settembre 2017


Luciano Gaido Riunione CCR, Bari, 28 Settembre 2017 Il progetto EOSC-hub Luciano Gaido Riunione CCR, Bari, 28 Settembre 2017

Credits Grazie a chi ha contribuito, direttamente o indirettamente, a questa presentazione : Daniele Spiga, Marco Verlato, Cristina Duma, Davide Salomoni, Giacinto Donvito

Sommario

Sommario

Il punto di partenza EGI_Engage (Coordinatore: EGI Foundation - Terminato il 31.8.2017, Call: EINFRA-1-2014 Scopo: gestire la e-infrastruttura europea (Grid e FedCloud) e promuoverne l’uso da parte delle comunità scientifiche INDIGO-DataCloud (Coordinatore: INFN): Termina il 30.9.2017, Call: EINFRA-1-2014 Scopo: sviluppare tool e componenti che rendano possibile o facilitino l’uso di cloud ibride da parte delle comunità scientifiche EUDAT2020 (Coordinatore: CSC): Termina il 28.2.2018, Call: EINFRA-1-2014 Scopo: «enable researchers to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres”

EOSC-hub Scultura di Georges Faures, attualmente sulla terrazza dell’hotel Bologna, in corso Vittorio Emanuele, angolo via XX settembre a Torino

Sommario

EOSC-hub in a nutshell/1 Titolo: Integrating and managing services for the European Open Science Cloud Call: EINFRA-12-2017 (Data and Distributed Computing e-infrastructures for Open Science), topic a: Secure and agile data and distributed computing e-infrastructure Budget totale richiesto: 30 M€ Coordinatore: EGI Foundation ( Partecipanti: 74 beneficiari (più 20 linked third-parties) da 36 paesi diversi Responsabile nazionale: L. Gaido Durata: 36 mesi Budget per INFN: 1.8 M€ (al secondo posto dopo Sezioni INFN coinvolte: BA, CNAF, CT, PD, PG, TO

EOSC-hub in a nutshell/2 Proposal congiunto tra EGI, EUDAT e INDIGO-DataCloud per la call EINFRA-12, topic a: Secure and agile data and distributed computing e-infrastructures: fostering the integration of a secure, permanent, on-demand service-driven, privacy-compliant and sustainable e-infrastructure incorporating distributed databases, computing resources and software. The challenge is to integrate at European level the geographically and disciplinary dispersed resources to achieve economies of scale and efficiency gains in providing the best data and computing capacity and services to the research and education communities. This action is interrelated to INFRADEV-04-2016, “European Open Science Cloud for Research”.

Sommario

Collegamento con altri progetti/1 Essendo la filosofia del progetto quella di un HUB (concentratore che collega più reti e quindi, per estrapolazione, più infrastrutture) EOSC-hub si propone di essere il ‘fulcro’ tra produttori e consumatori di servizi, cioè tra resource provider e utenti. Quindi la richiesta della EC che i progetti finanziati stipulino degli accordi di collaborazione è nel DNA del progetto e vale in generale. Ma in particolare ci sono due progetti con i quali la collaborazione è più rilevante, e si stanno già muovendo i primi passi: EOSCpilot OpenAire2020

Collegamento con altri progetti/2 EOSCpilot Obiettivi: definire la Governance della EOSC realizzare science demonstrators per una decina di use case  i dimostratori dovrebbero produrre dei servizi candidati a essere inclusi nel catalogo di EOSC-hub OpenAIRE rendere visibili e accessibili pubblicazioni scientifiche e dati della ricerca risultanti da ricerche finanziate in H2020 Percorso di collaborazione già partito, 3 tematiche (Governance; Training, Engagment and Support; Architecture and common use cases)

Sommario

EOSC-hub mission The EOSC-hub mission is to contribute to the EOSC implementation by enabling seamless and open access to a system of research data and services provided across nations and multiple disciplines. The project will offer these resources via the Hub – a integration and management system of the EOSC, acting as a European-level entry point for all stakeholders. This will be achieved through 3 actions.

EOSC-hub actions/1 Governance and funding: Data culture and FAIR data Be an EOSC service integrator and federator Develop the know-how and the prototype procurement and purchasing framework that interested organisations can use to acquire digital services from either publicly funded infrastructures or commercial providers. Data culture and FAIR data Provide production-quality FAIR data and services by leveraging current and future FAIR implementation guidelines developed by research communities, e- Infrastructures and other relevant players. Ensure data can be used as widely as possible across scientific disciplines and between the private and public sector by: (1) providing third-party public/private data as a service, (2) making core data resources discoverable and accessible, and (3) interconnecting existing data infrastructures across Europe.

EOSC-hub actions/2 Research data services and architecture Create an open integration and management system (the Hub) for the future European Open Science Cloud that delivers an evolving catalogue of services, open source software and data, and aggregate services required by key European scientific communities from local, regional and national e-Infrastructures in Europe and worldwide. The Hub acts as a contact point for researchers and innovators to easily discover, access, use and reuse a broad spectrum of resources for advanced data- driven research. Improve skills and knowledge among researchers and service operators by delivering specialised trainings and by establishing competence centres to co-create solutions. The project also stimulates an ecosystem of industry/SMEs, service providers and researchers to support business pilots, market take-up and commercial boost strategies.

Service architecture Federation Services Basic Infrastructure AAI, Accounting, Monitoring, Operations, Security Coord. Basic Infrastructure Compute and Storage Open Collab. Platforms Application Repository, Configuration Management, Marketplace Common services Thematic Service Community Support services Added Value Services Compute, Data, Software Management and Preservation Compute . HTC . HPC . Cloud . Cloud container Storage . Online storage . Archival storage . Object storage Compute/Data Management . PID mgm . Data discovery (B2FIND, LFC) . Data transfer (catch all FTS possibly Globus Transfer) . Metadata management . B2SAFE // data policy management (iRODS) . B2SHARE // invenio . Virtual Machine Image mgm(AppDB) . Workflow mgm Service management . Accounting . Monitoring . Service registry . Helpdesk . Operations support tools Collaboration platform . AppDB . Marketplace . Software repository

Engagement with research communities Thematic Service Providers Interested in providing a thematic production service as part of e-Infra-12 and the future EOSC Early adopters Interested in piloting common services, using and advancing cross-infrastructure usage for the benefit of their research community and beyond Organized through “competence centers” bringing together e-Infrastructure providers and relevant research organizations and technology providers/experts Both have already been selected via an open call, but an additional call is expected during the project (about 0.5 M€ earmarked) Distributed across 3 slides 18

S Thematic Services 60 proposals received, 9 selected: CLARIN (European Research Infrastructure for Language Resources and Technology) DODAS (CMS)  Lead: INFN (vedi present. di D. Spiga) ECAS (climate studies) GEOSS (Global Earth Observation System of Systems) OPENCoasts (On-demand Operational Coastal Circulation Forecast Service) WeNMR (Worldwide e-Infrastructure for Nuclear Magnetic Resonance and structural biology) DARIAH (Digital Research Infrastructure for the Arts and Humanities) LifeWatch EO Pillar (Earth Observation) 19

Early Adopters/Competence Centers The Competence Centre is driven by well established and mature research infrastructure or international collaboration requiring advanced and integrated data and computing services. In the Competence Centre early adopters test, adapt, and integrate the digital capabilities they need to pursue they research, with the support of e-Infrastructure and technology experts. The Competence Centre will: Run Proof of Concepts Conduct Pilots Prepare the production environment Define appropriate business models to sustain the solutions after the end of the project Distributed across 3 slides 20

Competence Centers Out of 51 proposals received, 7 Competence Centres have been selected: Elixir  Lead Partner: EMBL-EBI Fusion  Lead Partner: CCFE Marine  Lead Partner: IFREMER EISCAT_3D  Lead Partner: EISCAT EPOS-ORFEUS  Lead Partner: SURFsara Radio Astronomy Competence Center (RACC) Lead Partner: ASTRON ICOS  Lead Partner: SNIC Disaster Mitigation Competence Centre Plus (DMCC+)  Lead Partner: ASGC, unfunded Distributed across 3 slides 21

Sommario

Ruolo dell’INFN/1 WP1 (6 PM) per coordinamento tecnico (WP10)  Riconoscimento delle nostre competenze e esperienza dimostrate in INDIGO WP2 (24 PM) per attività di governance, strategy, service portfolio management  Riconoscimento della nostra rilevanza a livello internazionale (coordinatore di INDIGO) e come stakeholder (INFN) WP4 (16 PM) per contributo a operations coordination (V. Spinoso)  Riconoscimento del lavoro fatto in EGI_Engage WP6 (45 PM) per maintenance di alcune componenti grid (CREAM, BDII, ARGUS, VOMS) e INDIGO (IAM, PaaS, FG, ….) Distributed across 3 slides 23

Ruolo dell’INFN/2 WP7 (36 PM) per attività in 2 Thematic Service (DODAS e WeNMR)  Vedi slide successive WP10 (56 PM) per coordinamento del Technology Committee, coord. task T10.1 (technical roadmap), evoluzione service catalogue e supporto in varie aree tematiche (AAI, PaaS, User Interfaces, Data Solutions) WP11 (17 PM) per contributo ad attività di training e coordinamento task T6.4  Riconoscimento dell’expertise e di quanto fatto in vari progetti passati WP13 (51 PM) per provisioning di risorse per vari servizi  fondi attraverso meccanismo del Virtual Access Distributed across 3 slides 24

WeNMR Thematic Service/1 WeNMR (Worldwide e-Infrastructure for Nuclear Magnetic Resonance and structural biology): consolidation of the submission machineries of various portals building on DIRAC4EGI and INDIGO: 8 grid-enabled application web portals already in production (TRL9): 6 hosted at University of Utrecht (DISVIS, POWERFIT, HADDOCK, GROMACS, CS-ROSETTA, UNIO) and 2 at CERM (FANTEN, AMPS-NMR) ~ 20M normalized CPU-hours/year on EGI HTC platform (SLA in place with EGI) DISVIS, POWERFIT and AMPS-NMR pioneering the use of HTC GPGPU resources via udocker tool (INDIGO devel.) and GPU-enabled CREAM-CE (INFN- Padova/Milano devel.) …but also exploring the way to move the workload to cloud (e.g. in 2 INDIGO use-cases)

WeNMR Thematic Service/2 Planned activities encompass: User support and training, Outreach and dissemination Continuous operation of the various grid- and cloud-enabled web portals, their Consolidation / upgrading and provisioning Consolidation of the job submission frameworks of the various portals building on DIRAC4EGI and INDIGO solutions (e.g. phasing out glite-WMS in favor of DIRAC or PaaS Orchestrator) Integration of distributed data storage solutions (from EUDAT and OneData) Implementation of AAI solutions developed by EGI and INDIGO INFN main role: Maintenance, operations and support of VO related services and users (VOMS servers, LFC, HTC and FedCloud resources, GPU resources) Supporting the service evolution by integrating (where needed) advanced INDIGO solutions like PaaS Orchestrator, IAM, OneData. E.g. INFN is already providing testing OneData storage to WeNMR users

WeNMR: sinergie/ricadute in INFN Attività svolte in sinergia con altri progetti, ad es. WestLife Il testbed di Padova: usato per INDIGO, WestLife, etc. implementa alcuni servizi di INDIGO (Synergy, Novadocker, AAI tramite INDIGO IAM, OneData) fa già parte della EGI FedCloud verrà integrato nella Cloud Padovana  in questo modo i nuovi servizi sviluppati nei vari progetti saranno a disposizione degli esperimenti INFN Nota: alcuni risultati di attività svolte in altri progetti per WeNMR/Mobrain (EGI_Engage) sono ora disponibili a tutti (e.g. supporto di GPGPU in CREAM)

DODAS/1 Dynamic On Demand Analysis Service (DODAS): An automated system that simplifies the process of provisioning, creating, managing and accessing a pool of heterogeneous (possibly opportunistic) computing resources Evoluzione dello use case HEP (CMS) di INDIGO: soluzione semplice e automatica per la creazione, accesso e gestione di un cluster HTC Condor (container based) su risorse cloud per l’esecuzione di workflow di analisi dati di CMS Integrazione BigData (Spark)

DODAS/2 Componenti di INDIGO utilizzate: PaaS Orchestrator, Infrastructure Manager  indipendenza dall’infrastruttura IAM, TTS  armonizzaione dell’identità e integrazione con sistemi proprietari Template TOSCA, autoscaling-self-healing  interazione automatizzata con la PaaS Ruoli Ansible, docker  implementazione agnostica L’implementazione di DODAS è fatta per lo use case di CMS ma è generica e quindi facilmente utilizzabile da altri esperimenti interesse e lettere di supporto di AMS, DAMPE e VIRGO interesse per la parte BigData da IIT (Istituto Italiano di Tecnoclogia) e LIPh (Padova Laboratory for Interdisciplinary Physics)


DODAS: sinergie/ricadute in INFN (ma non solo) DODAS & CMS Attività inserita nel contesto del Dynamic Resource WG dell’esperimento Recentemente accordato il passaggio da Integration TestBed a Production Pool di HTCondor Utilizzato per l’exploitation del grant INFN-CNAF Microsoft Azure DODAS & Helix Nebula Science Cloud DODAS è ora inserito nel portfolio dei test in carico ad INFN, nel contesto di WP4 Transparent Data Access (xrootd & Onedata) DODAS & AMS (& Dampe & Virgo) AMS ha iniziato attivamente la valutazione di DODAS (con Onedata) nell’ottica di definire un piano di lavoro per la sua validazione/adozione

Sommario

Conclusioni/1 Possiamo essere molto soddisfatti per: i risultati raggiunti nei progetti passati, in particolare in EGI_Engage ma soprattutto in INDIGO-DataCloud Il riconoscimento dell’esperienza che abbiamo mostrato in vari ambiti: Coordinamento progetti, gestione collaborazioni, strategia, etc. Coordinamento tecnico e attività distribuite (ad es. operations) Sviluppo codice e gestione processo di produzione release Training Il risultato delle ultime call: oltre a EOSC-hub sono significativi anche XDC e DEEP HybridCloud Ma la cosa più importante è la reputazione che ci siamo costruiti

Conclusioni/2 Molte attività e prodotti hanno avuto delle ricadute dirette sull’INFN o sugli esperimenti a cui collaboriamo: vedi slide su WeNMR e DODAS alcuni servizi di INDIGO sono già in produzione in alcuni siti, ad es. IAM (Bari e CNAF) e Onedata (CNAF, PD, FI, PG) alcuni lo stanno per diventare, ad es. PaaS (Bari, è già ora in pre-produzione) e CDMI-QOS server + CDMI-QOS-STORM plugin, PaaS, CMDB e WATTS (CNAF) Altri servizi hanno beneficiato di sinergie tra progetti (ad es. IAM che ha integrato una funzionalità sviluppata in OCP di interesse per ICCU) E’ però necessario fare uno sforzo ulteriore affinchè le potenzialità di questi servizi siano sfruttate maggiormente nell’INFN  quali possono essere le azioni per questo?

Conclusioni/3 La discussione di ieri in CCR sul OpenIdConnect è un chiaro segnale che non abbiamo fatto abbastanza per diffondere nell’INFN i risultati di quanto fatto, nonostante: Le presentazioni in CCR dei progetti esterni a cui collaboriamo Le discussioni nel Comitato di Coordinamento del Calcolo Scientifico (C3S) I corsi di formazione organizzati per i dipendenti INFN Ci sono ancora forse troppi compartimenti stagni nell’INFN. Cos’altro possiamo/dobbiamo fare? Cosa può fare la CCR?

Grazie per l’attenzione

Backup slides

EOSCpilot objectives The EOSCpilot represents a first step towards the development of the European Open Science Cloud. It will: Design and trial a stakeholder-driven governance framework Contribute to the development of European open science policy and best practice; Develop demonstrators of integrated services and infrastructures in a number of scientific domains, showcasing interoperability and its benefits; Engage with a broad range of stakeholders, crossing borders and communities, to build trust and skills  I servizi dei demonstrator/pilot verranno proposti a EOSC-hub per l’inclusione nel service catalogue

EOSCpilot science demonstrators/1 5 sono stati scelti già nel proposal: PanCancer Analysis of Whole genomes (EMBL): Sensitive genomic data for cancer patient health care ENVRI Radiative Forcing Integration (ICOS ERIC + ACTRIS + DKRZ + IPSL): Integration of heterogeneous climate data sources Research with Photons & Neutrons (DESY, ESFR, XFEL, ESS, EMBL, ILL): Exploitation of data from analytical facilities WLCG (CERN): Large-scale long-term data preservation and reuse of physics data TEXTCROWD (Univ. of Florence): Collaborative semantic enrichment of text-based datasets –TEXTCROWD

EOSCpilot science demonstrators/2 Altri 5 sono stati scelti a giugno 2017: PROMINENCE: HPCaaS for Fusion - Access to HPC class nodes for the Fusion Research community through a cloud interface EPOS/VERCE: Virtual Earthquake and Computational Earth Science e-science environment in Europe Life Sciences Datasets (Genome Research):  Leveraging EOSC to offload updating and standardizing life sciences datasets and to improve studies reproducibility, reusability and interoperability CryoEM Workflows: Linking distributed data and data analysis resources as workflows in Structural Biology with cryo Electron Microscopy: Interoperability and reuse LOFAR Data: Easy access to LOFAR telescope data and knowledge extraction through Open Science Cloud

EOSCpilot T6.3 - Interoperability pilots Validation regarding: AAI requirements for both e-infrastructures and scientific communities solutions offered by INDIGO- DataCloud, ELIXIR AAI, EUDAT B2ACCESS and AARC Resource brokering solutions - spanning multiple infrastructures and user communities and aimed for high-level resource discoverability and addressability Accessibility - EOSC local, Grid, HPC and Cloud resources accessible by multiple communities Data accessibility – through personal resources, scientific portals, CLI Interoperability - underlying distributed storage systems with the EOSC platform services; services and tools - as those provided by the EUDAT service suite and INDIGO-DataCloud toolbox Coordinato da INFN

OpenAire2020 project Open Access Infrastructure for Research in Europe Obiettivo: rendere visibili e liberamente accessibili le pubblicazioni scientifiche peer-reviewed e i dati della ricerca (limitatamente al Progetto Pilota sull'accesso aperto ai dati della ricerca) risultanti da ricerche finanziate in Horizon2020. Coordinato dall’università di Atene, termina a Maggio 2018 Continuazione di due progetti precedenti (OpenAire e OpenAirePlus) Partner italiani: CNR/ISTI e CINECA Percorso di collaborazione già partito, 3 tematiche (Governance; Training, Engagment and Support; Architecture and common use cases)

EOSC-hub Objectives Simplify access to a broad portfolio of products, resources and service provided by major pan-European and international organizations through an open service catalogue A cornerstone of the EOSC Remove fragmentation of service provisioning and access to digital services in Europe and beyond Technical integration between common & thematic services Service innovation, procurement, provisioning and access Increase innovation capacity of digital Infrastructures

EOSC-hub Objectives (cont) Consolidate digital infrastructures by Expanding capacities and capabilities Improving discoverability, access, interoperability and sharing , across research communities and countries Extend access to integrated compute, storage, data and software to new user groups including high-education and industry, increase the user base Expand human capacity Consolidate/Expand a distributed network of experts and service operators at local/national level

EOSC-hub Involvement of Industries Organized through Business Pilots: relevant to EGI participants with a national business programme and have activities of pan-European impact/interest designed to foster innovation between e-Infrastructures and the private sector through building an ecosystem of SMEs, large industries, startups, researchers, accelerators, and investors to become active business partners of e-Infrastructures as customers and/or service providers. These initial pilots will serve as early demonstrators of the project’s Joint Digital Innovation Hub (DIH). Selected via an open call at proposal preparation Distributed across 3 slides 45

EOSC-hub Business Pilots 6 proposals selected, out of 31 received: CyberHAB (Water body management sector) Sports Smart Video Analysis (Sports sector) Bot Mitigation Engine (Business sector) ACTION Seaport (Local coastal authorities) Space Weather Data Services for the future DRACO Observatory (Climate sector) Furniture Enterprise Analytics - DataFurn (Furniture industry sector) Two transversal activities will support these pilots, an OpenLab and the commercialization support. Distributed across 3 slides 46

EOSC-hub Linking programme Goal: Linking of the EOSC-hub with local/national e-Infrastructures from all European regions to: Contribute to user community engagement through the provisioning of infrastructure services to user communities of the EOSC-hub engagement roadmap, and to participate to direction giving activities involving EOSC-hub stakeholders. Publish local/national services in the EOSC-hub catalogue in compliance to the EOSC-hub rules of engagement for service providers Budget: about 500K€ in task T4.1 (4 to 6 PMs) Distributed across 3 slides 47