PaNOSC project (2018-2022)
The Photon and Neutron Open Science Cloud – PaNOSC project brought together six strategic European research infrastructures (ESRF, CERIC-ERIC, ELI Delivery Consortium, the European Spallation Source, European XFEL and the Institut Laue-Langevin – ILL, and the e-infrastructures EGI and GEANT, with the goal of contributing to the construction and development of the EOSC, an ecosystem allowing universal and cross-disciplinary open access to data through a single access point, for researchers in all scientific fields.
Throughout its implementation from December 2018 to November 2022, PaNOSC partners (ESRF, CERIC-ERIC, ILL, ELI ERIC, ESS, European XFEL, EGI) were driven by the goal to make FAIR data a reality at photon and neutron (PaN) research facilities, developing and providing services for scientific data, and connecting these to the European Open Science Cloud (EOSC).
To achieve this goal, the PaNOSC project contributed to equipping PaN RIs with all the necessary software, policies and the required legal and administrative frameworks for enabling Open Science, towards making the data produced at PaN facilities easily accessible to the users and the public.
In this respect, the new PaN FAIR data policy framework released in the frame of the project now allows PaN facilities to manage and curate data according to the FAIR principles, for data to be findable, interoperable and reusable.
Moreover, for data to be accessible, domain-specific searches across the PaN data repositories needed to be enabled. This has happened by developing and adopting a federated search API (application programming interface) for PaN data catalogues and a common protocol for harvesting data and metadata to make public datasets available to third-party EOSC cross-discipline repositories. This service provides a unified way across facilities for PaN scientists to find, filter and score/rank datasets and publications from any number of configured sites based on relevant domain-specific metadata using a variety of parameters (source characteristics, sample information, detector details, etc.), and can be used by third parties to find data released from any facility after the embargo period.
The PaN Open Data portal (https://data.panosc.eu/) was also implemented, in the form of a web portal using the federated search API across metadata catalogues and data repositories to search, find and download open data across all PaN RIs in Europe which deploy the federated search and provide open data.
A community metadata standard for PaN sources (NeXus/HDF5) has been widely adopted to make data interoperable and reusable. Electronic logbooks have been developed to capture what happens during experiments and keep track of the various steps and settings of the experiments for future usage. Also, facilities have dedicated resources to generating DOIs (digital object identifiers) for each experiment and for one or more specific datasets to be cited in publications. DOIs ensure data are findable and accessible and enable tracking of data re-use by other research teams in the same or different domains.
In addition to data generation, collection, processing, storage and management, data need to be visualised, analysed and interpreted. Given the increased data amounts and volumes, this requires an increasing level of computer power and storage space and makes a download and local computation for data analysis by single users partly unfeasible. To further lower the barriers to re-use of open data, users should be enabled to explore data through their web browser after identifying a dataset of interest. To this aim, the PaNOSC project has provided a remote access infrastructure to enable and contain FAIR data services for users of the PaN community and scientists from across domain borders, through the EOSC. This has been achieved by making available and developing two types of data analysis services: remote desktops for graphical software use and Jupyter Notebook for programmatic data analysis. These run in virtual machines and can be accessed remotely via the open-source data analysis online portal, VISA—Virtual Infrastructure for Scientific Analysis (watch the video). Initially designed at the Institut Laue-Langevin and further developed in the frame of the PaNOSC project, VISA offers remote control, simulation services of experiments and experimental set-ups and data analysis (of user data or open data from the data portal). So, basically, VISA is a new way for academic and industrial researchers to access data and advanced analysis tools from anywhere. It offers support for real-time collaboration through data sharing and can have a great impact on the work of researchers in the PaN community, as it increases their scientific capacity and productivity. Users can select their experiment and resource options (memory, CPU, display) for the virtual machine to be used and the type of analysis service to operate within the virtual machine: a Jupyter Notebook or a remote desktop with access to the software stack contained in the virtual machine image.
Free, open-source software and services for simulation and modelling of PaN sources, beamlines and experimental instruments, and start-to-end simulations to describe entire experiments, are also accessible via VISA, as part of the Virtual Neutron and X-ray Laboratory (ViNYL) developed within the project. ViNYL enables PaN users to rapidly implement simulation and analysis workflows specific to their facilities, instruments and experiments. This is important, as simulations of the various parts and processes involved in complex experiments play an increasingly important role in the entire lifecycle of scientific data generated at RIs. Among the software packages available are:
- McStasScript – McStas simulation code, which is world-leading in the simulation of instrumentation for (virtual) neutron scattering experiments
- OASYS – Open-source graphical environment for x-ray virtual experiments
- SIMEX – Photon experiment simulation environment.
Providing training modules in PaN science has also been a project goal. In this domain, PaNOSC, in collaboration with ExPaNDS, has developed an e-learning platform hosting free education and training for scientists and students, with online interactive courses on both the theory of PaN science and how to use Python code or software for data reduction and modelling. Moreover, a training catalogue for PaN science allows browsing instructional material and resources from institutes around Europe.
The platform has also been added to the service catalogue in the EOSC Portal. In addition to it, the following services are now provided and accessible via the EOSC, using a single AAI—authorisation and authentication infrastructure service (Umbrella ID) — which enables users to log in to multiple applications and websites with one single set of credentials:
- PaNOSC Software Catalogue, with over a hundred standard software tools used for analysing data from PaN RIs
- Human Organ Atlas, an open data portal of 3D scans of human organs with micron resolution for different pathologies, including COVID-19
- PaNOSC open data portal
- Search API service