Photon and Neutron (PaN) facilities are extremely powerful sources of X-rays and neutrons used for a wide range of research domains, spanning from cultural heritage to life science and material science. The different beams are used to analyse samples and obtain data and high-resolution images up to the nano-scale that help us better understand how matter and biological processes work. Together these facilities produce petabytes of high-value datasets of unique samples which should be kept for posterity. With the appropriate data treatment and interpretation, these data turn into precious knowledge.
To meet the challenges given by the ever-increasing amount and volumes of data and by data fragmentation, in 2015 the European Commission proposed a new concept: the European Open Science Cloud – EOSC, aiming to provide scientists with access to data, software and services from as many scientific data sources in Europe as possible, by making them FAIR (Findable, Accessible, Interoperable, Reusable) across facilities and scientific domains.
The PaN community commitment to Open Science, FAIR data and the EOSC is structured around the PaNOSC Science Cluster, that represents European Research Infrastructures (RIs), developing and providing services for its scientific community and connecting these to the EOSC. The PaNOSC science cluster is one of 5 science clusters in Europe working together to help scientists adopt FAIR data and Open Science best practices.
The PaNOSC Science Cluster was born from the EC-financed projects to kickstart the building of the EOSC (among these were the now completed PaNOSC [2018-2022] and ExPaNDS [2018-2023] projects, which gathered the majority of PaN facilities in Europe).
Since then, the PaNOSC Science Cluster has continued to commit to Open Science, as reflected in the LEAPS and LENS data strategy. FAIR and open data has indeed become a higher priority for facilities to allow its users to better analyse their own experimental data that is becoming ever larger and more complex, to enable the reproducibility of results, and to open facilities data archives for reanalysis to support new discoveries, using for example, Machine Learning.
An updated FAIR implementation framework is now available for all PaN facilities. This framework serves as a toolkit of practical guidance and tools that facilities can use to evolve their systems and processes to deliver FAIR data as it leaves the facility. Facilities are encouraged to update their policies accordingly. The framework covers various aspects, including defining FAIR data policies, establishing metadata standards for annotating data, providing tools and processes for managing experimental data by both humans and machines, adopting suitable PIDs for research entities, implementing active data management plans (DMPs), and conducting FAIR self-assessment.
The FAIR data services currently available to the PaN community include:
- A PaN e-learning platform and a training catalogue
- A single AAI (Umbrella ID) enabling users to login to multiple applications and websites with one single set of credentials
- A federated search API for PaN data catalogues
- An Open Data Portal for searching and downloading data
- A standard protocol to enable third-party EOSC aggregators to harvest PaN open data and metadata
- The Virtual Infrastructure for Scientific Analysis – VISA, allowing access to data, software and services for data analysis and simulation using remote desktops and/or Jupyter notebooks in the browser
- A PaN software catalogue of packages relevant to the user community
- A framework for a common metadata schema, including an ontology of PaN techniques
- A standard data format (NeXus)
What is our vision?
In the long-run, PaNOSC envisions the availability and accessibility of a PaN Data Commons serving the PaN community of RIs and scientists.
Data Commons aggregate data from a wide range of sources into a unified database to make it more accessible and useful (https://datacommons.org).
The PaN Data Commons is envisioned as a common space for all PaN facilities, where petabytes of PaN FAIR data, analysis software, notebooks, workflows, and training material can be FAIR, i.e., Found, Accessed (downloaded and/or executed), Re-Used + Improved.
It would allow remote access, as it will be accessible remotely while being executed locally (close to the data) or via the EOSC (data needs to be moved). It would also enable and encourage remote users and experiments (as was urgently required in the post-COVID-19 phase and to tackle climate change challenges).
Some useful examples have been taken as benchmarks. These span the COVID-19 portal, the ESCAPE Virtual Observatory, and more from the other clusters. An example of domain-specific open data publishing is the Human Organ Atlas.
Overall, a guiding principle for the future implementation of the PaN Data Commons is that petabytes of data stored are kept forever not to waste the great amount of knowledge produced by the scientists, a gold mine of data waiting to be reused.