Services

Data Analysis

PaNOSC has been developing the Common Portal for Data Analysis Services to facilitate starting a data analysis session after a dataset of interest has been collected. The Portal aims to provide access to both remote desktop environments and Jupyter Notebooks, enabling users to remotely analyse data from PaN facilities.

Milestones reached

  • Existing data analysis requirements and solutions from all partner sites (including ExPaNDS) have been surveyed [1] [2];
  • All sites now provide remote desktop analysis services or remote Jupyter Notebook analysis services in a variety of states (some in production with large user numbers);
  • Provision of a citizen science prototype environment for remote and reproducible data analysis of COVID 19 infection data OSCOVIDA

Ongoing activities

  • Developing standard data analysis notebooks for specific techniques;
  • Providing tools used in the Notebook-based data analysis at the facilities, and contributions to open source data analysis tools that are used in PaNOSC and elsewhere, specifically, h5py, h5glance, hdf5plugin;
  • Developing web-based viewers for HDF5 files: h5nuvola and h5web;
  • Providing an infrastructure (e.g., JupyterHub, or Jupyter-Slurm) so that notebooks can be executed remotely on the computing and data infrastructure of the facility;
  • Exploring the use of software packaging managers to deploy versioned software at HPC installations and provide the same software in a portable container to support remote and cloud-based analysis software environment provision. 

Common Portal Achievements

  • Possible use cases of the Portal have been listed [3];
  • Definition of the Portal Architecture by adopting a microservices approach (foundation services, user services and compute services [4] [5], for more flexible integration into site-specific infrastructures. 

After initial deployment at facilities to provide remote analysis services to local data, the Portal will be deployed as part of the EOSC to provide federated data analysis of data across the facilities. 

References

Share this content