Data Analysis
In this section you will find a brief description of the Data Analysis Services being developed and provided.
PaN portal for data analysis services
PaNOSC has been developing VISA, the Virtual Infrastructure for Scientific Analysis, to enable to start a data analysis session as soon as a dataset has been collected. VISA provides access to both remote desktop environments and Jupyter Notebooks, enabling users to remotely analyse data from PaN facilities during or after the experiment.
Who is VISA for?
The VISA platform, by providing an integrated environment for analysing PaN related datasets based on popular tools like remote desktops and Jupyter notebooks, leverages the FAIR principles of the PaN data to enable scientists run analysis experiments. The environment offers support for real time collaboration through data sharing. It can have a possible impact on researchers of the PaN community, through increasing their scientific capacity and productivity.
VISA development
The development was based on enhancements to background work of PaNOSC partner ILL, based on the OpenStack open-source cloud platform. The release of a federated instance of VISA has been planned for 2022, along with a cross-facility for searching for open data.
The project implemented also a software package that facilitates High Performance Computing (HPC) centres to install a scalable JupyterHub process for users. This boosts the inclusion of Jupyter notebooks in the portfolio of analytics services and tools of the project. Hence, PaNOSC users can select between Jupyter notebooks and remote desktop solutions when it comes to analysing FAIR data via the remote analysis services of the project. Deployed instances of remote analysis services can be already found in all PaNOSC facilities.

Source code for the portal can be found on github: https://github.com/panosc-portal
Documentation for the portal development is available on the PaNOSC confluence site: https://confluence.panosc.eu/display/wp4/Development
Watch the video on VISA
Jupyter Notebooks

PaNOSC has chosen the Jupyter Notebooks and Jupyter Lab from the Jupyter project as general purpose data analysis tool. Notebooks allow code and documentation to be intermingled in one document in the web browser. The uptake of the notebooks is proving to be very popular in data science partly because they support Python as programming language. A number of the scientific Use Cases for PaNOSC request solutions based on Jupyter notebooks. All PaNOSC sites have implemented a Jupyter notebook service. EGI provides a Jupyter notebook service for all PaNOSC users with an UmbrellaId.

Binder is a service built on top of Jupyter notebooks to make scientific data analysis reproducible. EGI provides a Binder service for all PaNOSC users with an UmbrellaId.
Remote Desktops
PaNOSC offers virtual machines for scientists to run applications which cannot be converted to Jupyter notebooks. The VMs are accessed through a remote desktop which exports the graphics to a browser. The PaN portal and its main back-end service VISA use Guacamole to export the desktop to a web browser. Extra features have been implemented to allow sharing of desktops between scientists.
Milestones reached
- Existing data analysis requirements and solutions from all partner sites (including ExPaNDS) have been surveyed [1] [2];
- All sites now provide remote desktop analysis services or remote Jupyter Notebook analysis services in a variety of states (some in production with large user numbers);
- Provision of a citizen science prototype environment for remote and reproducible data analysis of COVID 19 infection data OSCOVIDA.
Ongoing activities
- Developing standard data analysis notebooks for specific techniques;
- Providing tools used in the Notebook-based data analysis at the facilities, and contributions to open source data analysis tools that are used in PaNOSC and elsewhere, specifically, h5py, h5glance, hdf5plugin;
- Developing web-based viewers for HDF5 files: h5nuvola and h5web;
- Providing an infrastructure (e.g., JupyterHub, or Jupyter-Slurm) so that notebooks can be executed remotely on the computing and data infrastructure of the facility;
- Exploring the use of software packaging managers to deploy versioned software at HPC installations and provide the same software in a portable container to support remote and cloud-based analysis software environment provision.
Common Portal Achievements
- Possible use cases of the Portal have been listed [3];
- Definition of the Portal Architecture by adopting a microservices approach (foundation services, user services and compute services [4] [5], for more flexible integration into site-specific infrastructures.
After initial deployment at facilities to provide remote analysis services to local data, the Portal will be deployed as part of the EOSC to provide federated data analysis of data across the facilities.