Guidelines for implementing a Research Data Policy released
In the frame of PaNOSC WP2 – Data policy and stewardship the first draft of the Guidelines for implementing a Research Data Policy have been released.
The purpose of the document is to provide guidelines on how to implement a research data policy at Photon and Neutron (PaN) sources and at Research Infrastructures (RIs) in general. It includes a set of annotated guidelines based on the experience of six PaN institutes – ESRF, ILL, EuXFEL, ESS, CERIC-ERIC and ELI – who have either already implemented a data policy, or are in the process of implementing a data policy.
The scope of the guidelines is to cover all steps a typical RI has to go through when implementing a Research Data Policy (RDP): from drafting to adoption and implementation of an RDP, through to its implementation.
The document also includes a set of frequently asked questions, and the related answers, as well as a section on GDPR rules and the lessons learned by the PaN facilities involved in PaNOSC.
Below are the guidelines outlined in the document, based on the feedback to a set of questions (for each of the questions, the document presents case studies from each of the PaN facilities involved):
Who are the main drivers within an Organization to adopt a Research Data Policy?
Adopting a data policy is a management decision because the data policy will be part of the governance documents of the RI. The main drivers should include top management. They will need to be supported by IT experts, scientists and data managers.
Which are the main reasons/benefits for adopting a Research Data Policy?
The reasons are many and range from the need to make science reproducible and replicable by adopting an Open Science approach, following the recommendations of international bodies, such as the OECD, ISC, IUCr, implementing the FAIR principles to enable the re-use of data, providing scientists with new data services, archiving of important datasets, to improving the quality of scientific data.
To write a Research Data Policy, should one use a template, a management platform or an existing policy?
The obvious place to start for Photon and Neutron Research Infrastructures is with one of the existing Research Data Policy frameworks developed specifically for the Photon and Neutron RIs, namely the most recent PaNOSC  one (written in 2020) which specifically treats the FAIR principles and is an update of the original PaNdata  one (written in 2010).
Who should be consulted/involved when implementing the policy?
The important groups of people to consult are the beamline scientists, User Office, legal office + management who will be confronted with the consequences of implementing the DRP. In addition the control engineers, data managers and IT engineers need to be involved in the implementation. Users have to agree to the policy when applying for beamtime. New data consumers (who do not have access to state of the art RIs) should also be consulted. The latter group is represented by community organisations (e.g. IUCr) and forums (e.g. RDA and GO-FAIR).
Before the adoption of a data policy, what compliance with legal and regulatory aspects should be assessed?
The RDP should be reviewed by the legal counsel of the Research Infrastructure to ensure it complies with the legal statutes of the institute. The RDP should be reviewed by the Data Protection Officer to ensure it complies with GDPR for scientific data.
Which data produced and related metadata are covered by the Research Data Policy? Which kind of data should be excluded (personal data, sensitive data, etc.)?
The RDP covers scientific research data and metadata. Data can be raw data, processed data, auxiliary data or results (refer to the PaNOSC data policy framework  for a definition of the different types of data). It is highly recommended to exclude data from clinical trials or other data where the samples refer to identifiable humans as these are considered sensitive data. Paleontological human samples are not considered sensitive data. Proprietary research (resulting from commercial beamtime) is usually not covered by the RDP.
Which personnel of your organization should be trained on how to apply the Data Policy?
The implementation of an RDP requires dedicated personnel mainly in the form of data managers but also controls engineers, data scientists and IT personnel.
Should the policy include a review cycle?
It is necessary to review the RDP at regular intervals to take into account the evolving norms for research data (e.g. introduction of the FAIR principles in 2016) and experience gained in implementing the RDP. The data management landscape is evolving with the increased adoption of the FAIR principles and Open Science methodology thanks to the efforts of scientific communities and support from scientific bodies and governments and last but not least the EOSC. The RDP needs to be regularly reviewed to consider new guidelines like FAIR and be adapted if the new guidelines improve scientific data management. The review process should be foreseen and minor changes should be possible without going through the full approval process.
If you used a template or model, do some standard definitions need to be changed?
It is standard practice to adapt the definitions of certain terms in the template to the local vocabulary. If a definition needs to be altered significantly then it is better to introduce a new term.
Does one need to define one or more standard formats for the raw data? If yes, which one/s?
The RDP should guarantee that all curated data can be read and understood by the custodians of the data i.e. the RI. Defining the data format in which (raw, processed, auxiliary and results) data will be curated ensures the data can be read. Standard metadata and/or using standard vocabularies are part of ensuring data can be understood by the community. The preferred data format and vocabulary should be mentioned in the RDP.
Which considerations should be taken into account in the choice of the embargo period?
The two main common considerations to take into account are the length of a PhD which is commonly 3 years and the time needed between to analyse the data before publishing.
Should the embargo period be allowed to be extended and how to manage this?
The embargo period is based on an average PhD and is a compromise for research projects that need more than 3 years. The RDP should foresee the extension of the embargo period for such projects and ensure the process is easy for researchers. It should not however encourage blanket extensions to the embargo period for research groups without good reasons.
What data services should be provided as part of the RDP?
One of the main reasons for adopting an RDP is to improve the quality of scientific data and be able to provide data services to researchers. Adopting an RDP should go hand in hand with the proposal of new data services enabled by proper data management e.g. services like long-term archiving, download and data transfer services, data processing and analysis services, DOI services etc.