Data – a major open science issue

Research

On November 30th the CNRS Open Research Data Department (DDOR) is organising its 4th Open Science Day devoted to issues relating to the management and sharing of research data. The event is an opportunity to take stock of existing tools and services and also of the challenges associated with data. Interview with Sylvie Rousset, director of the DDOR.

What are the objectives of this Open Science Day?
Sylvie Rousset: This event is one of a series of internal CNRS days dedicated to open science aimed at scientists via the ten CNRS Institutes. It is also for members of the scientific councils of these Institutes which represent all French research institutions. Last year's event focused on research evaluation (French link). The event has enabled the CNRS to position itself in the national and European contexts. This year, we are focusing on data management and sharing (see programme) particularly in relation to the 'Research Data' plan (French link).

You have notably decided to organise a round table on the legal framework for research data which remains relatively unknown.
S. R: In fact, unlike scientific publications, data belongs in principle to the institution that produced them rather than their authors. What's more, although the law requires all research data to be open, there are exceptions to this for example personal data protected by the GDPR1 (French link), health data or data that raises national sovereignty issues. It is therefore important to clarify the legal rules associated with data in general and more specifically in different use cases.

This round table will deal with issues of data sharing in the framework of consortium agreements, the identification of intellectual property rights holders, the issue of the appropriate licences for shared data, and so forth. We will also discuss issues related to the use of third-party data, hosting data in other countries and the responsibility for making data produced by platforms available. In parallel to these discussions, the CNRS needs to identify the best way of organising itself to manage all these legal issues. For example, staff training could be organised through the organisation's 18 regional offices to give scientists the best possible support on legal issues linked to data.

This day also focuses on thematic data repositories. What are your key messages on this subject?
S. R: We are working very hard to raise awareness and educate scientists about the benefits of sharing data and using existing solutions. Some communities are much more advanced than others. This is the case in astronomy, with the Strasbourg Astronomical Data Centre (see box), the earth sciences, the humanities and bioinformatics. We aim to use such examples to show it is possible to share open data and that this functions well. We also need to carry on with our support for other fields like materials physics, chemistry and engineering where over 50% of data is not stored permanently.

There will be a presentation of the national Recherche Data Gouv (French link) platform inaugurated last July by the Ministry of Higher Education and Research and whose development was supported by the CNRS. Communities that do not have thematic repositories can use this platform to deposit generic datasets. This is an important initiative because it provides a response to a critical issue linked to the data associated with publications. Journal publishers increasingly ask authors to hand over their data which means it's essential to provide repositories that enable institutions to retain their rights to research data.

The last issue covered combines storage, computing and artificial intelligence. Where does the CNRS stand on these questions?
S. R: Data repositories bring up questions of storage and therefore of economic models. We don't yet know how much small data the CNRS needs to host or where it should all be stored – in national or regional computing centres? To tackle these new issues, the CNRS's pioneering strategy of bringing together the entire scope of subjects – from publications to research data and computing – in a single department (the DDOR) (French link) is bearing fruit. We have set up a working group dedicated to data storage with the computing centres which also enables us to capitalise on emerging opportunities. Sharing data paves the way for new ways of carrying out interdisciplinary research. We already know that artificial intelligence tools are revolutionising research in many sectors and our objective now is to use these to identify the most interesting new issues linked to datasets.

More generally, where does the CNRS currently stand as regards its ambitions for open science?
S. R: We have made good progress in getting researchers used to sharing data and helped develop solutions that fit well in the national (national platform) and international (European Open Science Cloud (French link) – EOSC) contexts. We are also progressing well towards achieving the 100% open publication target (French link) set in November 2019 by Antoine Petit. The priority issue for us now is publication fees. We are exchanging with other international research organisations to develop and propose more new academic journals and/or publishing platforms.

In 2018 the CNRS signed the San Francisco Declaration on Research Assessment (DORA)2 aimed at transforming research evaluation practices and more generally promoting open science. We are also a stakeholder in and signatory to European-level reform (French link) through the recent launch of the new CoARA coalition. A final large-scale initiative involves tools that derive from artificial intelligence which will be one of the main issues in 2023 for the DDOR.

  • 1The General Data Protection Regulation (GDPR) is the European Union's benchmark regulation on the protection of personal data.
  • 2DORA questions the growing use of the Journal Impact Factor as an indicator for assessing research and researchers themselves.

50 years of sharing astronomical data

Astronomy as a pioneering discipline for sharing scientific data and the Strasbourg Astronomical Data Centre (CDS) (French link) was in fact one of the first centres to work with digital data. Following its creation in 1972 the CDS established itself as a pillar of the astronomy data-sharing ecosystem with more than two million requests every day currently. The centre hosts and indexes data produced by observatories and major sky surveys along with publication data submitted by authors and specialist journals. The repository can be accessed by astronomers and those who are curious from all over the world. The CDS provides FAIR data (findable, accessible, interoperable and reusable) thanks to standards developed by the Virtual Astronomical Observatory and is indeed actively involved in defining these disciplinary data sharing standards. After 50 years of activity, how can this success be explained? Its developments have always been driven by scientific requirements and implemented by a team from a wide range of backgrounds including astronomers, document and IT specialists and so on. Above all, the CDS has managed to adapt to sometimes disruptive scientific and technical developments like the arrival of the internet and also changes in research policy. Hopefully the centre's work can inspire other disciplines to do the same.