The RfII focuses on the topics of “Research Data – Sustainability – Internationality” in its first term of office. A series of recommendations and information resources highlight various facets of these topics.
The process of digitisation in the sciences affects research methods and research data, among other areas, and digitality fundamentally changes the opportunities to acquire scientific knowledge. While there are several good examples of research data management in Germany, there is an overall absence of coordination, and current efforts often take the form of parallel, project-based initiatives. Universal access to services for research data management is lacking, and there is a clear need for action in a variety of areas.
For this reason, the RfII addressed the topic of research data management in its first position paper PERFORMANCE THROUGH DIVERSITY (2016). The position paper organises the complex material and offers a detailed and clearly formulated diagnosis. The recommendations range from adjustments with a short-term effect (e.g. evaluation of external funding applications) to strategic development tasks with a long-term effect (e.g. long-term financing options for services financed as projects and the establishment of a national research data infrastructure). The appendix contains an explanation of key terms and an appraisal of the research policy strategy discussions of past decades, each of which are available as working papers.
The report AN INTERNATIONAL COMPARISON (2017) depicts parallel developments and the paths taken in Canada, Great Britain, the Netherlands, and Australia. Based on these examples, the RfII has also derived a series of suggestions useful for the recommended establishment of a national research data infrastructure (NFDI).
A primary recommendation of the RfII is to establish a coordinated research data infrastructure for Germany (NFDI). The RfII suggests following a long-term development path with this new construct. The current poorly coordinated and unsustainably fundable landscape of the data infrastructures in science will thus be steered in a more efficient and more cooperative direction. Systematisation of the databases, easily accessible research data, and continuous development of the services will strengthen the position of research in Germany as well as its global competitiveness.
The NFDI is conceived as a collaborative, nationwide network that will expand step by step. It shall provide reliable and sustainable services to cover generic and discipline-specific requirements of research data management in Germany. The NFDI will be established in stages, and its creation will be driven by science. Its services are available to researchers across disciplines, institutions, and federal states.
The establishment of the NFDI was suggested in the RfII position paper PERFORMANCE THROUGH DIVERSITY (2016). In addition, the RfII has published two discussion papers directed towards the scientific community. Several responses have been published which show how stakeholders reflect on the suggestions for the establishment of a national research data infrastructure (a selection of responses is maintained here). In November 2018 the German federal and state governments have concluded an agreement on the establishment of the National Research Data Infrastructure (NFDI). Until 2028 they will make available an annual amount of up to 90 Mio. EUR. The program starts on January 1st 2019.
Zusammenarbeit als Chance – zweiter Diskussionsimpuls zur Ausgestaltung einer Nationalen Forschungsdateninfrastruktur (NFDI) für die Wissenschaft in Deutschland, Göttingen, 2018., 4 S. (German only)
Digital data is being used more and more in all areas of life, especially in the sciences, and is also being utilised more extensively due to new analysis methods. When the data in question is personal data, but also when the data is of a less sensitive nature, data protection issues are increasingly coming into conflict with the right to freedom of research. The ability to reuse and link data (including personal data) necessary to gain scientific knowledge is subject to strict limits in terms of the declared purpose of collecting the data.
In March 2017, the RfII offered initial recommendations in its paper DATENSCHUTZ UND FORSCHUNGSDATEN (Data Protection and Research Data, German only) on the compatibility of good research and good data protection. The recommendations of the RfII are based on four areas of action:
The publication of the paper by the RfII was prompted by the implementation of the European General Data Protection Regulation (GDPR) in 2016. The RfII recommendations should be considered as proposals to facilitate the scientific use of personal data to the extent appropriate. They do not anticipate more extensive reforms. However, the RfII believes they are necessary in the interest of science.
The problems related to the topics covered by the RfII are often complex and subject to numerous conditions. For this reason, the RfII will provide explanations of important terms at regular intervals. The use of consistent and coherent terminology is indispensable for a critical discourse.
This explanation of terms was revised and adopted by the RfII in 2017
The term “data quality” refers to general, typical properties of the data itself, including those required due to the methodology, as well as its suitability for further use after the application of appropriate quality assurance measures. The evaluation of the data quality is based on the requirements to be defined for the data, which in turn depend on the research question, and therefore on how the data will be used to obtain research results. These requirements concern the accuracy of measured values, the reliability of a result obtained empirically, the completeness or currency of the data, and the documentation on how the data was acquired and stored. In addition, sustainability aspects are inextricably intertwined with the evaluation of the actual quality of the data. Such aspects include the properties of the data, the transferability of the data, the life expectancies of data media, etc. They affect in particular the preservation of research data for use in the future by science, business, and society; ideally in many different and possibly even currently unknown ways. In terms of further use (“reuse”), data quality is determined by the ease with which databases and data collections can be researched and data found, as well as by whether or not they contain enough additional information. This additional information should be available, if possible, in the form of standardised technical and scientific metadata regarding quality aspects, provide information on how the data was generated and processed, and state which tools and methods were used. A prerequisite for the traceability and, if possible, reuse of digital research findings is that the corresponding data is fully documented in terms of the data models on which it is based (vocabularies used, formats, etc.) and the methods used to acquire it (e.g. measuring instruments, surveys, algorithms, etc.). Wherever possible, not only the metadata, but also additional and possibly even special documentation should follow recognized and available standards. The availability, accessibility, and citability of research data – including over the long term – are in turn quality aspects of the information infrastructures and services that allow the data to be stored securely, located quickly (retrieval), accessed, and re-used (also in the context of long-term archiving). The clarification of the legal framework conditions under which the data can be used in connection with information infrastructure services is also a component of the data quality.
RfII (2017) – Arbeitsthema Datenqualität (unpublished), p. 11
[information infrastructures (RfII), e-infrastructures (EU)]
Information infrastructures are technically and organisationally networked services and facilities for accessing and maintaining databases, information bases, and knowledge bases. In the context of the RfII’s counselling work, they primarily serve research purposes, are often objects of research, and always function as an enabler.
Information infrastructures must always take into account that knowledge bases in universities, research facilities, archives, libraries, and museums are available in purely analogue or digital form or in a combination of analogue and digital forms. The purpose of the digitisation of analogue knowledge bases is to integrate and merge digitised data and native digital data into uniform, integrated work environments with the goal of achieving dynamic knowledge integration. Like the term ‘e-infrastructures’, the term ‘information infrastructures’ commonly encountered in Germany is also increasingly being used to refer to the digital information and communication technologies employed in research.
The performance of digital information infrastructures depends significantly on the amount invested in digitising the content, user-friendly access methods, technical features, international standards, and effective tools. The level of information literacy of the users and personnel and the associated quality of the custom services provided are equally relevant.
[research data, research data management]
This explanation of terms was revised and adopted by the RfII in 2017
RESEARCH DATA is not only comprised of the (final) results of research. Instead, research data comprises all data generated in the course of scientific activity, including large amounts of data used for documentation purposes in scientific projects generated through measurements and through selecting, preparing, collecting, and storing information. However, data not obtained through direct scientific activity but that is used by science for the purpose of research to form the methodological foundation of the specific research process is also research data. This is the case, for example, when official statistics or other data from public authorities or products from non-scientific service providers are processed scientifically. That research data also includes the research tools used as well as the traces of scientific activity continuously generated – i.e. the process data produced automatically and in large quantities through digital research – is important wherever research processes and research data are documented and archived for quality assurance purposes and wherever this is advisable based on sustainability aspects or for the purpose of historical research.
In actual research, it is possible to differentiate, although not always clearly, between research data and METADATA. Metadata documents the process through which the research data was created and provides it with a context. In the research process, metadata can itself become the object of research, which is significant in terms of the life cycle of research data.
RESEARCH DATA MANAGEMENT includes all measures ‒ even organisational measures extending beyond research activity in the narrow sense ‒ that need to be taken in order to obtain high quality data, to follow good scientific practice within the data life cycle, to make results reproducible, and possibly fulfil existing documentation requirements (e.g. in the health care sector). The availability (possibly across different domains) of data for reuse is an important issue, and data management plans are increasingly being used by scientific institutions. Data management plans, which are developed and written at the beginning of a project or are the result of a research project, are intended to describe the data to be used and generated as well as the documentation, metadata, and standards required, state the potential legal restrictions (e.g. data protection) early on, plan the storage resources necessary, and specify the criteria to determine which data should be made available externally in which form and how it could be stored in the long-term. At the organisational level, research facilities (e.g. universities) must ensure access to the corresponding infrastructure services within the facility (e.g. by creating new capacities or expanding existing capacities) or in cooperation with external partners (through cooperation agreements, etc.). In this context, organisations should also actively work towards the overall goal of enabling the use of data across domains and scientific communities.
Begriffsklärungen: Bericht des Redaktionsausschusses Begriffe an den RfII (RfII Berichte No. 1), Göttingen 2016, 31 S. (German only)
The definitions have partially been translated into English, see appendix to the position paper Performance through Diversity (2016), p. 71-82