FAIR data is data which meets the FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). [1] [2] The acronym and principles were defined in a March 2016 paper in the journal Scientific Data by a consortium of scientists and organizations. [1]
The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in the volume, complexity, and rate of production of data. [3]
The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the FAIR principles and also carries an explicit data‑capable open license.
Findable
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
Accessible
Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
Interoperable
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
Reusable
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).
— GO FAIR Foundation, FAIR Principles, https://www.gofair.foundation/
Before FAIR a 2007 paper was the earliest paper discussing similar ideas related to data accessibility. [4]
At the 2016 G20 Hangzhou summit, the G20 leaders issued a statement endorsing the application of FAIR principles to research. [5] [6] Also in 2016, a group of Australian organisations developed a Statement on FAIR Access to Australia's Research Outputs, which aimed to extend the principles to research outputs more generally. [7] In 2017, Germany, Netherlands and France agreed to establish [8] an international office to support the FAIR initiative, the GO FAIR International Support and Coordination Office. [9]
Other international organisations active in the research data ecosystem, such as CODATA or Research Data Alliance (RDA) also support FAIR implementations by their communities. FAIR principles implementation assessment is being explored by FAIR Data Maturity Model Working Group of RDA, [10] CODATA's strategic Decadal Programme "Data for Planet: Making data work for cross-domain challenges" [11] mentions FAIR data principles as a fundamental enabler of data driven science. The Association of European Research Libraries recommends the use of FAIR principles. [12]
A 2017 paper by advocates of FAIR data reported that awareness of the FAIR concept was increasing among various researchers and institutes, but also, understanding of the concept was becoming confused as different people apply their own differing perspectives to it. [13]
Guides on implementing FAIR data practices state that the cost of a data management plan in compliance with FAIR data practices should be 5% of the total research budget. [14]
In 2019 the Global Indigenous Data Alliance (GIDA) released the CARE Principles for Indigenous Data Governance as a complementary guide. [15] The CARE principles extend principles outlined in FAIR data to include Collective benefit, Authority to control, Responsibility, and Ethics to ensure data guidelines address historical contexts and power differentials. The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event, "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", held 8 November 2018, in Gaborone, Botswana. [16]
The lack of information on how to implement the guidelines have led to inconsistent interpretations of them. [17]
In January 2020, representatives of nine groups of universities around the world produced the Sorbonne declaration on research data rights, [18] which included a commitment to FAIR data, and called on governments to provide support to enable it. [19] In 2021, researchers identified the FAIR principles as a conceptual component of data catalog software tools, with the other components being metadata management, business context and data responsibility roles. [20] In April 2022, Matthias Scheffler and colleagues argued in Nature that FAIR principles are "a must" so that data mining and artificial intelligence can extract useful scientific information from the data. [21]
However, making data (and research outcomes) FAIR is a challenging task, and it is challenging to assess the FAIRness. [22]
The Committee on Data of the International Science Council (CODATA) was established in 1966 as the Committee on Data for Science and Technology, originally part of the International Council of Scientific Unions, now part of the International Science Council (ISC). Since November 2023 its president is the Catalan researcher Mercè Crosas.
A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for.
Open data is data that is openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data is licensed under an open license.
The Publications Office of the European Union is the official provider of publishing services and data, information and knowledge management services to all EU institutions, bodies and agencies. This makes it the central point of access to EU law, publications, open data, research results, procurement notices, and other official information.
Barend Mons is a molecular biologist by training and a leading FAIR data specialist. The first decade of his scientific career he spent on fundamental research on malaria parasites and later on translational research for malaria vaccines. In the year 2000 he switched to advanced data stewardship and (biological) systems analytics. He is currently a professor in Leiden and most known for innovations in scholarly collaboration, especially nanopublications, knowledge graph based discovery and most recently the FAIR data initiative and GO FAIR. Since 2012 he is a Professor in biosemantics in the Department of Human Genetics at the Leiden University Medical Center (LUMC) in The Netherlands. In 2015 Barend was appointed chair of the High Level Expert Group on the European Open Science Cloud. Since 2017 Barend is heading the International Support and Coordination office of the GO FAIR initiative. He is also the elected president of CODATA, the standing committee on research data related issues of the International Science Council. Barend is a member of the Netherlands Academy of Technology and Innovation(ACTI). He is also the European representative in the Board on Research Data and Information (BRDI) of the National Academies of Science for engineering and medicine in the USA. Barend is a frequent keynote speaker about FAIR and open science around the world, and participates in various national and international boards.
Resource Description and Access (RDA) is a standard for descriptive cataloging initially released in June 2010, providing instructions and guidelines on formulating bibliographic data. Intended for use by libraries and other cultural organizations such as museums and archives, RDA is the successor to Anglo-American Cataloguing Rules, Second Edition (AACR2).
Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspects of digital media, software and web development, and in marketing, since products and services cannot be used if people cannot find it or do not understand what it can be used for.
Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.
An open repository or open-access repository is a digital platform that holds research output and provides free, immediate and permanent access to research results for anyone to use, download and distribute. To facilitate open access such repositories must be interoperable according to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Search engines harvest the content of open access repositories, constructing a database of worldwide, free of charge available research. Data repositories are the cornerstone for FAIR data practices and are used expeditiously within the scientific community.
Before data.europa.eu, the EU Open Data Portal was the point of access to public data published by the EU institutions, agencies and other bodies. On April 21, 2021 it was consolidated to the data.europa.eu portal, together with the European Data Portal: a similar initiative aimed at the EU Member States.
The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research Council and is led by the UK Data Archive at the University of Essex, in partnership with other universities.
CORE is a service provided by the Knowledge Media Institute based at The Open University, United Kingdom. The goal of the project is to aggregate all open access content distributed across different systems, such as repositories and open access journals, enrich this content using text mining and data mining, and provide free access to it through a set of services. The CORE project also aims to promote open access to scholarly outputs. CORE works closely with digital libraries and institutional repositories.
The Initiative for Open Citations (I4OC) is a project launched publicly in April 2017, that describes itself as: "a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data and to make these data available." It is intended to facilitate improved citation analysis.
The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals.
Elizabeth 'Liddy' Nevile is an Australian academic and a pioneer in using computers and the World Wide Web for education in Australia. In 1989-1990 she was instrumental in establishing the first program in the world that required all students to have laptop computers, at Methodist Ladies College, Melbourne, Australia.
The CARE Principles for Indigenous Data Governance are a set of principles intended to guide open data projects in engaging Indigenous Peoples rights and interests. CARE was created in 2019 by the International Indigenous Data Sovereignty Interest Group, a group that is a part of the Research Data Alliance. It outlines collective rights related to open data in the context of the United Nations Declaration on the Rights of Indigenous Peoples and Indigenous data sovereignty.
Susanna-Assunta Sansone is a British-Italian data scientist who is professor of data readiness at the University of Oxford where she leads the data readiness group and serves as associate director of the Oxford e-Research Centre. Her research investigates techniques for improving the interoperability, reproducibility and integrity of data.
The Microdata Information System (MISSY) is a database-driven online system, that provides structured metadata about selected research data of official statistics free of charge as part of the service infrastructure of the German Microdata Lab (GML) at GESIS – Leibniz Institute for the Social Sciences. MISSY is targeted at empirically-working scientists who use official microdata for their research.
The German Human Genome-Phenome Archive (GHGA) is a consortium within the national data infrastructure (NFDI). GHGA aims to create a secure national data infrastructure for human omics data in order to make these data available for scientific research while preventing the misuse of data.
{{cite web}}
: CS1 maint: numeric names: authors list (link)