In computing, a Research Object is a method for the identification, aggregation and exchange of scholarly information on the Web. The primary goal of the research object approach is to provide a mechanism to associate related resources about a scientific investigation so that they can be shared using a single identifier. As such, research objects are an advanced form of Enhanced publication. [1]
Current implementations build upon existing Web technologies and methods including Linked Data, HTTP, Uniform Resource Identifiers (URIs), the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) and the Open Annotation model, as well as existing approaches for identification and knowledge representation in the scientific domain including Digital Object Identifiers for documents, ORCID identifiers for people, and the Investigation, Study, and Assay (ISA) data model.
The research object approach is primarily motivated by a desire to improve reproducibility of scientific investigations. Central to the proposal is need to share research artifacts commonly distributed across specialist repositories on the Web including supporting data, software executables, source code, presentation slides, presentation videos. Research Objects are not one specific technology but are instead guided by a set of principles. Specifically research objects are guided by three principles of identity, aggregation and annotation [2]
A number of communities are developing the research object concept.
A W3C community group entitled the Research Objects for Scholarly Communication (ROSC) Community Group was started in April 2013. The community charter states that the goals of the ROSC activity are: [3] "to exchange requirements and expectations for supporting a new form of scholarly communication"
The Community Group aims to produce the following types of deliverables:
The FAIR digital object forum is a community that brings together experts from the FAIR data movement, semantic web, and digital publishing of scholarly work. The first conference on FAIR digital objects led the coalition to ratify the Leiden Declaration [4] on FAIR digital objects. The principles contained in the Leiden Declaration provides a prescriptive framework for infrastructure development around digital research objects. This framework draws from the FAIR data principles and ideas around distributed infrastructure that relies on open protocols to prevent vendor lock-in and ensure access that is "as open as possible, as restricted as necessary". These enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. To promote the uptake and share experiences creating a FAIR Digital Object, case studies have been published showcasing how to create these with the necessary machine-understandable semantic metadata. Specifications like the ISA metadata framework and RO-Crate supporting these ontology-based annotations of high-throughput experiments and analysis workflows, respectively. [5]
The Mozilla Science Lab have initiated an activity in collaboration with GitHub and Figshare to develop "Code as research object". The initial proposal of the activity is to allow users to transfer code from a GitHub repository to figshare, and provide that code with a Digital Object Identifier (DOI), providing a permanent record of the code that can be cited in future publications.
RO-Crate (Research Objects Crate) is a community initiative started around 2019 that provides an approach to package and aggregate research artefacts with their metadata and relationships. RO-Crate is based on Schema.org annotations in JSON-LD, aiming to establish best practices to formally describe metadata in an accessible and practical way. It has the intent of applying “just enough” Linked Data standards for making research outputs FAIR while also enhancing research reproducibility. [6] It seeks to bridge the complexity gap in the tooling for metadata specifications by following 4 principles:
While developing the standard, the base level for simplicity was friendliness to software developers. The team assumed a developer familiar with making Web applications with JSON data, which informed core design choices for the JSON-level documentation approach and RO-Crate serialization. [6] Additionally, in RO-Crate, a referenced contextual entity (e.g. a person identified by ORCID) should always be described within the RO-Crate Metadata File with at least a type and name, even where their persistent identifier (PID) might resolve to further Linked Data. This is so that clients are not required to follow every link for presentation purposes, for instance HTML rendering. [6]
An RO-Crate is defined as a self-described "Root Data Entity" that describes and contains data entities, which are further described by referencing contextual entities. A "data entity" is either a file (i.e. a byte sequence stored on disk somewhere) or a directory (i.e. set of named files and other directories). A file does not need to be stored inside the RO-Crate root, it can be referenced via a PID/IRI. A contextual entity exists outside the information system (e.g. a Person, a workflow language) and is stored solely by its metadata. The representation of a data entity as a byte sequence makes it possible to store a variety of research artefacts including not only data but also, for instance, software and text. [6] [7]
The Root Data Entity is a directory, the RO-Crate Root, identified by the presence of the RO-Crate Metadata Filero-crate-metadata.json. RO-Crates can be stored, transferred or published in multiple ways, including downloadable ZIP archives in Zenodo or through dedicated online repositories, as well as published directly on the Web, e.g. using GitHub Pages. [6]
A simple RO-Crate metadata file describing data entities (CSV and JPG files) with contextual entities (authors identified by name or ORCID) can be seen below: [6]
{"@context":"https://w3id.org/ro/crate/1.1/context","@graph":[{"@id":"ro-crate-metadata.json","@type":"CreativeWork","conformsTo":{"@id":"https://w3id.org/ro/crate/1.1"},"about":{"@id":"./"}},{"@id":"./","@type":"Dataset","name":"A simplified RO-Crate","author":{"@id":"#alice"},"license":{"@id":"https://spdx.org/licenses/CC-BY-4.0"},"datePublished":"2021-11-02T16:04:43Z","hasPart":[{"@id":"survey-responses-2019.csv"},{"@id":"https://example.com/pics/5707039334816454031_o.jpg"}]},{"@id":"survey-responses-2019.csv","@type":"File","about":{"@id":"https://example.com/pics/5707039334816454031_o.jpg"},"author":{"@id":"#alice"}},{"@id":"https://example.com/pics/5707039334816454031_o.jpg","@type":["File","ImageObject"],"contentLocation":{"@id":"http://sws.geonames.org/8152662/"},"author":{"@id":"https://orcid.org/0000-0002-1825-0097"}},{"@id":"#alice","@type":"Person","name":"Alice"},{"@id":"https://orcid.org/0000-0002-1825-0097","@type":"Person","name":"Josiah Carberry"},{"@id":"http://sws.geonames.org/8152662/","@type":"Place","name":"Catalina Park"},{"@id":"https://spdx.org/licenses/CC-BY-4.0","@type":"CreativeWork","name":"Creative Commons Attribution 4.0"}]}