Minimum information required in the annotation of models

Last updated
MIRIAM logo.png

MIRIAM (Minimum Information Required In The Annotation of Models [1] ) is a community-level effort to standardize the annotation and curation processes of quantitative models of biological systems. [2] It consists of a set of guidelines suitable for use with any structured format, allowing different groups to collaborate and share resulting models. Adherence to these guidelines also facilitates the sharing of software and service infrastructures built upon modeling activities.

Contents

The idea of "a set of good practices" including "some obligatory metadata" was first proposed by Nicolas Le Novère in October 2004 as part of a discussion to develop a common database of models in systems biology (which led to the creation of BioModels Database). These initial ideas were further refined at a meeting in Heidelberg, during ICSB 2004, with representatives from many other interested groups.

MIRIAM is a registered project of the MIBBI (minimum information for biological and biomedical investigations). [3]

MIRIAM Guidelines

The MIRIAM Guidelines are composed of three parts, reference correspondence, attribution annotation, and external resource annotation, each of which deals with a different aspect of information that should be included within a model.

Reference correspondence

'Reference correspondence' deals with the basic reference information needed to make use of the model, detailing on a gross level the format of the model file, and its instantiability for simulation purposes.

Attribution annotation

'Attribution annotation' deals with the attribution information that must be embedded within the model file.

External resource annotation

'External resource annotation' defines the manner in which annotations should be constructed. Those annotations contain references to entities in databases, classifications, ontologies, etc. One of the purposes of annotation is to allow unambiguous identification of the various model components.

More information about the existing qualifiers is available from BioModels.net. [4]

So far, annotation is mainly a manual work, so to ensure their longevity the usage of perennial URIs is necessary. It was recognised that the generation of valid and unique URIs for annotation required the creation of a catalogue of shared namespaces for use by the community. This function is provided by the MIRIAM Registry. The Registry also provides a variety of supporting auxiliary features to enable automated procedures based upon these URIs. The ability to generate resolvable identifiers is provided through the use of the resolving layer, Identifiers.org.

See also

Related Research Articles

<span class="mw-page-title-main">Dublin Core</span> Standardized set of metadata elements

The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources. The Dublin Core Metadata Initiative (DCMI) is responsible for formulating the Dublin Core; DCMI is a project of the Association for Information Science and Technology (ASIS&T), a non-profit organization.

Bitzi was a website, operating from 2001 to 2013, where volunteers shared reports about any kind of digital file, with identifying metadata, commentary, and other ratings.

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. DOIs have also been used to identify other types of information resources, like commercial videos.

<span class="mw-page-title-main">CellML</span>

CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation. CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.

The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

PRONOM is a web-based technical registry to support digital preservation services, developed by The National Archives of the United Kingdom. PRONOM was the first and remains, to date, the only operational public file format registry in the world, although the "Magic File" repository of the File Command has served this role in a less formal capacity for two decades. Other projects to develop technical registries, including the UK Digital Curation Centre's Representation Information Registry, and the Global Digital Format Registry project at Harvard University, are now in progress.

<span class="mw-page-title-main">Systems Biology Ontology</span>

The Systems Biology Ontology (SBO) is a set of controlled, relational vocabularies of terms commonly used in systems biology, and in particular in computational modeling.

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

<span class="mw-page-title-main">Minimum information about a simulation experiment</span>

The minimum information about a simulation experiment (MIASE) is a list of the common set of information a modeller needs to enable the execution and reproduction of a numerical simulation experiment, derived from a given set of quantitative models.

<span class="mw-page-title-main">Archival Resource Key</span> Form of URLs used as persistent identifiers

An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI).

<span class="mw-page-title-main">SABIO-Reaction Kinetics Database</span>

SABIO-RK is a web-accessible database storing information about biochemical reactions and their kinetic properties.

<span class="mw-page-title-main">MIRIAM Registry</span>

The MIRIAM Registry, a by-product of the MIRIAM Guidelines, is a database of namespaces and associated information that is used in the creation of uniform resource identifiers. It contains the set of community-approved namespaces for databases and resources serving, primarily, the biological sciences domain. These shared namespaces, when combined with 'data collection' identifiers, can be used to create globally unique identifiers for knowledge held in data repositories. For more information on the use of URIs to annotate models, see the specification of SBML Level 2 Version 2.

The European Legislation Identifier (ELI) ontology is a vocabulary for representing metadata about national and European Union (EU) legislation. It is designed to provide a standardized way to identify and describe the context and content of national or EU legislation, including its purpose, scope, relationships with other legislations and legal basis. This will guarantee easier identification, access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for knowledge graphs, based on semantic web standards, of legal gazettes and official journals.

Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.

<span class="mw-page-title-main">SED-ML</span>

The Simulation Experiment Description Markup Language (SED-ML) is a representation format, based on XML, for the encoding and exchange of simulation descriptions on computational models of biological systems. It is a free and open community development project.

Nicolas Le Novère is a British and French biologist. His research focuses on modeling signaling pathways and developing tools to share mathematical models.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).

References

  1. The original article used the verb "requested", but this has evolved through community use into "required", and is in line with other Minimum Information (MI) standards available through the MIBBI portal
  2. Novère, Nicolas Le; Finney, Andrew; Hucka, Michael; Bhalla, Upinder S; Campagne, Fabien; Collado-Vides, Julio; Crampin, Edmund J; Halstead, Matt; et al. (2005). "Minimum information requested in the annotation of biochemical models (MIRIAM)". Nature Biotechnology. 23 (12): 1509–15. doi: 10.1038/nbt1156 . hdl: 11858/00-001M-0000-0010-853F-C . PMID   16333295.
  3. http://www.mibbi.org/ Minimum Information for Biological and Biomedical Investigations[ non-primary source needed ]
  4. "website". Biomodels.net. 1999-02-22. Retrieved 2012-10-09.