MIRIAM (Minimum Information Required In The Annotation of Models [1] ) is a community-level effort to standardize the annotation and curation processes of quantitative models of biological systems. [2] It consists of a set of guidelines suitable for use with any structured format, allowing different groups to collaborate and share resulting models. Adherence to these guidelines also facilitates the sharing of software and service infrastructures built upon modeling activities.
The idea of "a set of good practices" including "some obligatory metadata" was first proposed by Nicolas Le Novère in October 2004 as part of a discussion to develop a common database of models in systems biology (which led to the creation of BioModels Database). These initial ideas were further refined at a meeting in Heidelberg, during ICSB 2004, with representatives from many other interested groups.
MIRIAM is a registered project of the MIBBI (minimum information for biological and biomedical investigations). [3]
The MIRIAM Guidelines are composed of three parts, reference correspondence, attribution annotation, and external resource annotation, each of which deals with a different aspect of information that should be included within a model.
'Reference correspondence' deals with the basic reference information needed to make use of the model, detailing on a gross level the format of the model file, and its instantiability for simulation purposes.
'Attribution annotation' deals with the attribution information that must be embedded within the model file.
'External resource annotation' defines the manner in which annotations should be constructed. Those annotations contain references to entities in databases, classifications, ontologies, etc. One of the purposes of annotation is to allow unambiguous identification of the various model components.
{data collection, collection-specific identifier, optional qualifier}
: More information about the existing qualifiers is available from BioModels.net. [4]
So far, annotation is mainly a manual work, so to ensure their longevity the usage of perennial URIs is necessary. It was recognised that the generation of valid and unique URIs for annotation required the creation of a catalogue of shared namespaces for use by the community. This function is provided by the MIRIAM Registry. The Registry also provides a variety of supporting auxiliary features to enable automated procedures based upon these URIs. The ability to generate resolvable identifiers is provided through the use of the resolving layer, Identifiers.org.
The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed for describing web content in the early days of the World Wide Web. The Dublin Core Metadata Initiative (DCMI) is responsible for maintaining the Dublin Core vocabulary.
Bitzi was a website, operating from 2001 to 2013, where volunteers shared reports about any kind of digital file, with identifying metadata, commentary, and other ratings.
A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications.
CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation. CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.
The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.
PRONOM is a web-based technical registry to support digital preservation services, developed by The National Archives of the United Kingdom. PRONOM was the first and remains, to date, the only operational public file format registry in the world, although the "Magic File" repository of the File Command has served this role in a less formal capacity for two decades. Other projects to develop technical registries, including the UK Digital Curation Centre's Representation Information Registry, and the Global Digital Format Registry project at Harvard University, are now in progress.
BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scientific literature.
The Systems Biology Ontology (SBO) is a set of controlled, relational vocabularies of terms commonly used in systems biology, and in particular in computational modeling.
Resource Description and Access (RDA) is a standard for descriptive cataloging initially released in June 2010, providing instructions and guidelines on formulating bibliographic data. Intended for use by libraries and other cultural organizations such as museums and archives, RDA is the successor to Anglo-American Cataloguing Rules, Second Edition (AACR2).
The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".
The minimum information about a simulation experiment (MIASE) is a list of the common set of information a modeller needs to enable the execution and reproduction of a numerical simulation experiment, derived from a given set of quantitative models.
An Archival Resource Key (ARK) is a multi-purpose URL suited to being a persistent identifier for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI) scheme.
SABIO-RK is a web-accessible database storing information about biochemical reactions and their kinetic properties.
The MIRIAM Registry, a by-product of the MIRIAM Guidelines, is a database of namespaces and associated information that is used in the creation of uniform resource identifiers. It contains the set of community-approved namespaces for databases and resources serving, primarily, the biological sciences domain. These shared namespaces, when combined with 'data collection' identifiers, can be used to create globally unique identifiers for knowledge held in data repositories. For more information on the use of URIs to annotate models, see the specification of SBML Level 2 Version 2.
The European Legislation Identifier (ELI) ontology is a vocabulary for representing metadata about national and European Union (EU) legislation. It is designed to provide a standardized way to identify and describe the context and content of national or EU legislation, including its purpose, scope, relationships with other legislations and legal basis. This will guarantee easier identification, access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for knowledge graphs, based on semantic web standards, of legal gazettes and official journals.
Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.
The Simulation Experiment Description Markup Language (SED-ML) is a representation format, based on XML, for the encoding and exchange of simulation descriptions on computational models of biological systems. It is a free and open community development project.
Nicolas Le Novère is a British and French biologist. His research focuses on modeling signaling pathways and developing tools to share mathematical models.
Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).