Minimum information about a simulation experiment

Last updated
logo of MIASE Miase 170.png
logo of MIASE

The minimum information about a simulation experiment (MIASE) [1] is a list of the common set of information a modeller needs to enable the execution and reproduction of a numerical simulation experiment, derived from a given set of quantitative models.

Contents

MIASE is a registered project of the MIBBI (minimum information for biological and biomedical investigations). [2]

History

The MIASE project was launched in 2007 by Dagmar Köhn and Nicolas Le Novère and first presented on the 12th SBML Forum Meeting in October 2007. Since then, MIASE was discussed on various meetings, not only within the SBML community. MIASE has become a community effort involving people from various standardisation communities as well as developers of simulation tools. In April 2009, MIASE was part of the "CellML, SBGN, SBO, BioPAX, and MIASE Super-Workshop 2009".

The guidelines

The MIASE Guidelines are composed of the following parts: Information about the models to use, information about the simulation steps, and Information about the output:

Information about the models to use

All models used in the experiment must be identified, accessible, and fully described.

  1. The description of the simulation experiment must be provided together with the models necessary for the experiment, or with a precise and unambiguous way of accessing those models.
  2. The models required for the simulations must be provided with all governing equations, parameter values, and necessary conditions (initial state and/or boundary conditions).
  3. If a model is not encoded in a standard format, then the model code must be made available to the user. If a model is not encoded in an open format or code, its full description must be provided, sufficient to re-implement it.
  4. Any modification of a model (pre-processing) required before the execution of a step of the simulation experiment must be described.

Information about the simulation steps

A precise description of the simulation steps and other procedures used by the experiment must be provided.

  1. All simulation steps must be clearly described, including the simulation algorithms to be used, the models on which to apply each simulation, the order of the simulation steps, and the data processing to be done between the simulation steps.
  2. All information needed for the correct implementation of the necessary simulation steps must be included through precise descriptions or references to unambiguous information sources.
  3. If a simulation step is performed using a computer program for which source code is not available, all information needed to reproduce the simulation, and not just repeat it, must be provided, including the algorithms used by the original software and any information necessary to implement them, such as the discretization and integration methods.
  4. If it is known that a simulation step will produce different results when performed in a different simulation environment or on a different computational platform, an explanation must be given of how the model has to be run with the specified environment/platform in order to achieve the purpose of the experiment.

Information about the output

All information necessary to obtain the desired numerical results must be provided.

  1. All post-processing steps applied on the raw numerical results of simulation steps in order to generate the final results have to be described in detail. That includes the identification of data to process, the order in which changes were applied, and also the nature of changes.
  2. If the expected insights depend on the relation between different results, such as a plot of one against another, the results to be compared have to be specified.

See also

Related Research Articles

Simulated annealing Probabilistic optimization technique and metaheuristic

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It is often used when the search space is discrete. For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.

Computer simulation Process of mathematical modelling, performed on a computer

Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determined by comparing their results to the real-world outcomes they aim to predict. Computer simulations have become a useful tool for the mathematical modeling of many natural systems in physics, astrophysics, climatology, chemistry, biology and manufacturing, as well as human systems in economics, psychology, social science, health care and engineering. Simulation of a system is represented as the running of the system's model. It can be used to explore and gain new insights into new technology and to estimate the performance of systems too complex for analytical solutions.

CellML

CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation. CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.

Integrated circuit design Engineering process for electronic hardware

Integrated circuit design, or IC design, is a sub-field of electronics engineering, encompassing the particular logic and circuit design techniques required to design integrated circuits, or ICs. ICs consist of miniaturized electronic components built into an electrical network on a monolithic semiconductor substrate by photolithography.

Metabolic network modelling Form of biological modelling

Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.

The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

Systems Biology Ontology

The Systems Biology Ontology (SBO) is a set of controlled, relational vocabularies of terms commonly used in systems biology, and in particular in computational modeling. SBO is part of the BioModels.net effort.

A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.

Minimum information required in the annotation of models

MIRIAM is a community-level effort to standardize the annotation and curation processes of quantitative models of biological systems. It consists of a set of guidelines suitable for use with any structured format, allowing different groups to collaborate and share resulting models. Adherence to these guidelines also facilitates the sharing of software and service infrastructures built upon modeling activities.

IOSO is a multiobjective, multidimensional nonlinear optimization technology.

NeuroML is an XML based model description language that aims to provide a common data format for defining and exchanging models in computational neuroscience. The focus of NeuroML is on models which are based on the biophysical and anatomical properties of real neurons.

ISO/IEC 9797-1Information technology – Security techniques – Message Authentication Codes (MACs) – Part 1: Mechanisms using a block cipher is an international standard that defines methods for calculating a message authentication code (MAC) over data.

LibSBML is an open-source software library that provides an application programming interface (API) for the SBML format. The libSBML library can be embedded in a software application or used in a web servlet as part of the application or servlet's implementation of support for reading, writing, and manipulating SBML documents and data streams. The core of libSBML is written in ISO standard C++; the library provides API for many programming languages via interfaces generated with the help of SWIG.

MIRIAM Registry

The MIRIAM Registry, a by-product of the MIRIAM Guidelines, is a database of namespaces and associated information that is used in the creation of uniform resource identifiers. It contains the set of community-approved namespaces for databases and resources serving, primarily, the biological sciences domain. These shared namespaces, when combined with 'data collection' identifiers, can be used to create globally unique identifiers for knowledge held in data repositories. For more information on the use of URIs to annotate models, see the specification of SBML Level 2 Version 2.

In co-simulation, the different subsystems which form a coupled problem are modeled and simulated in a distributed manner. Hence, the modeling is done on the subsystem level without having the coupled problem in mind. Furthermore, the coupled simulation is carried out by running the subsystems in a black-box manner. During the simulation the subsystems will exchange data. Co-simulation can be considered as the joint simulation of the already well-established tools and semantics; when they are simulated with their suitable solvers. Co-simulation proves its advantage in validation of multi-domain and cyber physical system by offering a flexible solution which allows consideration of multiple domains with different time steps, at the same time. As the calculation load is shared among simulators, co-simulation also enables the possibility of large scale system assessment.

SED-ML

The Simulation Experiment Description Markup Language (SED-ML) is a representation format, based on XML, for the encoding and exchange of simulation descriptions on computational models of biological systems. It is a free and open community development project.

Multi-state modeling of biomolecules refers to a series of techniques used to represent and compute the behaviour of biological molecules or complexes that can adopt a large number of possible functional states.

Nicolas Le Novère

Nicolas Le Novère is a British and French biologist. His research focus on modeling signaling pathways and developing tools to share mathematical models.

Automated machine learning Process of automating the application of machine learning

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning. Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models.

Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).

References

  1. D. Waltemath; Richard Adams; Daniel A. Beard; rank T. Bergmann; Upinder S. Bhalla; Randall Britten; Vijayalakshmi Chelliah; Michael T. Cooling; Jonathan Cooper; Edmund J. Crampin; Alan Garny; Stefan Hoops; Michael Hucka; Peter Hunter; Edda Klipp; Camille Laibe; Andrew K. Miller; Ion Moraru; David Nickerson; Poul Nielsen; Macha Nikolski; Sven Sahle; Herbert M. Sauro; Henning Schmidt; Jacky L. Snoep; Dominic Tolle; Olaf Wolkenhauer; Nicolas Le Novère (2011). "Minimum Information About a Simulation Experiment (MIASE)". PLOS Computational Biology. 7 (4): e1001122. Bibcode:2011PLSCB...7E1122W. doi:10.1371/journal.pcbi.1001122. PMC   3084216 . PMID   21552546.
  2. http://www.mibbi.org/ Minimum Information for Biological and Biomedical Investigations