CellML

Last updated December 10, 2024

CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation.^[1] CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.

History
The structure of a CellML model
Specifications
CellML 1.0
CellML 1.1
CellML 2.0
Metadata specifications
CellML.org
References
External links
See also

History

The CellML language grew from a need to share models of cardiac cell dynamics among researchers at a number of sites across the world. The original working group formed in 1998 consisted of David Bullivant, Warren Hedley, and Poul Nielsen; all three were at that time members of the Department of Engineering Science at the University of Auckland. The language was an application of the XML specification developed by the World Wide Web Consortium – the decision to use XML was based on late 1998 recommendations from Warren Hedley and André (David) Nickerson. Existing XML-based languages were leveraged to describe the mathematics (content MathML), metadata (RDF), and links between resources (XLink). The CellML working group first became aware of the SBML effort in late 2000, when Warren Hedley attended the 2nd workshop on Software Platforms for Systems Biology in Tokyo.

The working group collaborated with a number of researchers at Physiome Sciences Inc. (particularly Melanie Nelson, Scott Lett, Mark Grehlinger, Prasad Ramakrishna, Jeremy Rice, Adam Muzikant, and Kam-Chuen Jim) to draft the initial CellML 1.0 specification, which was published on the 11th of August 2001. This first draft was followed by specifications for CellML Metadata and an update to CellML to accommodate structured nesting of models with the addition of the <import> element. Physiome Sciences Inc. also produced the first CellML capable software. The National Resource for Cell Analysis and Modeling (NRCAM) at the University of Connecticut Health Center also produced early CellML capable software called Virtual Cell.

In 2002 the CellML 1.1 specification was written, in which imports were added. Imports provide the ability to incorporate external components into a model, enabling modular modelling. This specification was frozen in early 2006. Work has continued on metadata and other specifications.

In July 2009 the CellML website was completely revamped, and an initial version of the new CellML repository software (PMR2) was released.

In July 2020 the CellML 2.0^[2] specification was published, in which resets were added. Resets allowed for setting the value of a variable to any value when a prescribed condition is met. The directionality between connections was dropped to aid in the reusability of components, and the definition of units was restricted to the model only.

In support of the CellML 2.0 specification the libCellML project was started in 2015 to provide application developers with a model based API C++ library. The library was quickly expanded to provide support for Javascript, Julia, and Python. The latest release of the software is version 0.5.0. The 0.5.0 version of the library supports the majority of the CellML 2.0 specification, it also supports reading of CellML 1.1 and CellML 1.0 models. A major point of interest is that the libCellML library does not support writing to older CellML standards.

The structure of a CellML model

A CellML model consists of a number of components, each described in their own component element. A component can be an entirely conceptual entity created for modelling convenience, or it can have some real physical interpretation (for example, it could represent the cell membrane).

Each component contains a number of variables, which must be declared by placing a variable element inside the component. For example, a component representing a cell membrane may have a variable called V representing the potential difference (voltage) across the cell membrane.

Mathematical relationships between variables are expressed within components, using MathML. MathML is used to make declarative expressions (as opposed to procedural statements as in a computer programming language). However, most CellML processing software will only accept a limited of range of mathematics (for example, some processing software requires equations with a single variable on one side of an equality). The choice of MathML makes CellML particularly suited for describing models containing differential equations. There is no mechanism for the expression of stochastic models or any other form of randomness.

Components can be connected in other components using a connection element, which describes the name of two components to be connected, and the variables in the first component which are mapped to variables in the second component. Such connections are a statement that the variable in one component is equivalent to another variable in another component.

CellML models also allow relationships between components to be expressed. The CellML specification defines two types of relationship, encapsulation and containment, however more can be defined by the user. The containment relationship is used to express that one component is physically within another. The encapsulation relationship is special because it is the only relationship that affects the interpretation of the rest of the model. The effect of encapsulation is that components encapsulated beneath other components are private and cannot be accessed except by the component directly above in the encapsulation tree. The modeller is free to use encapsulation as a conceptual tool, and it does not necessarily have any physical interpretation.

Specifications

CellML is defined by core specifications as well as additional specifications for metadata, used to annotate models and specify simulations.

CellML 1.0

CellML 1.0 was the first final specification, and is used to describe many of the models in the CellML Model Repository.

CellML 1.0 has some biochemistry specific elements for describing the role of variables in a reaction model.

CellML 1.1

CellML 1.1 introduced the ability to import components and units. In order to fully support this feature, variables in CellML 1.1 accept variable names as initial values.

CellML 2.0

CellML 2.0 made some significant changes, the changes are listed below:

Introduced the ability to reset state variables.
Dropped the directionality in connections.
Units can now only be defined at the model level.

Metadata specifications

CellML has several metadata specifications, used to annotate models or provide information for running and/or visualizing simulations of models.

The metadata 1.0 specification is used to annotate models with a variety of information; relevant references, authorship information, the species the model is relevant to, and so on.
Simulation metadata provides the information required to reproduce specific simulations using a CellML model.
Graphing metadata provides information to specify particular visualizations of simulation output, for example to reproduce a particular graph from a paper.

CellML.org

CellML.org aims to provide a focal point for the CellML community. Members can submit, review, and update models and receive feedback and help from the community. A CellML discussion mailing list can be found at CellML-discussion mailing list. The scope of this mailing list includes everything related to the development and use of CellML.

A repository of several hundred biological models encoded into CellML can be found on the CellML community website at CellML Model Repository. These models are actively undergoing a curation process aiming to provide annotations with biological ontologies such as Gene Ontology and to validate the models against standards of unit balance and biophysical constrains such as conservation of mass, charge, energy etc.

Related Research Articles

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

The XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML).

The common warehouse metamodel (CWM) defines a specification for modeling metadata for relational, non-relational, multi-dimensional, and most other objects found in a data warehousing environment. The specification is released and owned by the Object Management Group, which also claims a trademark in the use of "CWM".

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

The Clinical Data Interchange Standards Consortium (CDISC) is a standards developing organization (SDO) dealing with medical research data linked with healthcare,made to enable information system interoperability and to improve medical research and related areas of healthcare. The standards support medical research from protocol through analysis and reporting of results and have been shown to decrease resources needed by 60% overall and 70–90% in the start-up stages when they are implemented at the beginning of the research process. Since December 2016, CDISC standards are mandatory for submission to US FDA.

SensorML is an approved Open Geospatial Consortium standard and an XML encoding for describing sensors and measurement processes. SensorML can be used to describe a wide range of sensors, including both dynamic and stationary platforms and both in-situ and remote sensors.

The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scientific literature.

The Systems Biology Ontology (SBO) is a set of controlled, relational vocabularies of terms commonly used in systems biology, and in particular in computational modeling.

The IMS Question and Test Interoperability specification (QTI) defines a standard format for the representation of assessment content and results, supporting the exchange of this material between authoring and delivery systems, repositories and other learning management systems. It allows assessment materials to be authored and delivered on multiple systems interchangeably. It is, therefore, designed to facilitate interoperability between systems.

Systems immunology is a research field under systems biology that uses mathematical approaches and computational methods to examine the interactions within cellular and molecular networks of the immune system. The immune system has been thoroughly analyzed as regards to its components and function by using a "reductionist" approach, but its overall function can't be easily predicted by studying the characteristics of its isolated components because they strongly rely on the interactions among these numerous constituents. It focuses on in silico experiments rather than in vivo.

Physiomics is a systematic study of physiome in biology. Physiomics employs bioinformatics to construct networks of physiological features that are associated with genes, proteins and their networks. A few of the methods for determining individual relationships between the DNA sequence and physiological function include metabolic pathway engineering and RNAi analysis. The relationships derived from methods such as these are organized and processed computationally to form distinct networks. Computer models use these experimentally determined networks to develop further predictions of gene function.

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes stylized as ePUB. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook (OEB) standard.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

NeuroML is an XML based model description language that aims to provide a common data format for defining and exchanging models in computational neuroscience. The focus of NeuroML is on models which are based on the biophysical and anatomical properties of real neurons.

Virtual Cell (VCell) is an open-source software platform for modeling and simulation of living organisms, primarily cells. It has been designed to be a tool for a wide range of scientists, from experimental cell biologists to theoretical biophysicists.

LibSBML is an open-source software library that provides an application programming interface (API) for the SBML format. The libSBML library can be embedded in a software application or used in a web servlet as part of the application or servlet's implementation of support for reading, writing, and manipulating SBML documents and data streams. The core of libSBML is written in ISO standard C++; the library provides API for many programming languages via interfaces generated with the help of SWIG.

Multi-state modeling of biomolecules refers to a series of techniques used to represent and compute the behaviour of biological molecules or complexes that can adopt a large number of possible functional states.

References

↑ "[cellml-discussion] Expanding the CellML abbreviation".
↑ Clerx, M.; Cooling, M. T.; Cooper, J.; Garny, A.; Moyle, K.; Nickerson, D. P.; Nielsen PMF; Sorby, H. (2020). "[cellml-2.0] CellML 2.0 Specification". Journal of Integrative Bioinformatics. 17 (2–3). doi:10.1515/jib-2020-0021. PMC 7756617 . PMID 32759406.

External links

CellML homepage
IUPS Physiome Project
Physiome JAPAN Project
Interactive cell models Java versions of many of CellML cardiac models.

v t e COMBINE
Formats	BioPAX SBGN SBML SED-ML CellML
Standards	MIRIAM SBO KiSAO BioModels.net