CellML

Last updated
CellML logo.svg

CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology. This is reflected in its name CellML, although this is simply a name, not an abbreviation. [1] CellML is growing in popularity as a portable description format for computational models, and groups throughout the world are using CellML for modelling or developing software tools based on CellML. CellML is similar to Systems Biology Markup Language SBML but provides greater scope for model modularity and reuse, and is not specific to descriptions of biochemistry.

Contents

History

The CellML language grew from a need to share models of cardiac cell dynamics among researchers at a number of sites across the world. The original working group formed in 1998 consisted of David Bullivant, Warren Hedley, and Poul Nielsen; all three were at that time members of the Department of Engineering Science at the University of Auckland. The language was an application of the XML specification developed by the World Wide Web Consortium – the decision to use XML was based on late 1998 recommendations from Warren Hedley and André (David) Nickerson. Existing XML-based languages were leveraged to describe the mathematics (content MathML), metadata (RDF), and links between resources (XLink). The CellML working group first became aware of the SBML effort in late 2000, when Warren Hedley attended the 2nd workshop on Software Platforms for Systems Biology in Tokyo.

The working group collaborated with a number of researchers at Physiome Sciences Inc. (particularly Melanie Nelson, Scott Lett, Mark Grehlinger, Prasad Ramakrishna, Jeremy Rice, Adam Muzikant, and Kam-Chuen Jim) to draft the initial CellML 1.0 specification, which was published on the 11th of August 2001. This first draft was followed by specifications for CellML Metadata and an update to CellML to accommodate structured nesting of models with the addition of the <import> element. Physiome Sciences Inc. also produced the first CellML capable software. The National Resource for Cell Analysis and Modeling (NRCAM) at the University of Connecticut Health Center also produced early CellML capable software called Virtual Cell.

In 2002 the CellML 1.1 specification was written, in which imports were added. Imports provide the ability to incorporate external components into a model, enabling modular modelling. This specification was frozen in early 2006. Work has continued on metadata and other specifications.

In July 2009 the CellML website was completely revamped, and an initial version of the new CellML repository software (PMR2) was released.

The structure of a CellML model

A CellML model consists of a number of components, each described in their own component element. A component can be an entirely conceptual entity created for modelling convenience, or it can have some real physical interpretation (for example, it could represent the cell membrane).

Each component contains a number of variables, which must be declared by placing a variable element inside the component. For example, a component representing a cell membrane may have a variable called V representing the potential difference (voltage) across the cell membrane.

Mathematical relationships between variables are expressed within components, using MathML. MathML is used to make declarative expressions (as opposed to procedural statements as in a computer programming language). However, most CellML processing software will only accept a limited of range of mathematics (for example, some processing software requires equations with a single variable on one side of an equality). The choice of MathML makes CellML particularly suited for describing models containing differential equations. There is no mechanism for the expression of stochastic models or any other form of randomness.

Components can be connected in other components using a connection element, which describes the name of two components to be connected, and the variables in the first component which are mapped to variables in the second component. Such connections are a statement that the variable in one component is equivalent to another variable in another component.

CellML models also allow relationships between components to be expressed. The CellML specification defines two types of relationship, encapsulation and containment, however more can be defined by the user. The containment relationship is used to express that one component is physically within another. The encapsulation relationship is special because it is the only relationship that affects the interpretation of the rest of the model. The effect of encapsulation is that components encapsulated beneath other components are private and cannot be accessed except by the component directly above in the encapsulation tree. The modeller is free to use encapsulation as a conceptual tool, and it does not necessarily have any physical interpretation.

Specifications

CellML is defined by core specifications as well as additional specifications for metadata, used to annotate models and specify simulations.

CellML 1.0

CellML 1.0 was the first final specification, and is used to describe many of the models in the CellML Model Repository.

CellML 1.0 has some biochemistry specific elements for describing the role of variables in a reaction model.

CellML 1.1

CellML 1.1 introduced the ability to import components and units. In order to fully support this feature, variables in CellML 1.1 accept variable names as initial values.

Metadata specifications

CellML has several metadata specifications, used to annotate models or provide information for running and/or visualizing simulations of models.

CellML.org

CellML.org aims to provide a focal point for the CellML community. Members can submit, review, and update models and receive feedback and help from the community. A CellML discussion mailing list can be found at CellML-discussion mailing list. The scope of this mailing list includes everything related to the development and use of CellML.

A repository of several hundred biological models encoded into CellML can be found on the CellML community website at CellML Model Repository. These models are actively undergoing a curation process aiming to provide annotations with biological ontologies such as Gene Ontology and to validate the models against standards of unit balance and biophysical constrains such as conservation of mass, charge, energy etc.

Related Research Articles

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

The XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML).

XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.

The Darwin Information Typing Architecture (DITA) specification defines a set of document types for authoring and organizing topic-oriented information, as well as a set of mechanisms for combining, extending, and constraining document types. It is an open standard that is defined and maintained by the OASIS DITA Technical Committee.

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

NIEMOpen, frequently referred to as NIEM, originated as an XML-based information exchange framework from the United States, but has transitioned to an OASISOpen Project. This initiative formalizes NIEM's designation as an official standard in national and international policy and procurement. NIEMOpen's Project Governing Board recently approved the first standard under this new project; the Conformance Targets Attribute Specification (CTAS) Version 3.0. A full collection of NIEMOpen standards are anticipated by end of year 2024.

The Clinical Data Interchange Standards Consortium (CDISC) is a standards developing organization (SDO) dealing with medical research data linked with healthcare, to "enable information system interoperability to improve medical research and related areas of healthcare". The standards support medical research from protocol through analysis and reporting of results and have been shown to decrease resources needed by 60% overall and 70–90% in the start-up stages when they are implemented at the beginning of the research process.

SensorML is an approved Open Geospatial Consortium standard and an XML encoding for describing sensors and measurement processes. SensorML can be used to describe a wide range of sensors, including both dynamic and stationary platforms and both in-situ and remote sensors.

The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

<span class="mw-page-title-main">BioModels</span> Database of biological reactions

BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scientific literature.

<span class="mw-page-title-main">Systems Biology Ontology</span>

The Systems Biology Ontology (SBO) is a set of controlled, relational vocabularies of terms commonly used in systems biology, and in particular in computational modeling.

The IMS Question and Test Interoperability specification (QTI) defines a standard format for the representation of assessment content and results, supporting the exchange of this material between authoring and delivery systems, repositories and other learning management systems. It allows assessment materials to be authored and delivered on multiple systems interchangeably. It is, therefore, designed to facilitate interoperability between systems.

<span class="mw-page-title-main">EPUB</span> E-book file format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook (OEB) standard.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

NeuroML is an XML based model description language that aims to provide a common data format for defining and exchanging models in computational neuroscience. The focus of NeuroML is on models which are based on the biophysical and anatomical properties of real neurons.

Virtual Cell (VCell) is an open-source software platform for modeling and simulation of living organisms, primarily cells. It has been designed to be a tool for a wide range of scientists, from experimental cell biologists to theoretical biophysicists.

LibSBML is an open-source software library that provides an application programming interface (API) for the SBML format. The libSBML library can be embedded in a software application or used in a web servlet as part of the application or servlet's implementation of support for reading, writing, and manipulating SBML documents and data streams. The core of libSBML is written in ISO standard C++; the library provides API for many programming languages via interfaces generated with the help of SWIG.

Multi-state modeling of biomolecules refers to a series of techniques used to represent and compute the behaviour of biological molecules or complexes that can adopt a large number of possible functional states.

References

  1. "[cellml-discussion] Expanding the CellML abbreviation".

See also