Chemical Markup Language

Last updated
cml
Filename extension
.cml
Internet media type chemical/x-cml
Type of format chemical file format

Chemical Markup Language (ChemML or CML) is an approach to managing molecular information using tools such as XML and Java. [1] It was the first domain specific implementation based strictly on XML, first based on a DTD [2] and later on an XML Schema, [3] the most robust and widely used system for precise information management in many areas. It has been developed over more than a decade by Murray-Rust, Rzepa and others and has been tested in many areas and on a variety of machines.

Contents

Chemical information is traditionally stored in many different file types which inhibit reuse of the documents. CML uses XML's portability to help CML developers and chemists design interoperable documents. There are a number of tools that can generate, process and view CML documents. Publishers can distribute chemistry within XML documents by using CML, e.g. in RSS documents. [4]

CML is capable of supporting a wide range of chemical concepts including:

Details of CML and points currently under discussion are now posted on the CML Blog.

Versioning

Versions of the schema are available at SourceForge. As of April 2012, the latest frozen schema is CML v2.4. Some constructs in CML v1 are now deprecated.

Tools

JUMBO began life as the Java Universal Molecular Browser for Objects but is now a Java library that supports validation, reading and writing of CML as well as conversion of several legacy formats to CML and, for example, a reaction in CML to an animated SVG representation of the reaction. [7] JUMBO has evolved into an extensive Java library, CMLDOM, [8] supporting all elements in the schema. [9] Although JUMBO used to be a browser, the preferred approach is to use the Open Source tools Jmol and JChemPaint, some of which use alternative CML libraries. [10] See Blue Obelisk.

Software support

Software importing and exporting a valid CML format

See also

Related Research Articles

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

A chemical file format is a type of data file which is used specifically for depicting molecular data. One of the most widely used is the chemical table file format, which is similar to Structure Data Format (SDF) files. They are text files that represent multiple chemical structure records and associated data fields. The XYZ file format is a simple format that usually gives the number of atoms in the first line, a comment on the second, followed by a number of lines with atomic symbols and cartesian coordinates. The Protein Data Bank Format is commonly used for proteins but is also used for other types of molecules. There are many other types which are detailed below. Various software systems are available to convert from one format to another.

<span class="mw-page-title-main">Open Babel</span>

Open Babel is computer software, a chemical expert system mainly used to interconvert chemical file formats.

The World Wide Molecular Matrix (WWMM) was a proposed electronic repository for unpublished chemical data. First introduced in 2002 by Peter Murray-Rust and his colleagues in the chemistry department at the University of Cambridge in the United Kingdom, WWMM provided a free, easily searchable database for information about thousands of complicated molecules, data that would otherwise remain inaccessible to scientists.

<span class="mw-page-title-main">Jmol</span> Open-source Java viewer for 3D chemical structures

Jmol is computer software for molecular modelling chemical structures in 3-dimensions. Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g., in chemistry and biochemistry.

<span class="mw-page-title-main">JOELib</span>

JOELib is computer software, a chemical expert system used mainly to interconvert chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix and other operating systems supporting the programming language Java. It is free and open-source software distributed under the GNU General Public License (GPL) 2.0.

<span class="mw-page-title-main">Chemistry Development Kit</span> Computer software

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">Henry Rzepa</span>

Henry Stephen Rzepa is a chemist and Emeritus Professor of Computational Chemistry at Imperial College London.

<span class="mw-page-title-main">Peter Murray-Rust</span> Chemist and open-access research activist

Peter Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Murray-Rust is also known for his support of open access and open data.

<span class="mw-page-title-main">ChemDraw</span> Software for chemical structure drawing

ChemDraw is a molecule editor first developed in 1985 by Selena "Sally" Evans, her husband David A. Evans, and Stewart Rubenstein. The company was sold to PerkinElmer in the year 2011. ChemDraw, along with Chem3D and ChemFinder, is part of the ChemOffice suite of programs and is available for Macintosh and Microsoft Windows.

Wiswesser line notation (WLN), invented by William J. Wiswesser in 1949, was the first line notation capable of precisely describing complex molecules. It was the basis of ICI Ltd's CROSSBOW database system developed in the late 1960s. WLN allowed for indexing the Chemical Structure Index (CSI) at the Institute for Scientific Information (ISI). It was also the tool used to develop the CAOCI (Commercially Available Organic Chemical Intermediates) database, the datafile from which Accelrys' (successor to MDL) ACD file was developed. WLN is still being extensively used by BARK Information Services. Descriptions of how to encode molecules as WLN have been published in several books.

ChemSpider is a freely accessible online database of chemicals owned by the Royal Society of Chemistry. It contains information on more than 100 million molecules from over 270 data sources, each of them receiving a unique identifier called ChemSpider Identifier.

<span class="mw-page-title-main">Blue Obelisk</span>

Blue Obelisk is an informal group of chemists who promote open data, open source, and open standards; it was initiated by Peter Murray-Rust and others in 2005. Multiple open source cheminformatics projects associate themselves with the Blue Obelisk, among which, in alphabetical order, Avogadro, Bioclipse, cclib, Chemistry Development Kit, GaussSum, JChemPaint, JOELib, Kalzium, Openbabel, OpenSMILES, and UsefulChem.

Louis Hodes was an American mathematician, computer scientist, and cancer researcher.

<span class="mw-page-title-main">Christoph Steinbeck</span> German chemist (born 1966)

Christoph Steinbeck is a German chemist and has a professorship for analytical chemistry, cheminformatics and chemometrics at the Friedrich-Schiller-Universität Jena in Thuringia.

ChemWindow is a chemical structure drawing molecule editor and publishing program now published by John Wiley & Sons as of 2020, originally developed by Bio-Rad Laboratories, Inc. It was first developed by SoftShell International in the 1990s. Bio-Rad acquired this technology in 1996 and eventually made it part of their KnowItAll software product line, offering a specific ChemWindow edition of their software for structure drawing and publishing. They have also incorporated ChemWindow structure drawing components into their KnowItAll spectroscopy software packages with their DrawIt, ReportIt, and MineIt tools.

References

  1. Murray-Rust, Peter; Rzepa, Henry S (2011). "CML: Evolution and design". Journal of Cheminformatics . 3 (1): 44. doi: 10.1186/1758-2946-3-44 . PMC   3205047 . PMID   21999549.
  2. Murray-Rust, P.; Rzepa, H. S. (1999), "Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles", J. Chem. Inf. Comput. Sci. , 39 (6): 928–942, CiteSeerX   10.1.1.40.8275 , doi:10.1021/ci990052b
  3. Murray-Rust, P.; Rzepa, H. S. (2003), "Chemical Markup, XML and the World Wide Web. 4. CML Schema", J. Chem. Inf. Comput. Sci. , 43 (3): 757–772, doi:10.1021/ci0256541, PMID   12767134
  4. Gkoutos, G. V.; Murray-Rust, P.; Rzepa, S.; Wright, M. (2001), "Chemical Markup, XML, and the World-Wide Web. 3. Toward a Signed Semantic Chemical Web of Trust", J. Chem. Inf. Comput. Sci. , 41 (5): 1124–1130, doi:10.1021/ci000406v, PMID   11604013
  5. Holliday, G. L.; Murray-Rust, P.; Rzepa, H. S. (2006), "Chemical Markup, XML and the World Wide Web. Part 6. CMLReact; An XML Vocabulary for Chemical Reactions", J. Chem. Inf. Model. , 46 (1): 145–157, doi:10.1021/ci0502698, PMID   16426051
  6. Kuhn, S.; Helmus, T.; Lancashire, R. J.; Murray-Rust, P.; Rzepa, H. S.; Steinbeck, C.; Willighagen, E. L. (2007), "Chemical Markup, XML, and the World Wide Web. 7. CMLSpect, an XML Vocabulary for Spectral Data", J. Chem. Inf. Model. , 47 (6): 2015–2034, doi:10.1021/ci600531a, PMID   17887743
  7. JUMBO
  8. Murray-Rust, P.; Rzepa, H. S. (2001), "Chemical Markup, XML and the World-Wide Web. 2. Information Objects and the CMLDOM", J. Chem. Inf. Comput. Sci. , 41 (5): 1113–1123, doi:10.1021/ci000404a, PMID   11604012
  9. CML home on Sourceforge
  10. Willighagen, E.L. (2001). "Processing CML Conventions in Java". Internet Journal of Chemistry. 4. Archived from the original on 2001-04-11.

Further reading