Developer(s) | Open Babel development team |
---|---|
Initial release | 2 June 2005 |
Stable release | |
Repository | www |
Written in | C, C++ (wxWidgets [2] ) |
Operating system | Windows, macOS, Linux, Android |
Platform | IA-32, x86-64 |
Available in | English |
Type | Cheminformatics, molecular modelling |
License | GPL 2.0 |
Website | www |
Open Babel is a free chemical informatics software designed to facilitate the conversion of Chemical file formats and manage molecular data. [3] It serves as a chemical expert system, widely used in fields such as cheminformatics, molecular modelling, and computational chemistry. Open Babel provides both a comprehensive library and command-line utilities, making it a versatile tool for researchers, developers, and professionals. [4]
Due to the strong relationship to informatics this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix, Linux, macOS, and Android. It is free and open-source software released under a GNU General Public License (GPL) 2.0.
The project's stated goal is: "Open Babel is a community-driven scientific project assisting both users and developers as a cross-platform program and library designed to support molecular modeling, chemistry, and many related areas, including interconversion of file formats and data."
Open Babel and JOELib were derived from the OELib cheminformatics library. In turn, OELib was based on ideas in the original chemistry program Babel and an unreleased object-oriented programming library called OBabel.
In cheminformatics, Open Babel facilitates the management of molecular data through substructure searching and molecular fingerprint calculations. These functionalities enable similarity analysis, dataset clustering, and efficient organization of chemical libraries, making it suitable for large-scale workflows.
In drug discovery, Open Babel supports tasks such as preparing chemical libraries for high-throughput virtual screening and standardizing molecular formats for structure-based drug design. The software's ability to generate 3D molecular coordinates and calculate molecular descriptors is particularly valuable in predicting properties such as solubility, reactivity, and toxicity. [7]
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.
Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.
Chemical Markup Language is an approach to managing molecular information using tools such as XML and Java. It was the first domain specific implementation based strictly on XML, first based on a DTD and later on an XML Schema, the most robust and widely used system for precise information management in many areas. It has been developed over more than a decade by Murray-Rust, Rzepa and others and has been tested in many areas and on a variety of machines.
A chemical file format is a type of data file which is used specifically for depicting molecular data. One of the most widely used is the chemical table file format, which is similar to Structure Data Format (SDF) files. They are text files that represent multiple chemical structure records and associated data fields. The XYZ file format is a simple format that usually gives the number of atoms in the first line, a comment on the second, followed by a number of lines with atomic symbols and cartesian coordinates. The Protein Data Bank Format is commonly used for proteins but is also used for other types of molecules. There are many other types which are detailed below. Various software systems are available to convert from one format to another.
The International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.
OpenEye Scientific Software is an American software company founded by Anthony Nicholls in 1997. It develops large-scale molecular modelling applications and toolkits. Following OpenEye's acquisition by Cadence Design Systems for $500 million in September 2022, the company was rebranded to OpenEye Cadence Molecular Sciences and operates as a business unit under Cadence.
JOELib is computer software, a chemical expert system used mainly to interconvert chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix and other operating systems supporting the programming language Java. It is free and open-source software distributed under the GNU General Public License (GPL) 2.0.
The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.
Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.
Peter Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Murray-Rust is also known for his support of open access and open data.
MDL Information Systems, Inc. was a provider of R&D informatics products for the life sciences and chemicals industries. The company was launched as a computer-aided drug design firm in January 1978 in Hayward, California. The company was acquired by Symyx Technologies, Inc. in 2007. Subsequently Accelrys merged with Symyx. The Accelrys name was retained for the combined company. In 2014 Accelrys was acquired by Dassault Systemes. The Accelrys business unit was renamed BIOVIA.
Substructure search (SSS) is a method to retrieve from a database only those chemicals matching a pattern of atoms and bonds which a user specifies. It is an application of graph theory, specifically subgraph matching in which the query is a hydrogen-depleted molecular graph. The mathematical foundations for the method were laid in the 1870s, when it was suggested that chemical structure drawings were equivalent to graphs with atoms as vertices and bonds as edges. SSS is now a standard part of cheminformatics and is widely used by pharmaceutical chemists in drug discovery.
BIOVIA is a software company headquartered in the United States, with representation in Europe and Asia. It provides software for chemical, materials and bioscience research for the pharmaceutical, biotechnology, consumer packaged goods, aerospace, energy and chemical industries.
ChemSpider is a freely accessible online database of chemicals owned by the Royal Society of Chemistry. It contains information on more than 100 million molecules from over 270 data sources, each of them receiving a unique identifier called ChemSpider Identifier.
Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.
Blue Obelisk is an informal group of chemists who promote open data, open source, and open standards; it was initiated by Peter Murray-Rust and others in 2005. Multiple open source cheminformatics projects associate themselves with the Blue Obelisk, among which, in alphabetical order, Avogadro, Bioclipse, cclib, Chemistry Development Kit, GaussSum, JChemPaint, JOELib, Kalzium, Openbabel, OpenSMILES, and UsefulChem.
Avogadro is a molecule editor and visualizer designed for cross-platform use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. It is extensible via a plugin architecture.
Christoph Steinbeck is a German chemist and has a professorship for analytical chemistry, cheminformatics and chemometrics at the Friedrich-Schiller-Universität Jena in Thuringia.
A chemical graph generator is a software package to generate computer representations of chemical structures adhering to certain boundary conditions. The development of such software packages is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.