Chemical space

Last updated
View of PubChem chemical space; a projection of the 42-dimensional molecular quantum numbers (MQN) properties of compounds in PubChem (5 virtually created libraries of compounds) using PCA. Color coding is according to fraction of ring atoms in molecules (blue 0, red 1). ChemicalSpace.png
View of PubChem chemical space; a projection of the 42-dimensional molecular quantum numbers (MQN) properties of compounds in PubChem (5 virtually created libraries of compounds) using PCA. Color coding is according to fraction of ring atoms in molecules (blue 0, red 1).

Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds which are readily accessible and available to researchers. It is a library used in the method of molecular docking. [2]

Contents

Theoretical spaces

A chemical space often referred to in cheminformatics is that of potential pharmacologically active molecules. Its size is estimated to be in the order of 1060 molecules. There are no rigorous methods for determining the precise size of this space. The assumptions [3] used for estimating the number of potential pharmacologically active molecules, however, use the Lipinski rules, in particular the molecular weight limit of 500. The estimate also restricts the chemical elements used to be Carbon, Hydrogen, Oxygen, Nitrogen and Sulfur. It further makes the assumption of a maximum of 30 atoms to stay below 500 Daltons, allows for branching and a maximum of 4 rings and arrives at an estimate of 1063. This number is often misquoted in subsequent publications to be the estimated size of the whole organic chemistry space, [4] which would be much larger if including the halogens and other elements. In addition to the drug-like space and lead-like space that are, in part, defined by the Lipinski's rule of five, the concept of known drug space (KDS), which is defined by the molecular descriptors of marketed drugs, has also been introduced. [5] [6] [7] KDS can be used to help predict the boundaries of chemical spaces for drug development by comparing the structure of the molecules that are undergoing design and synthesis to the molecular descriptor parameters that are defined by the KDS.

Empirical spaces

As of July 2009, there were 49,037,297 organic and inorganic substances registered with the Chemical Abstracts Service, indicating that they have been reported in the scientific literature. [8] Chemical libraries used for laboratory-based screening for compounds with desired properties are examples for real-world chemical libraries of small size (a few hundred to hundreds of thousands of molecules).

Generation

Systematic exploration of chemical space is possible by creating in silico databases of virtual molecules, [9] which can be visualized by projecting multidimensional property space of molecules in lower dimensions. [10] [11] Generation of chemical spaces may involve creating stoichiometric combinations of electrons and atomic nuclei to yield all possible topology isomers for the given construction principles. In Cheminformatics, software programs called Structure Generators are used to generate the set of all chemical structure adhering to given boundary conditions. Constitutional Isomer Generators, for example, can generate all possible constitutional isomers of a given molecular gross formula.

In the real world, chemical reactions allow us to move in chemical space. The mapping between chemical space and molecular properties is often not unique, meaning that there can be very different molecules exhibiting very similar properties. Materials design and drug discovery both involve the exploration of chemical space.

See also

Related Research Articles

Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number of compounds in a single process. These compound libraries can be made as mixtures, sets of individual compounds or chemical structures generated by computer software. Combinatorial chemistry can be used for the synthesis of small molecules and for peptides.

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

Chemical Markup Language is an approach to managing molecular information using tools such as XML and Java. It was the first domain specific implementation based strictly on XML, first based on a DTD and later on an XML Schema, the most robust and widely used system for precise information management in many areas. It has been developed over more than a decade by Murray-Rust, Rzepa and others and has been tested in many areas and on a variety of machines.

Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.

A molecule editor is a computer program for creating and modifying representations of chemical structures.

<span class="mw-page-title-main">Molecular machine</span> Molecular-scale artificial or biological device

Molecular machines are a class of molecules typically described as an assembly of a discrete number of molecular components intended to produce mechanical movements in response to specific stimuli, mimicking macromolecular devices such as switches and motors. Naturally occurring or biological molecular machines are responsible for vital living processes such as DNA replication and ATP synthesis. Kinesins and ribosomes are examples of molecular machines, and they often take the form of multi-protein complexes. For the last several decades, scientists have attempted, with varying degrees of success, to miniaturize machines found in the macroscopic world. The first example of an artificial molecular machine (AMM) was reported in 1994, featuring a rotaxane with a ring and two different possible binding sites.

A photoswitch is a type of molecule that can change its structural geometry and chemical properties upon irradiation with electromagnetic radiation. Although often used interchangeably with the term molecular machine, a switch does not perform work upon a change in its shape whereas a machine does. However, photochromic compounds are the necessary building blocks for light driven molecular motors and machines. Upon irradiation with light, photoisomerization about double bonds in the molecule can lead to changes in the cis- or trans- configuration. These photochromic molecules are being considered for a range of applications.

<span class="mw-page-title-main">JOELib</span>

JOELib is computer software, a chemical expert system used mainly to interconvert chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix and other operating systems supporting the programming language Java. It is free and open-source software distributed under the GNU General Public License (GPL) 2.0.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">ISIS/Draw</span>

ISIS/Draw was a chemical structure drawing program developed by MDL Information Systems. It introduced a number of file formats for the storage of chemical information that have become industry standards.

Druglikeness is a qualitative concept used in drug design for how "druglike" a substance is with respect to factors like bioavailability. It is estimated from the molecular structure before the substance is even synthesized and tested. A druglike molecule has properties such as:

<span class="mw-page-title-main">Chemical similarity</span> Chemical term

Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.

Inte:Ligand was founded in Maria Enzersdorf, Lower Austria (Niederösterreich) in 2003. They established the company headquarters on Mariahilferstrasse in Vienna, Austria that same year.

LigandScout is computer software that allows creating three-dimensional (3D) pharmacophore models from structural data of macromolecule–ligand complexes, or from training and test sets of organic molecules. It incorporates a complete definition of 3D chemical features that describe the interaction of a bound small organic molecule (ligand) and the surrounding binding site of the macromolecule. These pharmacophores can be overlaid and superimposed using a pattern-matching based alignment algorithm that is solely based on pharmacophoric feature points instead of chemical structure. From such an overlay, shared features can be interpolated to create a so-called shared-feature pharmacophore that shares all common interactions of several binding sites/ligands or extended to create a so-called merged-feature pharmacophore. The software has been successfully used to predict new lead structures in drug design, e.g., predicting biological activity of novel human immunodeficiency virus (HIV) reverse transcriptase inhibitors.

<span class="mw-page-title-main">Antony John Williams</span> British chemist

Antony John Williams is a British chemist and expert in the fields of both nuclear magnetic resonance (NMR) spectroscopy and cheminformatics at the United States Environmental Protection Agency. He is the founder of the ChemSpider website that was purchased by the Royal Society of Chemistry in May 2009. He is a science blogger and an author.

Druggability is a term used in drug discovery to describe a biological target that is known to or is predicted to bind with high affinity to a drug. Furthermore, by definition, the binding of the drug to a druggable target must alter the function of the target with a therapeutic benefit to the patient. The concept of druggability is most often restricted to small molecules but also has been extended to include biologic medical products such as therapeutic monoclonal antibodies.

Matched molecular pair analysis (MMPA) is a method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, such as the substitution of a hydrogen atom by a chlorine one. Such pairs of compounds are known as matched molecular pairs (MMP). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair can more easily be interpreted. The term was first coined by Kenny and Sadowski in the book Chemoinformatics in Drug Discovery.

<span class="mw-page-title-main">Building block (chemistry)</span>

Building block is a term in chemistry which is used to describe a virtual molecular fragment or a real chemical compound the molecules of which possess reactive functional groups. Building blocks are used for bottom-up modular assembly of molecular architectures: nano-particles, metal-organic frameworks, organic molecular constructs, supra-molecular complexes. Using building blocks ensures strict control of what a final compound or a (supra)molecular construct will be.

References

  1. Reymond, J.-L.; Awale, M. (2012). "Exploring chemical space for drug discovery using the chemical universe database". ACS Chem. Neurosci. 3 (9): 649–657. doi:10.1021/cn3000422. PMC   3447393 . PMID   23019491.
  2. Rudling, Axel; Gustafsson, Robert; Almlöf, Ingrid; Homan, Evert; Scobie, Martin; Warpman Berglund, Ulrika; Helleday, Thomas; Stenmark, Pål; Carlsson, Jens (2017-10-12). "Fragment-Based Discovery and Optimization of Enzyme Inhibitors by Docking of Commercial Chemical Space". Journal of Medicinal Chemistry. 60 (19): 8160–8169. doi:10.1021/acs.jmedchem.7b01006. ISSN   1520-4804. PMID   28929756.
  3. Bohacek, R .S.; C. McMartin; W. C. Guida (1999). "The art and practice of structure‐based drug design: A molecular modeling perspective". Medicinal Research Reviews. 16 (1): 3–50. doi:10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6. PMID   8788213. S2CID   44271689.
  4. Kirkpatrick, P.; C. Ellis (2004). "Chemical space". Nature. 432 (7019): 823–865. Bibcode:2004Natur.432..823K. doi: 10.1038/432823a .
  5. Mirza, A.; Desai, R.; Reynisson, J. (2009). "Known drug space as a metric in exploring the boundaries of drug-like chemical space". Eur. J. Med. Chem. 44 (12): 5006–5011. doi:10.1016/j.ejmech.2009.08.014. PMID   19782440.
  6. Bade, R.; Chan, H.F.; Reynisson, J. (2010). "Characteristics of known drug space. Natural products, their derivatives and synthetic drugs". Eur. J. Med. Chem. 45 (12): 5646–5652. doi:10.1016/j.ejmech.2010.09.018. PMID   20888084.
  7. Matuszek, A. M.; Reynisson, J. (2016). "Defining Known Drug Space Using DFT". Mol. Inform. 35 (2): 46–53. doi:10.1002/minf.201500105. PMID   27491789. S2CID   21489164.
  8. "CAS, Chemical Abstracts Service - Database Counter". www.cas.org. Archived from the original on 24 July 2012. Retrieved 22 May 2022.
  9. L. Ruddigkeit; R. van Deursen; L. C. Blum; J.-L. Reymond (2012). "Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17". J. Chem. Inf. Model. 52 (11): 2864–2875. doi:10.1021/ci300415d. PMID   23088335.
  10. M. Awale; R. van Deursen; J. L. Reymond (2013). "MQN-Mapplet: Visualization of Chemical Space with Interactive Maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13". J. Chem. Inf. Model. 53 (2): 509–18. doi: 10.1021/ci300513m . PMID   23297797.
  11. L. Ruddigkeit; L. C. Blum; J.-L. Reymond (2013). "Visualization and Virtual Screening of the Chemical Universe Database GDB-17". J. Chem. Inf. Model. 53 (1): 56–65. doi:10.1021/ci300535x. PMID   23259841. S2CID   18531792.