Molecular descriptor

Last updated

Molecular descriptors play a fundamental role in chemistry, pharmaceutical sciences, environmental protection policy, and health researches, as well as in quality control, being the way molecules, thought of as real bodies, are transformed into numbers, allowing some mathematical treatment of the chemical information contained in the molecule. This was defined by Todeschini and Consonni as:

Contents

"The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment." [1]

By this definition, the molecular descriptors are divided into two main categories: experimental measurements, such as log P, molar refractivity, dipole moment, polarizability, and, in general, additive physico-chemical properties, and theoretical molecular descriptors, which are derived from a symbolic representation of the molecule and can be further classified according to the different types of molecular representation. [2]

The main classes of theoretical molecular descriptors are: 1) 0D-descriptors (i.e. constitutional descriptors, count descriptors), 2) 1D-descriptors (i.e. list of structural fragments, fingerprints),3) 2D-descriptors (i.e. graph invariants),4) 3D-descriptors (such as, for example, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, quantum-chemical descriptors, size, steric, surface and volume descriptors),5) 4D-descriptors (such as those derived from GRID or CoMFA methods, Volsurf). The outspread of artificial intelligence and machine learning to computational chemistry has also lead to various attempts to uncover new descriptors or to find the most predictive ones among some sort of candidates. [3] [4]

Invariance properties of molecular descriptors

The invariance properties of molecular descriptors can be defined as the ability of the algorithm for their calculation to give a descriptor value that is independent of the particular characteristics of the molecular representation, such as atom numbering or labeling, spatial reference frame, molecular conformations, etc. Invariance to molecular numbering or labeling is assumed as a minimal basic requirement for any descriptor.

Two other important invariance properties, translational invariance and rotational invariance, are the invariance of a descriptor value to any translation or rotation of the molecules in the chosen reference frame. These last invariance properties are required for the 3D-descriptors.

Degeneracy of molecular descriptors

This property refers to the ability of a descriptor to avoid equal values for different molecules. In this sense, descriptors can show no degeneracy at all, low, intermediate, or high degeneracy. For example, the number of molecule atoms and the molecular weights are high degeneracy descriptors, while, usually, 3D-descriptors show low or no degeneracy at all.

Basic requirements for optimal descriptors

  1. Should have structural interpretation
  2. Should have good correlation with at least one property
  3. Should preferably discriminate among isomers
  4. Should be possible to apply to local structure
  5. Should be possible to generalize to "higher" descriptors
  6. Should be simple
  7. Should not be based on experimental properties
  8. Should not be trivially related to other descriptors
  9. Should be possible to construct efficiently
  10. Should use familiar structural concepts
  11. Should change gradually with gradual changes in structures
  12. Should have the correct size dependence, if related to the molecule size

Software for molecular descriptors calculation

Here there is a list of a selection of commercial and free descriptor calculation tools.

NameDescriptorsFingerprintsCLIGUIKNIMECommentsLicenseWebsite
alvaDesc [5] [6] 5666YesYesYesYesAvailable for Windows, Linux and macOS Proprietary, commercial https://www.alvascience.com/alvadesc/
Dragon [7] 5270YesYesYesYesDiscontinued Proprietary, commercial https://chm.kode-solutions.net/products_dragon.php
Mordred [8] 1826NoYesNoNoBased on RDKit Free open source https://github.com/mordred-descriptor
PaDEL-descriptor [9] 1875YesYesYesYesBased on CDK Free open source http://www.yapcwsoft.com/dd/padeldescriptor/

See also

Related Research Articles

<span class="mw-page-title-main">Blood–brain barrier</span> Semipermeable capillary border that allows selective passage of blood constituents into the brain

The blood–brain barrier (BBB) is a highly selective semipermeable border of endothelial cells that regulates the transfer of solutes and chemicals between the circulatory system and the central nervous system, thus protecting the brain from harmful or unwanted substances in the blood. The blood–brain barrier is formed by endothelial cells of the capillary wall, astrocyte end-feet ensheathing the capillary, and pericytes embedded in the capillary basement membrane. This system allows the passage of some small molecules by passive diffusion, as well as the selective and active transport of various nutrients, ions, organic anions, and macromolecules such as glucose and amino acids that are crucial to neural function.

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

<span class="mw-page-title-main">Drug design</span> Invention of new medications based on knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

<span class="mw-page-title-main">Molecular modelling</span> Discovering chemical properties by physical simulations

Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials science to study molecular systems ranging from small chemical systems to large biological molecules and material assemblies. The simplest calculations can be performed by hand, but inevitably computers are required to perform molecular modelling of any reasonably sized system. The common feature of molecular modelling methods is the atomistic level description of the molecular systems. This may include treating atoms as the smallest individual unit, or explicitly modelling protons and neutrons with its quarks, anti-quarks and gluons and electrons with its photons.

Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.

<span class="mw-page-title-main">Pharmacophore</span> Abstract description of molecular features

In medicinal chemistry and molecular biology, a pharmacophore is an abstract description of molecular features that are necessary for molecular recognition of a ligand by a biological macromolecule. IUPAC defines a pharmacophore to be "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger its biological response". A pharmacophore model explains how structurally diverse ligands can bind to a common receptor site. Furthermore, pharmacophore models can be used to identify through de novo design or virtual screening novel ligands that will bind to the same receptor.

Mathematical chemistry is the area of research engaged in novel applications of mathematics to chemistry; it concerns itself principally with the mathematical modeling of chemical phenomena. Mathematical chemistry has also sometimes been called computer chemistry, but should not be confused with computational chemistry.

<span class="mw-page-title-main">Chemistry Development Kit</span> Computer software

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

The applicability domain (AD) of a QSAR model is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.

<span class="mw-page-title-main">Hosoya index</span> Number of matchings in a graph

The Hosoya index, also known as the Z index, of a graph is the total number of matchings in it. The Hosoya index is always at least one, because the empty set of edges is counted as a matching for this purpose. Equivalently, the Hosoya index is the number of non-empty matchings plus one. The index is named after Haruo Hosoya. It is used as a topological index in chemical graph theory.

In the fields of chemical graph theory, molecular topology, and mathematical chemistry, a topological index, also known as a connectivity index, is a type of a molecular descriptor that is calculated based on the molecular graph of a chemical compound. Topological indices are numerical parameters of a graph which characterize its topology and are usually graph invariant. Topological indices are used for example in the development of quantitative structure-activity relationships (QSARs) in which the biological activity or other properties of molecules are correlated with their chemical structure.

In chemical graph theory, the Wiener index introduced by Harry Wiener, is a topological index of a molecule, defined as the sum of the lengths of the shortest paths between all pairs of vertices in the chemical graph representing the non-hydrogen atoms in the molecule.

<span class="mw-page-title-main">Chemical similarity</span> Chemical term

Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.

The lower critical solution temperature (LCST) or lower consolute temperature is the critical temperature below which the components of a mixture are miscible in all proportions. The word lower indicates that the LCST is a lower bound to a temperature interval of partial miscibility, or miscibility for certain compositions only.

The Randić index, also known as the connectivity index, of a graph is the sum of bond contributions where and are the degrees of the vertices making bond i ~ j.

Matched molecular pair analysis (MMPA) is a method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, such as the substitution of a hydrogen atom by a chlorine one. Such pairs of compounds are known as matched molecular pairs (MMP). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair can more easily be interpreted. The term was first coined by Kenny and Sadowski in the book Chemoinformatics in Drug Discovery.

In computational chemistry, a solvent model is a computational method that accounts for the behavior of solvated condensed phases. Solvent models enable simulations and thermodynamic calculations applicable to reactions and processes which take place in solution. These include biological, chemical and environmental processes. Such calculations can lead to new predictions about the physical processes occurring by improved understanding.

Yvonne Connolly Martin is an American cheminformatics and computer-aided drug design expert who rose to the rank of Senior Volwiler Research Fellow at Abbott Laboratories. Trained in chemistry at Northwestern University, she became a leader in collaborative science aimed at discovering and developing bioactive molecules as therapeutic agents, with her contributions proceeding from application of methods to understand how descriptors of molecular shapes and physicochemical properties relate to their biological activity. She is the author of a seminal volume in cheminformatics, Quantitative Drug Design, and has been the recipient of numerous awards in her field, including being named as a fellow of the American Association for the Advancement of Science (1985) and of the International Union of Pure and Applied Chemistry (2000), and receiving the Herman Skolnik Award (2009) and the Award for Computers in Chemical and Pharmaceutical Research (2017) from the American Chemical Society.

References

  1. Todeschini, Roberto; Consonni, Viviana (2000). Handbook of Molecular Descriptors. Methods and Principles in Medicinal Chemistry. Wiley. doi:10.1002/9783527613106. ISBN   978-3-527-29913-3.
  2. Mauri, Andrea; Consonni, Viviana; Todeschini, Roberto (2017). "Molecular Descriptors". Handbook of Computational Chemistry. Springer International Publishing. pp. 2065–2093. doi:10.1007/978-3-319-27282-5_51. ISBN   978-3-319-27282-5.
  3. Mueller, Tim; Kusne, Aaron Gilad; Ramprasad, Rampi (2016-04-01). "Machine Learning in Materials Science". In Parrill, Abby L.; Lipkowitz, Kenny B. (eds.). Reviews in Computational Chemistry. Vol. 29 (1st ed.). Wiley. pp. 186–273. doi:10.1002/9781119148739.ch4. ISBN   978-1-119-10393-6.
  4. Ghiringhelli, Luca M.; Vybiral, Jan; Levchenko, Sergey V.; Draxl, Claudia; Scheffler, Matthias (2015-03-10). "Big Data of Materials Science: Critical Role of the Descriptor". Physical Review Letters. 114 (10). 105503. arXiv: 1411.7437 . Bibcode:2015PhRvL.114j5503G. doi:10.1103/PhysRevLett.114.105503. PMID   25815947.
  5. Mauri, Andrea (2020). "alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints". Methods in Pharmacology and Toxicology. New York, NY: Springer US. pp. 801–820. doi:10.1007/978-1-0716-0150-1_32. ISBN   978-1-0716-0149-5. ISSN   1557-2153. S2CID   213896490.
  6. Mauri, Andrea; Bertola, Matteo (2022). "Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood–Brain Barrier Permeability". International Journal of Molecular Sciences. 23 (12882): 12882. doi: 10.3390/ijms232112882 . PMC   9655980 . PMID   36361669.
  7. Mauri, A., Consonni, V., Pavan, M., & Todeschini, R. (2006). Dragon software: An easy approach to molecular descriptor calculations. Match Communications In Mathematical And In Computer Chemistry, 56(2), 237–248.
  8. Moriwaki, H., Tian, Y. S., Kawashita, N., & Takagi, T. (2018). Mordred: A molecular descriptor calculator. Journal of Cheminformatics, 10(1), 1–14. https://doi.org/10.1186/s13321-018-0258-y
  9. Yap, C. W. (2011). PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry. https://doi.org/10.1002/jcc.21707

Further reading