Chemical similarity

Last updated

Chemical similarity (or molecular similarity) refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds (among others).

Contents

Amphetamine and Methylhexanamine similarity Amphetamine & Methylhexanamine similarity V.2.svg
Amphetamine and Methylhexanamine similarity

The notion of chemical similarity (or molecular similarity) is one of the most important concepts in cheminformatics. [1] [2] It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties. [1]

Similarity measures

Chemical similarity is often described as an inverse of a measure of distance in descriptor space. Examples for inverse distance measures are molecule kernels, that measure the structural similarity of chemical compounds. [3]

Similarity search and virtual screening

The similarity-based [4] virtual screening (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid, [5] quite often the set of retrieved compounds is considerably enriched with actives. [6] To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys, [7] are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight, [8] BCI, [9] and UNITY 2D (Tripos [10] ) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient T. Two structures are usually considered similar if T > 0.85 (for Daylight fingerprints). However, it is a common misunderstanding that a similarity of T > 0.85 reflects similar bioactivities in general ("the 0.85 myth"). [11]

Chemical similarity network

The concept of chemical similarity can be expanded to consider chemical similarity network theory, where descriptive network properties and graph theory can be applied to analyze large chemical space, estimate chemical diversity and predict drug target. Recently, 3D chemical similarity networks based on 3D ligand conformation have also been developed, which can be used to identify scaffold hopping ligands.

See also

Related Research Articles

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

<span class="mw-page-title-main">Drug design</span> Invention of new medications based on knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

In the physical sciences, a partition coefficient (P) or distribution coefficient (D) is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solubilities of the solute in these two liquids. The partition coefficient generally refers to the concentration ratio of un-ionized species of compound, whereas the distribution coefficient refers to the concentration ratio of all species of the compound.

Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.

<span class="mw-page-title-main">Pharmacophore</span> Abstract description of molecular features

In medicinal chemistry and molecular biology, a pharmacophore is an abstract description of molecular features that are necessary for molecular recognition of a ligand by a biological macromolecule. IUPAC defines a pharmacophore to be "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger its biological response". A pharmacophore model explains how structurally diverse ligands can bind to a common receptor site. Furthermore, pharmacophore models can be used to identify through de novo design or virtual screening novel ligands that will bind to the same receptor.

<span class="mw-page-title-main">Chemical space</span>

Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds which are readily accessible and available to researchers. It is a library used in the method of molecular docking.

PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database.

<span class="mw-page-title-main">OpenEye Scientific Software</span> American molecular modelling software company

OpenEye Scientific Software is an American software company founded by Anthony Nicholls in 1997. It develops large-scale molecular modelling applications and toolkits. Following OpenEye's acquisition by Cadence Design Systems for $500 million in September 2022, the company was rebranded to OpenEye Cadence Molecular Sciences and operates as a business unit under Cadence.

A structural analog, also known as a chemical analog or simply an analog, is a compound having a structure similar to that of another compound, but differing from it in respect to a certain component.

<span class="mw-page-title-main">Chemistry Development Kit</span> Computer software

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">Virtual screening</span>

Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

Hit to lead (H2L) also known as lead generation is a stage in early drug discovery where small molecule hits from a high throughput screen (HTS) are evaluated and undergo limited optimization to identify promising lead compounds. These lead compounds undergo more extensive optimization in a subsequent step of drug discovery called lead optimization (LO). The drug discovery process generally follows the following path that includes a hit to lead stage:

<span class="mw-page-title-main">Chemical Computing Group</span> Software company in Canada

Chemical Computing Group is a software company specializing in research software for computational chemistry, bioinformatics, cheminformatics, docking, pharmacophore searching and molecular simulation. The company's main customer base consists of pharmaceutical and biotechnology companies, as well as academic research groups. It is a private company that was founded in 1994; it is based in Montreal, Quebec, Canada. Its main product, Molecular Operating Environment (MOE), is written in a self-contained programming system, the Scientific Vector Language (SVL).

Topological inhibitors are rigid three-dimensional molecules of inorganic, organic, and hybrid compounds that form multicentered supramolecular interactions in vacant cavities of protein macromolecules and their complexes.

Matched molecular pair analysis (MMPA) is a method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, such as the substitution of a hydrogen atom by a chlorine one. Such pairs of compounds are known as matched molecular pairs (MMP). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair can more easily be interpreted. The term was first coined by Kenny and Sadowski in the book Chemoinformatics in Drug Discovery.

Yvonne Connolly Martin is an American cheminformatics and computer-aided drug design expert who rose to the rank of Senior Volwiler Research Fellow at Abbott Laboratories. Trained in chemistry at Northwestern University, she became a leader in collaborative science aimed at discovering and developing bioactive molecules as therapeutic agents, with her contributions proceeding from application of methods to understand how descriptors of molecular shapes and physicochemical properties relate to their biological activity. She is the author of a seminal volume in cheminformatics, Quantitative Drug Design, and has been the recipient of numerous awards in her field, including being named as a fellow of the American Association for the Advancement of Science (1985) and of the International Union of Pure and Applied Chemistry (2000), and receiving the Herman Skolnik Award (2009) and the Award for Computers in Chemical and Pharmaceutical Research (2017) from the American Chemical Society.

Molecular Operating Environment (MOE) is a drug discovery software platform that integrates visualization, modeling and simulations, as well as methodology development, in one package. MOE scientific applications are used by biologists, medicinal chemists and computational chemists in pharmaceutical, biotechnology and academic research. MOE runs on Windows, Linux, Unix, and macOS. Main application areas in MOE include structure-based design, fragment-based design, ligand-based design, pharmacophore discovery, medicinal chemistry applications, biologics applications, structural biology and bioinformatics, protein and antibody modeling, molecular modeling and simulations, virtual screening, cheminformatics & QSAR. The Scientific Vector Language (SVL) is the built-in command, scripting and application development language of MOE.

SIRIUS is a Java-based open-source software for the identification of small molecules from fragmentation mass spectrometry data without the use of spectral libraries. It combines the analysis of isotope patterns in MS1 spectra with the analysis of fragmentation patterns in MS2 spectra. SIRIUS is the umbrella application comprising CSI:FingerID, CANOPUS, COSMIC and ZODIAC.

References

  1. 1 2 Johnson, A. M.; Maggiora, G. M. (1990). Concepts and Applications of Molecular Similarity. New York: John Wiley & Sons. ISBN   978-0-471-62175-1.
  2. N. Nikolova; J. Jaworska (2003). "Approaches to Measure Chemical Similarity - a Review". QSAR & Combinatorial Science . 22 (9–10): 1006–1026. doi:10.1002/qsar.200330831.
  3. Ralaivola, Liva; Swamidass, Sanjay J.; Hiroto, Saigo; Baldi, Pierre (2005). "Graph kernels for chemical informatics". Neural Networks . 18 (8): 1093–1110. doi:10.1016/j.neunet.2005.07.009. PMID   16157471.
  4. Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2009). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics . 1 (12): 12. doi: 10.1186/1758-2946-1-12 . PMC   2820491 . PMID   20298518.
  5. Kubinyi, H. (1998). "Similarity and Dissimilarity: A Medicinal Chemist's View". Perspectives in Drug Discovery and Design. 9–11: 225–252. doi:10.1023/A:1027221424359.
  6. Martin, Y. C.; Kofron, J. L.; Traphagen, L. M. (2002). "Do structurally similar molecules have similar biological activity?". J. Med. Chem. 45 (19): 4350–4358. doi:10.1021/jm020155c. PMID   12213076.
  7. Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. (2002). "Reoptimization of MDL Keys for Use in Drug Discovery". J. Chem. Inf. Comput. Sci. 42 (6): 1273–1280. doi:10.1021/ci010132r. PMID   12444722.
  8. "Daylight Chemical Information Systems Inc". Archived from the original on 2012-12-05. Retrieved 2022-07-19.
  9. "Barnard Chemical Information Ltd". Archived from the original on 2008-10-11.
  10. "Tripos Inc". Archived from the original on 2012-04-19. Retrieved 2022-07-19.
  11. Maggiora, G.; Vogt, M.; Stumpfe, D.; Bajorath, J. (2014). "Molecular Similarity in Medicinal Chemistry". J. Med. Chem. 57 (8): 3186–3204. doi:10.1021/jm401411z. PMID   24151987 . Retrieved 2023-11-13.