Matched molecular pair analysis

Last updated

Matched molecular pair analysis (MMPA) is a method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, such as the substitution of a hydrogen atom by a chlorine one. Such pairs of compounds are known as matched molecular pairs (MMP). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair can more easily be interpreted. The term was first coined by Kenny and Sadowski in the book Chemoinformatics in Drug Discovery. [1]

Contents

Introduction

MMP can be defined as a pair of molecules that differ in only a minor single point change (See Fig 1). Matched molecular pairs (MMPs) are widely used in medicinal chemistry to study changes in compound properties which includes biological activity, toxicity, environmental hazards and much more, which are associated with well-defined structural modifications. Single point changes in the molecule pairs are termed a chemical transformation or Molecular transformation. Each molecular pair is associated with a particular transformation. An example of transformation is the replacement of one functional group by another. More specifically, molecular transformation can be defined as the replacement of a molecular fragment having one, two or three attachment points with another fragment. Useful Molecular transformation in a specified context is termed as "Significant" transformations. For example, a transformation may systematically decrease or increase a desired property of chemical compounds. Transformations that affect a particular property/activity in a statistically significant sense are called as significant transformations. The transformation is considered significant, if it increases the property value "more often" than it decreases it or vice versa. Thus, the distribution of increasing and decreasing pairs should be significantly different from the binomial ("no effect") distribution with a particular p-value (usually 0.05).

Fig 1: Examplary MMPs (differences highlighted in orange): A MMP.png
Fig 1: Examplary MMPs (differences highlighted in orange):

Significance of MMP based analysis

MMP based analysis is an attractive method for computational analysis because they can be algorithmically generated and they make it possible to associate defined structural modifications at the level of compound pairs with chemical property changes, including biological activity. [2] [3] [4]

Interpretable QSAR models

MMPA is quite useful in the field of quantitative structure–activity relationship (QSAR) modelling studies. One of the issues of QSAR models is they are difficult to interpret in a chemically meaningful manner. While it can be pretty easy to interpret simple linear regression models, the most powerful algorithms like neural networks, support vector machine are similar to "black boxes", which provide predictions that can't be easily interpreted. [5] This problem undermines the applicability of QSAR model in helping the medicinal chemist to make the decision. If the compound is predicted to be active against some microorganism, what are the driving factors of its activity? Or if it is predicted to be inactive, how its activity can be modulated? The black box nature of the QSAR model prevents it from addressing these crucial issues. The use of predicted MMPs allows to interpret models and identify which MMPs were learned by the model. [6] The MMPs, which were not reproduced by the model, could correspond to experimental errors or deficiency of the model (inappropriate descriptors, too few data, etc.).[ citation needed ]

Analysis of MMPs (matched molecular pair) can be very useful for understanding the mechanism of action. A medicinal chemist might be interested particularly in "activity cliff". Activity cliff is a minor structural modification, which changes the target activity significantly.[ citation needed ]

Activity Cliff

Activity cliffs are pairs or groups of compounds that are highly similar in the structures but have large different in potency towards the same target. [7] Activity cliffs received great attention in computational chemistry and drug discovery as they represent a discontinuity in structure-activity relationship (SAR). [7] This discontinuity also indicates high SAR information content, because small chemical changes in the set of similar compounds lead to large changes in activity. The assessment of activity cliffs requires careful consideration of similarity and potency difference criteria. [8] [9] [10]

Types of MMP based analysis

Matched molecular pair (MMPA) analyses can be classified into two types: supervised and unsupervised MMPA.

Supervised MMPA

In supervised MMPA, the chemical transformations are predefined, then the corresponding matched pair compounds are found within the data set and the change in end point computed for each transformation.[ citation needed ]

Unsupervised MMPA

Also known as automated MMPAs. A machine learning algorithm is used to finds all possible matched pairs in a data set according to a set of predefined rules. This results in much larger numbers of matched pairs and unique transformations, which are typically filtered during the process to identify those transformations that correspond to statistically significant changes in the targeted property with a reasonable number of matched pairs.[ citation needed ]

Matched molecular series

Here instead of looking at the pair of molecules which differ only at one point, a series of more than 2 molecules different at a single point is considered. The concept of matching molecular series was introduced by Wawer and Bajorath. [11] It is argued that longer matched series is more likely to exhibit preferred molecular transformation while, matched pairs exhibit only a small preference. [12]

Limitations

The application of the MMPA across large chemical databases for the optimization of ligand potency is problematic because same structural transformation may increase or decrease or doesn't affect the potency of different compounds in the dataset. Selection of practical significant transformation from a dataset of molecules is a challenging issue in the MMPA. Moreover, the effect of a particular molecular transformation can significantly depend on the Chemical context of transformations. [13] [14]

Beside these, MMPA might pose some limitations in terms of computational resources, especially when dealing with databases of compounds with a large number of breakable bonds. Further, more atoms in the variable part of the molecule also leads to combinatorial explosion problems. To deal with this, the number of breakable bonds and number of atoms in the variable part can be used to pre-filter the database.

Related Research Articles

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

<span class="mw-page-title-main">Drug design</span> Inventive process of finding new medications based on the knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

In the physical sciences, a partition coefficient (P) or distribution coefficient (D) is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solubilities of the solute in these two liquids. The partition coefficient generally refers to the concentration ratio of un-ionized species of compound, whereas the distribution coefficient refers to the concentration ratio of all species of the compound.

Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.

<span class="mw-page-title-main">Medicinal chemistry</span> Scientific branch of chemistry

Medicinal or pharmaceutical chemistry is a scientific discipline at the intersection of chemistry and pharmacy involved with designing and developing pharmaceutical drugs. Medicinal chemistry involves the identification, synthesis and development of new chemical entities suitable for therapeutic use. It also includes the study of existing drugs, their biological properties, and their quantitative structure-activity relationships (QSAR).

<span class="mw-page-title-main">Pharmacophore</span> Abstract description of molecular features

In medicinal chemistry and molecular biology, a pharmacophore is an abstract description of molecular features that are necessary for molecular recognition of a ligand by a biological macromolecule. IUPAC defines a pharmacophore to be "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger its biological response". A pharmacophore model explains how structurally diverse ligands can bind to a common receptor site. Furthermore, pharmacophore models can be used to identify through de novo design or virtual screening novel ligands that will bind to the same receptor.

Corwin Herman Hansch was a professor of chemistry at Pomona College in California. He became known as the 'father of computer-assisted molecule design.'

<span class="mw-page-title-main">Chemical space</span>

Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds which are readily accessible and available to researchers. It is a library used in the method of molecular docking.

A structural analog, also known as a chemical analog or simply an analog, is a compound having a structure similar to that of another compound, but differing from it in respect to a certain component.

The structure–activity relationship (SAR) is the relationship between the chemical structure of a molecule and its biological activity. This idea was first presented by Crum-Brown and Fraser in 1865. The analysis of SAR enables the determination of the chemical group responsible for evoking a target biological effect in the organism. This allows modification of the effect or the potency of a bioactive compound by changing its chemical structure. Medicinal chemists use the techniques of chemical synthesis to insert new chemical groups into the biomedical compound and test the modifications for their biological effects.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">Virtual screening</span>

Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

Hit to lead (H2L) also known as lead generation is a stage in early drug discovery where small molecule hits from a high throughput screen (HTS) are evaluated and undergo limited optimization to identify promising lead compounds. These lead compounds undergo more extensive optimization in a subsequent step of drug discovery called lead optimization (LO). The drug discovery process generally follows the following path that includes a hit to lead stage:

Druglikeness is a qualitative concept used in drug design for how "druglike" a substance is with respect to factors like bioavailability. It is estimated from the molecular structure before the substance is even synthesized and tested. A druglike molecule has properties such as:

In the fields of chemical graph theory, molecular topology, and mathematical chemistry, a topological index, also known as a connectivity index, is a type of a molecular descriptor that is calculated based on the molecular graph of a chemical compound. Topological indices are numerical parameters of a graph which characterize its topology and are usually graph invariant. Topological indices are used for example in the development of quantitative structure-activity relationships (QSARs) in which the biological activity or other properties of molecules are correlated with their chemical structure.

SMILES arbitrary target specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.

<span class="mw-page-title-main">Chemical similarity</span> Chemical term

Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.

The ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database is a publicly available tool developed by the National Institute of Allergy and Infectious Diseases to compile preclinical data on small molecules with potential therapeutic action against HIV/AIDS and related opportunistic infections.

<span class="mw-page-title-main">Classical pharmacology</span> Drug discovery by phenotypic screening

In the field of drug discovery, classical pharmacology, also known as forward pharmacology, or phenotypic drug discovery (PDD), relies on phenotypic screening of chemical libraries of synthetic small molecules, natural products or extracts to identify substances that have a desirable therapeutic effect. Using the techniques of medicinal chemistry, the potency, selectivity, and other properties of these screening hits are optimized to produce candidate drugs.

<span class="mw-page-title-main">Building block (chemistry)</span>

Building block is a term in chemistry which is used to describe a virtual molecular fragment or a real chemical compound the molecules of which possess reactive functional groups. Building blocks are used for bottom-up modular assembly of molecular architectures: nano-particles, metal-organic frameworks, organic molecular constructs, supra-molecular complexes. Using building blocks ensures strict control of what a final compound or a (supra)molecular construct will be.

References

  1. Kenny, Peter W.; Sadowski, Jens (2005). "Chapter 11: Structure Modification in Chemical Databases". In Oprea, Tudor I. (ed.). Chemoinformatics in Drug Discovery . Wiley-VCH Verlag GmbH & Co. KGaA. pp.  271–285.
  2. Griffen, Ed; Leach, Andrew G.; Robb, Graeme R.; Warner, Daniel J. (2011). "Matched molecular pairs as a medicinal chemistry tool". J. Med. Chem. 54 (22): 7739–50. doi:10.1021/jm200452d. PMID   21936582.
  3. Wassermann, A.M.; Dimova, D.; Iyer P; et al. (2012). "Advances in computational medicinal chemistry: matched molecular pair analysis". Drug Development Research. 73 (8): 518–527. doi:10.1002/ddr.21045. S2CID   82321850.
  4. Dossetter, Alexander G.; Griffen, Edward J.; Leach, Andrew G. (2013). "Matched molecular pair analysis in drug discovery". Drug Discovery Today . 18 (15–16): 724–731. doi:10.1016/j.drudis.2013.03.003. PMID   23557664.
  5. Cumming, J.; et al. (2013). "Chemical predictive modelling to improve compound quality". Nature Reviews Drug Discovery. 12 (12): 948–962. doi: 10.1038/nrd4128 . PMID   24287782. S2CID   6218976.
  6. Sushko, Yurii; Novotarskyi, Sergii; Körner, Robert; Vogt, Joachim; Abdelaziz, Ahmed; Tetko, Igor V (2014-12-11). "Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process". Journal of Cheminformatics. 6 (1): 48. doi: 10.1186/s13321-014-0048-0 . PMC   4272757 . PMID   25544551.
  7. 1 2 Stumpfe, Dagmar; Hu, Huabin; Bajorath, Jürgen (2019-09-10). "Evolving Concept of Activity Cliffs". ACS Omega. 4 (11): 14360–14368. doi:10.1021/acsomega.9b02221. ISSN   2470-1343. PMC   6740043 . PMID   31528788.
  8. Stumpfe D, Bajorath J: Exploring activity cliffs in medicinal chemistry. J Med Chem. 2012; 55(7): 2932–2942 PMID   22236250
  9. Stumpfe D, Hu Y, Dimova D, et al.: Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem. 2014; 57(1): 18–28. PMID   23981118
  10. Hu Y, Stumpfe D, Bajorath J: Advancing the activity cliff concept [v1; ref status: indexed, http://f1000r.es/1wf]. F1000Res. 2013; 2: 199. doi : 10.12688/f1000research.2-199.v1
  11. Wawer, Mathias; Bajorath, Jürgen (2011). "Local Structural Changes, Global Data Views: Graphical Substructure−Activity Relationship Trailing". J. Med. Chem. 54 (8): 2944–2951. doi:10.1021/jm200026b. PMID   21443196.
  12. O'Boyle, Noel M.; Boström, Jonas; Sayle, Roger A.; Gill, Adrian (2014). "Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity". J. Med. Chem. 57 (6): 2704–2713. doi:10.1021/jm500022q. PMC   3968889 . PMID   24601597.
  13. Warner, D. J.; Bridgland-Taylor, M. H.; Sefton, C. E.; Wood, D. J. (2012). "Prospective prediction of antitarget activity by matched molecular pairs analysis". Mol. Inform. 31 (5): 365–368. doi:10.1002/minf.201200020. PMID   27477265. S2CID   5430494.
  14. Hajduk, P.J.; Sauer, D.R. (2008). "Statistical analysis of the effects of common chemical substituents on ligand potency". J. Med. Chem. 51 (3): 553–64. doi:10.1021/jm070838y. PMID   18173228.