Scoring functions for docking

Last updated
Docking glossary
Receptor or host or lock
The "receiving" molecule, most commonly a protein or other biopolymer.
Ligand or guest or key
The complementary partner molecule which binds to the receptor. Ligands are most often small molecules but could also be another biopolymer.
Docking
Computational simulation of a candidate ligand binding to a receptor.
Binding mode
The orientation of the ligand relative to the receptor as well as the conformation of the ligand and receptor when bound to each other.
Pose
A candidate binding mode.
Scoring
The process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts.
Ranking
The process of classifying which ligands are most likely to interact favorably to a particular receptor based on the predicted free-energy of binding.
Docking assessment (DA)
Procedure to quantify the predictive capability of a docking protocol.
edit

In the fields of computational chemistry and molecular modelling, scoring functions are mathematical functions used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor. [1] Scoring functions have also been developed to predict the strength of intermolecular interactions between two proteins [2] or between protein and DNA. [3]

Contents

Utility

Scoring functions are widely used in drug discovery and other molecular modelling applications. These include: [4]

A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation calculations. [8]

Prerequisites

Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict.

For currently used methods aiming to predict affinities of ligands for proteins the following must first be known or predicted:

The above information yields the three-dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilized within the docking run.

Classes

There are four general classes of scoring functions: [9] [10] [11]

The first three types, force-field, empirical and knowledge-based, are commonly referred to as classical scoring functions and are characterized by assuming their contributions to binding are linearly combined. Due to this constraint, classical scoring functions are unable to take advantage of large amounts of training data. [35]

Refinement

Since different scoring functions are relatively co-linear, consensus scoring functions may not improve accuracy significantly. [36] This claim went somewhat against the prevailing view in the field, since previous studies had suggested that consensus scoring was beneficial. [37]

A perfect scoring function would be able to predict the binding free energy between the ligand and its target. But in reality both the computational methods and the computational resources put restraints to this goal. So most often methods are selected that minimize the number of false positive and false negative ligands. In cases where an experimental training set of data of binding constants and structures are available a simple method has been developed to refine the scoring function used in molecular docking. [38]

Related Research Articles

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">Binding site</span> Molecule-specific coordinate bonding area in biological systems

In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may include other proteins, enzyme substrates, second messengers, hormones, or allosteric modulators. The binding event is often, but not always, accompanied by a conformational change that alters the protein's function. Binding to protein binding sites is most often reversible, but can also be covalent reversible or irreversible.

<span class="mw-page-title-main">Drug design</span> Inventive process of finding new medications based on the knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">Ligand (biochemistry)</span> Substance that forms a complex with a biomolecule

In biochemistry and pharmacology, a ligand is a substance that forms a complex with a biomolecule to serve a biological purpose. The etymology stems from Latin ligare, which means 'to bind'. In protein-ligand binding, the ligand is usually a molecule which produces a signal by binding to a site on a target protein. The binding typically results in a change of conformational isomerism (conformation) of the target protein. In DNA-ligand binding studies, the ligand can be a small molecule, ion, or protein which binds to the DNA double helix. The relationship between ligand and binding partner is a function of charge, hydrophobicity, and molecular structure.

<span class="mw-page-title-main">Docking (molecular)</span> Prediction method in molecular modeling

In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when a ligand and a target are bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.

Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules. Protein–protein complexes are the most commonly attempted targets of such modelling, followed by protein–nucleic acid complexes.

Protein–ligand docking is a molecular modelling technique. The goal of protein–ligand docking is to predict the position and orientation of a ligand when it is bound to a protein receptor or enzyme. Pharmaceutical research employs docking techniques for a variety of purposes, most notably in the virtual screening of large databases of available chemicals in order to select likely drug candidates. There has been rapid development in computational ability to determine protein structure with programs such as AlphaFold, and the demand for the corresponding protein-ligand docking predictions is driving implementation of software that can find accurate models. Once the protein folding can be predicted accurately along with how the ligands of various structures will bind to the protein, the ability for drug development to progress at a much faster rate becomes possible.

<span class="mw-page-title-main">Virtual screening</span>

Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

In molecular modelling, docking is a method which predicts the preferred orientation of one molecule to another when bound together in a stable complex. In the case of protein docking, the search space consists of all possible orientations of the protein with respect to the ligand. Flexible docking in addition considers all possible conformations of the protein paired with all possible conformations of the ligand.

The PDBbind database is a comprehensive collection of experimentally measured binding affinity data for the protein-ligand complexes deposited in the Protein Data Bank (PDB). It thus provides a link between energetic and structural information of protein-ligand complexes, which is of great value to various studies on molecular recognition occurred in biological systems.

AutoDock is a molecular modeling simulation software. It is especially effective for protein-ligand docking. AutoDock 4 is available under the GNU General Public License. AutoDock is one of the most cited docking software applications in the research community. It is used by the FightAIDS@Home and OpenPandemics - COVID-19 projects run at World Community Grid, to search for antivirals against HIV/AIDS and COVID-19. In February 2007, a search of the ISI Citation Index showed more than 1,100 publications had been cited using the primary AutoDock method papers. As of 2009, this number surpassed 1,200.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain a wiki related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules. One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

Lead Finder is a computational chemistry tool designed for modeling protein-ligand interactions. This application is useful for conducting molecular docking studies and quantitatively assessing ligand binding and biological activity. Lead Finder offers free access to individual users, especially those in non-commercial and academic settings.

Jeffrey Skolnick is an American computational biologist. He is currently a Georgia Institute of Technology School of Biology Professor, the Director of the Center for the Study of Systems Biology, the Mary and Maisie Gibson Chair, the Georgia Research Alliance Eminent Scholar in Computational Systems Biology, the Director of the Integrative BioSystems Institute, and was previously the Scientific Advisor at Intellimedix.

Chemoproteomics entails a broad array of techniques used to identify and interrogate protein-small molecule interactions. Chemoproteomics complements phenotypic drug discovery, a paradigm that aims to discover lead compounds on the basis of alleviating a disease phenotype, as opposed to target-based drug discovery, in which lead compounds are designed to interact with predetermined disease-driving biological targets. As phenotypic drug discovery assays do not provide confirmation of a compound's mechanism of action, chemoproteomics provides valuable follow-up strategies to narrow down potential targets and eventually validate a molecule's mechanism of action. Chemoproteomics also attempts to address the inherent challenge of drug promiscuity in small molecule drug discovery by analyzing protein-small molecule interactions on a proteome-wide scale. A major goal of chemoproteomics is to characterize the interactome of drug candidates to gain insight into mechanisms of off-target toxicity and polypharmacology.

Molecular Operating Environment (MOE) is a drug discovery software platform that integrates visualization, modeling and simulations, as well as methodology development, in one package. MOE scientific applications are used by biologists, medicinal chemists and computational chemists in pharmaceutical, biotechnology and academic research. MOE runs on Windows, Linux, Unix, and macOS. Main application areas in MOE include structure-based design, fragment-based design, ligand-based design, pharmacophore discovery, medicinal chemistry applications, biologics applications, structural biology and bioinformatics, protein and antibody modeling, molecular modeling and simulations, virtual screening, cheminformatics & QSAR. The Scientific Vector Language (SVL) is the built-in command, scripting and application development language of MOE.

LeDock is a proprietary, flexible molecular docking software designed for the purpose of docking ligands with target proteins. It is available for Linux, macOS, and Windows.

Shaomeng Wang is a Chinese-American chemist currently the Warner-Lambert/Parke-Davis Professor in Medicine at University of Michigan and a former Co-Editor-in-Chief at American Chemical Society's Journal of Medicinal Chemistry. A cited expert in his field, his interests are synthesis and design of moleculars, neurological diseases and computational and informatics. He was Elected as Fellow at the National Academy of Inventors in 2014. Dr. Wang was named to the AAAS Fellows Section on Pharmaceutical Sciences in 2019, and is the recipient of the Division of Medicinal Chemistry Award 2020 American Chemical Society.

FlexAID is a molecular docking software that can use small molecules and peptides as ligands and proteins and nucleic acids as docking targets. As the name suggests, FlexAID supports full ligand flexibility as well side-chain flexibility of the target. It does using a soft scoring function based on the complementarity of the two surfaces.

References

  1. Jain AN (October 2006). "Scoring functions for protein-ligand docking". Current Protein & Peptide Science. 7 (5): 407–20. doi:10.2174/138920306778559395. PMID   17073693.
  2. Lensink MF, Méndez R, Wodak SJ (December 2007). "Docking and scoring protein complexes: CAPRI 3rd Edition". Proteins. 69 (4): 704–18. doi:10.1002/prot.21804. PMID   17918726. S2CID   25383642.
  3. Robertson TA, Varani G (February 2007). "An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure". Proteins. 66 (2): 359–74. doi:10.1002/prot.21162. PMID   17078093. S2CID   24437518.
  4. Rajamani R, Good AC (May 2007). "Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development". Current Opinion in Drug Discovery & Development. 10 (3): 308–15. PMID   17554857.
  5. Seifert MH, Kraus J, Kramer B (May 2007). "Virtual high-throughput screening of molecular databases". Current Opinion in Drug Discovery & Development. 10 (3): 298–307. PMID   17554856.
  6. 1 2 Böhm HJ (July 1998). "Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs". Journal of Computer-Aided Molecular Design. 12 (4): 309–23. Bibcode:1998JCAMD..12..309B. doi:10.1023/A:1007999920146. PMID   9777490. S2CID   7474036.
  7. Joseph-McCarthy D, Baber JC, Feyfant E, Thompson DC, Humblet C (May 2007). "Lead optimization via high-throughput molecular docking". Current Opinion in Drug Discovery & Development. 10 (3): 264–74. PMID   17554852.
  8. Foloppe N, Hubbard R (2006). "Towards predictive ligand design with free-energy based computational methods?". Current Medicinal Chemistry. 13 (29): 3583–608. doi:10.2174/092986706779026165. PMID   17168725.
  9. Fenu LA, Lewis RA, Good AC, Bodkin M, Essex JW (2007). "Chapter 9: Scoring Functions: From Free-energies of Binding to Enrichment in Virtual Screening". In Dhoti H, Leach AR (eds.). Structure-Based Drug Discovery. Dordrecht: Springer. pp. 223–246. ISBN   978-1-4020-4407-6.
  10. Sotriffer C, Matter H (2011). "Chapter 7.3: Classes of Scoring Functions". In Sotriffer C (ed.). Virtual Screening: Principles, Challenges, and Practical Guidelines. Vol. 48. John Wiley & Sons, Inc. ISBN   978-3-527-63334-0.
  11. 1 2 3 Ain QU, Aleksandrova A, Roessler FD, Ballester PJ (2015-11-01). "Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening". Wiley Interdisciplinary Reviews: Computational Molecular Science. 5 (6): 405–424. doi:10.1002/wcms.1225. PMC   4832270 . PMID   27110292.
  12. Genheden S, Ryde U (May 2015). "The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities". Expert Opinion on Drug Discovery. 10 (5): 449–61. doi:10.1517/17460441.2015.1032936. PMC   4487606 . PMID   25835573.
  13. Schneider N, Lange G, Hindle S, Klein R, Rarey M (January 2013). "A consistent description of HYdrogen bond and DEhydration energies in protein-ligand complexes: methods behind the HYDE scoring function". Journal of Computer-Aided Molecular Design. 27 (1): 15–29. Bibcode:2013JCAMD..27...15S. doi:10.1007/s10822-012-9626-2. PMID   23269578. S2CID   1545277.
  14. Lange G, Lesuisse D, Deprez P, Schoot B, Loenze P, Bénard D, Marquette JP, Broto P, Sarubbi E, Mandine E (November 2003). "Requirements for specific binding of low affinity inhibitor fragments to the SH2 domain of (pp60)Src are identical to those for high affinity binding of full length inhibitors". Journal of Medicinal Chemistry. 46 (24): 5184–95. doi:10.1021/jm020970s. PMID   14613321.
  15. Muegge I (October 2006). "PMF scoring revisited". Journal of Medicinal Chemistry. 49 (20): 5895–902. doi:10.1021/jm050038s. PMID   17004705.
  16. Ballester PJ, Mitchell JB (May 2010). "A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking". Bioinformatics. 26 (9): 1169–75. doi:10.1093/bioinformatics/btq112. PMC   3524828 . PMID   20236947.
  17. Li H, Leung KS, Wong MH, Ballester PJ (February 2015). "Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets". Molecular Informatics. 34 (2–3): 115–26. doi:10.1002/minf.201400132. PMID   27490034. S2CID   3444365.
  18. Ashtawy HM, Mahapatra NR (2015-04-01). "A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 12 (2): 335–47. doi: 10.1109/TCBB.2014.2351824 . PMID   26357221.
  19. Zhan W, Li D, Che J, Zhang L, Yang B, Hu Y, Liu T, Dong X (March 2014). "Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors". European Journal of Medicinal Chemistry. 75: 11–20. doi:10.1016/j.ejmech.2014.01.019. PMID   24508830.
  20. Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE (February 2011). "A machine learning-based method to improve docking scoring functions and its application to drug repurposing". Journal of Chemical Information and Modeling. 51 (2): 408–19. doi:10.1021/ci100369f. PMC   3076728 . PMID   21291174.
  21. Li H, Sze K-H, Lu G, Ballester PJ (2020-02-05). "Machine-Learning Scoring Functions for Structure-Based Drug Lead Optimization". Wiley Interdisciplinary Reviews: Computational Molecular Science. 10 (5). doi: 10.1002/wcms.1465 .
  22. Li L, Wang B, Meroueh SO (September 2011). "Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries". Journal of Chemical Information and Modeling. 51 (9): 2132–8. doi:10.1021/ci200078f. PMC   3209528 . PMID   21728360.
  23. Durrant JD, Friedman AJ, Rogers KE, McCammon JA (July 2013). "Comparing neural-network scoring functions and the state of the art: applications to common library screening". Journal of Chemical Information and Modeling. 53 (7): 1726–35. doi:10.1021/ci400042y. PMC   3735370 . PMID   23734946.
  24. Ding B, Wang J, Li N, Wang W (January 2013). "Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening". Journal of Chemical Information and Modeling. 53 (1): 114–22. doi:10.1021/ci300508m. PMC   3584174 . PMID   23259763.
  25. Wójcikowski M, Ballester PJ, Siedlecki P (April 2017). "Performance of machine-learning scoring functions in structure-based virtual screening". Scientific Reports. 7: 46710. Bibcode:2017NatSR...746710W. doi:10.1038/srep46710. PMC   5404222 . PMID   28440302.
  26. Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (April 2017). "Protein-Ligand Scoring with Convolutional Neural Networks". Journal of Chemical Information and Modeling. 57 (4): 942–957. arXiv: 1612.02751 . doi:10.1021/acs.jcim.6b00740. PMC   5479431 . PMID   28368587.
  27. Li H, Peng J, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ (March 2018). "The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction". Biomolecules. 8 (1): 12. doi: 10.3390/biom8010012 . PMC   5871981 . PMID   29538331.
  28. Imrie F, Bradley AR, Deane CM (February 2021). "Generating Property-Matched Decoy Molecules Using Deep Learning". Bioinformatics. 37 (btab080): 2134–2141. doi:10.1093/bioinformatics/btab080. PMC   8352508 . PMID   33532838.
  29. Adeshina YO, Deeds EJ, Karanicolas J (August 2020). "Machine learning classification can reduce false positives in structure-based virtual screening". Proceedings of the National Academy of Sciences of the United States of America. 117 (31): 18477–18488. Bibcode:2020PNAS..11718477A. doi: 10.1073/pnas.2000585117 . PMC   7414157 . PMID   32669436.
  30. Xiong GL, Ye WL, Shen C, Lu AP, Hou TJ, Cao DS (June 2020). "Improving structure-based virtual screening performance via learning from scoring function components". Briefings in Bioinformatics. 22 (bbaa094). doi:10.1093/bib/bbaa094. PMID   32496540.
  31. Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T (2019-06-27). "From Machine Learning to Deep Learning: Advances in Scoring Functions for Protein–ligand Docking". Wiley Interdisciplinary Reviews: Computational Molecular Science. 10. doi:10.1002/wcms.1429. S2CID   198336898.
  32. Yang X, Wang Y, Byrne R, Schneider G, Yang S (2019-07-11). "Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery". Chemical Reviews. 119 (18): 10520–10594. doi: 10.1021/acs.chemrev.8b00728 . PMID   31294972.
  33. Li H, Sze K-H, Lu G, Ballester PJ (2020-04-22). "Machine-Learning Scoring Functions for Structure-Based Virtual Screening". Wiley Interdisciplinary Reviews: Computational Molecular Science. 11. doi:10.1002/wcms.1478. S2CID   219089637.
  34. Ballester PJ (December 2019). "Selecting machine-learning scoring functions for structure-based virtual screening". Drug Discovery Today: Technologies. 32–33: 81–87. doi: 10.1016/j.ddtec.2020.09.001 . PMID   33386098. S2CID   224968364.
  35. Li H, Peng J, Sidorov P, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ (March 2019). "Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data". Bioinformatics. Oxford, England. 35 (20): 3989–3995. doi:10.1093/bioinformatics/btz183. PMID   30873528.
  36. Englebienne P, Moitessier N (June 2009). "Docking ligands into flexible and solvated macromolecules. 4. Are popular scoring functions accurate for this class of proteins?". Journal of Chemical Information and Modeling. 49 (6): 1568–80. doi:10.1021/ci8004308. PMID   19445499.
  37. Oda A, Tsuchida K, Takakura T, Yamaotsu N, Hirono S (2006). "Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes". Journal of Chemical Information and Modeling. 46 (1): 380–91. doi:10.1021/ci050283k. PMID   16426072.
  38. Hellgren M, Carlsson J, Ostberg LJ, Staab CA, Persson B, Höög JO (September 2010). "Enrichment of ligands with molecular dockings and subsequent characterization for human alcohol dehydrogenase 3". Cellular and Molecular Life Sciences. 67 (17): 3005–15. doi:10.1007/s00018-010-0370-2. PMID   20405162. S2CID   2391130.