This article may contain an excessive number of citations .(March 2022) |
A major contributor to this article appears to have a close connection with its subject.(March 2022) |
SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) is a set of community-wide blind challenges aimed to advance computational techniques as standard predictive tools in rational drug design. [1] [2] [3] [4] [5] A broad range of biologically relevant systems with different sizes and levels of complexities including proteins, host–guest complexes, and drug-like small molecules have been selected to test the latest modeling methods and force fields in SAMPL. New experimental data, such as binding affinity and hydration free energy, are withheld from participants until the prediction submission deadline, so that the true predictive power of methods can be revealed. The most recent SAMPL5 challenge contains two prediction categories: the binding affinity of host–guest systems, and the distribution coefficients of drug-like molecules between water and cyclohexane. [6] [7] Since 2008, the SAMPL challenge series has attracte interest from scientists engaged in the field of computer-aided drug design (CADD) [8] [9] [10] The current SAMPL organizers include John Chodera, Michael K. Gilson, David Mobley, and Michael Shirts. [11]
The SAMPL challenge seeks to accelerate progress in developing quantitative, accurate drug discovery tools by providing prospective validation and rigorous comparisons for computational methodologies and force fields. Computer-aided drug design methods have been considerably improved over time, along with the rapid growth of high-performance computing capabilities. However, their applicability in the pharmaceutical industry are still highly limited, due to the insufficient accuracy. Lacking large-scale prospective validations, methods tend to suffer from over-fitting the pre-existing experimental data. To overcome this, SAMPL challenges have been organized as blind tests: each time new datasets are carefully designed and collected from academic or industrial research laboratories, and measurements are released shortly after the deadline of prediction submission. Researchers then can compare those high-quality, prospective experimental data with the submitted estimates. A key emphasis is on lessons learned, allowing participants in future challenges to benefit from modeling improvements made based on earlier challenges.
SAMPL has historically focused on the properties of host–guest systems and drug-like small molecules. These simply model systems require considerably less computational resources to simulate than protein systems, and thus converge more quickly. Through careful design, these model systems can be used to focus on one particular or a subset of simulation challenges.[ further explanation needed ] [12] The past several SAMPL host–guest, hydration free energy and log D challenges revealed the limitations in generalized force fields, [13] [14] facilitated the development of solvent models, [15] [16] and highlighted the importance of properly handling protonation states and salt effects. [17] [18]
Registration and participation is free for SAMPL challenges. Beginning with SAMPL7, challenge participation data was posted on the SAMPL website, [19] as well as the GitHub page for the specific challenge. Instructions, input files and results were then provided through GitHub (earlier challenges provided content primarily through D3R for SAMPL4-5, and via other means for earlier SAMPLs). Participants were allowed to submit multiple predictions through the D3R website, either anonymously or with research affiliation. Since the SAMPL2 challenge, all participants have been invited to attend the SAMPL workshops and submit manuscripts to describe their results. After a peer-review process, the resulting papers, along with the overview papers which summarize all submitting data, were published in the special issues of the Journal of Computer-Aided Molecular Design. [20]
The SAMPL project was recently funded by the NIH (grant GM124270-01A1), for the period of Sept. 2018 through August 2022, to allow the design of future SAMPL challenges to drive advances in the areas they are most needed for modeling efforts. [9] [10] The effort is spearheaded by David L. Mobley (UC Irvine) with co-investigators John D. Chodera (MSKCC), Bruce C. Gibb (Tulane), and Lyle Isaacs (Maryland). Currently challenges and workshops are run in partnership with the NIH-funded Drug Design Data Resource, but this will likely change over time as funding for the two projects is not coupled.
Funding also allowed a broadening of scope of SAMPL; through SAMPL6, its role had been seen as primarily focused on physical properties, with D3R handling protein-ligand challenges. However, the funded effort broadened its focus to include systems which will drive improvements in modeling, including potentially suitable protein-ligand systems. This is still in contrast to D3R, which relies on donated datasets of pharmaceutical interest, whereas SAMPL challenges are specifically designed to focus on specific modeling challenges.
The first SAMPL exercise, SAMPL0 (2008) [21] focused on the predictions of solvation free energies of 17 small molecules. A research group at Stanford University and scientists at OpenEye Scientific Software carried out the calculations. Despite the informal format, SAMPL0 laid the groundwork for the following SAMPL challenges.
SAMPL1 (2009) [22] and SAMPL2 challenges (2010) [1] were organized by OpenEye and continued to focus on predicting solvation free energies of drug-like small molecules. Attempts were also made to predict binding affinities, binding poses and tautomer ratios. Both challenges attracted significant participations from computational scientists and researchers in academia and industry.
The blinded data sets for host–guest binding affinities were introduced for the first time in SAMPL3 (2011-2012), [3] along with solvation free energies for small molecules and the binding affinity data for 500 fragment-like tyrosine inhibitors. Three host molecules were all from the cucurbituril family. The SAMPL3 challenge received 103 submissions from 23 research groups worldwide. [2]
Different from the prior three SAMPL events, the SAMPL4 exercise (2013-2014) [4] [5] was coordinated by academic researchers, with logistical support from OpenEye. Datasets in SAMPL4 consisted of binding affinities for host–guest systems and HIV integrase inhibitors, as well as hydration free energies of small molecules. Host molecules included cucurbit[7]uril (CB7) and octa-acid. The SAMPL4 hydration challenge involved 49 submissions from 19 groups. The participation of the host–guest challenge also grew significantly compared to SAMPL3. The workshop was held at Stanford University in September, 2013.
The protein-ligand challenges were separated from SAMPL in SAMPL5 (2015-2016) [6] [7] and were distributed as the new Grand Challenges of the Drug Design Data Resource (D3R). [23] SAMPL5 allowed participants to make predictions of the binding affinities of three sets of host–guest systems: an acyclic CB7 derivative and two host from the octa-acid family. Participants were also encouraged to submit predictions for binding enthalpies. A wide array of computational methods were tested, including density functional theory (DFT), molecular dynamics, docking, and metadynamics. The distribution coefficient predictions were introduced for the first time, receiving total of 76 submissions from 18 researcher groups or scientists for a set of 53 small molecules. The workshop was held in March, 2016 at University of California, San Diego as part of the D3R workshop. The top-performing methods in the host–guest challenge yielded encouraging yet imperfect correlations with experimental data, accompanied by large, systematic shifts relative to experiment. [24] [25]
The SAMPL6 testing systems include cucurbit[8]uril, octa-acid, tetra-endo-methyl octa-acid, and a series of fragment-like small molecules. The host–guest, conformational sampling and pKa prediction challenges of SAMPL6 are now closed. The SAMPL6 workshop was jointly run with the D3R workshop in February 2018 at the Scripps Institution of Oceanography [26] and a SAMPL special issue of the Journal of Computer Aided Molecular Design reported many of the results. A SAMPL6 Part II challenge focused on a small octanol-water partition coefficient prediction set and was followed by a virtual workshop on May 16, 2019 and a joint D3R/SAMPL workshop in San Diego in August 2019. A special issue or special section of JCAMD is planned to report the results.[ needs update ] SAMPL6 inputs and results are available via the SAMPL6 GitHub repository.
SAMPL7 again included host-guest challenges and a physical property challenge. A protein-ligand binding challenge on PHIPA fragments was also included. Host-guest binding focused on several small molecules binding to octa-acid and exo-octa-acid; binding of two compounds to a series of cyclodextrin derivatives; and binding of a series of small molecules to a clip-like guest known as TrimerTrip. A SAMPL7 virtual workshop took place and is available online. A SAMPL7 physical properties challenge is currently ongoing. Plans for a EuroSAMPL in-person workshop in Fall 2020 were derailed by COVID-19 and the workshop is being conducted virtually. SAMPL7 inputs and (as challenge components are completed, results) are available via the SAMPL6 GitHub repository.
SAMPL8 included host-guest components on binding of drugs of abuse to CB8, and a series of small molecules to Gibb Deep Cavity Cavitands (GDCCs), as detailed on the SAMPL8 GitHub repository. An additional pKa and logD challenge focused on pK and logD prediction for a series of drug-like molecules.
SAMPL9 is in planning stages, except that a SAMPL9 host-guest challenge on a host from Lyle Isaacs' group is currently underway. Details are available on the SAMPL9 GitHub repository
A relatively complete list of SAMPL-related publications is maintained by the SAMPL organizers; more than 150 related papers have been published.
SAMPL is slated to continue its focus on physical property prediction, including logP and logD values, pKa prediction, host–guest binding, and other properties, as well as broadening to include a protein-ligand component. [9] Some data is planned to be collected directly by the SAMPL co-investigators (Chodera, Gibb and Isaacs), but industry partnerships and internships are also proposed. [9]
Computational chemistry is a branch of chemistry that uses computer simulations to assist in solving chemical problems. It uses methods of theoretical chemistry incorporated into computer programs to calculate the structures and properties of molecules, groups of molecules, and solids. The importance of this subject stems from the fact that, with the exception of some relatively recent findings related to the hydrogen molecular ion, achieving an accurate quantum mechanical depiction of chemical systems analytically, or in a closed form, is not feasible. The complexity inherent in the many-body problem exacerbates the challenge of providing detailed descriptions of quantum mechanical systems. While computational results normally complement information obtained by chemical experiments, it can occasionally predict unobserved chemical phenomena.
Molecular mechanics uses classical mechanics to model molecular systems. The Born–Oppenheimer approximation is assumed valid and the potential energy of all systems is calculated as a function of the nuclear coordinates using force fields. Molecular mechanics can be used to study molecule systems ranging in size and complexity from small to large biological systems or material assemblies with many thousands to millions of atoms.
Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.
Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.
The term molecular recognition refers to the specific interaction between two or more molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, π-π interactions, halogen bonding, or resonant interaction effects. In addition to these direct interactions, solvents can play a dominant indirect role in driving molecular recognition in solution. The host and guest involved in molecular recognition exhibit molecular complementarity. Exceptions are molecular containers, including, e.g., nanotubes, in which portals essentially control selectivity. Selective partioning of molecules between two or more phases can also result in molecular recognition. In partitioning-based molecular recognition the kinetics and equilibrium conditions are governed by the presence of solutes in the two phases.
In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when a ligand and a target are bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.
SciTegic was a San Diego–based software company that developed and marketed informatics software to the pharmaceutical and biotechnology industries.
Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.
In the fields of computational chemistry and molecular modelling, scoring functions are mathematical functions used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor. Scoring functions have also been developed to predict the strength of intermolecular interactions between two proteins or between protein and DNA.
Inte:Ligand was founded in Maria Enzersdorf, Lower Austria (Niederösterreich) in 2003. They established the company headquarters on Mariahilferstrasse in Vienna, Austria that same year.
The program UCSF DOCK was created in the 1980s by Irwin "Tack" Kuntz's Group, and was the first docking program. DOCK uses geometric algorithms to predict the binding modes of small molecules. Brian K. Shoichet, David A. Case, and Robert C.Rizzo are codevelopers of DOCK.
FightAIDS@Home is a volunteer computing project operated by the Olson Laboratory at The Scripps Research Institute. It runs on internet-connected home computers, and since July 2013 also runs on Android smartphones and tablets. It aims to use biomedical software simulation techniques to search for ways to cure or prevent the spread of HIV/AIDS.
BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. As of March, 2011, BindingDB contains about 650,000 binding data, for 5,700 protein targets and 280,000 small molecules. BindingDB also includes a small collection of host–guest binding data of interest to chemists studying supramolecular systems.
Lead Finder is a computational chemistry tool designed for modelling protein-ligand interactions. It is used for conducting molecular docking studies and quantitatively assessing ligand binding and biological activity. It offers free access to users in commercial, academic, or other settings.
Druggability is a term used in drug discovery to describe a biological target that is known to or is predicted to bind with high affinity to a drug. Furthermore, by definition, the binding of the drug to a druggable target must alter the function of the target with a therapeutic benefit to the patient. The concept of druggability is most often restricted to small molecules but also has been extended to include biologic medical products such as therapeutic monoclonal antibodies.
SR-57227 is a potent and selective agonist at the 5HT3 receptor, with high selectivity over other serotonin receptor subtypes and good blood–brain barrier penetration.
Rommie E. Amaro is a professor and endowed chair of chemistry and biochemistry and the director of the National Biomedical Computation Resource at the University of California, San Diego. Her research focuses on development of computational methods in biophysics for applications to drug discovery.
Molecular Operating Environment (MOE) is a drug discovery software platform that integrates visualization, modeling and simulations, as well as methodology development, in one package. MOE scientific applications are used by biologists, medicinal chemists and computational chemists in pharmaceutical, biotechnology and academic research. MOE runs on Windows, Linux, Unix, and macOS. Main application areas in MOE include structure-based design, fragment-based design, ligand-based design, pharmacophore discovery, medicinal chemistry applications, biologics applications, structural biology and bioinformatics, protein and antibody modeling, molecular modeling and simulations, virtual screening, cheminformatics & QSAR. The Scientific Vector Language (SVL) is the built-in command, scripting and application development language of MOE.
David Weininger was an American cheminformatician and entrepreneur. He was most notable for inventing the chemical line notations for structures (SMILES), substructures (SMARTS) and reactions (SMIRKS). He also founded Daylight Chemical Information Systems, Inc.
Chimeric small molecule therapeutics are a class of drugs designed with multiple active domains to operate outside of the typical protein inhibition model. While most small molecule drugs inhibit target proteins by binding their active site, chimerics form protein-protein ternary structures to induce degradation or, less frequently, other protein modifications.
{{cite journal}}
: CS1 maint: multiple names: authors list (link)