GeNMR

Last updated
GeNMR method flow chart GeNMR protocol.png
GeNMR method flow chart
Example of GeNMR output page Example of GeNMR output page.png
Example of GeNMR output page

GeNMR method (GEnerate NMR structures) is the first fully automated template-based method of protein structure determination that utilizes both NMR chemical shifts and NOE -based distance restraints. [1]

Contents

In addition to the template-based approach, the GeNMR webserver also offers an ab initio protein folding mode that starts folding from an extended structure. The GeNMR web server produces an ensemble of PDB coordinates within a period ranging from 20 minutes to 4 hours, depending on protein size, server load, quality and type of experimental information, and selected protocol options. GeNMR webserver is composed of two parts, a front-end web-interface (written in Perl and HTML) and a back-end consisting of eight different alignment, structure generation and structure optimization programs along with three local databases.

Input

GeNMR accepts and processes backbone and side chain 1H, 13C or 15N chemical shift data of almost any combination (HA only, HN only, HA+HN only, HA+HN+sidechain H, CA only, CA+CB only, CA+CO only, HA+CA+CB, HN+CA+CB, HN+15N only, HN,+15N+CA, HN+15N+CA+CB, etc.). This allows GeNMR to handle small peptides (where only H shifts are typically measured) to large proteins (where only N or C shifts might be available). The input files must include chemical shift data in NMR-STAR 2.1 format and distance restraints in XPLOR/CNS format (see more info here). The minimum sequence length is 30 residues.

Output

The output for a typical GeNMR structure calculation consists of a user-defined set of lowest energy PDB coordinates in a simple, downloadable text format. In addition, details about the overall energy score (prior to and following energy minimization) and chemical shift correlations (between the observed and calculated shifts) is provided at the top of the output page. If score failed to decrease below a certain threshold, a warning is printed at the top of the page.

Sub-programs

A flow chart describing the processing logic used in GeNMR is shown on the right. GeNMR makes use of a number of well-known programs and databases. These include Proteus2 to perform structural modeling, PREDITOR to calculate torsion angles from chemical shifts, PPT-DB for comparative modeling and alignment and CS23D to calculate protein structures from chemical shifts only. GeNMR also uses several well-known external programs, including Rosetta for ab initio folding without NOEs and XPLOR-NIH for NOE-based simulated annealing and refinement. A more complete list of GeNMR sub-programs is listed on the CS23D page.

Homology modelling

GeNMR uses homology modeling and sequence/structure threading to rapidly generate a first-pass model of the query protein. The use of homology modeling/threading in GeNMR allows a considerable speed-up in its structure calculations since homology models can often be generated and refined in a minute or two.

Genetic algorithm

GeNMR also makes use of genetic algorithms to allow configurational sampling and structural refinement using non-differentiable scores, such as ShiftX chemical shift scores. GeNMR's genetic algorithm creates a population of initial structures and then uses combinations of mutations, cross-overs, segment swaps and writhe movements to comprehensively sample conformation space. The 25 lowest energy structures are then selected, duplicated and carried to the next round of conformational sampling.

Scoring functions

The potential functions used in GeNMR are derived from those used in CS23D and Proteus2. The knowledge-based potentials include information on predicted/known secondary structure, radius of gyration, hydrogen bond energies, number of hydrogen bonds, allowed backbone and side chain torsion angles, atom contact radii (bump checks), disulfide bonding information and a modified threading energy based on the Bryant and Lawrence potential. The chemical shift component of the GeNMR potential uses weighted correlation coefficients calculated between the observed and SHIFTX calculated shifts of the structure being refined.

Calculation scenarios

There are six different kinds of calculation scenarios that GeNMR can currently accommodate. These scenarios include:

  1. chemical shift only—query has homologue in database;
  2. chemical shift only—query has no homologue in database;
  3. NOE only—query has homologue in database;
  4. NOE only—query has no homologue in database;
  5. NOE and chemical shift—query has homologue in database;
  6. NOE and chemical shift—query has no homologue in database.

See also

Related Research Articles

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Nuclear magnetic resonance spectroscopy</span> Laboratory technique

Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic field and the NMR signal is produced by excitation of the nuclei sample with radio waves into nuclear magnetic resonance, which is detected with sensitive radio receivers. The intramolecular magnetic field around an atom in a molecule changes the resonance frequency, thus giving access to details of the electronic structure of a molecule and its individual functional groups. As the fields are unique or highly characteristic to individual compounds, in modern organic chemistry practice, NMR spectroscopy is the definitive method to identify monomolecular organic compounds.

Protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, Angela Gronenborn at the NIH, and Gerhard Wagner at Harvard University, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The Re-referenced Protein Chemical shift Database (RefDB) is an NMR spectroscopy database of carefully corrected or re-referenced chemical shifts, derived from the BioMagResBank (BMRB). The database was assembled by using a structure-based chemical shift calculation program to calculate expected protein (1)H, (13)C and (15)N chemical shifts from X-ray or NMR coordinate data of previously assigned proteins reported in the BMRB. The comparison is automatically performed by a program called SHIFTCOR. The RefDB database currently provides reference-corrected chemical shift data on more than 2000 assigned peptides and proteins. Data from the database indicates that nearly 25% of BMRB entries with (13)C protein assignments and 27% of BMRB entries with (15)N protein assignments require significant chemical shift reference readjustments. Additionally, nearly 40% of protein entries deposited in the BioMagResBank appear to have at least one assignment error. Users may download, search or browse the database through a number of methods available through the RefDB website. RefDB provides a standard chemical shift resource for biomolecular NMR spectroscopists, wishing to derive or compute chemical shift trends in peptides and proteins.

SHIFTCOR is a freely available web server as well as a stand-alone computer program for protein chemical shift re-referencing. Chemical shift referencing is a particularly widespread problem in biomolecular NMR with up to 25% of existing NMR chemical shift assignments being improperly referenced. Some of these referencing problems can lead to systematic errors of between 1.0 to 2.5 ppm. Errors of this magnitude can play havoc with any attempt to compare assignments between proteins or to structurally interpret chemical shifts. Identifying which proteins are mis-assigned or improperly referenced can be challenging, as can correcting the errors once they are found. The SHIFTCOR program was designed to assist with identifying and fixing these chemical shift referencing problems. Specifically it compares, identifies, corrects and re-references 1H, 13C and 15N backbone chemical shifts of peptides and proteins by comparing the observed chemical shifts with the predicted chemical shifts derived from the 3D structure of the protein(s) of interest [1]. The predicted chemical shifts are calculated using the ShiftX program. The SHIFTCOR program was originally used to construct a database of properly re-referenced protein chemical shift assignments called RefDB. RefDB is a web-accessible database of more than 2000 correctly referenced protein chemical shift assignments. While originally available as a stand-alone program only, SHIFTCOR has since been released for general use as a web server.

<span class="mw-page-title-main">Random coil index</span> Protocol in biochemistry

Random coil index (RCI) predicts protein flexibility by calculating an inverse weighted average of backbone secondary chemical shifts and predicting values of model-free order parameters as well as per-residue RMSD of NMR and molecular dynamics ensembles from this parameter.

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

Triple resonance experiments are a set of multi-dimensional nuclear magnetic resonance spectroscopy (NMR) experiments that link three types of atomic nuclei, most typically consisting of 1H, 15N and 13C. These experiments are often used to assign specific resonance signals to specific atoms in an isotopically-enriched protein. The technique was first described in papers by Ad Bax, Mitsuhiko Ikura and Lewis Kay in 1990, and further experiments were then added to the suite of experiments. Many of these experiments have since become the standard set of experiments used for sequential assignment of NMR resonances in the determination of protein structure by NMR. They are now an integral part of solution NMR study of proteins, and they may also be used in solid-state NMR.

<span class="mw-page-title-main">Protein Structure Evaluation Suite & Server</span> System for validating protein structures

Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein structures solved by NMR spectroscopy.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

<span class="mw-page-title-main">Chemical shift index</span> Laboratory technique

The chemical shift index or CSI is a widely employed technique in protein nuclear magnetic resonance spectroscopy that can be used to display and identify the location as well as the type of protein secondary structure found in proteins using only backbone chemical shift data The technique was invented by David S. Wishart in 1992 for analyzing 1Hα chemical shifts and then later extended by him in 1994 to incorporate 13C backbone shifts. The original CSI method makes use of the fact that 1Hα chemical shifts of amino acid residues in helices tends to be shifted upfield relative to their random coil values and downfield in beta strands. Similar kinds of upfield/downfiled trends are also detectable in backbone 13C chemical shifts.

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.

Nuclear magnetic resonance chemical shift re-referencing is a chemical analysis method for chemical shift referencing in biomolecular nuclear magnetic resonance (NMR). It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy.

Protein chemical shift re-referencing is a post-assignment process of adjusting the assigned NMR chemical shifts to match IUPAC and BMRB recommended standards in protein chemical shift referencing. In NMR chemical shifts are normally referenced to an internal standard that is dissolved in the NMR sample. These internal standards include tetramethylsilane (TMS), 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) and trimethylsilyl propionate (TSP). For protein NMR spectroscopy the recommended standard is DSS, which is insensitive to pH variations. Furthermore, the DSS 1H signal may be used to indirectly reference 13C and 15N shifts using a simple ratio calculation [1]. Unfortunately, many biomolecular NMR spectroscopy labs use non-standard methods for determining the 1H, 13C or 15N “zero-point” chemical shift position. This lack of standardization makes it difficult to compare chemical shifts for the same protein between different laboratories. It also makes it difficult to use chemical shifts to properly identify or assign secondary structures or to improve their 3D structures via chemical shift refinement. Chemical shift re-referencing offers a means to correct these referencing errors and to standardize the reporting of protein chemical shifts across laboratories.

PREDITOR is a freely available web-server for the prediction of protein torsion angles from chemical shifts. For many years it has been known that protein chemical shifts are sensitive to protein secondary structure, which in turn, is sensitive to backbone torsion angles. torsion angles are internal coordinates that can be used to describe the conformation of a polypeptide chain. They can also be used as constraints to help determine or refine protein structures via NMR spectroscopy. In proteins there are four major torsion angles of interest: phi, psi, omega and chi-1. Traditionally protein NMR spectroscopists have used vicinal J-coupling information and the Karplus relation to determine approximate backbone torsion angle constraints for phi and chi-1 angles. However, several studies in the early 1990s pointed out the strong relationship between 1H and 13C chemical shifts and torsion angles, especially with backbone phi and psi angles. Later a number of other papers pointed out additional chemical shift relationships with chi-1 and even omega angles. PREDITOR was designed to exploit these experimental observations and to help NMR spectroscopists easily predict protein torsion angles from chemical shift assignments. Specifically, PREDITOR accepts protein sequence and/or chemical shift data as input and generates torsion angle predictions for phi, psi, omega and chi-1 angles. The algorithm that PREDITOR uses combines sequence alignment, chemical shift alignment and a number of related chemical shift analysis techniques to predict torsion angles. PREDITOR is unusually fast and exhibits a very high level of accuracy. In a series of tests 88% of PREDITOR’s phi/psi predictions were within 30 degrees of the correct values, 84% of chi-1 predictions were correct and 99.97% of PREDITOR’s predicted omega angles were correct. PREDITOR also estimates the torsion angle errors so that its torsion angle constraints can be used with standard protein structure refinement software, such as CYANA, CNS, XPLOR and AMBER. PREDITOR also supports automated protein chemical shift re-referencing and the prediction of proline cis/trans states. PREDITOR is not the only torsion angle prediction software available. Several other computer programs including TALOS, TALOS+ and DANGLE have also been developed to predict backbone torsion angles from protein chemical shifts. These stand-alone programs exhibit similar prediction performance to PREDITOR but are substantially slower.

ShiftX is a freely available web server for rapidly calculating protein chemical shifts from protein X-ray coordinates. Protein chemical shift prediction is particularly useful in verifying protein chemical shift assignments, adjusting mis-referenced chemical shifts, refining NMR protein structures and assisting with the NMR assignment of unassigned proteins that have either had their structures determined by X-ray or NMR methods.

The Biological Magnetic Resonance Data Bank is an open access repository of nuclear magnetic resonance (NMR) spectroscopic data from peptides, proteins, nucleic acids and other biologically relevant molecules. The database is operated by the University of Wisconsin–Madison and is supported by the National Library of Medicine. The BMRB is part of the Research Collaboratory for Structural Bioinformatics and, since 2006, it is a partner in the Worldwide Protein Data Bank (wwPDB). The repository accepts NMR spectral data from laboratories around the world and, once the data is validated, it is available online at the BMRB website. The database has also an ftp site, where data can be downloaded in the bulk. The BMRB has two mirror sites, one at the Protein Database Japan (PDBj) at Osaka University and one at the Magnetic Resonance Research Center (CERM) at the University of Florence in Italy. The site at Japan accepts and processes data depositions.

David S. Wishart is a Canadian researcher and a Distinguished University Professor in the Department of Biological Sciences and the Department of Computing Science at the University of Alberta. Wishart also holds cross appointments in the Faculty of Pharmacy and Pharmaceutical Sciences and the Department of Laboratory Medicine and Pathology in the Faculty of Medicine and Dentistry. Additionally, Wishart holds a joint appointment in metabolomics at the Pacific Northwest National Laboratory in Richland, Washington. Wishart is well known for his pioneering contributions to the fields of protein NMR spectroscopy, bioinformatics, cheminformatics and metabolomics. In 2011, Wishart founded the Metabolomics Innovation Centre (TMIC), which is Canada's national metabolomics laboratory.

References

  1. Berjanskii, Mark; Tang P; Liang J; Cruz JA; Zhou J; Zhou Y; Bassett E; MacDonell C; Lu P; Lin G; Wishart DS (April 30, 2009). "GeNMR: a web server for rapid NMR-based protein structure determination". Nucleic Acids Res. 37 (Web Server issue): W670-7. doi:10.1093/nar/gkp280. PMC   2703936 . PMID   19406927.