Protein chemical shift re-referencing

Last updated

Protein chemical shift re-referencing is a post-assignment process of adjusting the assigned NMR chemical shifts to match IUPAC and BMRB recommended standards in protein chemical shift referencing. In NMR chemical shifts are normally referenced to an internal standard that is dissolved in the NMR sample. These internal standards include tetramethylsilane (TMS), 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) and trimethylsilyl propionate (TSP). For protein NMR spectroscopy the recommended standard is DSS, which is insensitive to pH variations (unlike TSP). Furthermore, the DSS 1H signal may be used to indirectly reference 13C and 15N shifts using a simple ratio calculation [1]. Unfortunately, many biomolecular NMR spectroscopy labs use non-standard methods for determining the 1H, 13C or 15N “zero-point” chemical shift position. This lack of standardization makes it difficult to compare chemical shifts for the same protein between different laboratories. It also makes it difficult to use chemical shifts to properly identify or assign secondary structures or to improve their 3D structures via chemical shift refinement. Chemical shift re-referencing offers a means to correct these referencing errors and to standardize the reporting of protein chemical shifts across laboratories.

Contents

Importance of NMR chemical shift re-referencing in biomolecular NMR

Incorrect chemical shift referencing is a particularly acute problem in biomolecular NMR. [1] It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. [2] [3] [4] Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy. [5] [3]

Programs for protein chemical shift re-referencing

Because of the magnitude and severity of the problems with chemical shift referencing in biomolecular NMR, a number of computer programs have been developed to help mitigate the problem (see Table 1 for a summary). The first program to comprehensively tackle chemical shift mis-referencing in biomolecular NMR was SHIFTCOR. [2]

Table 1. Summary and comparison of different chemical shift re-referencing and mis-assignment detection programs. [5]

Program [Reference]Detects or performs shift re-referencingDetects gross assignment errorsDetects subtle assignment errorsDistinguishes assignment errors from referencing errorsRequires 3D structure
CheckShift [6] [7] YesNoNoNoNo
AVS [8] NoYesNoNoNo
LACS [4] [9] YesSometimesNoNoNo
PSSI [10] YesNoNoNoNo
SHIFTCOR [2] YesYesSometimesYesYes
PANAV [11] YesYesYesYesNo

SHIFTCOR: A structure-based chemical shift correction program

SHIFTCOR is an automated protein chemical shift correction program that uses statistical methods to compare and correct predicted NMR chemical shifts (derived from the 3D structure of the protein) relative to an input set of experimentally measured chemical shifts. SHIFTCOR uses several simple statistical approaches and pre-determined cut-off values to identify and correct potential referencing, assignment and typographical errors. SHIFTCOR identifies potential chemical shift referencing problems by comparing the difference between the average value of each set of observed backbone (1Hα, 13Cα, 13Cβ, 13CO, 15N and 1HN) shifts and their corresponding predicted chemical shifts. The difference between these two averages results in a nucleus-specific chemical shift offset or reference correction (i.e. one for 1H, one for 13C and one for 15N). In order to ensure that certain extreme outliers do not unduly bias these average offset values, the average of the observed shifts is only calculated after excluding potential mis-assignments or typographical errors. [2]

SHIFTCOR output

SHIFTCOR generates and reports chemical shift offsets or differences for each nucleus. The results contain the chemical shift analyses (including lists of potential mis-assignments, the estimated referencing errors, the estimated error in the calculated reference offset (95% confidence interval), the applied or suggested reference offset, correlation coefficients, RMSD values) and the corrected BMRB formatted chemical shift file (see Figure 1 for details). [2]

SHIFTCOR uses the chemical shift calculation program SHIFTX [12] to predict 1Hα, 13Cα,15N shifts based on the 3D structure coordinates of the protein being analyzed. By comparing the predicted shifts to the observed shifts, SHIFTCOR is able to accurately identify chemical shift reference offsets as well as potential mis-assignments. A key limitation to the SHIFTCOR approach is that requires that the 3D structure for the target protein be available to assess the chemical shift reference offsets. Given that chemical shift assignments are typically made before the structure is determined, it was soon realized that structure-independent approaches were required to develop. [5]

Structure-independent chemical shift correction programs

Several methods have been developed that make use of the estimated (via 1H or 13C shifts) or predicted (via sequence) secondary structure content of the protein being analyzed. These programs include PSSI, [10] CheckShift, [6] [7] LACS, [4] [9] and PANAV. [11] Both PANAV <> and CheckShift <http://checkshift.services.came.sbg.ac.at/> are also available as web servers.

The PSSI and PANAV programs use the secondary structure determined by 1H shifts (which are almost never mis-referenced) to adjust the target protein’s 13C and 15N shifts to match the 1H-derived secondary structure. LACS uses the difference between secondary 13Cα and 13Cβ shifts plotted against secondary 13Cα shifts or secondary 13Cβ shifts to determine reference offsets. A more recent version of LACS has been adapted to identify 15N chemical shift mis-referencing. [4] This new version of LACS exploits the well-known relationship between secondary 15N shifts and the secondary 13Cα and 13Cβ shifts of the preceding residue. [3] In contrast to LACS and PANAV/PSSI, CheckShift uses secondary structure predicted from high-performance secondary structure prediction programs such as PSIPRED [13] to iteratively adjust 13C and 15N chemical shifts so that their secondary shifts match the predicted secondary structure. These programs have all been shown to accurately identify mis-referenced and properly re-reference protein chemical shifts deposited in the BMRB,. [7] [11] Note that both LACS and CheckShift are programmed to always predict the same offset for 13Cα and 13Cβ shifts, whereas PSSI and PANAV do not make this assumption. As a general rule, PANAV and PSSI typically exhibit a smaller spread (or standard deviation) in calculated reference offsets, indicating that these programs are slightly more precise than either LACS or CheckShift. Neither LACS nor CheckShift are able to handle proteins that have the extremely large (above 40 ppm) reference offsets, whereas PANAV and PSSI seem to be able to deal with these kinds of anomalous proteins. [11]

In a recent study, [11] a chemical shift re-referencing program (PANAV) was run on a total of 2421 BMRB entries that had a sufficient proportion of (>80%) of assigned chemical shifts to perform a robust chemical shift reference correction. A total of 243 entries were found with 13Cα shifts offset by more than 1.0 ppm, 238 entries with 13Cβ shifts offset of more than 1.0 ppm, 200 entries with 13C’ shifts offset of more than 1.0 ppm and 137 entries with 15N shifts offset by more than 1.5 ppm. From this study, 19.7% of the entries in the BMRB appear to be mis-referenced. Evidently, chemical shift referencing continues to be a significant, and as yet unresolved problem for the biomolecular NMR community. [5] [11]

See also

Related Research Articles

Proton nuclear magnetic resonance NMR via protons, hydrogen-1 nuclei

Proton nuclear magnetic resonance is the application of nuclear magnetic resonance in NMR spectroscopy with respect to hydrogen-1 nuclei within the molecules of a substance, in order to determine the structure of its molecules. In samples where natural hydrogen (H) is used, practically all the hydrogen consists of the isotope 1H.

Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, and Angela Gronenborn at the NIH, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.

HNCOCA experiment

HNCOCA is a 3D triple-resonance NMR experiment commonly used in the field of protein NMR. The name derives from the experiment's magnetization transfer pathway: The magnetization of the amide proton of an amino acid residue is transferred to the amide nitrogen, and then to the alpha carbon of the previous residue in the protein's amino acid sequence. In contrast, the complementary HNCA experiment transfers magnetization to the alpha carbons of both the starting residue and the previous residue in the sequence. The HNCOCA experiment is used, often in tandem with HNCA, to assign alpha carbon resonance signals to specific residues in the protein. This experiment requires a purified sample of protein prepared with 13C and 15N isotopic labelling, at a concentration greater than 0.1 mM, and is thus generally only applied to recombinant proteins.

The heteronuclear single quantum coherence or heteronuclear single quantum correlation experiment, normally abbreviated as HSQC, is used frequently in NMR spectroscopy of organic molecules and is of particular significance in the field of protein NMR. The experiment was first described by Geoffrey Bodenhausen and D. J. Ruben in 1980. The resulting spectrum is two-dimensional (2D) with one axis for proton (1H) and the other for a heteronucleus, which is usually 13C or 15N. The spectrum contains a peak for each unique proton attached to the heteronucleus being considered. The 2D HSQC can also be combined with other experiments in higher-dimensional NMR experiments, such as NOESY-HSQC or TOCSY-HSQC.

Adriaan "Ad" Bax is a Dutch-American molecular biophysicist. He was born in the Netherlands and is the Chief of the Section on Biophysical NMR Spectroscopy at the National Institutes of Health. He is known for his work on the methodology of biomolecular NMR spectroscopy.

<i>Journal of Biomolecular NMR</i> Academic journal

The Journal of Biomolecular NMR publishes research on technical developments and innovative applications of nuclear magnetic resonance spectroscopy for the study of structure and dynamic properties of biopolymers in solution, liquid crystals, solids and mixed environments. Some of the main topics include experimental and computational approaches for the determination of three-dimensional structures of proteins and nucleic acids, advancements in the automated analysis of NMR spectra, and new methods to probe and interpret molecular motions.

CING (biomolecular NMR structure) computer software

In biomolecular structure, CING stands for the Common Interface for NMR structure Generation and is known for structure and NMR data validation.

The Re-referenced Protein Chemical shift Database (RefDB) is an NMR spectroscopy database of carefully corrected or re-referenced chemical shifts, derived from the BioMagResBank (BMRB). The database was assembled by using a structure-based chemical shift calculation program to calculate expected protein (1)H, (13)C and (15)N chemical shifts from X-ray or NMR coordinate data of previously assigned proteins reported in the BMRB. The comparison is automatically performed by a program called SHIFTCOR. The RefDB database currently provides reference-corrected chemical shift data on more than 2000 assigned peptides and proteins. Data from the database indicates that nearly 25% of BMRB entries with (13)C protein assignments and 27% of BMRB entries with (15)N protein assignments require significant chemical shift reference readjustments. Additionally, nearly 40% of protein entries deposited in the BioMagResBank appear to have at least one assignment error. Users may download, search or browse the database through a number of methods available through the RefDB website. RefDB provides a standard chemical shift resource for biomolecular NMR spectroscopists, wishing to derive or compute chemical shift trends in peptides and proteins.

SHIFTCOR is a freely available web server as well as a stand-alone computer program for protein chemical shift re-referencing. Chemical shift referencing is a particularly widespread problem in biomolecular NMR with up to 25% of existing NMR chemical shift assignments being improperly referenced. Some of these referencing problems can lead to systematic errors of between 1.0 to 2.5 ppm. Errors of this magnitude can play havoc with any attempt to compare assignments between proteins or to structurally interpret chemical shifts. Identifying which proteins are mis-assigned or improperly referenced can be challenging, as can correcting the errors once they are found. The SHIFTCOR program was designed to assist with identifying and fixing these chemical shift referencing problems. Specifically it compares, identifies, corrects and re-references 1H, 13C and 15N backbone chemical shifts of peptides and proteins by comparing the observed chemical shifts with the predicted chemical shifts derived from the 3D structure of the protein(s) of interest [1]. The predicted chemical shifts are calculated using the ShiftX program. The SHIFTCOR program was originally used to construct a database of properly re-referenced protein chemical shift assignments called RefDB. RefDB is a web-accessible database of more than 2000 correctly referenced protein chemical shift assignments. While originally available as a stand-alone program only, SHIFTCOR has since been released for general use as a web server.

Random coil index Protocol in biochemistry

Random coil index (RCI) predicts protein flexibility by calculating an inverse weighted average of backbone secondary chemical shifts and predicting values of model-free order parameters as well as per-residue RMSD of NMR and molecular dynamics ensembles from this parameter.

CheShift software tool designed to predict 13Cα and 13Cβ chemical shifts of protein structures

CheShift-2 is an application created to compute 13Cα and 13Cβ protein chemical shifts and to validate protein structures. It is based on quantum mechanics computations of 13Cα and 13Cβchemical shift as a function of the torsional angles of the 20 amino acids.

CS-ROSETTA is a framework for structure calculation of biological macromolecules on the basis of conformational information from NMR, which is built on top of the biomolecular modeling and design software called ROSETTA. The name CS-ROSETTA for this branch of ROSETTA stems from its origin in combining NMR chemical shift (CS) data with ROSETTA structure prediction protocols. The software package was later extended to include additional NMR conformational parameters, such as Residual Dipolar Couplings (RDC), NOE distance restraints, pseudocontact chemical shifts (PCS) and restraints derived from homologous proteins. This software can be used together with other molecular modeling protocols, such as docking to model protein oligomers. In addition, CS-ROSETTA can be combined with chemical shift resonance assignment algorithms to create a fully automated NMR structure determination pipeline. The CS-ROSETTA software is freely available for academic use and can be licensed for commercial use. A software manual and tutorials are provided on the supporting website https://csrosetta.chemistry.ucsc.edu/.

Triple resonance experiments are a set of multi-dimensional nuclear magnetic resonance spectroscopy (NMR) experiments that link three types of atomic nuclei, most typically consisting of 1H, 15N and 13C. These experiments are often used to assign specific resonance signals to specific atoms in an isotopically-enriched protein. The technique was first described in papers by Ad Bax, Mitsuhiko Ikura and Lewis Kay in 1990, and further experiments were then added to the suite of experiments. Many of these experiments have since become the standard set of experiments used for sequential assignment of NMR resonances in the determination of protein structure by NMR. They are now an integral part of solution NMR study of proteins, and they may also be used in solid-state NMR.

CS23D web server to generate 3D structural models from NMR chemical shifts

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

Chemical shift index Laboratory technique

The chemical shift index or CSI is a widely employed technique in protein nuclear magnetic resonance spectroscopy that can be used to display and identify the location as well as the type of protein secondary structure found in proteins using only backbone chemical shift data The technique was invented by Dr. David Wishart in 1992 for analyzing 1Hα chemical shifts and then later extended by him in 1994 to incorporate 13C backbone shifts. The original CSI method makes use of the fact that 1Hα chemical shifts of amino acid residues in helices tends to be shifted upfield relative to their random coil values and downfield in beta strands. Similar kinds of upfield/downfiled trends are also detectable in backbone 13C chemical shifts.

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.

Nuclear magnetic resonance chemical shift re-referencing is a chemical analysis method for chemical shift referencing in biomolecular nuclear magnetic resonance (NMR). It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy.

Probabilistic Approach for protein NMR Assignment Validation (PANAV) is a freely available stand-alone program that is used for protein chemical shift re-referencing. Chemical shift referencing is a problem in protein nuclear magnetic resonance as >20% of reported NMR chemical shift assignments appear to be improperly referenced. For certain nuclei these referencing issues can cause systematic chemical shift errors of between 1.0 and 2.5 ppm. Chemical shift errors of this magnitude often make it very difficult to compare NMR chemical shift assignments between proteins. It also makes it very hard to structurally interpret chemical shifts. Unlike most other chemical shift re-referencing tools PANAV employs a structure-independent protocol. That is, with PANAV there is no need to know the structure of the protein in advance of correcting any chemical shift referencing errors. This makes PANAV particularly useful for NMR studies involving novel or newly assigned proteins, where the structure has yet to be determined. Indeed, this scenario represents the vast majority of assignment cases in biomolecular NMR. PANAV uses residue-specific and secondary structure-specific chemical shift distributions that were calculated over short fragments of correctly referenced proteins to identify mis-assigned resonances. More specifically, PANAV compares the initial chemical shift assignments to the expected chemical shifts based on their local sequence and expected/predicted secondary structure. In this way, PANAV is able to identify and re-reference mis-referenced chemical shift assignments. PANAV can also identify potentially mis-assigned resonances as well. PANAV has been extensively tested and compared against a large number of existing re-referencing or mis-assignment detection programs. These assessments indicate that PANAV is equal to or superior to existing approaches.

PREDITOR is a freely available web-server for the prediction of protein torsion angles from chemical shifts. For many years it has been known that protein chemical shifts are sensitive to protein secondary structure, which in turn, is sensitive to backbone torsion angles. torsion angles are internal coordinates that can be used to describe the conformation of a polypeptide chain. They can also be used as constraints to help determine or refine protein structures via NMR spectroscopy. In proteins there are four major torsion angles of interest: phi, psi, omega and chi-1. Traditionally protein NMR spectroscopists have used vicinal J-coupling information and the Karplus relation to determine approximate backbone torsion angle constraints for phi and chi-1 angles. However, several studies in the early 1990s pointed out the strong relationship between 1H and 13C chemical shifts and torsion angles, especially with backbone phi and psi angles. Later a number of other papers pointed out additional chemical shift relationships with chi-1 and even omega angles. PREDITOR was designed to exploit these experimental observations and to help NMR spectroscopists easily predict protein torsion angles from chemical shift assignments. Specifically, PREDITOR accepts protein sequence and/or chemical shift data as input and generates torsion angle predictions for phi, psi, omega and chi-1 angles. The algorithm that PREDITOR uses combines sequence alignment, chemical shift alignment and a number of related chemical shift analysis techniques to predict torsion angles. PREDITOR is unusually fast and exhibits a very high level of accuracy. In a series of tests 88% of PREDITOR’s phi/psi predictions were within 30 degrees of the correct values, 84% of chi-1 predictions were correct and 99.97% of PREDITOR’s predicted omega angles were correct. PREDITOR also estimates the torsion angle errors so that its torsion angle constraints can be used with standard protein structure refinement software, such as CYANA, CNS, XPLOR and AMBER. PREDITOR also supports automated protein chemical shift re-referencing and the prediction of proline cis/trans states. PREDITOR is not the only torsion angle prediction software available. Several other computer programs including TALOS, TALOS+ and DANGLE have also been developed to predict backbone torsion angles from protein chemical shifts. These stand-alone programs exhibit similar prediction performance to PREDITOR but are substantially slower.

ShiftX is a freely available web server for rapidly calculating protein chemical shifts from protein X-ray coordinates. Protein chemical shift prediction is particularly useful in verifying protein chemical shift assignments, adjusting mis-referenced chemical shifts, refining NMR protein structures and assisting with the NMR assignment of unassigned proteins that have either had their structures determined by X-ray or NMR methods.

References

  1. Wishart, DS; Bigam CG; Yao J; Abildgaard F; et al. (1995). "1H, 13C and 15N chemical shift referencing in biomolecular NMR". Journal of Biomolecular NMR. 6 (2): 135–40. doi:10.1007/bf00211777. PMID   8589602.
  2. 1 2 3 4 5 Zhang, H; Neal, S. & Wishart, D.S. (Mar 2003). "RefDB: A database of uniformly referenced protein chemical shifts". J. Biomol. NMR. 25 (3): 173–195. doi:10.1023/A:1022836027055. PMID   12652131.
  3. 1 2 3 Wishart, DS; Case DA (2001). Use of chemical shifts in macromolecular structure determination. Methods in Enzymology. 338. pp. 3–34. doi:10.1016/s0076-6879(02)38214-4. ISBN   9780121822392. PMID   11460554.
  4. 1 2 3 4 Wang, L; Markley JL (2009). "Empirical correlation between protein backbone 15N and 13C secondary chemical shifts and its application to nitrogen chemical shift re-referencing". Journal of Biomolecular NMR. 44 (2): 95–99. doi:10.1007/s10858-009-9324-0. PMC   2782637 . PMID   19436955.
  5. 1 2 3 4 Wishart, DS (Feb 2011). "Interpreting protein chemical shift data". Progress in Nuclear Magnetic Resonance Spectroscopy. 58 (1–2): 62–87. doi:10.1016/j.pnmrs.2010.07.004. PMID   21241884.
  6. 1 2 Ginzinger, SW; Gerick F; Coles M; Heun V (2007). "CheckShift: automatic correction of inconsistent chemical shift referencing". Journal of Biomolecular NMR. 39 (3): 223–227. doi:10.1007/s10858-007-9191-5. PMID   17899394.
  7. 1 2 3 Ginzinger, SW; Skocibusić M; Heun V (2009). "CheckShift improved: fast chemical shift reference correction with high accuracy". Journal of Biomolecular NMR. 44 (4): 207–211. doi:10.1007/s10858-009-9330-2. PMID   19575298.
  8. Moseley, NH; Sahota G; Montelione TG (Jul 2004). "Assignment validation software suite for the evaluation and presentation of the protein resonance assignment data". Journal of Biomolecular NMR. 28 (4): 341–355. doi:10.1023/B:JNMR.0000015420.44364.06. PMID   14872126.
  9. 1 2 Wang, L; Eghbalnia HR; Bahrami A; Markley JL (May 2005). "Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications". Journal of Biomolecular NMR. 32 (1): 13–22. doi:10.1007/s10858-005-1717-0. PMID   16041479.
  10. 1 2 Wang, Y; Wishart DS (2005). "A simple method to adjust inconsistently referenced 13C and 15N chemical shift assignments of proteins". Journal of Biomolecular NMR. 31 (2): 143–148. doi:10.1007/s10858-004-7441-3. PMID   15772753.
  11. 1 2 3 4 5 6 Wang, B; Wang Y (2010). "A probabilistic approach for validating protein NMR chemical shift assignments". Journal of Biomolecular NMR. 47 (2): 85–99. doi:10.1007/s10858-010-9407-y. PMID   20446018.
  12. Neal, S; Nip AM; Zhang H; Wishart DS (Jul 2003). "Rapid and accurate calculation of protein 1H 13C and 15N chemical shifts". Journal of Biomolecular NMR. 26 (3): 215–240. doi:10.1023/A:1023812930288. PMID   12766419.
  13. McGuffin, LJ; Bryson K; Jones DT (2000). "The PSIPRED protein structure prediction server". Bioinformatics. 16 (4): 404–405. doi: 10.1093/bioinformatics/16.4.404 . PMID   10869041.

General References