The chemical shift index or CSI is a widely employed technique in protein nuclear magnetic resonance spectroscopy that can be used to display and identify the location (i.e. start and end) as well as the type of protein secondary structure (beta strands, helices and random coil regions) found in proteins using only backbone chemical shift data [1] [2] The technique was invented by David S. Wishart in 1992 for analyzing 1Hα chemical shifts and then later extended by him in 1994 to incorporate 13C backbone shifts. The original CSI method makes use of the fact that 1Hα chemical shifts of amino acid residues in helices tends to be shifted upfield (i.e. towards the right side of an NMR spectrum) relative to their random coil values and downfield (i.e. towards the left side of an NMR spectrum) in beta strands. Similar kinds of upfield and downfield trends are also detectable in backbone 13C chemical shifts.
The CSI is a graph-based technique that essentially employs an amino acid-specific digital filter to convert every assigned backbone chemical shift value into a simple three-state (-1, 0, +1) index. This approach generates a more easily understood and much more visually pleasing graph of protein chemical shift values. In particular, if the upfield 1Hα chemical shift (relative to an amino acid-specific random coil value) of a certain residue is > 0.1 ppm, then that amino acid residue is assigned a value of -1. Similarly, if the downfield 1Hα chemical shift of a certain amino acid residue is > 0.1 ppm then that residue is assigned a value of +1. If an amino acid residue's chemical shift is not shifted downfield or upfield by a sufficient amount (i.e. <0.1 ppm), it is given a value of 0. When this 3-state index is plotted as a bar graph over the full length of the protein sequence, simple inspection can allow one to identify beta strands (clusters of +1 values), alpha helices (clusters of -1 values), and random coil segments (clusters of 0 values). A list of the amino acid-specific random coil chemical shifts for CSI calculations is given in Table 1. An example of a CSI graph for a small protein is shown in Figure 1 with the arrows located above the black bars indicating locations of the beta strands and the rectangular box indicating the location of a helix.
Amino Acid | 1Hα random coil shift (ppm) | Amino Acid | 1Hα RC shift random coil shift (ppm) |
---|---|---|---|
Ala (A) | 4.35 | Met (M) | 4.52 |
Cys (C) | 4.65 | Asn (N) | 4.75 |
Asp (D) | 4.76 | Pro (P) | 4.44 |
Glu (E) | 4.29 | Gln (Q) | 4.37 |
Phe (F) | 4.66 | Arg (R) | 4.38 |
Gly (G) | 3.97 | Ser (S) | 4.50 |
His (H) | 4.63 | Thr (T) | 4.35 |
Ile (I) | 3.95 | Val (V) | 3.95 |
Lys (K) | 4.36 | Trp (W) | 4.70 |
Leu (L) | 4.17 | Tyr (Y) | 4.60 |
Using only 1Hα chemical shifts and simple clustering rules (clusters of 3 or more vertical bars for beta strands and clusters of 4 or more vertical bars for alpha helices), the CSI is typically 75-80% accurate in the identification of secondary structures. [2] [3] [4] [5] This performance depends partly on the quality of the NMR data set as well as the technique (manual or programmatic) used to identify the protein secondary structures. As noted above, a consensus CSI method that filters upfield/downfield chemical shift changes in 13Cα, 13Cβ, and 13C' atoms in a similar manner to 1Hα shifts has also been developed. [2] The consensus CSI combines the CSI plots from backbone 1H and 13C chemical shifts to generate a single CSI plot. It can be up to 85-90% accurate. [5]
The link between protein chemical shifts and protein secondary structure (specifically alpha helices) was first described by John Markley and colleagues in 1967. [6] With the development of modern 2-dimensional NMR techniques, it became possible to measure more protein chemical shifts. With more peptides and proteins were being assigned in the early 1980s it soon became obvious that amino acid chemical shifts were sensitive not only to helical conformations, but also to β-strand conformations. Specifically, the secondary 1Hα chemical shifts of all amino acids exhibit a clear upfield trend on helix formation and an obvious downfield trend on β-sheet formation. [7] [8] By the early 1990s, a sufficient body of 13C and 15N chemical shift assignments for peptides and proteins had been collected to determine that similar upfield/downfield trends were evident for essentially all backbone 13Cα, 13Cβ, 13C', 1HN and 15N (weakly) chemical shifts. [9] [10] It was these rather striking chemical shift trends that were exploited in the development of the chemical shift index.
The CSI method is not without some shortcomings. In particular, its performance drops if chemical shift assignments are mis-referenced or incomplete. It is also quite sensitive to the choice of random coil shifts used to calculate the secondary shifts [5] and it generally identifies alpha helices (>85% accuracy) better than beta strands (<75% accuracy) regardless of the choice of random coil shifts. [5] Furthermore, the CSI method does not identify other kinds of secondary structures, such as β-turns. Because of these shortcomings, a number of alternative CSI-like approaches have been proposed. These include: 1) a prediction method that employs statistically derived chemical shift/structure potentials (PECAN); [11] 2) a probabilistic approach to secondary structure identification (PSSI); [12] 3) a method that combines secondary structure predictions from sequence data and chemical shift data (PsiCSI), [13] 4) a secondary structure identification approach that uses pre-specified chemical shift patterns (PLATON) [14] and 5) a two-dimensional cluster analysis method known as 2DCSi. [15] The performance of these newer methods is generally slightly better (2-4%) than the original CSI method.
Since its original description in 1992, the CSI method has been used to characterize the secondary structure of thousands of peptides and proteins. Its popularity is largely due to the fact that it is easy to understand and can be implemented without the need for specialized computer programs. Even though the CSI method can be easily performed manually, a number of commonly used NMR data processing programs such as NMRView, [16] NMR structure generation web servers such as CS23D [17] as well as various NMR data analysis web servers such as RCI, [18] Preditor [19] and PANAV [20] have incorporated the CSI method into their software.
In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure of a molecule. Chemical shifts are also used to describe signals in other forms of spectroscopy such as photoemission spectroscopy.
Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique based on re-orientation of atomic nuclei with non-zero nuclear spins in an external magnetic field. This re-orientation occurs with absorption of electromagnetic radiation in the radio frequency region from roughly 4 to 900 MHz, which depends on the isotopic nature of the nucleus and increased proportionally to the strength of the external magnetic field. Notably, the resonance frequency of each NMR-active nucleus depends on its chemical environment. As a result, NMR spectra provide information about individual functional groups present in the sample, as well as about connections between nearby nuclei in the same molecule. As the NMR spectra are unique or highly characteristic to individual compounds and functional groups, NMR spectroscopy is one of the most important methods to identify molecular structures, particularly of organic compounds.
Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, Angela Gronenborn at the NIH, and Gerhard Wagner at Harvard University, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.
The heteronuclear single quantum coherence or heteronuclear single quantum correlation experiment, normally abbreviated as HSQC, is used frequently in NMR spectroscopy of organic molecules and is of particular significance in the field of protein NMR. The experiment was first described by Geoffrey Bodenhausen and D. J. Ruben in 1980. The resulting spectrum is two-dimensional (2D) with one axis for proton (1H) and the other for a heteronucleus, which is usually 13C or 15N. The spectrum contains a peak for each unique proton attached to the heteronucleus being considered. The 2D HSQC can also be combined with other experiments in higher-dimensional NMR experiments, such as NOESY-HSQC or TOCSY-HSQC.
The Journal of Biomolecular NMR publishes research on technical developments and innovative applications of nuclear magnetic resonance spectroscopy for the study of structure and dynamic properties of biopolymers in solution, liquid crystals, solids and mixed environments. Some of the main topics include experimental and computational approaches for the determination of three-dimensional structures of proteins and nucleic acids, advancements in the automated analysis of NMR spectra, and new methods to probe and interpret molecular motions.
Nucleic acid NMR is the use of nuclear magnetic resonance spectroscopy to obtain information about the structure and dynamics of nucleic acid molecules, such as DNA or RNA. It is useful for molecules of up to 100 nucleotides, and as of 2003, nearly half of all known RNA structures had been determined by NMR spectroscopy.
The Re-referenced Protein Chemical shift Database (RefDB) is an NMR spectroscopy database of carefully corrected or re-referenced chemical shifts, derived from the BioMagResBank (BMRB). The database was assembled by using a structure-based chemical shift calculation program to calculate expected protein (1)H, (13)C and (15)N chemical shifts from X-ray or NMR coordinate data of previously assigned proteins reported in the BMRB. The comparison is automatically performed by a program called SHIFTCOR. The RefDB database currently provides reference-corrected chemical shift data on more than 2000 assigned peptides and proteins. Data from the database indicates that nearly 25% of BMRB entries with (13)C protein assignments and 27% of BMRB entries with (15)N protein assignments require significant chemical shift reference readjustments. Additionally, nearly 40% of protein entries deposited in the BioMagResBank appear to have at least one assignment error. Users may download, search or browse the database through a number of methods available through the RefDB website. RefDB provides a standard chemical shift resource for biomolecular NMR spectroscopists, wishing to derive or compute chemical shift trends in peptides and proteins.
SHIFTCOR is a freely available web server as well as a stand-alone computer program for protein chemical shift re-referencing. Chemical shift referencing is a particularly widespread problem in biomolecular NMR with up to 25% of existing NMR chemical shift assignments being improperly referenced. Some of these referencing problems can lead to systematic errors of between 1.0 to 2.5 ppm. Errors of this magnitude can play havoc with any attempt to compare assignments between proteins or to structurally interpret chemical shifts. Identifying which proteins are mis-assigned or improperly referenced can be challenging, as can correcting the errors once they are found. The SHIFTCOR program was designed to assist with identifying and fixing these chemical shift referencing problems. Specifically it compares, identifies, corrects and re-references 1H, 13C and 15N backbone chemical shifts of peptides and proteins by comparing the observed chemical shifts with the predicted chemical shifts derived from the 3D structure of the protein(s) of interest [1]. The predicted chemical shifts are calculated using the ShiftX program. The SHIFTCOR program was originally used to construct a database of properly re-referenced protein chemical shift assignments called RefDB. RefDB is a web-accessible database of more than 2000 correctly referenced protein chemical shift assignments. While originally available as a stand-alone program only, SHIFTCOR has since been released for general use as a web server.
Random coil index (RCI) predicts protein flexibility by calculating an inverse weighted average of backbone secondary chemical shifts and predicting values of model-free order parameters as well as per-residue RMSD of NMR and molecular dynamics ensembles from this parameter.
Triple resonance experiments are a set of multi-dimensional nuclear magnetic resonance spectroscopy (NMR) experiments that link three types of atomic nuclei, most typically consisting of 1H, 15N and 13C. These experiments are often used to assign specific resonance signals to specific atoms in an isotopically-enriched protein. The technique was first described in papers by Ad Bax, Mitsuhiko Ikura and Lewis Kay in 1990, and further experiments were then added to the suite of experiments. Many of these experiments have since become the standard set of experiments used for sequential assignment of NMR resonances in the determination of protein structure by NMR. They are now an integral part of solution NMR study of proteins, and they may also be used in solid-state NMR.
GeNMR method is the first fully automated template-based method of protein structure determination that utilizes both NMR chemical shifts and NOE -based distance restraints.
CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.
Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.
Nuclear magnetic resonance chemical shift re-referencing is a chemical analysis method for chemical shift referencing in biomolecular nuclear magnetic resonance (NMR). It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy.
Protein chemical shift re-referencing is a post-assignment process of adjusting the assigned NMR chemical shifts to match IUPAC and BMRB recommended standards in protein chemical shift referencing. In NMR chemical shifts are normally referenced to an internal standard that is dissolved in the NMR sample. These internal standards include tetramethylsilane (TMS), 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) and trimethylsilyl propionate (TSP). For protein NMR spectroscopy the recommended standard is DSS, which is insensitive to pH variations. Furthermore, the DSS 1H signal may be used to indirectly reference 13C and 15N shifts using a simple ratio calculation [1]. Unfortunately, many biomolecular NMR spectroscopy labs use non-standard methods for determining the 1H, 13C or 15N “zero-point” chemical shift position. This lack of standardization makes it difficult to compare chemical shifts for the same protein between different laboratories. It also makes it difficult to use chemical shifts to properly identify or assign secondary structures or to improve their 3D structures via chemical shift refinement. Chemical shift re-referencing offers a means to correct these referencing errors and to standardize the reporting of protein chemical shifts across laboratories.
Probabilistic Approach for protein NMR Assignment Validation (PANAV) is a freely available stand-alone program that is used for protein chemical shift re-referencing. Chemical shift referencing is a problem in protein nuclear magnetic resonance as >20% of reported NMR chemical shift assignments appear to be improperly referenced. For certain nuclei these referencing issues can cause systematic chemical shift errors of between 1.0 and 2.5 ppm. Chemical shift errors of this magnitude often make it very difficult to compare NMR chemical shift assignments between proteins. It also makes it very hard to structurally interpret chemical shifts. Unlike most other chemical shift re-referencing tools PANAV employs a structure-independent protocol. That is, with PANAV there is no need to know the structure of the protein in advance of correcting any chemical shift referencing errors. This makes PANAV particularly useful for NMR studies involving novel or newly assigned proteins, where the structure has yet to be determined. Indeed, this scenario represents the vast majority of assignment cases in biomolecular NMR. PANAV uses residue-specific and secondary structure-specific chemical shift distributions that were calculated over short fragments of correctly referenced proteins to identify mis-assigned resonances. More specifically, PANAV compares the initial chemical shift assignments to the expected chemical shifts based on their local sequence and expected/predicted secondary structure. In this way, PANAV is able to identify and re-reference mis-referenced chemical shift assignments. PANAV can also identify potentially mis-assigned resonances as well. PANAV has been extensively tested and compared against a large number of existing re-referencing or mis-assignment detection programs. These assessments indicate that PANAV is equal to or superior to existing approaches.
PREDITOR is a freely available web-server for the prediction of protein torsion angles from chemical shifts. For many years it has been known that protein chemical shifts are sensitive to protein secondary structure, which in turn, is sensitive to backbone torsion angles. torsion angles are internal coordinates that can be used to describe the conformation of a polypeptide chain. They can also be used as constraints to help determine or refine protein structures via NMR spectroscopy. In proteins there are four major torsion angles of interest: phi, psi, omega and chi-1. Traditionally protein NMR spectroscopists have used vicinal J-coupling information and the Karplus relation to determine approximate backbone torsion angle constraints for phi and chi-1 angles. However, several studies in the early 1990s pointed out the strong relationship between 1H and 13C chemical shifts and torsion angles, especially with backbone phi and psi angles. Later a number of other papers pointed out additional chemical shift relationships with chi-1 and even omega angles. PREDITOR was designed to exploit these experimental observations and to help NMR spectroscopists easily predict protein torsion angles from chemical shift assignments. Specifically, PREDITOR accepts protein sequence and/or chemical shift data as input and generates torsion angle predictions for phi, psi, omega and chi-1 angles. The algorithm that PREDITOR uses combines sequence alignment, chemical shift alignment and a number of related chemical shift analysis techniques to predict torsion angles. PREDITOR is unusually fast and exhibits a very high level of accuracy. In a series of tests 88% of PREDITOR’s phi/psi predictions were within 30 degrees of the correct values, 84% of chi-1 predictions were correct and 99.97% of PREDITOR’s predicted omega angles were correct. PREDITOR also estimates the torsion angle errors so that its torsion angle constraints can be used with standard protein structure refinement software, such as CYANA, CNS, XPLOR and AMBER. PREDITOR also supports automated protein chemical shift re-referencing and the prediction of proline cis/trans states. PREDITOR is not the only torsion angle prediction software available. Several other computer programs including TALOS, TALOS+ and DANGLE have also been developed to predict backbone torsion angles from protein chemical shifts. These stand-alone programs exhibit similar prediction performance to PREDITOR but are substantially slower.
ShiftX is a freely available web server for rapidly calculating protein chemical shifts from protein X-ray coordinates. Protein chemical shift prediction is particularly useful in verifying protein chemical shift assignments, adjusting mis-referenced chemical shifts, refining NMR protein structures and assisting with the NMR assignment of unassigned proteins that have either had their structures determined by X-ray or NMR methods.
Nitrogen-15 nuclear magnetic resonance spectroscopy is a version of nuclear magnetic resonance spectroscopy that examines samples containing the 15N nucleus. 15N NMR differs in several ways from the more common 13C and 1H NMR. To circumvent the difficulties associated with measurement of the quadrupolar, spin-1 14N nuclide, 15N NMR is employed in samples for detection since it has a ground-state spin of ½. Since14N is 99.64% abundant, incorporation of 15N into samples often requires novel synthetic techniques.
David S. Wishart is a Canadian researcher and a Distinguished University Professor in the Department of Biological Sciences and the Department of Computing Science at the University of Alberta. Wishart also holds cross appointments in the Faculty of Pharmacy and Pharmaceutical Sciences and the Department of Laboratory Medicine and Pathology in the Faculty of Medicine and Dentistry. Additionally, Wishart holds a joint appointment in metabolomics at the Pacific Northwest National Laboratory in Richland, Washington. Wishart is well known for his pioneering contributions to the fields of protein NMR spectroscopy, bioinformatics, cheminformatics and metabolomics. In 2011, Wishart founded the Metabolomics Innovation Centre (TMIC), which is Canada's national metabolomics laboratory.