Structure-based assignment

Last updated July 19, 2023

Structure-Based Assignment (SBA) is a technique to accelerate the resonance assignment which is a key bottleneck of NMR (Nuclear magnetic resonance) structural biology.^[1] A homologous (similar) protein is used as a template to the target protein in SBA. This template protein provides prior structural information about the target protein and leads to faster resonance assignment . By analogy, in X-ray Crystallography, the molecular replacement technique allows solution of the crystallographic phase problem when a homologous structural model is known, thereby facilitating rapid structure determination.^[2] Some of the SBA algorithms are CAP which is an RNA assignment algorithm which performs an exhaustive search over all permutations,^[3] MARS which is a program for robust automatic backbone assignment ^[4] and Nuclear Vector Replacement (NVR) which is a molecular replacement like approach for SBA of resonances and sparse Nuclear Overhauser Effect (NOE)'s.^[5]^[6]^[7]

Related Research Articles

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, Angela Gronenborn at the NIH, and Gerhard Wagner at Harvard University, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.

X-PLOR is a computer software package for computational structural biology originally developed by Axel T. Brunger at Yale University. It was first published in 1987 as an offshoot of CHARMM - a similar program that ran on supercomputers made by Cray Inc. It is used in the fields of X-ray crystallography and nuclear magnetic resonance spectroscopy of proteins (NMR) analysis.

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

The residual dipolar coupling between two spins in a molecule occurs if the molecules in solution exhibit a partial alignment leading to an incomplete averaging of spatially anisotropic dipolar couplings.

Adriaan "Ad" Bax is a Dutch-American molecular biophysicist. He was born in the Netherlands and is the Chief of the Section on Biophysical NMR Spectroscopy at the National Institutes of Health. He is known for his work on the methodology of biomolecular NMR spectroscopy.

<i>Journal of Biomolecular NMR</i> Academic journal

The Journal of Biomolecular NMR publishes research on technical developments and innovative applications of nuclear magnetic resonance spectroscopy for the study of structure and dynamic properties of biopolymers in solution, liquid crystals, solids and mixed environments. Some of the main topics include experimental and computational approaches for the determination of three-dimensional structures of proteins and nucleic acids, advancements in the automated analysis of NMR spectra, and new methods to probe and interpret molecular motions.

Bruce Randall Donald is an American computer scientist and computational biologist. He is the James B. Duke Professor of Computer Science and Biochemistry at Duke University. He has made numerous contributions to several fields in Computer Science such as robotics, Microelectromechanical Systems (MEMS), Geometric & physical algorithms and computational geometry, as well as in areas of Structural Molecular Biology & Biochemistry such as Protein design, Protein Structure Determination and Computational Chemistry.

Nucleic acid NMR is the use of nuclear magnetic resonance spectroscopy to obtain information about the structure and dynamics of nucleic acid molecules, such as DNA or RNA. It is useful for molecules of up to 100 nucleotides, and as of 2003, nearly half of all known RNA structures had been determined by NMR spectroscopy.

<span class="mw-page-title-main">Macromolecular assembly</span>

The term macromolecular assembly (MA) refers to massive chemical structures such as viruses and non-biologic nanoparticles, cellular organelles and membranes and ribosomes, etc. that are complex mixtures of polypeptide, polynucleotide, polysaccharide or other polymeric macromolecules. They are generally of more than one of these types, and the mixtures are defined spatially, and with regard to their underlying chemical composition and structure. Macromolecules are found in living and nonliving things, and are composed of many hundreds or thousands of atoms held together by covalent bonds; they are often characterized by repeating units. Assemblies of these can likewise be biologic or non-biologic, though the MA term is more commonly applied in biology, and the term supramolecular assembly is more often applied in non-biologic contexts. MAs of macromolecules are held in their defined forms by non-covalent intermolecular interactions, and can be in either non-repeating structures, or in repeating linear, circular, spiral, or other patterns. The process by which MAs are formed has been termed molecular self-assembly, a term especially applied in non-biologic contexts. A wide variety of physical/biophysical, chemical/biochemical, and computational methods exist for the study of MA; given the scale of MAs, efforts to elaborate their composition and structure and discern mechanisms underlying their functions are at the forefront of modern structure science.

CS-ROSETTA is a framework for structure calculation of biological macromolecules on the basis of conformational information from NMR, which is built on top of the biomolecular modeling and design software called ROSETTA. The name CS-ROSETTA for this branch of ROSETTA stems from its origin in combining NMR chemical shift (CS) data with ROSETTA structure prediction protocols. The software package was later extended to include additional NMR conformational parameters, such as Residual Dipolar Couplings (RDC), NOE distance restraints, pseudocontact chemical shifts (PCS) and restraints derived from homologous proteins. This software can be used together with other molecular modeling protocols, such as docking to model protein oligomers. In addition, CS-ROSETTA can be combined with chemical shift resonance assignment algorithms to create a fully automated NMR structure determination pipeline. The CS-ROSETTA software is freely available for academic use and can be licensed for commercial use. A software manual and tutorials are provided on the supporting website https://csrosetta.chemistry.ucsc.edu/.

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

The chemical shift index or CSI is a widely employed technique in protein nuclear magnetic resonance spectroscopy that can be used to display and identify the location as well as the type of protein secondary structure found in proteins using only backbone chemical shift data The technique was invented by David S. Wishart in 1992 for analyzing ¹Hα chemical shifts and then later extended by him in 1994 to incorporate ¹³C backbone shifts. The original CSI method makes use of the fact that ¹Hα chemical shifts of amino acid residues in helices tends to be shifted upfield relative to their random coil values and downfield in beta strands. Similar kinds of upfield and downfield trends are also detectable in backbone ¹³C chemical shifts.

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.

Nuclear magnetic resonance chemical shift re-referencing is a chemical analysis method for chemical shift referencing in biomolecular nuclear magnetic resonance (NMR). It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy.

Protein chemical shift re-referencing is a post-assignment process of adjusting the assigned NMR chemical shifts to match IUPAC and BMRB recommended standards in protein chemical shift referencing. In NMR chemical shifts are normally referenced to an internal standard that is dissolved in the NMR sample. These internal standards include tetramethylsilane (TMS), 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) and trimethylsilyl propionate (TSP). For protein NMR spectroscopy the recommended standard is DSS, which is insensitive to pH variations. Furthermore, the DSS 1H signal may be used to indirectly reference 13C and 15N shifts using a simple ratio calculation [1]. Unfortunately, many biomolecular NMR spectroscopy labs use non-standard methods for determining the 1H, 13C or 15N “zero-point” chemical shift position. This lack of standardization makes it difficult to compare chemical shifts for the same protein between different laboratories. It also makes it difficult to use chemical shifts to properly identify or assign secondary structures or to improve their 3D structures via chemical shift refinement. Chemical shift re-referencing offers a means to correct these referencing errors and to standardize the reporting of protein chemical shifts across laboratories.

Probabilistic Approach for protein NMR Assignment Validation (PANAV) is a freely available stand-alone program that is used for protein chemical shift re-referencing. Chemical shift referencing is a problem in protein nuclear magnetic resonance as >20% of reported NMR chemical shift assignments appear to be improperly referenced. For certain nuclei these referencing issues can cause systematic chemical shift errors of between 1.0 and 2.5 ppm. Chemical shift errors of this magnitude often make it very difficult to compare NMR chemical shift assignments between proteins. It also makes it very hard to structurally interpret chemical shifts. Unlike most other chemical shift re-referencing tools PANAV employs a structure-independent protocol. That is, with PANAV there is no need to know the structure of the protein in advance of correcting any chemical shift referencing errors. This makes PANAV particularly useful for NMR studies involving novel or newly assigned proteins, where the structure has yet to be determined. Indeed, this scenario represents the vast majority of assignment cases in biomolecular NMR. PANAV uses residue-specific and secondary structure-specific chemical shift distributions that were calculated over short fragments of correctly referenced proteins to identify mis-assigned resonances. More specifically, PANAV compares the initial chemical shift assignments to the expected chemical shifts based on their local sequence and expected/predicted secondary structure. In this way, PANAV is able to identify and re-reference mis-referenced chemical shift assignments. PANAV can also identify potentially mis-assigned resonances as well. PANAV has been extensively tested and compared against a large number of existing re-referencing or mis-assignment detection programs. These assessments indicate that PANAV is equal to or superior to existing approaches.

Randy John Read is a Wellcome Trust Principal Research Fellow and professor of protein crystallography at the University of Cambridge.

David S. Wishart is a Canadian researcher and a Distinguished University Professor in the Department of Biological Sciences and the Department of Computing Science at the University of Alberta. Wishart also holds cross appointments in the Faculty of Pharmacy and Pharmaceutical Sciences and the Department of Laboratory Medicine and Pathology in the Faculty of Medicine and Dentistry. Additionally, Wishart holds a joint appointment in metabolomics at the Pacific Northwest National Laboratory in Richland, Washington. Wishart is well known for his pioneering contributions to the fields of protein NMR spectroscopy, bioinformatics, cheminformatics and metabolomics. In 2011, Wishart founded the Metabolomics Innovation Centre (TMIC), which is Canada's national metabolomics laboratory.

References

↑ Bartels, Christian; Billeter, Martin; Guentert, Peter; Wuethrich, Kurt (30 April 1996). "Automated sequence-specific NMR assignment of homologous proteins using the program GARANT". Journal of Biomolecular NMR . 7 (3): 207–13. doi:10.1007/BF00202037. PMID 22911044. S2CID 9450778.
↑ Rossman, M. G.; Blow, D. M. (1962), "The detection of sub-units within the crystallographic asymmetric unit", Acta Crystallogr. D, 15: 24–31, CiteSeerX 10.1.1.319.3019 , doi:10.1107/s0365110x62000067 .
↑ Al-Hashimi, H. M.; Gorin, A.; Majumdar, A.; Gosser, Y.; Patel, D. J. (2002), "Towards structural genomics of RNA: Rapid NMR resonance assignment and simultaneous RNA tertiary structure determination using residual dipolar couplings", J. Mol. Biol., 318 (3): 637–649, doi:10.1016/s0022-2836(02)00160-2, PMID 12054812 .
↑ Jung, Y.; Zweckstetter, M. (2004), "Mars - robust automatic backbone assignment of proteins", Journal of Biomolecular NMR, 30 (1): 11–23, doi:10.1023/b:jnmr.0000042954.99056.ad, hdl: 11858/00-001M-0000-0012-EC52-9 , PMID 15452431, S2CID 3006904 .
↑ Langmead, C. J.; Yan, A.; Lilien, R.; Wang, L.; Donald, B. R. (2004), "A polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments", J. Comp. Bio., 11 (2–3): 277–98, CiteSeerX 10.1.1.15.8054 , doi:10.1089/1066527041410436, PMID 15285893 .
↑ Langmead, C. J.; Donald, B. R. (2004), "An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments", J. Comp. Bio., 29 (2): 111–138, CiteSeerX 10.1.1.630.1110 , doi:10.1023/b:jnmr.0000019247.89110.e6, PMID 15014227, S2CID 12443551 .
↑ Apaydin, M. S.; Catay, B.; Patrick, N.; Donald, B. R. (2010), "NVR-BIP: nuclear vector replacement using binary integer programming for NMR structure-based assignments", The Computer Journal, 54 (January): 708–716, doi:10.1093/comjnl/bxp120, PMC 4287374 , PMID 25580019 .

This nuclear magnetic resonance–related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Bartels, Christian; Billeter, Martin; Guentert, Peter; Wuethrich, Kurt (30 April 1996). "Automated sequence-specific NMR assignment of homologous proteins using the program GARANT". Journal of Biomolecular NMR . 7 (3): 207–13. doi:10.1007/BF00202037. PMID 22911044. S2CID 9450778.

[2] Rossman, M. G.; Blow, D. M. (1962), "The detection of sub-units within the crystallographic asymmetric unit", Acta Crystallogr. D, 15: 24–31, CiteSeerX 10.1.1.319.3019 , doi:10.1107/s0365110x62000067 .

[3] Al-Hashimi, H. M.; Gorin, A.; Majumdar, A.; Gosser, Y.; Patel, D. J. (2002), "Towards structural genomics of RNA: Rapid NMR resonance assignment and simultaneous RNA tertiary structure determination using residual dipolar couplings", J. Mol. Biol., 318 (3): 637–649, doi:10.1016/s0022-2836(02)00160-2, PMID 12054812 .

[4] Jung, Y.; Zweckstetter, M. (2004), "Mars - robust automatic backbone assignment of proteins", Journal of Biomolecular NMR, 30 (1): 11–23, doi:10.1023/b:jnmr.0000042954.99056.ad, hdl: 11858/00-001M-0000-0012-EC52-9 , PMID 15452431, S2CID 3006904 .

[5] Langmead, C. J.; Yan, A.; Lilien, R.; Wang, L.; Donald, B. R. (2004), "A polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments", J. Comp. Bio., 11 (2–3): 277–98, CiteSeerX 10.1.1.15.8054 , doi:10.1089/1066527041410436, PMID 15285893 .

[6] Langmead, C. J.; Donald, B. R. (2004), "An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments", J. Comp. Bio., 29 (2): 111–138, CiteSeerX 10.1.1.630.1110 , doi:10.1023/b:jnmr.0000019247.89110.e6, PMID 15014227, S2CID 12443551 .

[7] Apaydin, M. S.; Catay, B.; Patrick, N.; Donald, B. R. (2010), "NVR-BIP: nuclear vector replacement using binary integer programming for NMR structure-based assignments", The Computer Journal, 54 (January): 708–716, doi:10.1093/comjnl/bxp120, PMC 4287374 , PMID 25580019 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]