Discrete optimized protein energy

Last updated

DOPE, or Discrete Optimized Protein Energy, [1] is a statistical potential used to assess homology models in protein structure prediction. DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. It is implemented in the popular homology modeling program MODELLER and used to assess the energy of the protein model generated through many iterations by MODELLER, which produces homology models by the satisfaction of spatial restraints. The models returning the minimum molpdfs can be chosen as best probable structures and can be further used for evaluating with the DOPE score. Like the current version of the MODELLER software, DOPE is implemented in Python and is run within the MODELLER environment. The DOPE method is generally used to assess the quality of a structure model as a whole. Alternatively, DOPE can also generate a residue-by-residue energy profile for the input model, making it possible for the user to spot the problematic region in the structure model.

Related Research Articles

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

Lattice proteins are highly simplified models of protein-like heteropolymer chains on lattice conformational space which are used to investigate protein folding. Simplification in lattice proteins is twofold: each whole residue is modeled as a single "bead" or "point" of a finite set of types, and each residue is restricted to be placed on vertices of a lattice. To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice. Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The contact order of a protein is a measure of the locality of the inter-amino acid contacts in the protein's native state tertiary structure. It is calculated as the average sequence distance between residues that form native contacts in the folded protein divided by the total length of the protein. Higher contact orders indicate longer folding times, and low contact order has been suggested as a predictor of potential downhill folding, or protein folding that occurs without a free energy barrier. This effect is thought to be due to the lower loss of conformational entropy associated with the formation of local as opposed to nonlocal contacts.

<span class="mw-page-title-main">Statistical potential</span>

In protein structure prediction, statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

ESyPred3D is an automated homology modeling program. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three-dimensional structure is built using the modeling package MODELLER.

RAPTOR is protein threading software used for protein structure prediction. It has been replaced by RaptorX, which is much more accurate than RAPTOR.

Protein backbone fragment libraries have been used successfully in a variety of structural biology applications, including homology modeling, de novo structure prediction, and structure determination. By reducing the complexity of the search space, these fragment libraries enable more rapid search of conformational space, leading to more efficient and accurate models.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

<span class="mw-page-title-main">GeNMR</span>

GeNMR method is the first fully automated template-based method of protein structure determination that utilizes both NMR chemical shifts and NOE -based distance restraints.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

<span class="mw-page-title-main">AlphaFold</span> Artificial intelligence program by DeepMind

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.

<span class="mw-page-title-main">Backbone-dependent rotamer library</span> Collection of data on conformations of a given proteins amino acid side chains

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

References

  1. Shen, Min-yi; Sali, Andrej (2006-11-01). "Statistical potential for assessment and prediction of protein structures". Protein Science. 15 (11): 2507–2524. doi:10.1110/ps.062416606. ISSN   1469-896X. PMC   2242414 . PMID   17075131.