DSSP (algorithm)

DSSP
Original author(s)	Wolfgang Kabsch, Chris Sander
Developer(s)	Maarten Hekkelman
Initial release	1983
Stable release	4.4 / 19 July 2023;6 months ago
Repository	github.com/PDB-REDO/dssp
Written in	C++
Operating system	Linux, Windows
License	BSD-2-clause license
Website	pdb-redo.eu/dssp/

Last updated January 31, 2024

The DSSP algorithm is the standard method for assigning secondary structure to the amino acids of a protein, given the atomic-resolution coordinates of the protein. The abbreviation is only mentioned once in the 1983 paper describing this algorithm,^[2] where it is the name of the Pascal program that implements the algorithm Define Secondary Structure of Proteins.

Algorithm

DSSP begins by identifying the intra-backbone hydrogen bonds of the protein using a purely electrostatic definition, assuming partial charges of −0.42 e and +0.20 e to the carbonyl oxygen and amide hydrogen respectively, their opposites assigned to the carbonyl carbon and amide nitrogen. A hydrogen bond is identified if E in the following equation is less than -0.5 kcal/mol:

E=0.084\left\{{\frac {1}{r_{ON}}}+{\frac {1}{r_{CH}}}-{\frac {1}{r_{OH}}}-{\frac {1}{r_{CN}}}\right\}\cdot 332\,\mathrm {kcal/mol}

where the $r_{AB}$ terms indicate the distance between atoms A and B, taken from the carbon (C) and oxygen (O) atoms of the C=O group and the nitrogen (N) and hydrogen (H) atoms of the N-H group.

Based on this, nine types of secondary structure are assigned. The 3₁₀ helix, α helix and π helix have symbols G, H and I and are recognized by having a repetitive sequence of hydrogen bonds in which the residues are three, four, or five residues apart respectively. Two types of beta sheet structures exist; a beta bridge has symbol B while longer sets of hydrogen bonds and beta bulges have symbol E. T is used for turns, featuring hydrogen bonds typical of helices, S is used for regions of high curvature (where the angle between ${\overrightarrow {C_{i}^{\alpha }C_{i+2}^{\alpha }}}$ and ${\overrightarrow {C_{i-2}^{\alpha }C_{i}^{\alpha }}}$ is at least 70°). As of DSSP version 4, PPII helices are also detected based on a combination of backbone torsion angles and the absence of hydrogen bonds compatible with other types. PPII helices have symbol P. A blank (or space) is used if no other rule applies, referring to loops.^[3] These eight types are usually grouped into three larger classes: helix (G, H and I), strand (E and B) and loop (S, T, and C, where C sometimes is represented also as blank space).

π helices

In the original DSSP algorithm, residues were preferentially assigned to α helices, rather than π helices. In 2011, it was shown that DSSP failed to annotate many "cryptic" π helices, which are commonly flanked by α helices.^[4] In 2012, DSSP was rewritten so that the assignment of π helices was given preference over α helices, resulting in better detection of π helices.^[3] Versions of DSSP from 2.1.0 onwards therefore produce slightly different output from older versions.

Variants

In 2002, a continuous DSSP assignment was developed by introducing multiple hydrogen bond thresholds, where the new assignment was found to correlate with protein motion.^[5]

Related Research Articles

<span class="mw-page-title-main">Alpha helix</span> Type of secondary structure of proteins

An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.

<span class="mw-page-title-main">Beta sheet</span> Protein structural motif

The beta sheet is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.

In chemistry, a hydrogen bond is primarily an electrostatic force of attraction between a hydrogen (H) atom which is covalently bonded to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a lone pair of electrons—the hydrogen bond acceptor (Ac). Such an interacting system is generally denoted Dn−H···Ac, where the solid line denotes a polar covalent bond, and the dotted or dashed line indicates the hydrogen bond. The most frequent donor and acceptor atoms are the period 2 elements nitrogen (N), oxygen (O), and fluorine (F).

Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthesis is most commonly performed by ribosomes in cells. Peptides can also be synthesized in the laboratory. Protein primary structures can be directly sequenced, or inferred from DNA sequences.

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

Circular dichroism (CD) is dichroism involving circularly polarized light, i.e., the differential absorption of left- and right-handed light. Left-hand circular (LHC) and right-hand circular (RHC) polarized light represent two possible spin angular momentum states for a photon, and so circular dichroism is also referred to as dichroism for spin angular momentum. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. Circular dichroism and circular birefringence are manifestations of optical activity. It is exhibited in the absorption bands of optically active chiral molecules. CD spectroscopy has a wide range of applications in many different fields. Most notably, UV CD is used to investigate the secondary structure of proteins. UV/Vis CD is used to investigate charge-transfer transitions. Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→d transitions. Vibrational circular dichroism, which uses light from the infrared energy region, is used for structural studies of small organic molecules, and most recently proteins and DNA.

In spectroscopy, the Rydberg constant, symbol $for heavy atoms or for hydrogen, named after the Swedish physicist Johannes Rydberg, is a physical constant relating to the electromagnetic spectra of an atom. The constant first arose as an empirical fitting parameter in the Rydberg formula for the hydrogen spectral series, but Niels Bohr later showed that its value could be calculated from more fundamental constants according to his model of the atom.$

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

In physics the Lamb shift, named after Willis Lamb, refers to an anomalous difference in energy between two electron orbitals in a hydrogen atom. The difference was not predicted by theory and it cannot be derived from the Dirac equation, which predicts identical energies. Hence the Lamb shift refers to a deviation from theory seen in the differing energies contained by the ²S_1/2 and ²P_1/2 orbitals of the hydrogen atom.

Polarizability usually refers to the tendency of matter, when subjected to an electric field, to acquire an electric dipole moment in proportion to that applied field. It is a property of particles with an electric charge. When subject to an electric field, the negatively charged electrons and positively charged atomic nuclei are subject to opposite forces and undergo charge separation. Polarizability is responsible for a material's dielectric constant and, at high (optical) frequencies, its refractive index.

The Hückel method or Hückel molecular orbital theory, proposed by Erich Hückel in 1930, is a simple method for calculating molecular orbitals as linear combinations of atomic orbitals. The theory predicts the molecular orbitals for π-electrons in π-delocalized molecules, such as ethylene, benzene, butadiene, and pyridine. It provides the theoretical basis for Hückel's rule that cyclic, planar molecules or ions with $π-electrons are aromatic. It was later extended to conjugated molecules such as pyridine, pyrrole and furan that contain atoms other than carbon and hydrogen (heteroatoms). A more dramatic extension of the method to include σ-electrons, known as the extended Hückel method (EHM), was developed by Roald Hoffmann. The extended Hückel method gives some degree of quantitative accuracy for organic molecules in general and was used to provide computational justification for the Woodward-Hoffmann rules. To distinguish the original approach from Hoffmann's extension, the Hückel method is also known as the simple Hückel method (SHM).$

<span class="mw-page-title-main">Protein contact map</span>

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues $and, the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the C α -C α atom with threshold 6-12 Å; distance between C β -C β atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.$

A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.

A polyproline helix is a type of protein secondary structure which occurs in proteins comprising repeating proline residues. A left-handed polyproline II helix is formed when sequential residues all adopt (φ,ψ) backbone dihedral angles of roughly and have trans isomers of their peptide bonds. This PPII conformation is also common in proteins and polypeptides with other amino acids apart from proline. Similarly, a more compact right-handed polyproline I helix is formed when sequential residues all adopt (φ,ψ) backbone dihedral angles of roughly and have cis isomers of their peptide bonds. Of the twenty common naturally occurring amino acids, only proline is likely to adopt the cis isomer of the peptide bond, specifically the X-Pro peptide bond; steric and electronic factors heavily favor the trans isomer in most other peptide bonds. However, peptide bonds that replace proline with another N-substituted amino acid are also likely to adopt the cis isomer.

A pi helix is a type of secondary structure found in proteins. Discovered by crystallographer Barbara Low in 1952 and once thought to be rare, short π-helices are found in 15% of known protein structures and are believed to be an evolutionary adaptation derived by the insertion of a single amino acid into an α-helix. Because such insertions are highly destabilizing, the formation of π-helices would tend to be selected against unless it provided some functional advantage to the protein. π-helices therefore are typically found near functional sites of proteins.

A 3₁₀ helix is a type of secondary structure found in proteins and polypeptides. Of the numerous protein secondary structures present, the 3₁₀-helix is the fourth most common type observed; following α-helices, β-sheets and reverse turns. 3₁₀-helices constitute nearly 10–15% of all helices in protein secondary structures, and are typically observed as extensions of α-helices found at either their N- or C- termini. Because of the α-helices tendency to consistently fold and unfold, it has been proposed that the 3₁₀-helix serves as an intermediary conformation of sorts, and provides insight into the initiation of α-helix folding.

<span class="mw-page-title-main">Davydov soliton</span> Quasiparticle used to model vibrations within proteins

In quantum biology, the Davydov soliton is a quasiparticle representing an excitation propagating along the self-trapped amide I groups within the α-helices of proteins. It is a solution of the Davydov Hamiltonian.

Helix–coil transition models are formalized techniques in statistical mechanics developed to describe conformations of linear polymers in solution. The models are usually but not exclusively applied to polypeptides as a measure of the relative fraction of the molecule in an alpha helix conformation versus turn or random coil. The main attraction in investigating alpha helix formation is that one encounters many of the features of protein folding but in their simplest version. Most of the helix–coil models contain parameters for the likelihood of helix nucleation from a coil region, and helix propagation along the sequence once nucleated; because polypeptides are directional and have distinct N-terminal and C-terminal ends, propagation parameters may differ in each direction.

In polymer science, the Lifson–Roig model is a helix-coil transition model applied to the alpha helix-random coil transition of polypeptides; it is a refinement of the Zimm–Bragg model that recognizes that a polypeptide alpha helix is only stabilized by a hydrogen bond only once three consecutive residues have adopted the helical conformation. To consider three consecutive residues each with two states, the Lifson–Roig model uses a 4x4 transfer matrix instead of the 2x2 transfer matrix of the Zimm–Bragg model, which considers only two consecutive residues. However, the simple nature of the coil state allows this to be reduced to a 3x3 matrix for most applications.

Volume, Area, Dihedral Angle Reporter (VADAR) is a freely available protein structure validation web server that was developed as a collaboration between Dr. Brian Sykes and Dr. David Wishart at the University of Alberta. VADAR consists of over 15 different algorithms and programs for assessing and validating peptide and protein structures from their PDB coordinate data. VADAR is capable of determining secondary structure, identifying and classifying six different types of beta turns, determining and calculating the strength of C=O -- N-H hydrogen bonds, calculating residue-specific accessible surface areas (ASA), calculating residue volumes, determining backbone and side chain torsion angles, assessing local structure quality, evaluating global structure quality, and identifying residue "outliers". The results have been validated through extensive comparison to published data and careful visual inspection. VADAR produces both text and graphical output with most of the quantitative data presented in easily viewed tables. In particular, VADAR's output is presented in a vertical, tabular format with most of the sequence data, residue numbering and any other calculated property or feature presented from top to bottom, rather than from left to right.

References

↑ "DSSP".
↑ Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–637. doi:10.1002/bip.360221211. PMID 6667333. S2CID 29185760.
1 2 "DSSP manual Archived 2015-05-22 at the Wayback Machine "
↑ Cooley RB, Arp DJ, Karplus PA (2010). "Evolutionary origin of a secondary structure: π-helices as cryptic but widespread insertional variations of α-helices enhancing protein functionality". J Mol Biol. 404 (2): 232–246. doi:10.1016/j.jmb.2010.09.034. PMC 2981643 . PMID 20888342.
↑ Andersen CA, Palmer AG, Brunak S, Rost B (2002). "Continuum secondary structure captures protein flexibility". Structure. 10 (2): 175–184. doi: 10.1016/S0969-2126(02)00700-1 . PMID 11839303.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "DSSP".

[Kabsch1983-2] Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–637. doi:10.1002/bip.360221211. PMID 6667333. S2CID 29185760.

[DSSPmanual-3] 1 2 "DSSP manual Archived 2015-05-22 at the Wayback Machine "

[pmid20888342-4] Cooley RB, Arp DJ, Karplus PA (2010). "Evolutionary origin of a secondary structure: π-helices as cryptic but widespread insertional variations of α-helices enhancing protein functionality". J Mol Biol. 404 (2): 232–246. doi:10.1016/j.jmb.2010.09.034. PMC 2981643 . PMID 20888342.

[Andersen2002-5] Andersen CA, Palmer AG, Brunak S, Rost B (2002). "Continuum secondary structure captures protein flexibility". Structure. 10 (2): 175–184. doi: 10.1016/S0969-2126(02)00700-1 . PMID 11839303.

[1]

[2]

[3]

[4]

[5]