Intrinsically disordered proteins

Last updated
Conformational flexibility in SUMO-1 protein (PDB:1a5r). The central part shows relatively ordered structure. Conversely, the N- and C-terminal regions (left and right, respectively) show 'intrinsic disorder', although a short helical region persists in the N-terminal tail. Ten alternative NMR models were morphed. Secondary structure elements: a-helices (red), b-strands (blue arrows). 1a5r SUMO-1 protein.gif
Conformational flexibility in SUMO-1 protein (PDB:1a5r). The central part shows relatively ordered structure. Conversely, the N- and C-terminal regions (left and right, respectively) show ‘intrinsic disorder’, although a short helical region persists in the N-terminal tail. Ten alternative NMR models were morphed. Secondary structure elements: α-helices (red), β-strands (blue arrows).

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, [2] [3] [4] typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins. [5]

Contents

IDPs are a very large and functionally important class of proteins and their discovery has disproved the idea that three-dimensional structures of proteins must be fixed to accomplish their biological functions. For example, IDPs have been identified to participate in weak multivalent interactions that are highly cooperative and dynamic, lending them importance in DNA regulation and in cell signaling. [6] [7] Many IDPs can also adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinctive function, structure, sequence, interactions, evolution and regulation. [8]

History

An ensemble of NMR structures of the Thylakoid soluble phosphoprotein TSP9, which shows a largely flexible protein chain. PDB 2fft EBI.jpg
An ensemble of NMR structures of the Thylakoid soluble phosphoprotein TSP9, which shows a largely flexible protein chain.

In the 1930s-1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. These publications solidified the central dogma of molecular biology in that the amino acid sequence of a protein determines its structure which, in turn, determines its function. In 1950, Karush wrote about 'Configurational Adaptability' contradicting this assumption. He was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal's paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. microseconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen's Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such "ordered" proteins. [10]

During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered". Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles.

In 2001, Dunker questioned whether the newly found information was ignored for 50 years [11] with more quantitative analyses becoming available in the 2000s. [12] In the 2010s it became clear that IDPs are common among disease-related proteins, such as alpha-synuclein and tau. [13]

Abundance

It is now generally accepted that proteins exist as an ensemble of similar structures with some regions more constrained than others. IDPs occupy the extreme end of this spectrum of flexibility and include proteins of considerable local structure tendency or flexible multidomain assemblies. [14] [15]

Intrinsic disorder is particularly elevated among proteins that regulate chromatin and transcription, [16] and bioinformatic predictions indicate that is more common in genomes and proteomes than in known structures in the protein database. Based on DISOPRED2 prediction, long (>30 residue) disordered segments occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins, [12] including certain disease-related proteins. [13]

Biological roles

Highly dynamic disordered regions of proteins have been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis. [14] [15] Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. [17] Intrinsic disorder is particularly enriched in proteins implicated in cell signaling and transcription, [16] as well as chromatin remodeling functions. [18] [19] Genes that have recently been born de novo tend to have higher disorder. [20] [21] In animals, genes with high disorder are lost at higher rates during evolution. [22]

Flexible linkers

Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery. [14] [2] The flexible linker of FBP25 which connects two domains of FKBP25 is important for the binding of FKBP25 with DNA. [23]

Linear motifs

Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). [16] Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity (not rarely by several orders of magnitude) of individual linear motifs for specific interactions. Relatively rapid evolution and a relatively small number of structural restraints for establishing novel (low-affinity) interfaces make it particularly challenging to detect linear motifs but their widespread biological roles and the fact that many viruses mimick/hijack linear motifs to efficiently recode infected cells underlines the timely urgency of research on this very challenging and exciting topic.

Pre-structured motifs

Unlike globular proteins, IDPs do not have spatially-disposed active pockets. Fascinatingly, 80% of target-unbound IDPs (~4 dozens) subjected to detailed structural characterization by NMR possess linear motifs termed PresMos (pre-structured motifs) [24] that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PresMos are the putative active sites in IDPs.

Coupled folding and binding

Many unstructured proteins undergo transitions to more ordered states upon binding to their targets (e.g. Molecular Recognition Features (MoRFs) [25] ). The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. [26] Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc. [27]

The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example Short Linear Motifs are over-represented in disordered proteins. Disordered proteins and short linear motifs are particularly abundant in many RNA viruses such as Hendra virus, HCV, HIV-1 and human papillomaviruses. This enables such viruses to overcome their informationally limited genomes by facilitating binding, and manipulation of, a large number of host cell proteins. [28] [29]

Disorder in the bound state (fuzzy complexes)

Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexes structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. [30] Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. [31] Some fuzzy complexes may exhibit high binding affinity, [32] although other studies showed different affinity values for the same system in a different concentration regime. [33]

Structural aspects

Intrinsically disordered proteins adapt many different structures in vivo according to the cell's conditions, creating a structural or conformational ensemble. [34] [35]

Therefore, their structures are strongly function-related. However, only few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. The term intrinsically disordered protein (IDP) therefore includes proteins that contain IDRs as well as fully disordered proteins.

The existence and kind of protein disorder is encoded in its amino acid sequence. [2] In general, IDPs are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity. [34] This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues. [35] Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding (refer to biological roles). Many disordered proteins reveal regions without any regular secondary structure. These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles. [35] The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs. [35] Many disordered proteins also reveal low complexity sequences, i.e. sequences with over-representation of a few residues. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure.

Due to the disordered nature of these proteins, topological approaches have been developed to search for conformational patterns in their dynamics. For instance, circuit topology has been applied to track the dynamics of disordered protein domains. [36] By employing a topological approach, one can categorize motifs according to their topological buildup and the timescale of their formation.

Experimental validation

IDPs can be validated in several contexts. Most approaches for experimental validation of IDPs are restricted to extracted or purified proteins while some new experimental strategies aim to explore in vivo conformations and structural variations of IDPs inside intact living cells and systematic comparisons between their dynamics in vivo and in vitro.

In vivo approaches

The first direct evidence for in vivo persistence of intrinsic disorder has been achieved by in-cell NMR upon electroporation of a purified IDP and recovery of cells to an intact state. [37]

Larger-scale in vivo validation of IDR predictions is now possible using biotin 'painting'. [38] [39]

In vitro approaches

Intrinsically unfolded proteins, once purified, can be identified by various experimental methods. The primary method to obtain information on disordered regions of a protein is NMR spectroscopy. The lack of electron density in X-ray crystallographic studies may also be a sign of disorder.

Folded proteins have a high density (partial specific volume of 0.72-0.74 mL/g) and commensurately small radius of gyration. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or hydrodynamic drag, such as size exclusion chromatography, analytical ultracentrifugation, small angle X-ray scattering (SAXS), and measurements of the diffusion constant. Unfolded proteins are also characterized by their lack of secondary structure, as assessed by far-UV (170-250 nm) circular dichroism (esp. a pronounced minimum at ~200 nm) or infrared spectroscopy. Unfolded proteins also have exposed backbone peptide groups exposed to solvent, so that they are readily cleaved by proteases, undergo rapid hydrogen-deuterium exchange and exhibit a small dispersion (<1 ppm) in their 1H amide chemical shifts as measured by NMR. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.) Recently, new methods including Fast parallel proteolysis (FASTpp) have been introduced, which allow to determine the fraction folded/disordered without the need for purification. [40] [41] Even subtle differences in the stability of missense mutations, protein partner binding and (self)polymerisation-induced folding of (e.g.) coiled-coils can be detected using FASTpp as recently demonstrated using the tropomyosin-troponin protein interaction. [42] Fully unstructured protein regions can be experimentally validated by their hypersusceptibility to proteolysis using short digestion times and low protease concentrations. [43]

Bulk methods to study IDP structure and dynamics include SAXS for ensemble shape information, NMR for atomistic ensemble refinement, Fluorescence for visualising molecular interactions and conformational transitions, x-ray crystallography to highlight more mobile regions in otherwise rigid protein crystals, cryo-EM to reveal less fixed parts of proteins, light scattering to monitor size distributions of IDPs or their aggregation kinetics, NMR chemical shift and Circular Dichroism to monitor secondary structure of IDPs.

Single-molecule methods to study IDPs include spFRET [44] to study conformational flexibility of IDPs and the kinetics of structural transitions, optical tweezers [45] for high-resolution insights into the ensembles of IDPs and their oligomers or aggregates, nanopores [46] to reveal global shape distributions of IDPs, magnetic tweezers [47] to study structural transitions for long times at low forces, high-speed AFM [48] to visualise the spatio-temporal flexibility of IDPs directly.

Disorder annotation

REMARK465 - missing electron densities in X-ray structure representing protein disorder (PDB: 1a22 , human growth hormone bound to receptor). Compilation of screenshots from PDB database and molecule representation via VMD. Blue and red arrows point to missing residues on receptor and growth hormone, respectively. Remark465 1a22 HUMAN GROWTH HORMONE BOUND TO SINGLE RECEPTOR.png
REMARK465 - missing electron densities in X-ray structure representing protein disorder ( PDB: 1a22 , human growth hormone bound to receptor). Compilation of screenshots from PDB database and molecule representation via VMD. Blue and red arrows point to missing residues on receptor and growth hormone, respectively.

Intrinsic disorder can be either annotated from experimental information or predicted with specialized software. Disorder prediction algorithms can predict Intrinsic Disorder (ID) propensity with high accuracy (approaching around 80%) based on primary sequence composition, similarity to unassigned segments in protein x-ray datasets, flexible regions in NMR studies and physico-chemical properties of amino acids.

Disorder databases

Databases have been established to annotate protein sequences with intrinsic disorder information. The DisProt database contains a collection of manually curated protein segments which have been experimentally determined to be disordered. MobiDB is a database combining experimentally curated disorder annotations (e.g. from DisProt) with data derived from missing residues in X-ray crystallographic structures and flexible regions in NMR structures.

Predicting IDPs by sequence

Separating disordered from ordered proteins is essential for disorder prediction. One of the first steps to find a factor that distinguishes IDPs from non-IDPs is to specify biases within the amino acid composition. The following hydrophilic, charged amino acids A, R, G, Q, S, P, E and K have been characterized as disorder-promoting amino acids, while order-promoting amino acids W, C, F, I, Y, V, L, and N are hydrophobic and uncharged. The remaining amino acids H, M, T and D are ambiguous, found in both ordered and unstructured regions. [2] A more recent analysis ranked amino acids by their propensity to form disordered regions as follows (order promoting to disorder promoting): W, F, Y, I, M, L, V, N, C, T, A, G, R, D, H, Q, K, S, E, P. [49] As it can be seen from the list, small, charged, hydrophilic residues often promote disorder, while large and hydrophobic residues promote order.

This information is the basis of most sequence-based predictors. Regions with little to no secondary structure, also known as NORS (NO Regular Secondary structure) regions, [50] and low-complexity regions can easily be detected. However, not all disordered proteins contain such low complexity sequences.

Prediction methods

Determining disordered regions from biochemical methods is very costly and time-consuming. Due to the variable nature of IDPs, only certain aspects of their structure can be detected, so that a full characterization requires a large number of different methods and experiments. This further increases the expense of IDP determination. In order to overcome this obstacle, computer-based methods are created for predicting protein structure and function. It is one of the main goals of bioinformatics to derive knowledge by prediction. Predictors for IDP function are also being developed, but mainly use structural information such as linear motif sites. [4] [51] There are different approaches for predicting IDP structure, such as neural networks or matrix calculations, based on different structural and/or biophysical properties.

Many computational methods exploit sequence information to predict whether a protein is disordered. [52] Notable examples of such software include IUPRED and Disopred. Different methods may use different definitions of disorder. Meta-predictors show a new concept, combining different primary predictors to create a more competent and exact predictor.

Due to the different approaches of predicting disordered proteins, estimating their relative accuracy is fairly difficult. For example, neural networks are often trained on different datasets. The disorder prediction category is a part of biannual CASP experiment that is designed to test methods according accuracy in finding regions with missing 3D structure (marked in PDB files as REMARK465, missing electron densities in X-ray structures).

Disorder and disease

Intrinsically unstructured proteins have been implicated in a number of diseases. [13] Aggregation of misfolded proteins is the cause of many synucleinopathies and toxicity as those proteins start binding to each other randomly and can lead to cancer or cardiovascular diseases. Thereby, misfolding can happen spontaneously because millions of copies of proteins are made during the lifetime of an organism. The aggregation of the intrinsically unstructured protein α-synuclein is thought to be responsible. The structural flexibility of this protein together with its susceptibility to modification in the cell leads to misfolding and aggregation. Genetics, oxidative and nitrative stress as well as mitochondrial impairment impact the structural flexibility of the unstructured α-synuclein protein and associated disease mechanisms. [53] Many key tumour suppressors have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions. Taking the cell's native defense mechanisms as a model drugs can be developed, trying to block the place of noxious substrates and inhibiting them, and thus counteracting the disease. [54]

Computer simulations

MD simulation of the Glutaredoxin 1 from Trypanosoma brucei. The globular thioredoxin fold is depicted in blue, while the disordered N-tail in green. According to the MD results, the disordered tail can be modulating the dynamics of the binding pocket. GRX1MD.gif
MD simulation of the Glutaredoxin 1 from Trypanosoma brucei. The globular thioredoxin fold is depicted in blue, while the disordered N-tail in green. According to the MD results, the disordered tail can be modulating the dynamics of the binding pocket.

Owing to high structural heterogeneity, NMR/SAXS experimental parameters obtained will be an average over a large number of highly diverse and disordered states (an ensemble of disordered states). Hence, to understand the structural implications of these experimental parameters, there is a necessity for accurate representation of these ensembles by computer simulations. All-atom molecular dynamic simulations can be used for this purpose but their use is limited by the accuracy of current force-fields in representing disordered proteins. Nevertheless, some force-fields have been explicitly developed for studying disordered proteins by optimising force-field parameters using available NMR data for disordered proteins. (examples are CHARMM 22*, CHARMM 32, [56] Amber ff03* etc.)

MD simulations restrained by experimental parameters (restrained-MD) have also been used to characterise disordered proteins. [57] [58] [59] In principle, one can sample the whole conformational space given an MD simulation (with accurate Force-field) is run long enough. Because of very high structural heterogeneity, the time scales that needs to be run for this purpose are very large and are limited by computational power. However, other computational techniques such as accelerated-MD simulations, [60] replica exchange simulations, [61]

[62] metadynamics, [63] [64] multicanonical MD simulations, [65] or methods using coarse-grained representation with implicit and explicit solvents [66] [67] [68] have been used to sample broader conformational space in smaller time scales.

Moreover, various protocols and methods of analyzing IDPs, such as studies based on quantitative analysis of GC content in genes and their respective chromosomal bands, have been used to understand functional IDP segments. [69] [70]

See also

Related Research Articles

<span class="mw-page-title-main">Protein</span> Biomolecule consisting of chains of amino acid residues

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

<span class="mw-page-title-main">Protein folding</span> Change of a linear protein chain to a 3D structure

Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

An epitope, also known as antigenic determinant, is the part of an antigen that is recognized by the immune system, specifically by antibodies, B cells, or T cells. The part of an antibody that binds to the epitope is called a paratope. Although epitopes are usually non-self proteins, sequences derived from the host that can be recognized are also epitopes.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

<span class="mw-page-title-main">Conformational change</span> Change in the shape of a macromolecule, often induced by environmental factors

In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Protein dynamics</span> Study of how proteins move and change shape

In molecular biology, proteins are generally thought to adopt unique structures determined by their amino acid sequences. However, proteins are not strictly static objects, but rather populate ensembles of conformations. Transitions between these states occur on a variety of length scales and time scales , and have been linked to functionally relevant phenomena such as allosteric signaling and enzyme catalysis.

<span class="mw-page-title-main">Short linear motif</span>

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

<span class="mw-page-title-main">Fuzzy complex</span>

Fuzzy complexes are protein complexes, where structural ambiguity or multiplicity exists and is required for biological function. Alteration, truncation or removal of conformationally ambiguous regions impacts the activity of the corresponding complex. Fuzzy complexes are generally formed by intrinsically disordered proteins. Structural multiplicity usually underlies functional multiplicity of protein complexes following a fuzzy logic. Distinct binding modes of the nucleosome are also regarded as a special case of fuzziness.

<span class="mw-page-title-main">Protein fold class</span> Categories of protein tertiary structure

In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.

<span class="mw-page-title-main">Fast parallel proteolysis</span>

Fast parallel proteolysis (FASTpp) is a method to determine the thermostability of proteins by measuring which fraction of protein resists rapid proteolytic digestion.

Molecular recognition features (MoRFs) are small intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners. As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins, they can be classified as existing in an extended semi-disordered state.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

<span class="mw-page-title-main">Conformational ensembles</span> Computational models of intrinsically-disordered proteins

In computational chemistry, conformational ensembles, also known as structural ensembles, are experimentally constrained computational models describing the structure of intrinsically unstructured proteins. Such proteins are flexible in nature, lacking a stable tertiary structure, and therefore cannot be described with a single structural representation. The techniques of ensemble calculation are relatively new on the field of structural biology, and are still facing certain limitations that need to be addressed before it will become comparable to classical structural description methods such as biological macromolecular crystallography.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

Low complexity regions (LCRs) in protein sequences, also defined in some contexts as compositionally biased regions (CBRs), are regions in protein sequences that differ from the composition and complexity of most proteins that is normally associated with globular structure. LCRs have different properties from normal regions regarding structure, function and evolution.

The dark proteome is defined as proteins with no defined three-dimensional structure. It can not be detected or analyzed with the use of homologous modeling or analytical quantification for the molecular conformation is unknown. Dark proteins are mostly composed of unknown unknowns.

LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS. Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.

References

  1. Majorek K, Kozlowski L, Jakalski M, Bujnicki JM (December 18, 2008). "First Steps of Protein Structure Prediction" (PDF). In Bujnicki J (ed.). Prediction of Protein Structures, Functions, and Interactions. John Wiley & Sons, Ltd. pp. 39–62. doi:10.1002/9780470741894.ch2. ISBN   9780470517673.
  2. 1 2 3 4 Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, Obradovic Z (2001). "Intrinsically disordered protein". Journal of Molecular Graphics & Modelling. 19 (1): 26–59. CiteSeerX   10.1.1.113.556 . doi:10.1016/s1093-3263(00)00138-8. PMID   11381529.
  3. Dyson HJ, Wright PE (March 2005). "Intrinsically unstructured proteins and their functions". Nature Reviews Molecular Cell Biology. 6 (3): 197–208. doi:10.1038/nrm1589. PMID   15738986. S2CID   18068406.
  4. 1 2 Dunker AK, Silman I, Uversky VN, Sussman JL (December 2008). "Function and structure of inherently disordered proteins". Current Opinion in Structural Biology. 18 (6): 756–64. doi:10.1016/j.sbi.2008.10.002. PMID   18952168.
  5. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (January 2014). "SCOP2 prototype: a new approach to protein structure mining". Nucleic Acids Research. 42 (Database issue): D310–4. doi:10.1093/nar/gkt1242. PMC   3964979 . PMID   24293656.
  6. Mir M, Stadler MR, Ortiz SA, Hannon CE, Harrison MM, Darzacq X, Eisen MB (December 2018). Singer RH, Struhl K, Crocker J (eds.). "Dynamic multifactor hubs interact transiently with sites of active transcription in Drosophila embryos". eLife. 7: e40497. doi: 10.7554/eLife.40497 . PMC   6307861 . PMID   30589412.
  7. Wright PE, Dyson HJ (January 2015). "Intrinsically disordered proteins in cellular signalling and regulation". Nature Reviews. Molecular Cell Biology. 16 (1): 18–29. doi:10.1038/nrm3920. PMC   4405151 . PMID   25531225.
  8. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. (July 2014). "Classification of intrinsically disordered regions and proteins". Chemical Reviews. 114 (13): 6589–6631. doi:10.1021/cr400525m. PMC   4095912 . PMID   24773235.
  9. Song J, Lee MS, Carlberg I, Vener AV, Markley JL (December 2006). "Micelle-induced folding of spinach thylakoid soluble phosphoprotein of 9 kDa and its functional implications". Biochemistry. 45 (51): 15633–43. doi:10.1021/bi062148m. PMC   2533273 . PMID   17176085.
  10. Anfinsen CB (July 1973). "Principles that govern the folding of protein chains". Science. 181 (4096): 223–230. Bibcode:1973Sci...181..223A. doi:10.1126/science.181.4096.223. PMID   4124164.
  11. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, Obradovic Z (2001-01-01). "Intrinsically disordered protein". Journal of Molecular Graphics & Modelling. 19 (1): 26–59. CiteSeerX   10.1.1.113.556 . doi:10.1016/s1093-3263(00)00138-8. PMID   11381529.
  12. 1 2 Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (March 2004). "Prediction and functional analysis of native disorder in proteins from the three kingdoms of life". Journal of Molecular Biology. 337 (3): 635–45. CiteSeerX   10.1.1.120.5605 . doi:10.1016/j.jmb.2004.02.002. PMID   15019783.
  13. 1 2 3 Uversky VN, Oldfield CJ, Dunker AK (2008). "Intrinsically disordered proteins in human diseases: introducing the D2 concept". Annual Review of Biophysics. 37: 215–46. doi:10.1146/annurev.biophys.37.032807.125924. PMID   18573080.
  14. 1 2 3 Bu Z, Callaway DJ (2011). "Proteins MOVE! Protein dynamics and long-range allostery in cell signaling". Protein Structure and Diseases. Advances in Protein Chemistry and Structural Biology. Vol. 83. pp. 163–221. doi:10.1016/B978-0-12-381262-9.00005-7. ISBN   9780123812629. PMID   21570668.
  15. 1 2 Kamerlin SC, Warshel A (May 2010). "At the dawn of the 21st century: Is dynamics the missing link for understanding enzyme catalysis?". Proteins. 78 (6): 1339–75. doi:10.1002/prot.22654. PMC   2841229 . PMID   20099310.
  16. 1 2 3 Cermakova K, Hodges HC (May 2023). "Interaction modules that impart specificity to disordered protein". Trends in Biochemical Sciences. 48 (5): 477–490. doi:10.1016/j.tibs.2023.01.004. PMC   10106370 . PMID   36754681.
  17. Collins MO, Yu L, Campuzano I, Grant SG, Choudhary JS (July 2008). "Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder" (PDF). Molecular & Cellular Proteomics. 7 (7): 1331–48. doi: 10.1074/mcp.M700564-MCP200 . PMID   18388127. S2CID   22193414.
  18. Iakoucheva LM, Brown CJ, Lawson JD, Obradović Z, Dunker AK (October 2002). "Intrinsic disorder in cell-signaling and cancer-associated proteins". Journal of Molecular Biology. 323 (3): 573–84. CiteSeerX   10.1.1.132.682 . doi:10.1016/S0022-2836(02)00969-5. PMID   12381310.
  19. Sandhu KS (2009). "Intrinsic disorder explains diverse nuclear roles of chromatin remodeling proteins". Journal of Molecular Recognition. 22 (1): 1–8. doi:10.1002/jmr.915. PMID   18802931. S2CID   33010897.
  20. Wilson BA, Foy SG, Neme R, Masel J (June 2017). "Young Genes are Highly Disordered as Predicted by the Preadaptation Hypothesis of De Novo Gene Birth". Nature Ecology & Evolution. 1 (6): 0146–146. Bibcode:2017NatEE...1..146W. doi:10.1038/s41559-017-0146. PMC   5476217 . PMID   28642936.
  21. Willis S, Masel J (September 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC   6116962 . PMID   30026186.
  22. James, Jennifer E; Nelson, Paul G; Masel, Joanna (4 April 2023). "Differential Retention of Pfam Domains Contributes to Long-term Evolutionary Trends". Molecular Biology and Evolution. 40 (4). doi:10.1093/molbev/msad073. PMC   10089649 . PMID   36947137.
  23. Prakash A, Shin J, Rajan S, Yoon HS (April 2016). "Structural basis of nucleic acid recognition by FK506-binding protein 25 (FKBP25), a nuclear immunophilin". Nucleic Acids Research. 44 (6): 2909–2925. doi:10.1093/nar/gkw001. PMC   4824100 . PMID   26762975.
  24. Lee SH, Kim DH, Han JJ, Cha EJ, Lim JE, Cho YJ, Lee C, Han KH (February 2012). "Understanding pre-structured motifs (PresMos) in intrinsically unfolded proteins". Current Protein & Peptide Science. 13 (1): 34–54. doi:10.2174/138920312799277974. PMID   22044148.
  25. Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, Uversky VN (October 2006). "Analysis of molecular recognition features (MoRFs)". Journal of Molecular Biology. 362 (5): 1043–59. doi:10.1016/j.jmb.2006.07.087. PMID   16935303.
  26. Gunasekaran K, Tsai CJ, Kumar S, Zanuy D, Nussinov R (February 2003). "Extended disordered proteins: targeting function with less scaffold". Trends in Biochemical Sciences. 28 (2): 81–5. doi:10.1016/S0968-0004(03)00003-3. PMID   12575995.
  27. Sandhu KS, Dash D (July 2007). "Dynamic alpha-helices: conformations that do not conform". Proteins. 68 (1): 109–22. doi:10.1002/prot.21328. PMID   17407165. S2CID   96719019.
  28. Tarakhovsky A, Prinjha RK (July 2018). "Drawing on disorder: How viruses use histone mimicry to their advantage". The Journal of Experimental Medicine. 215 (7): 1777–1787. doi:10.1084/jem.20180099. PMC   6028506 . PMID   29934321.
  29. Atkinson SC, Audsley MD, Lieu KG, Marsh GA, Thomas DR, Heaton SM, Paxman JJ, Wagstaff KM, Buckle AM, Moseley GW, Jans DA, Borg NA (January 2018). "Recognition by host nuclear transport proteins drives disorder-to-order transition in Hendra virus V". Scientific Reports. 8 (1): 358. Bibcode:2018NatSR...8..358A. doi:10.1038/s41598-017-18742-8. PMC   5762688 . PMID   29321677.
  30. Fuxreiter M (January 2012). "Fuzziness: linking regulation to protein dynamics". Molecular BioSystems. 8 (1): 168–77. doi:10.1039/c1mb05234a. PMID   21927770.
  31. Fuxreiter M, Simon I, Bondos S (August 2011). "Dynamic protein-DNA recognition: beyond what can be seen". Trends in Biochemical Sciences. 36 (8): 415–23. doi:10.1016/j.tibs.2011.04.006. PMID   21620710.
  32. Borgia A, Borgia MB, Bugge K, Kissling VM, Heidarsson PO, Fernandes CB, Sottini A, Soranno A, Buholzer KJ, Nettels D, Kragelund BB, Best RB, Schuler B (March 2018). "Extreme disorder in an ultrahigh-affinity protein complex". Nature. 555 (7694): 61–66. Bibcode:2018Natur.555...61B. doi:10.1038/nature25762. PMC   6264893 . PMID   29466338.
  33. Feng H, Zhou BR, Bai Y (November 2018). "Binding Affinity and Function of the Extremely Disordered Protein Complex Containing Human Linker Histone H1.0 and Its Chaperone ProTα". Biochemistry. 57 (48): 6645–6648. doi:10.1021/acs.biochem.8b01075. PMC   7984725 . PMID   30430826.
  34. 1 2 Uversky VN (August 2011). "Intrinsically disordered proteins from A to Z". The International Journal of Biochemistry & Cell Biology. 43 (8): 1090–1103. doi:10.1016/j.biocel.2011.04.001. PMID   21501695.
  35. 1 2 3 4 Oldfield CJ, Dunker AK (2014). "Intrinsically disordered proteins and intrinsically disordered protein regions". Annual Review of Biochemistry. 83: 553–584. doi:10.1146/annurev-biochem-072711-164947. PMID   24606139.
  36. Scalvini B. et al., Circuit Topology Approach for the Comparative Analysis of Intrinsically Disordered Proteins. J. Chem. Inf. Model. 63, 8, 2586–2602 (2023)
  37. Theillet FX, Binolfi A, Bekei B, Martorana A, Rose HM, Stuiver M, Verzini S, Lorenz D, van Rossum M, Goldfarb D, Selenko P (2016). "Structural disorder of monomeric α-synuclein persists in mammalian cells". Nature. 530 (7588): 45–50. Bibcode:2016Natur.530...45T. doi:10.1038/nature16531. PMID   26808899. S2CID   4461465.
  38. Minde DP, Ramakrishna M, Lilley KS (2018). "Biotinylation by proximity labelling favours unfolded proteins". bioRxiv. doi: 10.1101/274761 .
  39. Minde DP, Ramakrishna M, Lilley KS (2020). "Biotin proximity tagging favours unfolded proteins and enables the study of intrinsically disordered regions". Communications Biology. 3 (1): 38. doi: 10.1038/s42003-020-0758-y . PMC   6976632 . PMID   31969649.
  40. Minde DP, Maurice MM, Rüdiger SG (2012). Uversky VN (ed.). "Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp". PLOS ONE. 7 (10): e46147. Bibcode:2012PLoSO...746147M. doi: 10.1371/journal.pone.0046147 . PMC   3463568 . PMID   23056252.
  41. Park C, Marqusee S (March 2005). "Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding". Nature Methods. 2 (3): 207–12. doi:10.1038/nmeth740. PMID   15782190. S2CID   21364478.
  42. Robaszkiewicz K, Ostrowska Z, Cyranka-Czaja A, Moraczewska J (May 2015). "Impaired tropomyosin-troponin interactions reduce activation of the actin thin filament". Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. 1854 (5): 381–90. doi:10.1016/j.bbapap.2015.01.004. PMID   25603119.
  43. Minde DP, Radli M, Forneris F, Maurice MM, Rüdiger SG (2013). Buckle AM (ed.). "Large extent of disorder in Adenomatous Polyposis Coli offers a strategy to guard Wnt signalling against point mutations". PLOS ONE. 8 (10): e77257. Bibcode:2013PLoSO...877257M. doi: 10.1371/journal.pone.0077257 . PMC   3793970 . PMID   24130866.
  44. Brucale M, Schuler B, Samorì B (March 2014). "Single-molecule studies of intrinsically disordered proteins". Chemical Reviews. 114 (6): 3281–317. doi:10.1021/cr400297g. PMID   24432838.
  45. Neupane K, Solanki A, Sosova I, Belov M, Woodside MT (2014). "Diverse metastable structures formed by small oligomers of α-synuclein probed by force spectroscopy". PLOS ONE. 9 (1): e86495. Bibcode:2014PLoSO...986495N. doi: 10.1371/journal.pone.0086495 . PMC   3901707 . PMID   24475132.
  46. Japrung D, Dogan J, Freedman KJ, Nadzeyka A, Bauerdick S, Albrecht T, Kim MJ, Jemth P, Edel JB (February 2013). "Single-molecule studies of intrinsically disordered proteins using solid-state nanopores". Analytical Chemistry. 85 (4): 2449–56. doi:10.1021/ac3035025. PMID   23327569.
  47. Min D, Kim K, Hyeon C, Cho YH, Shin YK, Yoon TY (2013). "Mechanical unzipping and rezipping of a single SNARE complex reveals hysteresis as a force-generating mechanism". Nature Communications. 4 (4): 1705. Bibcode:2013NatCo...4.1705M. doi:10.1038/ncomms2692. PMC   3644077 . PMID   23591872.
  48. Miyagi A, Tsunaka Y, Uchihashi T, Mayanagi K, Hirose S, Morikawa K, Ando T (September 2008). "Visualization of intrinsically disordered regions of proteins by high-speed atomic force microscopy". ChemPhysChem. 9 (13): 1859–66. doi:10.1002/cphc.200800210. PMID   18698566.
  49. Campen A, Williams RM, Brown CJ, Meng J, Uversky VN, Dunker AK (2008). "TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder". Protein and Peptide Letters. 15 (9): 956–963. doi:10.2174/092986608785849164. PMC   2676888 . PMID   18991772.
  50. Schlessinger A, Schaefer C, Vicedo E, Schmidberger M, Punta M, Rost B (June 2011). "Protein disorder--a breakthrough invention of evolution?". Current Opinion in Structural Biology. 21 (3): 412–8. doi:10.1016/j.sbi.2011.03.014. PMID   21514145.
  51. Tompa P (June 2011). "Unstructural biology coming of age". Current Opinion in Structural Biology. 21 (3): 419–425. doi:10.1016/j.sbi.2011.03.012. PMID   21514142.
  52. Ferron F, Longhi S, Canard B, Karlin D (October 2006). "A practical overview of protein disorder prediction methods". Proteins. 65 (1): 1–14. doi:10.1002/prot.21075. PMID   16856179. S2CID   30231497.
  53. Wise-Scira O, Dunn A, Aloglu AK, Sakallioglu IT, Coskuner O (March 2013). "Structures of the E46K mutant-type α-synuclein protein and impact of E46K mutation on the structures of the wild-type α-synuclein protein". ACS Chemical Neuroscience. 4 (3): 498–508. doi:10.1021/cn3002027. PMC   3605821 . PMID   23374074.
  54. Dobson CM (December 2003). "Protein folding and misfolding". Nature. 426 (6968): 884–90. Bibcode:2003Natur.426..884D. doi:10.1038/nature02261. PMID   14685248. S2CID   1036192.
  55. Balatti GE, Barletta GP, Parisi G, Tosatto SC, Bellanda M, Fernandez-Alberti S (December 2021). "Intrinsically Disordered Region Modulates Ligand Binding in Glutaredoxin 1 from Trypanosoma Brucei". The Journal of Physical Chemistry B. 125 (49): 13366–13375. doi:10.1021/acs.jpcb.1c07035. PMID   34870419. S2CID   244942842.
  56. Best RB, Zhu X, Shim J, Lopes PE, Mittal J, Feig M, Mackerell AD (September 2012). "Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles". Journal of Chemical Theory and Computation. 8 (9): 3257–3273. doi:10.1021/ct300400x. PMC   3549273 . PMID   23341755.
  57. Best RB (February 2017). "Computational and theoretical advances in studies of intrinsically disordered proteins". Current Opinion in Structural Biology. 42: 147–154. doi:10.1016/j.sbi.2017.01.006. PMID   28259050.
  58. Chong SH, Chatterjee P, Ham S (May 2017). "Computer Simulations of Intrinsically Disordered Proteins". Annual Review of Physical Chemistry. 68: 117–134. Bibcode:2017ARPC...68..117C. doi:10.1146/annurev-physchem-052516-050843. PMID   28226222.
  59. Fox SJ, Kannan S (September 2017). "Probing the dynamics of disorder". Progress in Biophysics and Molecular Biology. 128: 57–62. doi:10.1016/j.pbiomolbio.2017.05.008. PMID   28554553.
  60. Terakawa T, Takada S (September 2011). "Multiscale ensemble modeling of intrinsically disordered proteins: p53 N-terminal domain". Biophysical Journal. 101 (6): 1450–1458. Bibcode:2011BpJ...101.1450T. doi:10.1016/j.bpj.2011.08.003. PMC   3177054 . PMID   21943426.
  61. Fisher CK, Stultz CM (June 2011). "Constructing ensembles for intrinsically disordered proteins". Current Opinion in Structural Biology. 21 (3): 426–431. doi:10.1016/j.sbi.2011.04.001. PMC   3112268 . PMID   21530234.
  62. Apicella A, Marascio M, Colangelo V, Soncini M, Gautieri A, Plummer CJ (June 2017). "Molecular dynamics simulations of the intrinsically disordered protein amelogenin". Journal of Biomolecular Structure & Dynamics. 35 (8): 1813–1823. doi:10.1080/07391102.2016.1196151. hdl: 11311/1004711 . PMID   27366858. S2CID   205576649.
  63. Zerze GH, Miller CM, Granata D, Mittal J (June 2015). "Free energy surface of an intrinsically disordered protein: comparison between temperature replica exchange molecular dynamics and bias-exchange metadynamics". Journal of Chemical Theory and Computation. 11 (6): 2776–2782. doi:10.1021/acs.jctc.5b00047. PMID   26575570.
  64. Granata D, Baftizadeh F, Habchi J, Galvagnion C, De Simone A, Camilloni C, et al. (October 2015). "The inverted free energy landscape of an intrinsically disordered peptide by simulations and experiments". Scientific Reports. 5: 15449. Bibcode:2015NatSR...515449G. doi:10.1038/srep15449. PMC   4620491 . PMID   26498066.
  65. Iida S, Kawabata T, Kasahara K, Nakamura H, Higo J (April 2019). "Multimodal Structural Distribution of the p53 C-Terminal Domain upon Binding to S100B via a Generalized Ensemble Method: From Disorder to Extradisorder". Journal of Chemical Theory and Computation. 15 (4): 2597–2607. doi:10.1021/acs.jctc.8b01042. PMID   30855964. S2CID   75138292.
  66. Kurcinski M, Kolinski A, Kmiecik S (June 2014). "Mechanism of Folding and Binding of an Intrinsically Disordered Protein As Revealed by ab Initio Simulations". Journal of Chemical Theory and Computation. 10 (6): 2224–2231. doi:10.1021/ct500287c. PMID   26580746.
  67. Ciemny MP, Badaczewska-Dawid AE, Pikuzinska M, Kolinski A, Kmiecik S (January 2019). "Modeling of Disordered Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields". International Journal of Molecular Sciences. 20 (3): 606. doi: 10.3390/ijms20030606 . PMC   6386871 . PMID   30708941.
  68. Garaizar A, Espinosa JR (September 2021). "Salt dependent phase behavior of intrinsically disordered proteins from a coarse-grained model with explicit water and ions". The Journal of Chemical Physics. 155 (12): 125103. Bibcode:2021JChPh.155l5103G. doi:10.1063/5.0062687. PMID   34598583. S2CID   238249229.
  69. Uversky VN (2013). "Digested disorder: Quarterly intrinsic disorder digest (January/February/March, 2013)". Intrinsically Disordered Proteins. 1 (1): e25496. doi:10.4161/idp.25496. PMC   5424799 . PMID   28516015.
  70. Costantini S, Sharma A, Raucci R, Costantini M, Autiero I, Colonna G (March 2013). "Genealogy of an ancient protein family: the Sirtuins, a family of disordered members". BMC Evolutionary Biology. 13 (1): 60. Bibcode:2013BMCEE..13...60C. doi: 10.1186/1471-2148-13-60 . PMC   3599600 . PMID   23497088.