List of disorder prediction software

Last updated

Computational methods exploit the sequence signatures of disorder to predict whether a protein is disordered, given its amino acid sequence. The table below, which was originally adapted from [1] and has been recently updated, shows the main features of software for disorder prediction. Note that different software use different definitions of disorder.

PredictorYear PublishedWhat is predictedBased onGenerates and uses multiple sequence alignment?Free for commercial use
PFVM [2] 2023Predict the protein intrinsic disorder regions, degree of disorder as well as folding patterns.Based on five amino acids, the folding variations along sequence are presented by Protein Folding Shape Code (PFSC) in Protein Folding Variation Matrix (PFVM).NoYes, Login=public; Password=public; select “Prediction”
SPOT-Disorder2 [3] 2020Per-residue probability of a sequence residue being disordered.Ensemble of Bidirectional Long Short-Term Memory and Inception-Residual Squeeze-and-Excitation Convolutional Neural NetworksYesNo
Disprot [4] 2019
NetSurfP-2.0 [5] 2019Secondary structure and disorder prediction methodLong Short-Term Memory and Convolutional Neural NetworksYesNo
SPOT-Disorder-Single [6] 2018Per-residue disorder predictor for a single-sequence input (i.e. no MSA profile).An ensemble of Long Short-Term Memory Bidirectional Recurrent Neural Networks and residual convolutional networks.NoNo
IUPred 2005-2018Regions that lack a well-defined 3D-structure under native conditionsEnergy resulting from inter-residue interactions, estimated from local amino acid compositionNoNo
MobiDB-lite [7] 2017Consensus-based prediction of residue disorderEight separate disorder predictors from various groupsNoNo
SPOT-Disorder [8] 2017Outputs the probability of each residue in a protein sequence of being disordered or ordered.A deep recurrent neural network architecture using Long Short-Term Memory (LSTM) cells.YesNo
Disopred2 [9] 2004-2015Regions devoid of ordered regular secondary structureCascaded support vector machine classifiers trained on PSI-BLAST profilesYesNo
s2D 2015Predict secondary structure and intrinsic disorder in one unified statistical framework based on the analysis of NMR chemical shifts [10] Neural networks trained on NMR solution-based data.YesNo
DisPredict_v1.0 [11] 2015Assigns binary order/disorder class and corresponding confidence score for each protein residues using optimized SVM with Radial basis kernel from protein sequenceAA composition, Physical Properties, Helix, strand and coil probability, Accessible surface area, torsion angle fluctuation, monogram, bigram.No?
SLIDER [12] 2014A binary prediction of whether a protein has a long disordered region (>30 residues)Physicochemical properties of amino acids, sequence complexity, and amino acid compositionNo?
MFDp2 [13] 2013Helix, strand and coil probability, relative entropy and per residue disorder prediction.A combination of MFDp and DisCon predictors with unique post processing. Improved prediction over MFDp.YesNo
ESpritz 2012Disorder definitions include: missing x-ray atoms (short), Disprot style disorder (long), and NMR flexibility. A probability of disorder is supplied with two decision thresholds which depend on a user preferred false positive rate.Bi-directional neural networks with diverse and high quality data derived from the Protein Data Bank and DisProt. Compares extremely well with other CASP 9 servers. The method was designed to be very fast.NoNo
GeneSilico Metadisorder [14] 2012Regions that lack a well-defined 3D structure under native conditions (REMARK-465)Meta method, which uses other disorder predictors (like RONN, IUPred, POODLE, and many more). Based on them the consensus is calculated according method accuracy (optimized using ANN, filtering and other techniques). Currently the best available method (first 2 places in last CASP experiment (blind test))YesNo
SPINE-D [15] 2012Output long/short disorder and semi-disorder (0.4-0.7) and full disorder (0.7-1.0). Semi-disorder is semi-collapsed with some secondary structure.A neural network based three-state predictor based on both local and global features. Ranked in Top 5 based on AUC in CASP 9.YesNo
CSpritz 2011Disorder definitions include: missing x-ray atoms (short) and DisProt style disorder (long). A probability of disorder is supplied with two decision thresholds which depend on the false positive rate. Linear motifs within a disorder segment are determined by simple pattern matching from ELM.Support Vector Machine and Bi-directional neural networks with high quality and diverse data derived from the Protein Data Bank and Disprot. Structural information is also supplied in the form of homologous templates. Compares extremely well with other CASP 9 servers.YesNo
PONDR 1999-2010All regions that are not rigid including random coils, partially unstructured regions, and molten globulesLocal aa composition, flexibility, hydropathy, etc.NoNo
MFDp [16] 2010Different types of disorder including random coils, unstructured regions, molten globules, and REMARK-465-based regions.An ensemble of 3 SVMs specialized for the prediction of short, long and generic disordered regions, which combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. MFDp (unofficially) secured 3rd place in last CASP experiment)YesNo
FoldIndex [17] 2005Regions that have a low hydrophobicity and high net charge (either loops or unstructured regions)Charge/hydrophaty analyzed locally using a sliding windowNo?
RONN 2005Regions that lack a well-defined 3D structure under native conditionsBio-basis function neural network trained on disordered proteinsNoNo
GlobPlot 2003Regions with high propensity for globularity on the Russell/Linding scale (propensities for secondary structures and random coils)Russell/Linding scale of disorderNoYes
DisEMBL 2003LOOPS (regions devoid of regular secondary structure); HOT LOOPS (highly mobile loops); REMARK465 (regions lacking electron density in crystal structure)Neural networks trained on X-ray structure dataNoYes
SEG 1994Low-complexity segments that is, “simple sequences” or “compositionally biased regions”.Locally optimized low-complexity segments are produced at defined levels of stringency and then refined according to the equations of Wootton and FederhenNo?

Methods not available anymore:

PredictorWhat is predictedBased onGenerates and uses multiple sequence alignment?
OnD-CRF [18] The transition between structurally ordered and mobile or disordered amino acids intervals under native conditions.OnD-CRF applies Conditional Random Fields, CRFs, which rely on features generated from the amino acid sequence and from secondary structure prediction.No
NORSp Regions with No Ordered Regular Secondary Structure (NORS). Most, but not all, are highly flexible.Secondary structure and solvent accessibilityYes
HCA (Hydrophobic Cluster Analysis)Hydrophobic clusters, which tend to form secondary structure elementsHelical visualization of amino acid sequenceNo
PreLink Regions that are expected to be unstructured in all conditions, regardless of the presence of a binding partnerCompositional bias and low hydrophobic cluster content.No
MD (Meta-Disorder predictor) [19] Regions of different "types"; for example, unstructured loops and regions containing few stable intra-chain contactsA neural-network based meta-predictor that uses different sources of information predominantly obtained from orthogonal approachesYes
IUPforest-L Long disordered regions in a set of proteinsMoreau-Broto auto-correlation function of amino acid indices (AAIs)No
MeDor (Metaserver of Disorder) [20] Regions of different "types". MeDor provides a unified view of multiple disorder predictors.Meta method, which uses other disorder predictors (like FoldIndex, DisEMBL REMARK465, IUPred, RONN ...) and provides additional features (like HCA plot, Secondary Structure prediction, Transmembrane domains ... ) that all together help the user in defining regions involved in disorder.No

Related Research Articles

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

Protein subcellular localization prediction involves the prediction of where a protein resides in a cell, its subcellular localization.

<span class="mw-page-title-main">Intrinsically disordered proteins</span> Protein without a fixed 3D structure

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Protein contact map</span>

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-binding proteins. Characteristics of such binding sites may be used for predicting DNA-binding sites from the structural and even sequence properties of unbound proteins. This approach has been successfully implemented for predicting the protein–protein interface. Here, this approach is adopted for predicting DNA-binding sites in DNA-binding proteins. First attempt to use sequence and evolutionary features to predict DNA-binding sites in proteins was made by Ahmad et al. (2004) and Ahmad and Sarai (2005). Some methods use structural information to predict DNA-binding sites and therefore require a three-dimensional structure of the protein, while others use only sequence information and do not require protein structure in order to make a prediction.

Residue depth (RD) is a solvent exposure measure that describes to what extent a residue is buried in the protein structure space. It complements the information provided by conventional accessible surface area (ASA).

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

<span class="mw-page-title-main">Burkhard Rost</span> German computational biology researcher

Burkhard Rost is a scientist leading the Department for Computational Biology & Bioinformatics at the Faculty of Informatics of the Technical University of Munich (TUM). Rost chairs the Study Section Bioinformatics Munich involving the TUM and the Ludwig Maximilian University of Munich (LMU) in Munich. From 2007-2014 Rost was President of the International Society for Computational Biology (ISCB).

Molecular recognition features (MoRFs) are small intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners. As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins, they can be classified as existing in an extended semi-disordered state.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

<span class="mw-page-title-main">Lukasz Kurgan</span> Polish-Canadian academic (born 1975)

Lukasz Kurgan is a Polish-Canadian bioinformatician. He is the Robert J. Mattauch Endowed Professor of Computer Science at the Virginia Commonwealth University, in Richmond, Virginia, U.S.A. He was a professor at the University of Alberta between 2003 and 2015. Kurgan earned his Ph.D. in computer science from the University of Colorado at Boulder in 2003 and his M.Sc. degree in automation and robotics from the AGH University of Science and Technology in 1999.

IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.

Computational methods that use protein sequence and/ or protein structure to predict protein aggregation. The table below, shows the main features of software for prediction of protein aggregation

LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS. Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.

References

  1. Ferron F, Longhi S, Canard B, Karlin D (October 2006). "A practical overview of protein disorder prediction methods". Proteins. 65 (1): 1–14. doi:10.1002/prot.21075. PMID   16856179. S2CID   30231497.
  2. Yang, J; Cheng, WX; Zhao, XF; Wu, G; Sheng, ST; Hu, Q; Ge, H; Qin, Q; Jin, X; Zhang, L; Zhang, P (Nov 2020). "Comprehensive folding variations for protein folding". Proteins: Structure, Function, and Bioinformatics. 90 (11): 1851–1872. doi: 10.1002/prot.26381 . PMID   35514069.
  3. Hanson, Jack; Paliwal, Kuldip K.; Litfin, Thomas; Zhou, Yaoqi (2020-03-13). "SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning". Genomics, Proteomics & Bioinformatics. 17 (6): 645–656. doi: 10.1016/j.gpb.2019.01.004 . ISSN   1672-0229. PMC   7212484 . PMID   32173600.
  4. Hatos, András; Hajdu-Soltész, Borbála; Monzon, Alexander M.; Palopoli, Nicolas; Álvarez, Lucía; Aykac-Fas, Burcu; Bassot, Claudio; Benítez, Guillermo I.; Bevilacqua, Martina; Chasapi, Anastasia; Chemes, Lucia (8 January 2020). "DisProt: intrinsic protein disorder annotation in 2020". Nucleic Acids Research. 48 (D1): D269–D276. doi:10.1093/nar/gkz975. ISSN   1362-4962. PMC   7145575 . PMID   31713636.
  5. Klausen MS, Jespersen MC, Nielsen H, Jensen KK, Jurtz VI, Soenderby CK, Sommer M, Otto A, Winther O, Nielsen M, Petersen B, Marcatili P (2019). "NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning". Proteins: Structure, Function, and Bioinformatics. 87 (6): 520–527. doi:10.1002/prot.25674. PMID   30785653. S2CID   216629401.
  6. Hanson J, Paliwal K, Zhou Y (2018). "Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures". Journal of Chemical Information and Modeling. 58 (11): 2369–2376. doi:10.1021/acs.jcim.8b00636. hdl: 10072/382201 . PMID   30395465. S2CID   53235372.
  7. Necci, Marco; Piovesan, Damiano; Dosztányi, Zsuzsanna; Tosatto, Silvio C.E. (2017-01-18). "MobiDB-lite: Fast and highly specific consensus prediction of intrinsic disorder in proteins" (PDF). Bioinformatics. 33 (9): 1402–1404. doi: 10.1093/bioinformatics/btx015 . ISSN   1367-4803. PMID   28453683.
  8. Hanson J, Yang Y, Paliwal K, Zhou Y (2016). "Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks". Bioinformatics. 33 (5): 685–692. doi: 10.1093/bioinformatics/btw678 . PMID   28011771.
  9. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (March 2004). "Prediction and functional analysis of native disorder in proteins from the three kingdoms of life". J. Mol. Biol. 337 (3): 635–45. CiteSeerX   10.1.1.120.5605 . doi:10.1016/j.jmb.2004.02.002. PMID   15019783.
  10. Sormanni P, Camilloni C, Fariselli P, Vendruscolo M (February 2015). "The s2D Method: Simultaneous Sequence- Based Prediction of the Statistical Populations of Ordered and Disordered Regions in Proteins". J. Mol. Biol. 427 (4): 982–996. doi:10.1016/j.jmb.2014.12.007. PMID   25534081.
  11. Sumaiya Iqbal; Md Tamjidul Hoque (October 2015). "DisPredict: A Predictor of Disordered Protein using Optimized RBF Kernel, content and profiles". PLOS ONE. 10 (10): e0141551. doi: 10.1371/journal.pone.0141551 . PMC   4627842 . PMID   26517719.
  12. Peng Z, Mizianty MJ, Kurgan L (Jan 2014). "Genome-scale prediction of proteins with long intrinsically disordered regions". Proteins. 82 (1): 145–58. doi:10.1002/prot.24348. PMID   23798504. S2CID   21229963.
  13. Marcin J. Miziantya, Zhenling Penga & Lukasz Kurgan (April 2013). "Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles". Intrinsically Disordered Proteins. 1 (1): e24428. doi:10.4161/idp.24428. PMC   5424793 . PMID   28516009.
  14. Kozlowski, L. P.; Bujnicki, J. M. (2012). "MetaDisorder: A meta-server for the prediction of intrinsic disorder in proteins". BMC Bioinformatics. 13: 111. doi: 10.1186/1471-2105-13-111 . PMC   3465245 . PMID   22624656.
  15. Zhang T, Faraggi E, Xue B, Dunker K, Uversky VN, Zhou Y (February 2012). "SPINE-D: Accurate prediction of short and long disordered regions by a single neural-network based method" (PDF). Journal of Biomolecular Structure and Dynamics. 29 (4): 799–813. doi:10.1080/073911012010525022. hdl:10072/57573. PMC   3297974 . PMID   22208280.
  16. Mizianty MJ, Stach W, Chen K, Kedarisetti KD, Disfani FM, Kurgan L (September 2010). "Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources". Bioinformatics. 26 (18): i489–96. doi:10.1093/bioinformatics/btq373. PMC   2935446 . PMID   20823312.
  17. Prilusky J, Felder CE, Zeev-Ben-Mordehai T, et al. (August 2005). "FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded" (PDF). Bioinformatics. 21 (16): 3435–8. doi: 10.1093/bioinformatics/bti537 . PMID   15955783.
  18. Wang L, Sauer UH (June 2008). "OnD-CRF: predicting order and disorder in proteins using conditional random fields". Bioinformatics. 24 (11): 1401–2. doi:10.1093/bioinformatics/btn132. PMC   2387219 . PMID   18430742.
  19. Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B (2009). Orgel JP (ed.). "Improved disorder prediction by combination of orthogonal approaches". PLOS ONE. 4 (2): e4433. Bibcode:2009PLoSO...4.4433S. doi: 10.1371/journal.pone.0004433 . PMC   2635965 . PMID   19209228.
  20. Lieutaud P, Canard B, Longhi S (September 2008). "MeDor: a metaserver for predicting protein disorder". BMC Genomics. 16 (Suppl 2): S25. doi: 10.1186/1471-2164-9-S2-S25 . PMC   2559890 . PMID   18831791.