Molecular recognition feature

Last updated

Molecular recognition features (MoRFs) are small (10-70 residues) intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners. [1] [2] As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins, [2] they can be classified as existing in an extended semi-disordered state. [3]

Contents

Categorization

MoRFs can be separated in 4 categories according to the shape they form once bound to their partners. [2]

The categories are:


MoRF predictors

Determining protein structures experimentally is a very time-consuming and expensive process. Therefore, recent years have seen a focus on computational methods for predicting protein structure and structural characteristics. Some aspects of protein structure, such as secondary structure and intrinsic disorder, have benefited greatly from applications of deep learning on an abundance of annotated data. However, computational prediction of MoRF regions remains a challenging task due to the limited availability of annotated data and the rarity of the MoRF class itself. [4] Most current methods have been trained and benchmarked on the sets released by the authors of MoRFPred [5] in 2012, as well as another set released by the authors of MoRFChibi [6] [7] [8] based on experimentally-annotated MoRF data. The table below details some methods available as of 2019 for MoRF prediction (related problems are also touched upon). [9]

PredictorYear PublishedPredicts forMethodologyUses MSA
ANCHOR Archived 2009-10-23 at the Wayback Machine [10] 2009Protein Binding RegionsAmino acid propensity and energy estimation analysis.N
ANCHOR2 [11] 2018Protein Binding RegionsAmino acid propensity and energy estimation analysis.N
DISOPRED3 [12] 2015Protein Intrinsic Disorder and Protein Binding SitesMultistage component prediction (utilizing neural network, Support Vector Machine, and K-nearest neighbour models) for protein disorder prediction. Also uses an additional Support Vector Machine to interpolate binding regions from the disorder predictions.Y
DisoRDPbind [13] 2015RNA, DNA, and Protein Binding RegionsMultiple logistic regression models based on predicted disorder, amino acid properties, and sequence composition. The result is aligned with transferred annotations from a functionally-annotated database.N
fMoRFPred [4] 2016MoRFsFaster version of MoRFPred without the use of multiple sequence alignments.N
MoRFchibi SYSTEM [6] [7] [8] 2015MoRFsHierarchy of different in-house MoRF prediction models:

MoRFchibi: Utilizes Bayes rule to combine the outcomes of two support Vector Machine modules using amino acid composition (Sigmoid kernel) and sequence similarity (RBF kernel). MoRFchibi_light: Utilizes Bayes rule to combine MoRFchibi and disorder prediction hierarchically. MoRFchibi_web: Utilizes Bayes rule to combine MoRFchibi, disorder prediction and PSSM (MSA) hierarchically.

N/Y
MoRFPred [5] 2012MoRFsSupport Vector Machine based on predicted sequence characteristics and alignment of input sequence to known MoRF database.Y
MoRFPred-Plus [14] 2018MoRFsCombined predictions from two Support Vector Machines, predicting for both MoRF regions and MoRF residues.Y
OPAL [15] 2018MoRFsSupport Vector Machine based on physicochemical properties and predicted structural attributes of protein residuesY
OPAL+ [16] 2019MoRFsEnsemble of Support Vector Machines trained individually for length-specific MoRF regions. Also incorporates other predictors as a metapredictor.Y
SPINE-D [17] [18] 2012Protein Intrinsic Disorder and Semi-DisorderNeural network for predicting both long and short disordered regions. Semi-disorder can be linearly interpolated from its predicted disorder probabilities (0.4<=P(D)<=0.7).Y
SPOT-Disorder [19] 2017Protein Intrinsic Disorder and Semi-DisorderBidirectional Long Short-Term Memory network for predicting intrinsic disorder. Semi-disordered regions can be linearly interpolated from its predicted disorder probabilities (0.28<=P(D)<=0.69).Y
SPOT-MoRF [20] 2019MoRFsTransfer learning from the large disorder prediction tool SPOT-Disorder2 [21] (which itself utilizes an ensemble of Bidirectional Long Short-Term Memory networks and Inception ResNets).Y

Databases

mpMoRFsDB [22]

Mutual Folding Induced by Binding (MFIB) database [23]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">Intrinsically disordered proteins</span> Protein without a fixed 3D structure

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.

<span class="mw-page-title-main">Short linear motif</span>

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">Protein fold class</span> Categories of protein tertiary structure

In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.

<span class="mw-page-title-main">Burkhard Rost</span> German computational biology researcher

Burkhard Rost is a scientist leading the Department for Computational Biology & Bioinformatics at the Faculty of Informatics of the Technical University of Munich (TUM). Rost chairs the Study Section Bioinformatics Munich involving the TUM and the Ludwig Maximilian University of Munich (LMU) in Munich. From 2007-2014 Rost was President of the International Society for Computational Biology (ISCB).

<span class="mw-page-title-main">Conformational ensembles</span> Computational models of intrinsically-disordered proteins

In computational chemistry, conformational ensembles, also known as structural ensembles, are experimentally constrained computational models describing the structure of intrinsically unstructured proteins. Such proteins are flexible in nature, lacking a stable tertiary structure, and therefore cannot be described with a single structural representation. The techniques of ensemble calculation are relatively new on the field of structural biology, and are still facing certain limitations that need to be addressed before it will become comparable to classical structural description methods such as biological macromolecular crystallography.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

<span class="mw-page-title-main">Hanah Margalit</span>

Hanah Margalit is a Professor in the faculty of medicine at the Hebrew University of Jerusalem. Her research combines bioinformatics, computational biology and systems biology, specifically in the fields of gene regulation in bacteria and eukaryotes.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

<span class="mw-page-title-main">Wojciech Karlowski</span> Polish biologist specializing in molecular biology and bioinformatics

Wojciech Maciej Karlowski is a Polish biologist specializing in molecular biology and bioinformatics, and a full professor in biological sciences. He is Head of the Department of Computational Biology at the Faculty of Biology at the Adam Mickiewicz University in Poznan. His major scientific interests include identification of non-coding RNAs, genomics, high-throughput analyses, and functional annotation of biological sequences.

LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS. Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.

References

  1. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. (July 2014). "Classification of intrinsically disordered regions and proteins". Chemical Reviews. 114 (13): 6589–631. doi:10.1021/cr400525m. PMC   4095912 . PMID   24773235.
  2. 1 2 3 Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, Uversky VN (October 2006). "Analysis of molecular recognition features (MoRFs)". Journal of Molecular Biology. 362 (5): 1043–59. doi:10.1016/j.jmb.2006.07.087. PMID   16935303.
  3. Zhang T, Faraggi E, Li Z, Zhou Y (2013-05-31). "Intrinsically semi-disordered state and its role in induced folding and protein aggregation". Cell Biochemistry and Biophysics. 67 (3): 1193–205. doi:10.1007/s12013-013-9638-0. PMC   3838602 . PMID   23723000.
  4. 1 2 Yan J, Dunker AK, Uversky VN, Kurgan L (March 2016). "Molecular recognition features (MoRFs) in three domains of life". Molecular BioSystems. 12 (3): 697–710. doi:10.1039/C5MB00640F. hdl: 1805/11056 . PMID   26651072.
  5. 1 2 Disfani FM, Hsu WL, Mizianty MJ, Oldfield CJ, Xue B, Dunker AK, et al. (June 2012). "MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins". Bioinformatics. 28 (12): i75-83. doi:10.1093/bioinformatics/bts209. PMC   3371841 . PMID   22689782.
  6. 1 2 Malhis N, Gsponer J (June 2015). "Computational identification of MoRFs in protein sequences". Bioinformatics. 31 (11): 1738–44. doi:10.1093/bioinformatics/btv060. PMC   4443681 . PMID   25637562.
  7. 1 2 Malhis N, Wong ET, Nassar R, Gsponer J (2015). "Computational Identification of MoRFs in Protein Sequences Using Hierarchical Application of Bayes Rule". PLOS ONE. 10 (10): e0141603. Bibcode:2015PLoSO..1041603M. doi: 10.1371/journal.pone.0141603 . PMC   4627796 . PMID   26517836.
  8. 1 2 Malhis N, Jacobson M, Gsponer J (July 2016). "MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences". Nucleic Acids Research. 44 (W1): W488-93. doi:10.1093/nar/gkw409. PMC   4987941 . PMID   27174932.
  9. Katuwawala A, Peng Z, Yang J, Kurgan L (2019). "Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions". Computational and Structural Biotechnology Journal. 17: 454–462. doi:10.1016/j.csbj.2019.03.013. PMC   6453775 . PMID   31007871.
  10. Mészáros B, Simon I, Dosztányi Z (May 2009). "Prediction of protein binding regions in disordered proteins". PLOS Computational Biology. 5 (5): e1000376. Bibcode:2009PLSCB...5E0376M. doi: 10.1371/journal.pcbi.1000376 . PMC   2671142 . PMID   19412530.
  11. Mészáros B, Erdos G, Dosztányi Z (July 2018). "IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding". Nucleic Acids Research. 46 (W1): W329–W337. doi:10.1093/nar/gky384. PMC   6030935 . PMID   29860432.
  12. Jones DT, Cozzetto D (March 2015). "DISOPRED3: precise disordered region predictions with annotated protein-binding activity". Bioinformatics. 31 (6): 857–63. doi:10.1093/bioinformatics/btu744. PMC   4380029 . PMID   25391399.
  13. Peng Z, Kurgan L (October 2015). "High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder". Nucleic Acids Research. 43 (18): e121. doi:10.1093/nar/gkv585. PMC   4605291 . PMID   26109352.
  14. Sharma R, Bayarjargal M, Tsunoda T, Patil A, Sharma A (January 2018). "MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles". Journal of Theoretical Biology. 437: 9–16. Bibcode:2018JThBi.437....9S. doi:10.1016/j.jtbi.2017.10.015. hdl: 10072/376330 . PMID   29042212.
  15. Sharma R, Raicar G, Tsunoda T, Patil A, Sharma A (June 2018). "OPAL: prediction of MoRF regions in intrinsically disordered protein sequences". Bioinformatics. 34 (11): 1850–1858. doi: 10.1093/bioinformatics/bty032 . hdl: 10072/379824 . PMID   29360926.
  16. Sharma R, Sharma A, Raicar G, Tsunoda T, Patil A (March 2019). "OPAL+: Length-Specific MoRF Prediction in Intrinsically Disordered Protein Sequences". Proteomics. 19 (6): e1800058. doi:10.1002/pmic.201800058. hdl: 10072/382746 . PMID   30324701. S2CID   53502553.
  17. Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y (2012). "SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method". Journal of Biomolecular Structure & Dynamics. 29 (4): 799–813. doi:10.1080/073911012010525022. PMC   3297974 . PMID   22208280.
  18. Zhang T, Faraggi E, Li Z, Zhou Y (2017). "Intrinsic Disorder and Semi-disorder Prediction by SPINE-D". In Zhou Y, Kloczkowski A, FaraggiR, Yang Y (eds.). Prediction of Protein Secondary Structure. Methods in Molecular Biology (vol. 1484). Vol. 1484. New York: Springer. pp. 159–174. doi:10.1007/978-1-4939-6406-2_12. ISBN   9781493964048. PMID   27787826.
  19. Hanson J, Yang Y, Paliwal K, Zhou Y (March 2017). "Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks". Bioinformatics. 33 (5): 685–692. doi: 10.1093/bioinformatics/btw678 . PMID   28011771.
  20. Hanson, Jack; Litfin, Thomas; Paliwal, Kuldip; Zhou, Yaoqi (2019-09-05). Gorodkin, Jan (ed.). "Identifying Molecular Recognition Features in Intrinsically Disordered Regions of Proteins by Transfer Learning". Bioinformatics. 36 (4): 1107–1113. doi: 10.1093/bioinformatics/btz691 . ISSN   1367-4803. PMID   31504193.
  21. Hanson, Jack; Paliwal, Kuldip K.; Litfin, Thomas; Zhou, Yaoqi (2020-03-13). "SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning". Genomics, Proteomics & Bioinformatics. 17 (6): 645–656. doi: 10.1016/j.gpb.2019.01.004 . ISSN   1672-0229. PMC   7212484 . PMID   32173600.
  22. Gypas F, Tsaousis GN, Hamodrakas SJ (October 2013). "mpMoRFsDB: a database of molecular recognition features in membrane proteins". Bioinformatics. 29 (19): 2517–8. doi: 10.1093/bioinformatics/btt427 . PMID   23894139.
  23. Fichó E, Reményi I, Simon I, Mészáros B (November 2017). "MFIB: a repository of protein complexes with mutual folding induced by binding". Bioinformatics. 33 (22): 3682–3684. doi:10.1093/bioinformatics/btx486. PMC   5870711 . PMID   29036655.