I-TASSER

Last updated
I-TASSER
Developer(s) Yang Zhang Lab
Website zhanglab.ccmb.med.umich.edu/I-TASSER/

I-TASSER (Iterative Threading ASSEmbly Refinement) is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. [1] It detects structure templates from the Protein Data Bank by a technique called fold recognition (or threading). The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

Contents

I-TASSER has been extended for structure-based protein function predictions, which provides annotations on ligand binding site, gene ontology and enzyme commission by structurally matching structural models of the target protein to the known proteins in protein function databases. [2] [3] It has an on-line server built in the Yang Zhang Lab at the University of Michigan, Ann Arbor, allowing users to submit sequences and obtain structure and function predictions. A standalone package of I-TASSER is available for download at the I-TASSER website.

Ranking in CASP

I-TASSER (as 'Zhang-Server') has been consistently ranked as the top method in CASP, a community-wide experiment to benchmark the best structure prediction methods in the field of protein folding and protein structure prediction. CASP takes place every two years since 1994. [4]

Method and pipeline

I-TASSER is a template-based method for protein structure and function prediction. [1] The pipeline consists of six consecutive steps:

On-line Server

The I-TASSER server allows users to generate automatically protein structure and function predictions.

Standalone Suite

The I-TASSER Suite is a downloadable package of standalone computer programs, developed by the Yang Zhang Lab for protein structure prediction and refinement, and structure-based protein function annotations. [12] Through the I-TASSER License, researchers have access to the following standalone programs:

Help documents

Related Research Articles

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">CASP</span> Protein structure prediction challenge

Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Binding site</span> Molecule-specific coordinate bonding area in biological systems

In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may include other proteins, enzyme substrates, second messengers, hormones, or allosteric modulators. The binding event is often, but not always, accompanied by a conformational change that alters the protein's function. Binding to protein binding sites is most often reversible, but can also be covalent reversible or irreversible.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Rosetta@home</span> BOINC based volunteer computing project researching protein folding

Rosetta@home is a volunteer computing project researching protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker laboratory at the University of Washington. Rosetta@home aims to predict protein–protein docking and design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 GigaFLOPS on average as of September 19, 2020. Foldit, a Rosetta@home videogame, aims to reach these goals with a crowdsourcing approach. Though much of the project is oriented toward basic research to improve the accuracy and robustness of proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease, and other pathologies.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

<span class="mw-page-title-main">RAPTOR (software)</span>

RAPTOR is protein threading software used for protein structure prediction. It has been replaced by RaptorX, which is much more accurate than RAPTOR.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain Wikipedia related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

Continuous Automated Model EvaluatiOn (CAMEO) is a community-wide project to continuously evaluate the accuracy and reliability of protein structure prediction servers in a fully automated manner. CAMEO is a continuous and fully automated complement to the bi-annual CASP experiment.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

Jeffrey Skolnick is an American computational biologist. He is currently a Georgia Institute of Technology School of Biology Professor, the Director of the Center for the Study of Systems Biology, the Mary and Maisie Gibson Chair, the Georgia Research Alliance Eminent Scholar in Computational Systems Biology, the Director of the Integrative BioSystems Institute, and was previously the Scientific Advisor at Intellimedix.

<span class="mw-page-title-main">C2orf73</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

<span class="mw-page-title-main">SBK3</span> Protein-coding gene in the species Homo sapiens

SH3 Domain Binding Kinase Family Member 3 is an enzyme that in humans is encoded by the SBK3 gene. SBK3 is a member of the serine/threonine protein kinase family. The SBK3 protein is known to exhibit transferase activity, especially phosphotransferase activity, and tyrosine kinase activity. It is well-conserved throughout mammalian organisms and has two paralogs: SBK1 and SBK2.

IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.

References

  1. 1 2 Roy A, Kucukural A, Zhang Y (2010). "I-TASSER: a unified platform for automated protein structure and function prediction". Nature Protocols. 5 (4): 725–738. doi:10.1038/nprot.2010.5. PMC   2849174 . PMID   20360767.
  2. Roy A, Yang J, Zhang Y (2012). "COFACTOR: An accurate comparative algorithm for structure-based protein function annotation". Nucleic Acids Research. 40 (Web Server issue): W471–W477. doi:10.1093/nar/gks372. PMC   3394312 . PMID   22570420.
  3. Zhang C, Freddolino PL, Zhang Y (2017). "COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information". Nucleic Acids Research. 45 (W1): W291–W299. doi:10.1093/nar/gkx366. PMC   5793808 . PMID   28472402.
  4. Moult, J; et al. (1995). "A large-scale experiment to assess protein structure prediction methods" (PDF). Proteins. 23 (3): ii–iv. doi:10.1002/prot.340230303. PMID   8710822.
  5. Battey, JN; et al. (2007). "Automated server predictions in CASP7". Proteins. 69 (Suppl 8): 68–82. doi: 10.1002/prot.21761 . PMID   17894354.
  6. Wu S, Zhang Y (2007). "LOMETS: A local meta-threading-server for protein structure prediction". Nucleic Acids Research. 35 (10): 3375–3382. doi:10.1093/nar/gkm251. PMC   1904280 . PMID   17478507.
  7. Swendsen RH, Wang JS (1986). "Replica Monte Carlo simulation of spin glasses". Physical Review Letters. 57 (21): 2607–2609. doi:10.1103/physrevlett.57.2607. PMID   10033814.
  8. Zhang Y, Skolnick J (2004). "SPICKER: A Clustering Approach to Identify Near-Native Protein Folds". Journal of Computational Chemistry. 25 (6): 865–871. doi:10.1002/jcc.20011. PMID   15011258.
  9. Zhang J, Liang Y, Zhang Y (2011). "Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling". Structure. 19 (12): 1784–1795. doi:10.1016/j.str.2011.09.022. PMC   3240822 . PMID   22153501.
  10. Xu D, Zhang Y (2011). "Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-step Atomic-level Energy Minimization". Biophysical Journal. 101 (10): 2525–2534. doi:10.1016/j.bpj.2011.10.024. PMC   3218324 . PMID   22098752.
  11. Yang J, Roy A, Zhang Y (2013). "Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment". Bioinformatics. 29 (20): 2588–2595. doi:10.1093/bioinformatics/btt447. PMC   3789548 . PMID   23975762.
  12. Yang J, Roy A, Zhang Y (2015). "The I-TASSER Suite: Protein structure and function prediction". Nature Methods. 12 (1): 7–8. doi:10.1038/nmeth.3213. PMC   4428668 . PMID   25549265.