CASP

Last updated
A target structure (ribbons) and 354 template-based predictions superimposed (gray Calpha backbones); from CASP8 Target3dsmRib 354predictedModels CASP8.jpg
A target structure (ribbons) and 354 template-based predictions superimposed (gray Calpha backbones); from CASP8

Critical Assessment of Structure Prediction (CASP), [1] sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. [2] CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

Contents

Selection of target proteins

In order to ensure that no predictor can have prior information about a protein's structure that would put them at an advantage, it is important that the experiment be conducted in a double-blind fashion: Neither predictors nor the organizers and assessors know the structures of the target proteins at the time when predictions are made. Targets for structure prediction are either structures soon-to-be solved by X-ray crystallography or NMR spectroscopy, or structures that have just been solved (mainly by one of the structural genomics centers) and are kept on hold by the Protein Data Bank. If the given sequence is found to be related by common descent to a protein sequence of known structure (called a template), comparative protein modeling may be used to predict the tertiary structure. Templates can be found using sequence alignment methods (e.g. BLAST or HHsearch) or protein threading methods, which are better in finding distantly related templates. Otherwise, de novo protein structure prediction must be applied (e.g. Rosetta), which is much less reliable but can sometimes yield models with the correct fold (usually, for proteins less than 100-150 amino acids). Truly new folds are becoming quite rare among the targets, [3] [4] making that category smaller than desirable.

Evaluation

The primary method of evaluation [5] is a comparison of the predicted model α-carbon positions with those in the target structure. The comparison is shown visually by cumulative plots of distances between pairs of equivalents α-carbon in the alignment of the model and the structure, such as shown in the figure (a perfect model would stay at zero all the way across), and is assigned a numerical score GDT-TS (Global Distance Test—Total Score) describing percentage of well-modeled residues in the model with respect to the target. [6] Free modeling (template-free, or de novo) is also evaluated visually by the assessors, since the numerical scores do not work as well for finding loose resemblances in the most difficult cases. [7] High-accuracy template-based predictions were evaluated in CASP7 by whether they worked for molecular-replacement phasing of the target crystal structure [8] with successes followed up later, [9] and by full-model (not just α-carbon) model quality and full-model match to the target in CASP8. [10]

Evaluation of the results is carried out in the following prediction categories:

Tertiary structure prediction category was further subdivided into:

Starting with CASP7, categories have been redefined to reflect developments in methods. The 'Template based modeling' category includes all former comparative modeling, homologous fold based models and some analogous fold based models. The 'template free modeling (FM)' category includes models of proteins with previously unseen folds and hard analogous fold based models. Due to limited numbers of template free targets (they are quite rare), in 2011 so called CASP ROLL was introduced. This continuous (rolling) CASP experiment aims at more rigorous evaluation of template free prediction methods through assessment of a larger number of targets outside of the regular CASP prediction season. Unlike LiveBench and EVA, this experiment is in the blind-prediction spirit of CASP, i.e. all the predictions are made on yet unknown structures. [11]

The CASP results are published in special supplement issues of the scientific journal Proteins, all of which are accessible through the CASP website. [12] A lead article in each of these supplements describes specifics of the experiment [13] [14] while a closing article evaluates progress in the field. [15] [16]

AlphaFold

In December 2018, CASP13 made headlines when it was won by AlphaFold, an artificial intelligence program created by DeepMind. [17] In November 2020, an improved version 2 of AlphaFold won CASP14. [18] According to one of CASP co-founders John Moult, AlphaFold scored around 90 on a 100-point scale of prediction accuracy for moderately difficult protein targets. [19] AlphaFold was made open source in 2021, and in CASP15 in 2022, while DeepMind did not enter, virtually all of the high-ranking teams used AlphaFold or modifications of AlphaFold. [20]

See also

Related Research Articles

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Rosetta@home</span> BOINC based volunteer computing project researching protein folding

Rosetta@home is a volunteer computing project researching protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker lab. Rosetta@home aims to predict protein–protein docking and design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 GigaFLOPS on average as of September 19, 2020. Foldit, a Rosetta@home videogame, aims to reach these goals with a crowdsourcing approach. Though much of the project is oriented toward basic research to improve the accuracy and robustness of proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease, and other pathologies.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy. The metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program. It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.

CAFASP, or the Critical Assessment of Fully Automated Structure Prediction, is a large-scale blind experiment in protein structure prediction that studies the performance of automated structure prediction webservers in homology modeling, fold recognition, and ab initio prediction of protein tertiary structures based only on amino acid sequence. The experiment runs once every two years in parallel with CASP, which focuses on predictions that incorporate human intervention and expertise. Compared to related benchmarking techniques LiveBench and EVA, which run weekly against newly solved protein structures deposited in the Protein Data Bank, CAFASP generates much less data, but has the advantage of producing predictions that are directly comparable to those produced by human prediction experts. Recently CAFASP has been run essentially integrated into the CASP results rather than as a separate experiment.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

Critical Assessment of PRediction of Interactions (CAPRI) is a community-wide experiment in modelling the molecular structure of protein complexes, otherwise known as protein–protein docking.

RAPTOR is protein threading software used for protein structure prediction. It has been replaced by RaptorX, which is much more accurate than RAPTOR.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

Continuous Automated Model EvaluatiOn (CAMEO) is a community-wide project to continuously evaluate the accuracy and reliability of protein structure prediction servers in a fully automated manner. CAMEO is a continuous and fully automated complement to the bi-annual CASP experiment.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

<span class="mw-page-title-main">AlphaFold</span> Artificial intelligence program by DeepMind

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.

<span class="mw-page-title-main">Lukasz Kurgan</span> Polish-Canadian academic (b. 1975)

Lukasz Kurgan is the Robert J. Mattauch Endowed Professor of Computer Science at the Virginia Commonwealth University, in Richmond, Virginia, U.S.A. He was a professor at the University of Alberta between 2003 and 2015. Dr. Kurgan earned his Ph.D. in computer science from the University of Colorado at Boulder in 2003 and his M.Sc. degree in automation and robotics from the AGH University of Science and Technology in 1999.

IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.

References

  1. "Home - CASP15". predictioncenter.org. Retrieved 2022-12-14.
  2. Moult J, Pedersen JT, Judson R, Fidelis K (November 1995). "A large-scale experiment to assess protein structure prediction methods". Proteins. 23 (3): ii–v. doi:10.1002/prot.340230303. PMID   8710822. S2CID   11216440.
  3. Tress ML, Ezkurdia I, Richardson JS (2009). "Target domain definition and classification in CASP8". Proteins. 77 Suppl 9 (Suppl 9): 10–7. doi:10.1002/prot.22497. PMC   2805415 . PMID   19603487.
  4. Zhang Y, Skolnick J (January 2005). "The protein structure prediction problem could be solved using the current PDB library". Proceedings of the National Academy of Sciences of the United States of America. 102 (4): 1029–34. Bibcode:2005PNAS..102.1029Z. doi: 10.1073/pnas.0407152101 . PMC   545829 . PMID   15653774.
  5. Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A (2009). "Evaluation of template-based models in CASP8 with standard measures". Proteins. 77 Suppl 9 (Suppl 9): 18–28. doi:10.1002/prot.22561. PMC   4589151 . PMID   19731382.
  6. Zemla A (July 2003). "LGA: A method for finding 3D similarities in protein structures". Nucleic Acids Research. 31 (13): 3370–4. doi:10.1093/nar/gkg571. PMC   168977 . PMID   12824330.
  7. Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y (2009). "Assessment of CASP8 structure predictions for template free targets". Proteins. 77 Suppl 9 (Suppl 9): 50–65. doi:10.1002/prot.22591. PMID   19774550. S2CID   16517118.
  8. Read RJ, Chavali G (2007). "Assessment of CASP7 predictions in the high accuracy template-based modeling category". Proteins. 69 Suppl 8 (Suppl 8): 27–37. doi: 10.1002/prot.21662 . PMID   17894351. S2CID   33172629.
  9. Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D (November 2007). "High-resolution structure prediction and the crystallographic phase problem". Nature. 450 (7167): 259–64. Bibcode:2007Natur.450..259Q. doi:10.1038/nature06249. PMC   2504711 . PMID   17934447.
  10. Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, et al. (2009). "The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models". Proteins. 77 Suppl 9 (Suppl 9): 29–49. doi:10.1002/prot.22551. PMC   2877634 . PMID   19731372.
  11. Kryshtafovych A, Monastyrskyy B, Fidelis K (February 2014). "CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL". Proteins. 82 Suppl 2 (2): 7–13. doi:10.1002/prot.24399. PMC   4396618 . PMID   24038551.
  12. "CASP Proceedings".
  13. Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A (2007). "Critical assessment of methods of protein structure prediction-Round VII". Proteins. 69 Suppl 8 (Suppl 8): 3–9. doi:10.1002/prot.21767. PMC   2653632 . PMID   17918729.
  14. Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A (2009). "Critical assessment of methods of protein structure prediction - Round VIII". Proteins. 77 Suppl 9 (Suppl 9): 1–4. doi: 10.1002/prot.22589 . PMID   19774620. S2CID   9704851.
  15. Kryshtafovych A, Fidelis K, Moult J (2007). "Progress from CASP6 to CASP7". Proteins. 69 Suppl 8 (Suppl 8): 194–207. doi: 10.1002/prot.21769 . PMID   17918728. S2CID   40200832.
  16. Kryshtafovych A, Fidelis K, Moult J (2009). "CASP8 results in context of previous experiments". Proteins. 77 Suppl 9 (Suppl 9): 217–28. doi:10.1002/prot.22562. PMC   5479686 . PMID   19722266.
  17. Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 19 July 2019.
  18. "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  19. Callaway, Ewen (2020). "'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures". Nature. 588 (7837): 203–204. doi:10.1038/d41586-020-03348-4. PMID   33257889. S2CID   227243204.
  20. Schreiner, Maximilian (2022-12-14). "CASP15: AlphaFold's success spurs new challenges in protein-structure prediction". The Decoder. Retrieved 2023-02-13.

Result ranking

Automated assessments for CASP15 (2022)

Automated assessments for CASP14 (2020)

Automated assessments for CASP13 (2018)

Automated assessments for CASP12 (2016)

Automated assessments for CASP11 (2014)

Automated assessments for CASP10 (2012)

Automated assessments for CASP9 (2010)

Automated assessments for CASP8 (2008)

Automated assessments for CASP7 (2006)