Phyre

Last updated
Phyre2
Developer(s)
  • Lawrence Kelley
  • Bob Maccallum
  • Benjamin Jefferys
  • Alex Herbert
  • Riccardo Bennett-Lovsey
  • Michael Sternberg
Stable release
2.0 / 23 February 2011;13 years ago (2011-02-23)
Written in
Available in English
Type Bioinformatics tool for protein structure prediction
License Creative Commons Attribution-2.0
Website www.sbg.bio.ic.ac.uk/phyre2

Phyre and Phyre2 (Protein Homology/AnalogYRecognition Engine; pronounced as 'fire') are free web-based services for protein structure prediction. [1] [2] [3] Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. [4] Like other remote homology recognition techniques (see protein threading), it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council. [5]

Contents

Description

The Phyre and Phyre2 servers predict the three-dimensional structure of a protein sequence using the principles and techniques of homology modeling. Because the structure of a protein is more conserved in evolution than its amino acid sequence, a protein sequence of interest (the target) can be modeled with reasonable accuracy on a very distantly related sequence of known structure (the template), provided that the relationship between target and template can be discerned through sequence alignment. Currently the most powerful and accurate methods for detecting and aligning remotely related sequences rely on profiles or hidden Markov models (HMMs). These profiles/HMMs capture the mutational propensity of each position in an amino acid sequence based on observed mutations in related sequences and can be thought of as an 'evolutionary fingerprint' of a particular protein.

Typically, the amino acid sequences of a representative set of all known three-dimensional protein structures is compiled, and these sequences are processed by scanning against a large protein sequence database. The result is a database of profiles or HMMs, one for each known 3D structure. A user sequence of interest is similarly processed to form a profile/HMM. This user profile is then scanned against the database of profiles using profile-profile or HMM-HMM alignment techniques. These alignments can also take into account patterns of predicted or known secondary structure elements and can be scored using various statistical models. See protein structure prediction for more information.

The first Phyre server was released in June 2005 and uses a profile-profile alignment algorithm based on each protein's position-specific scoring matrix. [6] The Phyre2 server was publicly released February 2011 as a replacement for the original Phyre server and provides extra functionality over Phyre, a more advanced interface, fully updated fold library and uses the HHpred / HHsearch package for homology detection among other improvements.

Standard usage

After pasting a protein amino acid sequence into the Phyre or Phyre2 submission form, a user will typically wait between 30 minutes and several hours (depending on factors such as sequence length, number of homologous sequences and frequency and length of insertions and deletions) for a prediction to complete. An email containing summary information and the predicted structure in PDB format are sent to the user together with a link to a web page of results. The Phyre2 results screen is divided into three main sections, described below.

Secondary structure and disorder prediction

Example Phyre2 output for secondary structure and disorder prediction Phyre2 secondary structure and disorder.png
Example Phyre2 output for secondary structure and disorder prediction

The user-submitted protein sequence is first scanned against a large sequence database using PSI-BLAST. The profile generated by PSI-BLAST is then processed by the neural network secondary structure prediction program PsiPred [7] and the protein disorder predictor Disopred. [8] The predicted presence of alpha-helices, beta-strands and disordered regions is shown graphically together with a color-coded confidence bar.

Domain analysis

Example Phyre2 output showing multiple domains and pop-up model viewer Phyre2 domain analysis.png
Example Phyre2 output showing multiple domains and pop-up model viewer

Many proteins contain multiple protein domains. Phyre2 provides a table of template matches color-coded by confidence and indicating the region of the user sequence matched. This can aid in the determination of the domain composition of a protein.

Detailed template information

Example Phyre2 detailed template information table Phyre2 main results table.png
Example Phyre2 detailed template information table

The main results table in Phyre2 provides confidence estimates, images and links to the three-dimensional predicted models and information derived from either Structural Classification of Proteins database (SCOP) or the Protein Data Bank (PDB) depending on the source of the detected template. For each match a link takes the user to a detailed view of the alignment between the user sequence and the sequence of known three-dimensional structure.

Alignment view

Example Phyre2 detailed view of the alignment between a user sequence and a known protein structure. Phyre2 alignment view.png
Example Phyre2 detailed view of the alignment between a user sequence and a known protein structure.

The detailed alignment view permits a user to examine individual aligned residues, matches between predicted and known secondary structure elements and the ability to toggle information regarding patterns of sequence conservation and secondary structure confidence. In addition Jmol is used to permit interactive 3D viewing of the protein model.

Improvements in Phyre2

Phyre2 uses a fold library that is updated weekly as new structures are solved. It uses a more up-to-date interface and offers additional functionality over the Phyre server as described below.

Additional functionality

Batch processing

The batch processing feature permits users to submit more than one sequence to Phyre2 by uploading a file of sequences in FASTA format. By default, users have a limit of 100 sequences in a batch. This limit can be raised by contacting the administrator. Batch jobs are processed in the background on free computing power as it becomes available. Thus, batch jobs will often take longer than individually submitted jobs, but this is necessary to allow a fair distribution of computing resources to all Phyre2 users.

One to one threading

One to one threading allows you to upload both a sequence you wish modelled AND the template on which to model it. Users sometimes have a protein sequence that they wish to model on a specific template of their choice. This may be for example a newly solved structure that is not in the Phyre2 database or because of some additional biological information that indicates the chosen template would produce a more accurate model than the one(s) automatically chosen by Phyre2.

Backphyre

Instead of predicting the 3D structure of a protein sequence, often users have a solved structure and they are interested in determining if there is a related structure in a genome of interest. In Phyre2 an uploaded protein structure can be converted into a hidden Markov model and then scanned against a set of genomes (more than 20 genomes as of March 2011). This functionality is called "BackPhyre" to indicate how Phyre2 is being used in reverse.

Phyrealarm

Sometimes Phyre2 can't detect any confident matches to known structures. However, the fold library database increases by about 40-100 new structures each week. So even though there might be no decent templates this week, there may well be in the coming weeks. Phyrealarm allows users to submit a protein sequence to be automatically scanned against new entries added to the fold library every week. If a confident hit is detected, the user is automatically notified by email together with the results of the Phyre2 search. Users can also control the level of alignment coverage and confidence in the match required to trigger an email alert.

3DLigandSite

Phyre2 is coupled to the 3DLigandSite [9] server for protein binding site prediction. 3DLigandSite has been one of the top performing servers for binding site prediction at the Critical Assessment of Techniques for Protein Structure Prediction (CASP) in (CASP8 and CASP9). Confident models produced by Phyre2 (confidence >90%) are automatically submitted to 3DLigandSite.

Transmembrane topology prediction

The program memsat_svm [10] is used to predict the presence and topology of any transmembrane helices present in the user protein sequence.

Multi-template modelling

Phyre2 permits users to choose 'Intensive' modelling from the main submission screen. This mode:

  • Examines the list of hits and applies heuristics in order to select templates that maximise sequence coverage and confidence.
  • Constructs models for each selected template.
  • Uses these models to provide pairwise distance constraints that are input to the ab initio and multi-template modelling tool Poing. [11]
  • Poing synthesises the user protein in the context of these distance constraints, modelled by springs. Regions for which there is no template information are modelled by the ab initio simplified physics model of Poing.
  • The complete model generated by Poing is combined with the original templates as input to MODELLER.

Applications

Applications of Phyre and Phyre2 include protein structure prediction, function prediction, domain prediction, domain boundary prediction, evolutionary classification of proteins, guiding site-directed mutagenesis and solving protein crystal structures by molecular replacement.

There are two linked resources that use Phyre predictions for the structure-based analysis of missense variants typically resulting from single-nucleotide polymorphisms.

History

Phyre and Phyre2 are the successors to the 3D-PSSM [14] protein structure prediction system which has over 1400 citations to date. [15] 3D-PSSM was designed and developed by Lawrence Kelley [16] and Bob MacCallum [17] in the Biomolecular modelling Lab [18] at the Cancer Research UK. Phyre and Phyre2 were Lawrence Kelley in the Structural bioinformatics group, [19] Imperial College London. Components of the Phyre and Phyre2 systems were developed by Benjamin Jefferys, [20] Alex Herbert, [21] and Riccardo Bennett-Lovsey. [22] Research and development of both servers was supervised by Michael Sternberg.

Related Research Articles

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

<span class="mw-page-title-main">CASP</span> Protein structure prediction challenge

Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

Modeller, often stylized as MODELLER, is a computer program used for homology modeling to produce models of protein tertiary structures and quaternary structures (rarer). It implements a method inspired by nuclear magnetic resonance spectroscopy of proteins, termed satisfaction of spatial restraints, by which a set of geometrical criteria are used to create a probability density function for the location of each atom in the protein. The method relies on an input sequence alignment between the target amino acid sequence to be modeled and a template protein which structure has been solved.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

ESyPred3D is an automated homology modeling program. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three-dimensional structure is built using the modeling package MODELLER.

<span class="mw-page-title-main">HMMER</span> Software package for sequence analysis

HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing a profile-HMM to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues. HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and macOS.

RAPTOR is protein threading software used for protein structure prediction. It has been replaced by RaptorX, which is much more accurate than RAPTOR.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

<span class="mw-page-title-main">David T. Jones (scientist)</span> British bioinformatician

David Tudor Jones is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

Michael Joseph Ezra Sternberg is a professor at Imperial College London, where he is director of the Centre for Integrative Systems Biology and Bioinformatics and Head of the Structural bioinformatics Group.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

References

  1. Lawrence Kelley; Riccardo Bennett-Lovsey; Alex Herbert; Kieran Fleming. "Phyre: Protein Homology/analogY Recognition Engine". Structural Bioinformatics Group, Imperial College, London. Retrieved 22 April 2011.
  2. Lawrence Kelley; Benjamin Jefferys. "Phyre2: Protein Homology/analogY Recognition Engine V 2.0". Structural Bioinformatics Group, Imperial College, London. Retrieved 22 April 2011.
  3. Kelley, L. A.; Sternberg, M. J. E. (2009). "Protein structure prediction on the Web: A case study using the Phyre server" (PDF). Nature Protocols. 4 (3): 363–71. doi:10.1038/nprot.2009.2. hdl: 10044/1/18157 . PMID   19247286. S2CID   12497300.
  4. Number of results returned from a search on Google Scholar (Google Scholar search)
  5. "Help: About PHYRE2". PHYRE Protein Fold Recognition Server. The work on developing the Phyre2 web server is supported by a BBSRC tools and resources grant
  6. Bennett-Lovsey, R. M.; Herbert, A. D.; Sternberg, M. J. E.; Kelley, L. A. (2007). "Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre". Proteins: Structure, Function, and Bioinformatics. 70 (3): 611–25. doi:10.1002/prot.21688. PMID   17876813. S2CID   23530683.
  7. McGuffin, L. J.; Bryson, K.; Jones, D. T. (2000). "The PSIPRED protein structure prediction server". Bioinformatics. 16 (4): 404–5. doi: 10.1093/bioinformatics/16.4.404 . PMID   10869041.
  8. Jones, D. T.; Ward, J. J. (2003). "Prediction of disordered regions in proteins from position specific score matrices". Proteins: Structure, Function, and Genetics. 53: 573–8. doi:10.1002/prot.10528. PMID   14579348. S2CID   6081008.
  9. Wass, M. N.; Kelley, L. A.; Sternberg, M. J. E. (2010). "3DLigand Site: Predicting ligand-binding sites using similar structures". Nucleic Acids Research. 38 (Web Server issue): W469–W473. doi:10.1093/nar/gkq406. PMC   2896164 . PMID   20513649.
  10. Jones, D. T. (2007). "Improving the accuracy of transmembrane protein topology prediction using evolutionary information". Bioinformatics. 23 (5): 538–44. doi: 10.1093/bioinformatics/btl677 . PMID   17237066.
  11. Jefferys, B. R.; Kelley, L. A.; Sternberg, M. J. E. (2010). "Protein Folding Requires Crowd Control in a Simulated Cell". Journal of Molecular Biology. 397 (5): 1329–38. doi:10.1016/j.jmb.2010.01.074. PMC   2891488 . PMID   20149797.
  12. Ofoegbu, Tochukwu C.; David, Alessia; Kelley, Lawrence A.; Mezulis, Stefans; Islam, Suhail A.; Mersmann, Sophia F.; Strömich, Léonie; Vakser, Ilya A.; Houlston, Richard S.; Sternberg, Michael J.E. (2019). "PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants". Journal of Molecular Biology. 431 (13): 2460–2466. doi: 10.1016/j.jmb.2019.04.043 . ISSN   0022-2836. PMC   6597944 . PMID   31075275.
  13. Ittisoponpisan, Sirawit; Islam, Suhail A.; Khanna, Tarun; Alhuzimi, Eman; David, Alessia; Sternberg, Michael J.E. (2019). "Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated?". Journal of Molecular Biology. 431 (11): 2197–2212. doi: 10.1016/j.jmb.2019.04.009 . ISSN   0022-2836. PMC   6544567 . PMID   30995449.
  14. Kelley, L. A.; MacCallum, R. M.; Sternberg, M. J. E. (2000). "Enhanced genome annotation using structural profiles in the program 3D-PSSM". Journal of Molecular Biology. 299 (2): 501–522. doi:10.1006/jmbi.2000.3741. PMID   10860755.
  15. Number of results returned from a search on Google Scholar. (Google Scholar search)
  16. Dr. Lawrence Kelley
  17. Dr. Bob Maccallum
  18. "Biomolecular Modelling laboratory". Archived from the original on 2011-09-29. Retrieved 2011-03-09.
  19. Structural Bioinformatics Group
  20. "Dr. Benjamin Jefferys". Archived from the original on 2011-04-18. Retrieved 2011-03-28.
  21. Dr. Alex Herbert [ permanent dead link ]
  22. Dr. Riccardo Bennett-Lovsey