Swiss-model

Last updated
Swiss-model
Type Structural bioinformatics tool
License freeware, source code unavailable
Website swissmodel.expasy.org

Swiss-model (stylized as SWISS-MODEL) is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. [1] [2] As of 2024, homology modeling is the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology (or comparative) modelling methods make use of experimental protein structures (templates) to build models for evolutionary related proteins (targets).

Contents

Today, Swiss-model consists of three tightly integrated components: (1) The Swiss-model pipeline – a suite of software tools and databases for automated protein structure modelling, [1] (2) The Swiss-model Workspace – a web-based graphical user interface workbench, [2] (3) The Swiss-model Repository – a continuously updated database of homology models for a set of model organism proteomes of high biomedical interest. [3]

Pipeline

Swiss-model pipeline comprises the four main steps that are involved in building a homology model of a given protein structure:

  1. Identify structural template(s). BLAST and HHblits are used to identify templates. Those are stored in the Swiss-model Template Library (SMTL), which is derived from Protein Data Bank (PDB).
  2. Align target sequence and template structure(s).
  3. Build model and minimize energy. Swiss-model implements a rigid fragment assembly approach in modelling.
  4. Assess model quality using QMEAN, a statistical potential of mean force.

Workspace

The Swiss-model Workspace integrates programs and databases required for protein structure prediction and modelling in a web-based workspace. Depending on the complexity of the modelling task, different modes of use can be applied, in which the user has different levels of control over individual modelling steps: automated mode, alignment mode, and project mode. A fully automated mode is used when a sufficiently high sequence identity between target and template (>50%) allows for no human intervention at all. In this case only the sequence or UniProt accession code of the protein is required as input. The alignment mode enables the user to input their own target-template alignments from which the modelling procedure starts (i.e. search for templates step is skipped and rarely only minor changes in the provided alignment are made). The project mode is used in more difficult cases, when manual corrections of target-template alignments are needed to improve the quality of the resulting model. In this mode the input is a project file that can be generated by the DeepView (Swiss Pdb Viewer) visualization and structural analysis tool, [4] to allow the user to examine and manipulate the target-template alignment in its structural context. In all three cases the output is a pdb file with atom coordinates of the model or a DeepView project file. The four main steps of homology modelling may be repeated iteratively until a satisfactory model is achieved.

The Swiss-model Workspace is accessible via the ExPASy web server, or it can be used as part of the program DeepView (Swiss Pdb-Viewer). As of September 2015 it has been cited 20000 times in scientific literature, [5] making it one of the most widely used tools for protein structure modelling. The tool is free for academic use.

Repository

The Swiss-model Repository provides access to an up-to-date collection of annotated three-dimensional protein models for a set of model organisms of high general interest. Model organisms include human, [6] mouse, [7] C.elegans, [8] E.coli, [9] and various pathogens including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). [10] Swiss-model Repository is integrated with several external resources, such as UniProt, [11] InterPro, [12] STRING, [13] and Nature Protein Structure Initiative (PSI) SBKB. [14]

New developments of the Swiss-model expert system feature (1) automated modelling of homo-oligomeric assemblies; (2) modelling of essential metal ions and biologically relevant ligands in protein structures; (3) local (per-residue) model reliability estimates based on the QMEAN local score function; [15] (4) mapping of UniProt features to models. (1) and (2) are available when using the automated mode of the Swiss-model Workspace; (3) is always provided when calculating an homology model using the Swiss-model Workspace, and (4) is available in the Swiss-model Repository.

Accuracy and reliability of the method

In the past, the accuracy, stability and reliability of the Swiss-model server pipeline was validated by the EVA-CM benchmark project. As of 2024, the Swiss-model server pipeline is participating in the Continuous Automated Model EvaluatiOn (CAMEO3D) project, which continuously evaluates the accuracy and reliability of protein structure prediction services via fully automated means. [16]

Related Research Articles

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

<span class="mw-page-title-main">CASP</span> Protein structure prediction challenge

Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of a sequence alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">SymE-SymR toxin-antitoxin system</span>

The SymE-SymR toxin-antitoxin system consists of a small symbiotic endonuclease toxin, SymE, and a non-coding RNA symbiotic RNA antitoxin, SymR, which inhibits SymE translation. SymE-SymR is a type I toxin-antitoxin system, and is under regulation by the antitoxin, SymR. The SymE-SymR complex is believed to play an important role in recycling damaged RNA and DNA. The relationship and corresponding structures of SymE and SymR provide insight into the mechanism of toxicity and overall role in prokaryotic systems.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

<span class="mw-page-title-main">GeNMR</span>

GeNMR method is the first fully automated template-based method of protein structure determination that utilizes both NMR chemical shifts and NOE -based distance restraints.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.

<span class="mw-page-title-main">Torsten Schwede</span>

Torsten Schwede is a German and Swiss bioinformatics scientist and Professor at the Biozentrum of the University of Basel, Switzerland. He is also Vice Rector for Research at the University of Basel.

References

  1. 1 2 Schwede T, Kopp J, Guex N, Peitsch MC (2003). "Swiss-model: an automated protein homology-modeling server". Nucleic Acids Research. 31 (13): 3381–3385. doi:10.1093/nar/gkg520. PMC   168927 . PMID   12824332.
  2. 1 2 Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L, Schwede T (2014). "Swiss-model: modelling protein tertiary and quaternary structure using evolutionary information". Nucleic Acids Research. 42 (W1): 195–201. doi:10.1093/nar/gku340. PMC   4086089 . PMID   24782522.
  3. Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L, Schwede T (2017). "The Swiss-model Repository-new features and functionality". Nucleic Acids Research. 45 (D1): D313–D319. doi:10.1093/nar/gkw1132. PMC   5210589 . PMID   27899672.
  4. Guex N, Peitsch MC, Schwede T (2009). "Automated comparative protein structure modeling with Swiss-model and Swiss-PdbViewer: a historical perspective". Electrophoresis. 30 (Suppl 1): S162–173. doi:10.1002/elps.200900140. PMID   19517507. S2CID   39507113.
  5. Number of results returned from a search in Google Scholar. (Google Scholar)
  6. "Swiss-model – Homo sapiens". swissmodel.expasy.org. Retrieved 2020-02-14.
  7. "Swiss-model – Mus musculus". swissmodel.expasy.org. Retrieved 2020-02-14.
  8. "Swiss-model – Caenorhabditis elegans". swissmodel.expasy.org. Retrieved 2020-02-14.
  9. "Swiss-model – Escherichia coli". swissmodel.expasy.org. Retrieved 2020-02-14.
  10. "Swiss-model – SARS-CoV-2". swissmodel.expasy.org. Retrieved 2020-02-14.
  11. Wu CH, Apweiler R, Bairoch A, et al. (2006). "The Universal Protein Resource (UniProt): an expanding universe of protein information". Nucleic Acids Research. 34 (Database issue): D187–91. doi:10.1093/nar/gkj161. PMC   1347523 . PMID   16381842.
  12. Wu CH, Apweiler R, Bairoch A, et al. (2007). "InterPro and InterProScan". Comparative Genomics. Methods in Molecular Biology. Vol. 396. pp. 59–70. doi:10.1007/978-1-59745-515-2_5. ISBN   978-1-934115-37-4. PMID   18025686.
  13. Szklarczyk D, Franceschini A, Kuhn M, et al. (2011). "The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored". Nucleic Acids Research. 39 (Database issue): D561–8. doi:10.1093/nar/gkq973. PMC   3013807 . PMID   21045058.
  14. Gabanyi MJ, Adams PD, Arnold K, et al. (2011). "The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods". Journal of Structural and Functional Genomics. 12 (2): 45–54. doi:10.1007/s10969-011-9106-2. PMC   3123456 . PMID   21472436.
  15. Benkert P, Kunzli M, Schwede T (2009). "QMEAN server for protein model quality estimation". Nucleic Acids Research. 37 (Web Server issue): W510–4. doi:10.1093/nar/gkp322. PMC   2703985 . PMID   19429685.
  16. "CAMEO: Continuous Automated Model EvaluatiOn". CAMEO3d.org.

See also