This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment.
NAME | Description | Class | Type | Flexible | Link | Author | Year |
---|---|---|---|---|---|---|---|
ARTEMIS [1] | Topology-independent superposition of RNA/DNA 3D structures and structure-based sequence alignment | AllA | Pair | No | download | Bohdan D.R.; Bujnicki J.M.; Baulin E.F. | 2024 |
ARTEM [2] [3] | Superposition of two arbitrary RNA/DNA 3D structure fragments & 3D motif identification | AllA | Pair | No | download | Bohdan D.R.; Voronina V.V.; Bujnicki J.M.; Baulin E.F. | 2023 |
foldseek [4] | Fast and accurate protein structure alignment and visualisation | Seq | Pair | Yes | server download | M. van Kempen & S. Kim & C. Tumescheit & M. Mirdita & J. Lee & C. Gilchrist & J. Söding & M. Steinegger | 2023 |
3decision | Protein structure repository with visualisation and structural analytics tools | Seq | Multi | Yes | site | P. Schmidtke | 2015 |
MAMMOTH | MAtching Molecular Models Obtained from Theory | Cα | Pair | No | server download | CEM Strauss & AR Ortiz | 2002 |
CE | Combinatorial Extension | Cα | Pair | No | server | I. Shindyalov | 2000 |
CE-MC | Combinatorial Extension-Monte Carlo | Cα | Multi | No | server | C. Guda | 2004 |
DaliLite | Distance Matrix Alignment | C-Map | Pair | No | server and download | L. Holm | 1993 |
TM-align | TM-score based protein structure alignment | Cα | Pair | nil | server and download | Y. Zhang & J. Skolnick | 2005 |
mTM-align | Multiple protein structure alignment based on TM-align | Cα | Multi | No | server and download | R. Dong, Z. Peng, Y. Zhang & J. Yang | 2018 |
VAST | Vector Alignment Search Tool | SSE | Pair | nil | server | S. Bryant | 1996 |
PrISM | Protein Informatics Systems for Modeling | SSE | Multi | nil | server | B. Honig | 2000 |
MOE | Molecular Operating Environment. Extensive platform for protein and protein-ligand structure modelling. | Cα, AllA, Seq | Multi | No | site | Chemical Computing Group | 2000 |
SSAP | Sequential Structure Alignment Program | SSE | Multi | No | server | C. Orengo & W. Taylor | 1989 |
SARF2 | Spatial ARrangements of Backbone Fragments | SSE | Pair | nil | server | N. Alexandrov | 1996 |
KENOBI/K2 | NA | SSE | Pair | nil | server | Z. Weng | 2000 |
STAMP | STructural Alignment of Multiple Proteins | Cα | Multi | No | download server | R. Russell & G. Barton | 1992 |
MASS | Multiple Alignment by Secondary Structure | SSE | Multi | No | server | O. Dror & H. Wolfson | 2003 |
SCALI | Structural Core ALIgnment of proteins | Seq/C-Map | Pair | nil | server download | X. Yuan & C. Bystroff | 2004 |
DEJAVU | NA | SSE | Pair | nil | server | GJ. Kleywegt | 1997 |
SSM | Secondary Structure Matching | SSE | Multi | nil | server | E. Krissinel | 2003 |
SHEBA | Structural Homology by Environment-Based Alignment | Seq | Pair | nil | server | J Jung & B Lee | 2000 |
LGA [5] | Local-Global Alignment, and Global Distance Test (GDT-TS) structure similarity measure | Cα, AllA, any atom | Pair | nil | server and download | A. Zemla | 2003 |
POSA | Partial Order Structure Alignment | Cα | Multi | Yes | server | Y. Ye & A. Godzik | 2005 |
PyMOL | "super" command does sequence-independent 3D alignment | Protein | Hybrid | No | site | W. L. DeLano | 2007 |
FATCAT | Flexible Structure AlignmenT by Chaining Aligned Fragment Pairs Allowing Twists | Cα | Pair | Yes | server | Y. Ye & A. Godzik | 2003 |
deconSTRUCT | Database search on substructural level and pairwise alignment. | SSE | Multi | No | server | ZH. Zhang et al. | 2010 |
Matras | MArkovian TRAnsition of protein Structure | Cα & SSE | Pair | nil | server | K. Nishikawa | 2000 |
MAMMOTH-mult | MAMMOTH-based multiple structure alignment | Cα | Multi | No | server | D. Lupyan | 2005 |
Protein3Dfit | NA | C-Map | Pair | nil | server | D. Schomburg | 1994 |
PRIDE | PRobability of IDEntity | Cα | Pair | nil | server | S. Pongor | 2002 |
FAST | FAST Alignment and Search Tool | Cα | Pair | nil | server | J. Zhu | 2004 |
C-BOP | Coordinate-Based Organization of Proteins | N/A | Multi | nil | server | E. Sandelin | 2005 |
ProFit | Protein least-squares Fitting | Cα | Multi | nil | server | ACR. Martin | 1996 |
TOPOFIT | Alignment as a superimposition of common volumes at a topomax point | Cα | Pair | nil | server | VA. Ilyin | 2004 |
MUSTANG | MUltiple STructural AligNment AlGorithm | Cα & C-Map | Multi | nil | download | A.S. Konagurthu et al. | 2006 |
URMS | Unit-vector RMSD | Cα | Pair | nil | server | K. Kedem | 2003 |
LOCK | Hierarchical protein structure superposition | SSE | Pair | No | NA | AP. Singh | 1997 |
LOCK 2 | Improvements over LOCK | SSE | Pair | No | download | J. Shapiro | 2003 |
CBA | Consistency Based Alignment | SSE | Multi | nil | download | J. Ebert | 2006 |
TetraDA | Tetrahedral Decomposition Alignment | SSE | Multi | Yes | NA | J. Roach | 2005 |
STRAP | STRucture based Alignment Program | Cα | Multi | nil | server | C. Gille | 2006 |
LOVOALIGN | Low Order Value Optimization methods for Structural Alignment | Cα | Pair | nil | server | Andreani et al. | 2006 |
GANGSTA | Genetic Algorithm for Non-sequential, Gapped protein STructure Alignment | SSE/C-Map | Pair | No | server | B. Kolbeck | 2006 |
GANGSTA+ | Combinatorial algorithm for nonsequential and gapped structural alignment | SSE/C-Map | Pair | No | server | A. Guerler & E.W. Knapp | 2008 |
MatAlign [6] | Protein Structure Comparison by Matrix Alignment | C-Map | Pair | nil | site | Z. Aung & K.L. Tan | 2006 |
Vorolign | Fast structure alignment using Voronoi contacts | C-Map | Multi | Yes | server | F. Birzele et al. | 2006 |
EXPRESSO | Fast Multiple Structural Alignment using T-Coffee and Sap | Cα | Multi | nil | site | C. Notredame et al. | 2007 |
CAALIGN | Cα Align | Cα | Multi | nil | site | T.J. Oldfield | 2007 |
YAKUSA | Internal Coordinates and BLAST type algorithm | Cα | Pair | nil | site | M. Carpentier et al. | 2005 |
BLOMAPS | Conformation-based alphabet alignments | Cα | Multi | nil | server | W-M. Zheng & S. Wang | 2008 |
CLEPAPS | Conformation-based alphabet alignments | Cα | Pair | nil | server | W-M. Zheng & S. Wang | 2008 |
TALI F | Torsion Angle ALIgnment | Cα | Pair | No | NA | X. Mioa | 2006 |
MolCom | NA | Geometry | Multi | nil | NA | S.D. O'Hearn | 2003 |
MALECON | NA | Geometry | Multi | nil | NA | S. Wodak | 2004 |
FlexProt | Flexible Alignment of Protein Structures | Cα | Pair | Yes | server | M. Shatsky & H. Wolfson | 2002 |
MultiProt | Multiple Alignment of Protein Structures | Geometry | Multi | No | server | M. Shatsky & H. Wolfson | 2004 |
CTSS | Protein Structure Alignment Using Local Geometrical Features | Geometry | Pair | nil | site | T. Can | 2004 |
CURVE | NA | Geometry | Multi | No | site | D. Zhi | 2006 |
Matt | Multiple Alignment with Translations and Twists | Cα | Multi | Yes | server download | M. Menke | 2008 |
TopMatch [7] | Protein structure alignment and visualization of structural similarities; alignment of multiprotein complexes | Cα | Pair | No | server download | M. Sippl & M. Wiederstein | 2012 |
SSGS | Secondary Structure Guided Superimposition | Ca | Pair | No | site | G. Wainreb et al. | 2006 |
Matchprot | Comparison of protein structures by growing neighborhood alignments | Cα | Pair | No | server | S. Bhattacharya et al. | 2007 |
UCSF Chimera | see MatchMaker tool and "matchmaker" command | Seq & SSE | Multi | No | site | E. Meng et al. | 2006 |
FLASH | Fast aLignment Algorithm for finding Structural Homology of proteins | SSE | Pair | No | NA | E.S.C. Shih & M-J Hwang | 2003 |
RAPIDO | Rapid Alignment of Protein structures In the presence of Domain mOvements | Cα | Pair | Yes | server | R. Mosca & T.R. Schneider | 2008 |
ComSubstruct | Structural Alignment based on Differential Geometrical Encoding | Geometry | Pair | Yes | site | N. Morikawa | 2008 |
ProCKSI | Protein (Structure) Comparison, Knowledge, Similarity and Information | Other | Pair | No | site | D. Barthel et al. | 2007 |
SARST | Structure similarity search Aided by Ramachandran Sequential Transformation | Cα | Pair | nil | site | W-C. Lo et al. | 2007 |
Fr-TM-align | Fragment-TM-score based protein structure alignment | Cα | Pair | no | site | S.B. Pandit & J. Skolnick | 2008 |
TOPS+ COMPARISON | Comparing topological models of protein structures enhanced with ligand information | Topology | Pair | Yes | server | M. Veeramalai & D. Gilbert | 2008 |
TOPS++FATCAT | Flexible Structure AlignmenT by Chaining Aligned Fragment Pairs Allowing Twists derived from TOPS+ String Model | Cα | Pair | Yes | server | M. Veeramalai et al. | 2008 |
MolLoc | Molecular Local Surface Alignment | Surf | Pair | No | server | M.E. Bock et al. | 2007 |
FASE | Flexible Alignment of Secondary Structure Elements | SSE | Pair | Yes | NA | J. Vesterstrom & W. R. Taylor | 2006 |
SABERTOOTH | Protein Structural Alignment based on a vectorial Structure Representation | Cα | Pair | Yes | server | F. Teichert et al. | 2007 |
STON | NA | Cα | Pair | No | site | C. Eslahchi et al. | 2009 |
SALIGN | Sequence-Structure Hybrid Method | Seq | Multi | No | site | M.S. Madhusudhan et al. | 2007 |
MAX-PAIRS | NA | Cα | Pair | No | site | A. Poleksic | 2009 |
THESEUS | Maximum likelihood superpositioning | Cα | Multi | No | site | D.L. Theobald & D.S. Wuttke | 2006 |
TABLEAUSearch | Structural Search and Retrieval using a Tableau Representation of Protein Folding Patterns | SSE | Pair | No | server | A.S. Konagurthu et al. | 2008 |
QP Tableau Search | Tableau-based protein substructure search using quadratic programming | SSE | Pair | No | download server | A.Stivala et al. | 2009 |
ProSMoS | Protein Structure Motif Search | SSE | Pair | No | server download | S. Shi et al. | 2007 |
MISTRAL | Energy-based multiple structural alignment of proteins | Cα | Multi | No | server | C. Micheletti & H. Orland | 2009 |
MSVNS for MaxCMO | A simple and fast heuristic for protein structure comparison | C-Map | Pair | No | site | D. Pelta et al. | 2008 |
Structal | Least Squares Root Mean Square deviation minimization by dynamic programming | Cα | Pair | No | server download | Gerstein & Levitt | 2005 |
ProBiS [8] | Detection of Structurally Similar Protein Binding Sites by Local Structural Alignment | Surf | Pair | Yes | server download | J. Konc & D. Janezic | 2010 |
ALADYN | Dynamics-based Alignment: superposing proteins by matching their collective movements | Cα | Pair | No | server | Potestio et al. | 2010 |
SWAPSC | Sliding Window Analysis Procedure for detecting Selective Constraints for analysing genetic data structured for a family or phylogenetic tree using constraints in protein-coding sequence alignments. | Seq | Multi | yes | Server | Mario A. Fares | 2004 |
SA Tableau Search | Fast and accurate protein substructure searching with simulated annealing and GPUs | SSE | Pair | No | download server | A.Stivala et al. | 2010 |
RCSB PDB Protein Comparison Tool | Provides CE, FATCAT, CE variation for Circular Permutations, Sequence Alignments | Cα | Pair | yes | server download | A. Prlic et al. | 2010 |
CSR | Maximal common 3D motif; non-parametric; outputs pairwise correspondence; works also on small molecules | SSE or Cα | Pair | No | server download | M. Petitjean | 1998 |
EpitopeMatch | discontinuous structure matching; induced fit consideration; flexible geometrical and physicochemical specificity definition; transplantation of similar spatial arrangements of amino acid residues | Cα-AllA | Multi | Yes | download | S. Jakuschev | 2011 |
CLICK | Topology-independent 3D structure comparison | SSE & Cα & SASA | Pair | Yes | server | M. Nguyen | 2011 |
Smolign | Spatial motifs based protein structural alignment | SSE & C-Map | Multi | Yes | download | H. Sun | 2010 |
3D-Blast | Comparing three-dimensional shape-density | Density | Pair | No | server | L. Mavridis et al. | 2011 |
DEDAL | DEscriptor Defined ALignment | SSE & Cα & C-Map | Pair | Yes | server | P. Daniluk & B. Lesyng | 2011 |
msTALI | multiple sTructure ALIgnment | Cα & Dihed & SSE & Surf | Multi | Yes | server | P. Shealy & H. Valafar | 2012 |
mulPBA | multiple PB sequence alignment | PB | Multi | Yes | NA | A.P. Joseph et al. | 2012 |
SAS-Pro | Similtaneous Alignment and Superimposition of PROteins | ??? | Pair | Yes | server | Shah & Sahinidis | 2012 |
MIRAGE-align | Match Index based structural alignment method | SSE & PPE | Pair | No | website | K. Hung et al. | 2012 |
SPalign | Structure Pairwise alignment | Cα | Pair | No | server download | Y. Yang et al. | 2012 |
Kpax [9] | Fast Pairwise or Multiple Alignments using Gaussian Overlap | Other | Pair | Yes | website | D.W. Ritchie | 2016 |
DeepAlign [10] | Protein structure alignment beyond spatial proximity (evolutionary information and hydrogen-bonding are taken into consideration) | Cα + Seq | Pair | No | download server | S. Wang and J. Xu | 2013 |
3DCOMB [11] | extension of DeepAlign | Cα | Multi | No | download server | S. Wang and J. Xu | 2012 |
TS-AMIR [12] | A topology string alignment method for intensive rapid protein structure comparison | SSE & Cα | Pair | No | NA | J. Razmara et al. | 2012 |
MICAN [13] | MICAN can handle Multiple-chains, Inverse alignments, C α only models, Alternative alignments, and Non-sequential alignments | Cα | Pair | No | download | S.Minami et al. | 2013 |
SPalignNS [14] | Structure Pairwise alignment Non-Sequential | Cα | Pair | No | server download | P. Brown et al. | 2015 |
Fit3D [15] | highly accurate screening for small structural motifs featuring definition of position-specific exchanges, detection of intra- and inter-molecular occurrences, definition of arbitrary atoms used for motif alignment | AllA, Cα | Multi | No | server download | F. Kaiser et al. | 2015 |
MMLigner [16] | Bayesian statistical inference of alignments based on information theory and compression. | Cα | Pair | Yes | server download | J. Collier et al. | 2017 |
RCSB PDB strucmotif-search [17] | Small structural motifs search that takes seconds to run on 180k or more structures, with nucleic acid & bioassembly support | AllA | Multi | No | server/documentation download | S. Bittrich et al. | 2020 |
Key map:
The beta sheet is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design.
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.
BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.
In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The latest version of Pfam, 37.0, was released in June 2024 and contains 21,979 families. It is currently provided through InterPro website.
Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations, insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from Protein Data Bank (PDB) files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported.
Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.
A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.
I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.
Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.
An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)