Protein superfamily

Last updated

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment [1] and mechanistic similarity, even if no sequence similarity is evident. [2] Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems. [2] [3]

Contents

Identification

Structure vs sequence in the PA clan.png
Above, secondary structural conservation of 80 members of the PA protease clan (superfamily). H indicates α-helix, E indicates β-sheet, L indicates loop. Below, sequence conservation for the same alignment. Arrows indicate catalytic triad residues. Aligned on the basis of structure by DALI

Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.

Sequence similarity

A sequence alignment of mammalian histone proteins. The similarity of the sequences implies that they evolved by gene duplication. Residues that are conserved across all sequences are highlighted in grey. Below the protein sequences is a key denoting:
.mw-parser-output .hlist dl,.mw-parser-output .hlist ol,.mw-parser-output .hlist ul{margin:0;padding:0}.mw-parser-output .hlist dd,.mw-parser-output .hlist dt,.mw-parser-output .hlist li{margin:0;display:inline}.mw-parser-output .hlist.inline,.mw-parser-output .hlist.inline dl,.mw-parser-output .hlist.inline ol,.mw-parser-output .hlist.inline ul,.mw-parser-output .hlist dl dl,.mw-parser-output .hlist dl ol,.mw-parser-output .hlist dl ul,.mw-parser-output .hlist ol dl,.mw-parser-output .hlist ol ol,.mw-parser-output .hlist ol ul,.mw-parser-output .hlist ul dl,.mw-parser-output .hlist ul ol,.mw-parser-output .hlist ul ul{display:inline}.mw-parser-output .hlist .mw-empty-li{display:none}.mw-parser-output .hlist dt::after{content:": "}.mw-parser-output .hlist dd::after,.mw-parser-output .hlist li::after{content:" * ";font-weight:bold}.mw-parser-output .hlist dd:last-child::after,.mw-parser-output .hlist dt:last-child::after,.mw-parser-output .hlist li:last-child::after{content:none}.mw-parser-output .hlist dd dd:first-child::before,.mw-parser-output .hlist dd dt:first-child::before,.mw-parser-output .hlist dd li:first-child::before,.mw-parser-output .hlist dt dd:first-child::before,.mw-parser-output .hlist dt dt:first-child::before,.mw-parser-output .hlist dt li:first-child::before,.mw-parser-output .hlist li dd:first-child::before,.mw-parser-output .hlist li dt:first-child::before,.mw-parser-output .hlist li li:first-child::before{content:" (";font-weight:normal}.mw-parser-output .hlist dd dd:last-child::after,.mw-parser-output .hlist dd dt:last-child::after,.mw-parser-output .hlist dd li:last-child::after,.mw-parser-output .hlist dt dd:last-child::after,.mw-parser-output .hlist dt dt:last-child::after,.mw-parser-output .hlist dt li:last-child::after,.mw-parser-output .hlist li dd:last-child::after,.mw-parser-output .hlist li dt:last-child::after,.mw-parser-output .hlist li li:last-child::after{content:")";font-weight:normal}.mw-parser-output .hlist ol{counter-reset:listitem}.mw-parser-output .hlist ol>li{counter-increment:listitem}.mw-parser-output .hlist ol>li::before{content:" "counter(listitem)"\a0 "}.mw-parser-output .hlist dd ol>li:first-child::before,.mw-parser-output .hlist dt ol>li:first-child::before,.mw-parser-output .hlist li ol>li:first-child::before{content:" ("counter(listitem)"\a0 "}
* conserved sequence,
: conservative mutations,
. semi-conservative mutations, and
 non-conservative mutations. Histone Alignment.png
A sequence alignment of mammalian histone proteins. The similarity of the sequences implies that they evolved by gene duplication. Residues that are conserved across all sequences are highlighted in grey. Below the protein sequences is a key denoting:

Historically, the similarity of different amino acid sequences has been the most common method of inferring homology. [5] Sequence similarity is considered a good predictor of relatedness, since similar sequences are more likely the result of gene duplication and divergent evolution, rather than the result of convergent evolution. Amino acid sequence is typically more conserved than DNA sequence (due to the degenerate genetic code), so it is a more sensitive detection method. Since some of the amino acids have similar properties (e.g., charge, hydrophobicity, size), conservative mutations that interchange them are often neutral to function. The most conserved sequence regions of a protein often correspond to functionally important regions like catalytic sites and binding sites, since these regions are less tolerant to sequence changes.

Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many insertions and deletions can also sometimes be difficult to align and so identify the homologous sequence regions. In the PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures. [6] In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily. [6]

Structural similarity

Structural homology in the PA superfamily (PA clan). The double b-barrel that characterises the superfamily is highlighted in red. Shown are representative structures from several families within the PA superfamily. Note that some proteins show partially modified structural. Chymotrypsin (1gg6), tobacco etch virus protease (1lvm), calicivirin (1wqs), west nile virus protease (1fp7), exfoliatin toxin (1exf), HtrA protease (1l1j), snake venom plasminogen activator (1bqy), chloroplast protease (4fln) and equine arteritis virus protease (1mbm). Structural homology of the PA clan.png
Structural homology in the PA superfamily (PA clan). The double β-barrel that characterises the superfamily is highlighted in red. Shown are representative structures from several families within the PA superfamily. Note that some proteins show partially modified structural. Chymotrypsin (1gg6), tobacco etch virus protease (1lvm), calicivirin (1wqs), west nile virus protease (1fp7), exfoliatin toxin (1exf), HtrA protease (1l1j), snake venom plasminogen activator (1bqy), chloroplast protease (4fln) and equine arteritis virus protease (1mbm).

Structure is much more evolutionarily conserved than sequence, such that proteins with highly similar structures can have entirely different sequences. [7] Over very long evolutionary timescales, very few residues show detectable amino acid sequence conservation, however secondary structural elements and tertiary structural motifs are highly conserved. Some protein dynamics [8] and conformational changes of the protein structure may also be conserved, as is seen in the serpin superfamily. [9] Consequently, protein tertiary structure can be used to detect homology between proteins even when no evidence of relatedness remains in their sequences. Structural alignment programs, such as DALI, use the 3D structure of a protein of interest to find proteins with similar folds. [10] However, on rare occasions, related proteins may evolve to be structurally dissimilar [11] and relatedness can only be inferred by other methods. [12] [13] [14]

Mechanistic similarity

The catalytic mechanism of enzymes within a superfamily is commonly conserved, although substrate specificity may be significantly different. [15] Catalytic residues also tend to occur in the same order in the protein sequence. [16] For the families within the PA clan of proteases, although there has been divergent evolution of the catalytic triad residues used to perform catalysis, all members use a similar mechanism to perform covalent, nucleophilic catalysis on proteins, peptides or amino acids. [17] However, mechanism alone is not sufficient to infer relatedness. Some catalytic mechanisms have been convergently evolved multiple times independently, and so form separate superfamilies, [18] [19] [20] and in some superfamilies display a range of different (though often chemically similar) mechanisms. [15] [21]

Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry. [22] They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA). [23]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).

Diversification

A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains. [5] Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”. [5] [1] When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations. [5]

Examples

α/β hydrolase superfamily
Members share an α/β sheet, containing 8 strands connected by helices, with catalytic triad residues in the same order, [24] activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases. [25]
Alkaline phosphatase superfamily
Members share an αβα sandwich structure [26] as well as performing common promiscuous reactions by a common mechanism. [27]
Globin superfamily
Members share an 8-alpha helix globular globin fold. [28] [29]
Immunoglobulin superfamily
Members share a sandwich-like structure of two sheets of antiparallel β strands (Ig-fold), and are involved in recognition, binding, and adhesion. [30] [31]
PA clan
Members share a chymotrypsin-like double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both cysteine and serine proteases (different nucleophiles). [2] [32]
Ras superfamily
Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices. [33]
RSH superfamily
Members share capability to hydrolyze and/or synthesize ppGpp alarmones in the stringent response. [34]
Serpin superfamily
Members share a high-energy, stressed fold which can undergo a large conformational change, which is typically used to inhibit serine and cysteine proteases by disrupting their structure. [9]
TIM barrel superfamily
Members share a large α8β8 barrel structure. It is one of the most common protein folds and the monophylicity of this superfamily is still contested. [35] [36]

Protein superfamily resources

Several biological databases document protein superfamilies and protein folds, for example:

Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:

See also

Related Research Articles

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

In biology and biochemistry, protease inhibitors, or antiproteases, are molecules that inhibit the function of proteases. Many naturally occurring protease inhibitors are proteins.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">Rossmann fold</span> Protein fold

The Rossmann fold is a tertiary fold found in proteins that bind nucleotides, such as enzyme cofactors FAD, NAD+, and NADP+. This fold is composed of alternating beta strands and alpha helical segments where the beta strands are hydrogen bonded to each other forming an extended beta sheet and the alpha helices surround both faces of the sheet to produce a three-layered sandwich. The classical Rossmann fold contains six beta strands whereas Rossmann-like folds, sometimes referred to as Rossmannoid folds, contain only five strands. The initial beta-alpha-beta (bab) fold is the most conserved segment of the Rossmann fold. The motif is named after Michael Rossmann who first noticed this structural motif in the enzyme lactate dehydrogenase in 1970 and who later observed that this was a frequently occurring motif in nucleotide binding proteins.

<span class="mw-page-title-main">Catalytic triad</span> Set of three coordinated amino acids

A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes. An acid-base-nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine amino acid, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">TIM barrel</span> Protein fold

The TIM barrel, also known as an alpha/beta barrel, is a conserved protein fold consisting of eight alpha helices (α-helices) and eight parallel beta strands (β-strands) that alternate along the peptide backbone. The structure is named after triose-phosphate isomerase, a conserved metabolic enzyme. TIM barrels are ubiquitous, with approximately 10% of all enzymes adopting this fold. Further, five of seven enzyme commission (EC) enzyme classes include TIM barrel proteins. The TIM barrel fold is evolutionarily ancient, with many of its members possessing little similarity today, instead falling within the twilight zone of sequence similarity.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Alpha/beta hydrolase superfamily</span>

The alpha/beta hydrolase superfamily is a superfamily of hydrolytic enzymes of widely differing phylogenetic origin and catalytic function that share a common fold. The core of each enzyme is an alpha/beta-sheet, containing 8 beta strands connected by 6 alpha helices. The enzymes are believed to have diverged from a common ancestor, retaining little obvious sequence similarity, but preserving the arrangement of the catalytic residues. All have a catalytic triad, the elements of which are borne on loops, which are the best-conserved structural features of the fold.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Subtilase</span>

Subtilases are a family of subtilisin-like serine proteases. They appear to have independently and convergently evolved an Asp/Ser/His catalytic triad, like in the trypsin serine proteases. The structure of proteins in this family shows that they have an alpha/beta fold containing a 7-stranded parallel beta sheet.

<span class="mw-page-title-main">Circular permutation in proteins</span> Arrangement of amino acid sequence

A circular permutation is a relationship between proteins whereby the proteins have a changed order of amino acids in their peptide sequence. The result is a protein structure with different connectivity, but overall similar three-dimensional (3D) shape. In 1979, the first pair of circularly permuted proteins – concanavalin A and lectin – were discovered; over 2000 such proteins are now known.

In molecular biology, glycoside hydrolase family 97 is a family of glycoside hydrolases.

<span class="mw-page-title-main">Glycoside hydrolase family 27</span>

In molecular biology, glycoside hydrolase family 27 is a family of glycoside hydrolases.

<span class="mw-page-title-main">Glycoside hydrolase family 36</span>

In molecular biology, glycoside hydrolase family 36 is a family of glycoside hydrolases.

<span class="mw-page-title-main">PA clan of proteases</span>

The PA clan is the largest group of proteases with common ancestry as identified by structural homology. Members have a chymotrypsin-like fold and similar proteolysis mechanisms but can have identity of <10%. The clan contains both cysteine and serine proteases. PA clan proteases can be found in plants, animals, fungi, eubacteria, archaea and viruses.

References

  1. 1 2 Holm L, Rosenström P (July 2010). "Dali server: conservation mapping in 3D". Nucleic Acids Research. 38 (Web Server issue): W545–9. doi:10.1093/nar/gkq366. PMC   2896194 . PMID   20457744.
  2. 1 2 3 Rawlings ND, Barrett AJ, Bateman A (January 2012). "MEROPS: the database of proteolytic enzymes, their substrates and inhibitors". Nucleic Acids Research. 40 (Database issue): D343–50. doi:10.1093/nar/gkr987. PMC   3245014 . PMID   22086950.
  3. Henrissat B, Bairoch A (June 1996). "Updating the sequence-based classification of glycosyl hydrolases". The Biochemical Journal. 316 (Pt 2): 695–6. doi:10.1042/bj3160695. PMC   1217404 . PMID   8687420.
  4. "Clustal FAQ #Symbols". Clustal. Archived from the original on 24 October 2016. Retrieved 8 December 2014.
  5. 1 2 3 4 Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J (April 2007). "The folding and evolution of multidomain proteins". Nature Reviews Molecular Cell Biology. 8 (4): 319–30. doi:10.1038/nrm2144. PMID   17356578. S2CID   13762291.
  6. 1 2 Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N (January 2002). "SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes". Nucleic Acids Research. 30 (1): 289–93. doi:10.1093/nar/30.1.289. PMC   99061 . PMID   11752317.
  7. Orengo CA, Thornton JM (2005). "Protein families and their evolution-a structural perspective". Annual Review of Biochemistry. 74 (1): 867–900. doi:10.1146/annurev.biochem.74.082803.133029. PMID   15954844.
  8. Liu Y, Bahar I (September 2012). "Sequence evolution correlates with structural dynamics". Molecular Biology and Evolution. 29 (9): 2253–63. doi:10.1093/molbev/mss097. PMC   3424413 . PMID   22427707.
  9. 1 2 Silverman GA, Bird PI, Carrell RW, Church FC, Coughlin PB, Gettins PG, Irving JA, Lomas DA, Luke CJ, Moyer RW, Pemberton PA, Remold-O'Donnell E, Salvesen GS, Travis J, Whisstock JC (September 2001). "The serpins are an expanding superfamily of structurally similar but functionally diverse proteins. Evolution, mechanism of inhibition, novel functions, and a revised nomenclature". The Journal of Biological Chemistry. 276 (36): 33293–6. doi: 10.1074/jbc.R100016200 . PMID   11435447.
  10. Holm L, Laakso LM (July 2016). "Dali server update". Nucleic Acids Research. 44 (W1): W351–5. doi:10.1093/nar/gkw357. PMC   4987910 . PMID   27131377.
  11. Pascual-García A, Abia D, Ortiz ÁR, Bastolla U (2009). "Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures". PLOS Computational Biology. 5 (3): e1000331. Bibcode:2009PLSCB...5E0331P. doi: 10.1371/journal.pcbi.1000331 . PMC   2654728 . PMID   19325884.
  12. Li D, Zhang L, Yin H, Xu H, Satkoski Trask J, Smith DG, Li Y, Yang M, Zhu Q (June 2014). "Evolution of primate α and θ defensins revealed by analysis of genomes". Molecular Biology Reports. 41 (6): 3859–66. doi:10.1007/s11033-014-3253-z. PMID   24557891. S2CID   14936647.
  13. Krishna SS, Grishin NV (April 2005). "Structural drift: a possible path to protein fold change". Bioinformatics. 21 (8): 1308–10. doi: 10.1093/bioinformatics/bti227 . PMID   15604105.
  14. Bryan PN, Orban J (August 2010). "Proteins that switch folds". Current Opinion in Structural Biology. 20 (4): 482–8. doi:10.1016/j.sbi.2010.06.002. PMC   2928869 . PMID   20591649.
  15. 1 2 Dessailly, Benoit H.; Dawson, Natalie L.; Das, Sayoni; Orengo, Christine A. (2017), "Function Diversity within Folds and Superfamilies", From Protein Structure to Function with Bioinformatics, Springer Netherlands, pp. 295–325, doi:10.1007/978-94-024-1069-3_9, ISBN   9789402410679
  16. Echave J, Spielman SJ, Wilke CO (February 2016). "Causes of evolutionary rate variation among protein sites". Nature Reviews. Genetics. 17 (2): 109–21. doi:10.1038/nrg.2015.18. PMC   4724262 . PMID   26781812.
  17. Shafee T, Gatti-Lafranconi P, Minter R, Hollfelder F (September 2015). "Handicap-Recover Evolution Leads to a Chemically Versatile, Nucleophile-Permissive Protease". ChemBioChem. 16 (13): 1866–1869. doi:10.1002/cbic.201500295. PMC   4576821 . PMID   26097079.
  18. Buller AR, Townsend CA (February 2013). "Intrinsic evolutionary constraints on protease structure, enzyme acylation, and the identity of the catalytic triad". Proceedings of the National Academy of Sciences of the United States of America. 110 (8): E653–61. Bibcode:2013PNAS..110E.653B. doi: 10.1073/pnas.1221050110 . PMC   3581919 . PMID   23382230.
  19. Coutinho PM, Deleury E, Davies GJ, Henrissat B (April 2003). "An evolving hierarchical family classification for glycosyltransferases". Journal of Molecular Biology. 328 (2): 307–17. doi:10.1016/S0022-2836(03)00307-3. PMID   12691742.
  20. Zámocký M, Hofbauer S, Schaffner I, Gasselhuber B, Nicolussi A, Soudi M, Pirker KF, Furtmüller PG, Obinger C (May 2015). "Independent evolution of four heme peroxidase superfamilies". Archives of Biochemistry and Biophysics. 574: 108–19. doi:10.1016/j.abb.2014.12.025. PMC   4420034 . PMID   25575902.
  21. Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E.; Barber, Alan E.; Custer, Ashley F.; Hicks, Michael A.; Huang, Conrad C.; Lauck, Florian; Mashiyama, Susan T. (2013-11-23). "The Structure–Function Linkage Database". Nucleic Acids Research. 42 (D1): D521–D530. doi:10.1093/nar/gkt1130. ISSN   0305-1048. PMC   3965090 . PMID   24271399.
  22. Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (March 2005). "Protein structure and evolutionary history determine sequence space topology". Genome Research. 15 (3): 385–92. arXiv: q-bio/0404040 . doi:10.1101/gr.3133605. PMC   551565 . PMID   15741509.
  23. Ranea JA, Sillero A, Thornton JM, Orengo CA (October 2006). "Protein superfamily evolution and the last universal common ancestor (LUCA)". Journal of Molecular Evolution. 63 (4): 513–25. Bibcode:2006JMolE..63..513R. doi:10.1007/s00239-005-0289-7. hdl:10261/78338. PMID   17021929. S2CID   25258028.
  24. Carr PD, Ollis DL (2009). "Alpha/beta hydrolase fold: an update". Protein and Peptide Letters. 16 (10): 1137–48. doi:10.2174/092986609789071298. PMID   19508187.
  25. Nardini M, Dijkstra BW (December 1999). "Alpha/beta hydrolase fold enzymes: the family keeps growing". Current Opinion in Structural Biology. 9 (6): 732–7. doi:10.1016/S0959-440X(99)00037-8. PMID   10607665.
  26. "SCOP". Archived from the original on 29 July 2014. Retrieved 28 May 2014.
  27. Mohamed MF, Hollfelder F (January 2013). "Efficient, crosswise catalytic promiscuity among enzymes that catalyze phosphoryl transfer". Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. 1834 (1): 417–24. doi:10.1016/j.bbapap.2012.07.015. PMID   22885024.
  28. Branden C, Tooze J (1999). Introduction to protein structure (2nd ed.). New York: Garland Pub. ISBN   978-0815323051.
  29. Bolognesi M, Onesti S, Gatti G, Coda A, Ascenzi P, Brunori M (February 1989). "Aplysia limacina myoglobin. Crystallographic analysis at 1.6 A resolution". Journal of Molecular Biology. 205 (3): 529–44. doi:10.1016/0022-2836(89)90224-6. PMID   2926816.
  30. Bork P, Holm L, Sander C (September 1994). "The immunoglobulin fold. Structural classification, sequence patterns and common core". Journal of Molecular Biology. 242 (4): 309–20. doi:10.1006/jmbi.1994.1582. PMID   7932691.
  31. Brümmendorf T, Rathjen FG (1995). "Cell adhesion molecules 1: immunoglobulin superfamily". Protein Profile. 2 (9): 963–1108. PMID   8574878.
  32. Bazan JF, Fletterick RJ (November 1988). "Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications". Proceedings of the National Academy of Sciences of the United States of America. 85 (21): 7872–6. Bibcode:1988PNAS...85.7872B. doi: 10.1073/pnas.85.21.7872 . PMC   282299 . PMID   3186696.
  33. Vetter IR, Wittinghofer A (November 2001). "The guanine nucleotide-binding switch in three dimensions". Science. 294 (5545): 1299–304. Bibcode:2001Sci...294.1299V. doi:10.1126/science.1062023. PMID   11701921. S2CID   6636339.
  34. Atkinson, Gemma C.; Tenson, Tanel; Hauryliuk, Vasili (2011-08-09). "The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life". PLOS ONE. 6 (8): e23479. Bibcode:2011PLoSO...623479A. doi: 10.1371/journal.pone.0023479 . ISSN   1932-6203. PMC   3153485 . PMID   21858139.
  35. Nagano N, Orengo CA, Thornton JM (August 2002). "One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions". Journal of Molecular Biology. 321 (5): 741–65. doi:10.1016/s0022-2836(02)00649-6. PMID   12206759.
  36. Farber G (1993). "An α/β-barrel full of evolutionary trouble". Current Opinion in Structural Biology. 3 (3): 409–412. doi:10.1016/S0959-440X(05)80114-9.