Evolutionary Classification of Protein Domains

Last updated
ECOD
Content
Data types
captured
Protein domains
Contact
Research center Grishin Lab, University of Texas Southwestern Medical Center
AuthorsH. Cheng, R. D. Schaeffer, Y. Liao, L. N. Kinch, J. Pei, S. Shi, B. H. Kim, N. V. Grishin.
Primary citation PMID   25474468
Release date2014
Access
Website http://prodata.swmed.edu/ecod/
Miscellaneous
Versioncontinuously updated
Curation policymanual for new proteins; automated for ones with close matches

The Evolutionary Classification of Protein Domains (ECOD) is a biological database that classifies protein domains available from the Protein Data Bank. The ECOD tries to determine the evolutionary relationships between proteins.

Similar to Pfam, CATH, and SCOP, ECOD compiles domains instead of whole proteins. However, ECOD focuses on evolutionary relationships more heavily: instead of grouping proteins by folds, which may simply represent convergent evolution, ECOD groups proteins by demonstratable homology only. [1]

Related Research Articles

<span class="mw-page-title-main">Gram-positive bacteria</span> Bacteria that give a positive result in the Gram stain test

In bacteriology, gram-positive bacteria are bacteria that give a positive result in the Gram stain test, which is traditionally used to quickly classify bacteria into two broad categories according to their type of cell wall.

In biology, taxonomy is the scientific study of naming, defining (circumscribing) and classifying groups of biological organisms based on shared characteristics. Organisms are grouped into taxa and these groups are given a taxonomic rank; groups of a given rank can be aggregated to form a more inclusive group of higher rank, thus creating a taxonomic hierarchy. The principal ranks in modern use are domain, kingdom, phylum, class, order, family, genus, and species. The Swedish botanist Carl Linnaeus is regarded as the founder of the current system of taxonomy, as he developed a ranked system known as Linnaean taxonomy for categorizing organisms and binomial nomenclature for naming organisms.

<span class="mw-page-title-main">Domain (biology)</span> Taxonomic rank

In biological taxonomy, a domain, also dominion, superkingdom, realm, or empire, is the highest taxonomic rank of all organisms taken together. It was introduced in the three-domain system of taxonomy devised by Carl Woese, Otto Kandler and Mark Wheelis in 1990.

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">Rossmann fold</span> Protein fold

The Rossmann fold is a tertiary fold found in proteins that bind nucleotides, such as enzyme cofactors FAD, NAD+, and NADP+. This fold is composed of alternating beta strands and alpha helical segments where the beta strands are hydrogen bonded to each other forming an extended beta sheet and the alpha helices surround both faces of the sheet to produce a three-layered sandwich. The classical Rossmann fold contains six beta strands whereas Rossmann-like folds, sometimes referred to as Rossmannoid folds, contain only five strands. The initial beta-alpha-beta (bab) fold is the most conserved segment of the Rossmann fold. The motif is named after Michael Rossmann who first noticed this structural motif in the enzyme lactate dehydrogenase in 1970 and who later observed that this was a frequently occurring motif in nucleotide binding proteins.

<span class="mw-page-title-main">CATH database</span>

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

<span class="mw-page-title-main">AAA proteins</span> Protein family

AAAproteins are a large group of protein family sharing a common conserved module of approximately 230 amino acid residues. This is a large, functionally diverse protein family belonging to the AAA+ protein superfamily of ring-shaped P-loop NTPases, which exert their activity through the energy-dependent remodeling or translocation of macromolecules.

<span class="mw-page-title-main">Conserved sequence</span> Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

<span class="mw-page-title-main">Cytochrome b</span> Membrane protein involved in the electron transport chain

Cytochrome b within both molecular and cell biology, is a protein found in the membranes of aerobic cells. In eukaryotic mitochondria and in aerobic prokaryotes, cytochrome b is a component of respiratory chain complex III — also known as the bc1 complex or ubiquinol-cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is an homologous protein, cytochrome b6, a component of the plastoquinone-plastocyanin reductase, also known as the b6f complex. These complexes are involved in electron transport, the pumping of protons to create a proton-motive force (PMF). This proton gradient is used for the generation of ATP. These complexes play a vital role in cells.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

MEROPS is an online database for peptidases and their inhibitors. The classification scheme for peptidases was published by Rawlings & Barrett in 1993, and that for protein inhibitors by Rawlings et al. in 2004. The most recent version, MEROPS 12.4, was released in late October 2021.

Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Eocyte hypothesis</span> Hypothesis in evolutionary biology

The eocyte hypothesis in evolutionary biology proposes that the eukaryotes originated from a group of prokaryotes called eocytes. After his team at the University of California, Los Angeles discovered eocytes in 1984, James A. Lake formulated the hypothesis as "eocyte tree" that proposed eukaryotes as part of archaea. Lake hypothesised the tree of life as having only two primary branches: prokaryotes, which include Bacteria and Archaea, and karyotes, that comprise Eukaryotes and eocytes. Parts of this early hypothesis were revived in a newer two-domain system of biological classification which named the primary domains as Archaea and Bacteria.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

References

  1. Cheng, H; Schaeffer, RD; Liao, Y; Kinch, LN; Pei, J; Shi, S; Kim, BH; Grishin, NV (December 2014). "ECOD: an evolutionary classification of protein domains". PLOS Computational Biology. 10 (12): e1003926. Bibcode:2014PLSCB..10E3926C. doi: 10.1371/journal.pcbi.1003926 . PMC   4256011 . PMID   25474468.