There does not appear to be enough references currently present in this article to demonstrate notability. However, an editor has performed a search and claims that there are sufficient sources to indicate that this is a notable topic. (February 2021) (Learn how and when to remove this template message) |
Content | |
---|---|
Data types captured | Protein domains |
Contact | |
Research center | Grishin Lab, University of Texas Southwestern Medical Center |
Authors | H. Cheng, R. D. Schaeffer, Y. Liao, L. N. Kinch, J. Pei, S. Shi, B. H. Kim, N. V. Grishin. |
Primary citation | PMID 25474468 |
Release date | 2014 |
Access | |
Website | http://prodata.swmed.edu/ecod/ |
Miscellaneous | |
Version | continuously updated |
Curation policy | manual for new proteins; automated for ones with close matches |
The Evolutionary Classification of Protein Domains (ECOD) is a biological database that classifies protein domains available from the Protein Data Bank. The ECOD tries to determine the evolutionary relationships between proteins.
Similar to Pfam, CATH, and SCOP, ECOD compiles domains instead of whole proteins. However, ECOD focuses on evolutionary relationships more heavily: instead of grouping proteins by folds, which may simply represent convergent evolution, ECOD groups proteins by demonstratable homology only. [1]
In bacteriology, gram-positive bacteria are bacteria that give a positive result in the Gram stain test, which is traditionally used to quickly classify bacteria into two broad categories according to their type of cell wall.
Excavata is a major supergroup of unicellular organisms belonging to the domain Eukaryota. It was first suggested by Simpson and Patterson in 1999 and introduced by Thomas Cavalier-Smith in 2002 as a formal taxon. It contains a variety of free-living and symbiotic forms, and also includes some important parasites of humans, including Giardia and Trichomonas. Excavates were formerly considered to be included in the now obsolete Protista kingdom. They are classified based on their flagellar structures, and they are considered to be the most basal Flagellate lineage. Phylogenomic analyses split the members of the Excavates into three different and not all closely related groups: Discobids, Metamonads and Malawimonads. Except for Euglenozoa, they are all non-photosynthetic.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.
A protein family is a group of evolutionarily-related proteins. In many cases a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term protein family should not be confused with family as it is used in taxonomy.
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.
The Caenophidia are a derived clade of alethinophidian snakes, which contains over 80% of all the extant species of snakes. The largest family is Colubridae, but it also includes at least seven other families, at least four of which were once classified as "Colubridae" before molecular phylogenetics helped us understand their relationships. It has been found to be monophyletic.
The hepatitis E virus (HEV) is the causative agent of hepatitis E. It is of the species Orthohepevirus A.
The Rossmann fold is a tertiary fold found in proteins that bind nucleotides, such as enzyme cofactors FAD, NAD+, and NADP+. This fold is composed of alternating beta strands and alpha helical segments where the beta strands are hydrogen bonded to each other forming an extended beta sheet and the alpha helices surround both faces of the sheet to produce a three-layered sandwich. The classical Rossmann fold contains six beta strands whereas Rossmann-like folds, sometimes referred to as Rossmannoid folds, contain only five strands. The initial beta-alpha-beta (bab) fold is the most conserved segment of the Rossmann fold. The motif is named after Michael Rossmann who first noticed this structural motif in the enzyme lactate dehydrogenase in 1970 and who later observed that this was a frequently occurring motif in nucleotide binding proteins.
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer may also be called a residue indicating a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo electron microscopy (cryo-EM) and dual polarisation interferometry to determine the structure of proteins.
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 33.1, was released in May 2020 and contains 18,259 families.
Cytochrome b is a protein found in the mitochondria of eukaryotic cells. It functions as part of the electron transport chain and is the main subunit of transmembrane cytochrome bc1 and b6f complexes.
InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
A protein domain is a region of the protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains. One domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
Protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.
A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.