Proteins are a class of macromolecular organic compounds that are essential to life. They consist of a long polypeptide chain that usually adopts a single stable three-dimensional structure. They fulfill a wide variety of functions including providing structural stability to cells, catalyze chemical reactions that produce or store energy or synthesize other biomolecules including nucleic acids and proteins, transport essential nutrients, or serve other roles such as signal transduction. They are selectively transported to various compartments of the cell or in some cases, secreted from the cell.
This list aims to organize information on how proteins are most often classified: by structure, by function, or by location.
Proteins may be classified as to their three-dimensional structure (also known a protein fold). The two most widely used classification schemes are: [2]
Both classification schemes are based on a hierarchy of fold types. At the top level are all alpha proteins (domains consisting of alpha helices), all beta proteins (domains consisting of beta sheets), and mixed alpha helix/beta sheet proteins.
While most proteins adopt a single stable fold, a few proteins can rapidly interconvert between one or more folds. These are referred to as metamorphic proteins. [5] Finally other proteins appear not to adopt any stable conformation and are referred to as intrinsically disordered. [6]
Proteins frequently contain two or more domains, each have a different fold separated by intrinsically disordered regions. These are referred to as multi-domain proteins.
Proteins may also be classified based on their cellular function. A widely used classification is PANTHER (protein analysis through evolutionary relationships) classification system. [7]
Enzymes classified according to their Enzyme Commission number (EC). Note that strictly speaking, an EC number corresponds to the reaction the enzyme catalyzes, not the protein per se. However each EC number has been mapped to one or more specific proteins.
Proteins may also be classified by which subcellular compartment they are found. [9] [10]
Mitochondrial DNA that encode mitochondial proteins (note that some mitochondial proteins are encoded by nuclear DNA)
Chloroplast DNA that encode chloroplast proteins
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.
In biochemistry, a polymerase is an enzyme that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using base-pairing interactions or RNA by half ladder replication.
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
An integral, or intrinsic, membrane protein (IMP) is a type of membrane protein that is permanently attached to the biological membrane. All transmembrane proteins can be classified as IMPs, but not all IMPs are transmembrane proteins. IMPs comprise a significant fraction of the proteins encoded in an organism's genome. Proteins that cross the membrane are surrounded by annular lipids, which are defined as lipids that are in direct contact with a membrane protein. Such proteins can only be separated from the membranes by using detergents, nonpolar solvents, or sometimes denaturing agents.
In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have to be associated with a sequence motif; it can be represented by different and completely unrelated sequences in different proteins or RNA.
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
BRENDA is the world's most comprehensive online database for functional, biochemical and molecular biological data on enzymes, metabolites and metabolic pathways. It contains data on the properties, function and significance of all enzymes classified by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB). As ELIXIR Core Data Resource and Global Core Biodata Resource, BRENDA is considered a data resource of critical importance to the international life sciences research community. The database compiles a representative overview of enzymes and metabolites using current research data from primary scientific literature and thus serves the purpose of facilitating information retrieval for researchers. BRENDA is subject to the terms of the Creative Commons license, is accessible worldwide and can be used free of charge. As one of the digital resources of the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, BRENDA is part of the integrated biodata infrastructure DSMZ Digital Diversity.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
In molecular biology and genetics, transcription coregulators are proteins that interact with transcription factors to either activate or repress the transcription of specific genes. Transcription coregulators that activate gene transcription are referred to as coactivators while those that repress are known as corepressors. The mechanism of action of transcription coregulators is to modify chromatin structure and thereby make the associated DNA more or less accessible to transcription. In humans several dozen to several hundred coregulators are known, depending on the level of confidence with which the characterisation of a protein as a coregulator can be made. One class of transcription coregulators modifies chromatin structure through covalent modification of histones. A second ATP dependent class modifies the conformation of chromatin.
The trefoil knot fold is a protein fold in which the protein backbone is twisted into a trefoil knot shape. "Shallow" knots in which the tail of the polypeptide chain only passes through a loop by a few residues are uncommon, but "deep" knots in which many residues are passed through the loop are extremely rare. Deep trefoil knots have been found in the SPOUT superfamily. including methyltransferase proteins involved in posttranscriptional RNA modification in all three domains of life, including bacterium Thermus thermophilus and proteins, in archaea and in eukaryota.
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Restriction endonuclease (REase) EcoRII is an enzyme of restriction modification system (RM) naturally found in Escherichia coli, a Gram-negative bacteria. Its molecular mass is 45.2 kDa, being composed of 402 amino acids.
The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.
A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.
Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1, which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.