List of proteins

Last updated

Schematic representation of structural classes of protein according to the CATH classification scheme. CATH hierarchy.png
Schematic representation of structural classes of protein according to the CATH classification scheme.

Proteins are a class of macromolecular organic compounds that are essential to life. They consist of a long polypeptide chain that usually adopts a single stable three-dimensional structure. They fulfill a wide variety of functions including providing structural stability to cells, catalyze chemical reactions that produce or store energy or synthesize other biomolecules including nucleic acids and proteins, transport essential nutrients, or serve other roles such as signal transduction. They are selectively transported to various compartments of the cell or in some cases, secreted from the cell.

Contents

This list aims to organize information on how proteins are most often classified: by structure, by function, or by location.

Structure

Proteins may be classified as to their three-dimensional structure (also known a protein fold). The two most widely used classification schemes are: [2]

Both classification schemes are based on a hierarchy of fold types. At the top level are all alpha proteins (domains consisting of alpha helices), all beta proteins (domains consisting of beta sheets), and mixed alpha helix/beta sheet proteins.

While most proteins adopt a single stable fold, a few proteins can rapidly interconvert between one or more folds. These are referred to as metamorphic proteins. [5] Finally other proteins appear not to adopt any stable conformation and are referred to as intrinsically disordered. [6]

Proteins frequently contain two or more domains, each have a different fold separated by intrinsically disordered regions. These are referred to as multi-domain proteins.

Function

The human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes. Human genome by functions.svg
The human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes.

Proteins may also be classified based on their celluar function. A widely used classification is PANTHER (protein analysis through evolutionary relationships) classification system. [7]

Structural

Protein#Structural proteins

Catalytic

Enzymes classified according to their Enzyme Commission number (EC). Note that strictly speaking, an EC number corresponds to the reaction the enzyme catalyzes, not the protein per se. However each EC number has been mapped to one or more specific proteins.

Transport

Transport protein

Immune

Genetic

Signal transduction

Signal transduction

Sub-cellular distribution

The human genome, categorized by the predicted subcellular location distribution of each gene product. Human proteome subcellular distribution.svg
The human genome, categorized by the predicted subcellular location distribution of each gene product.

Proteins may also be classified by which subcellular compartment they are found. [9] [10]

Nuclear

Nuclear proteins

Cytosolic

Cytosolic proteins

Cytoskeletal

Cytoskeletal proteins

Organelle

Endoplasmic reticulum

Endoplasmic reticulum resident protein

Lysosomal

Mitochondial

Mitochondrial DNA that encode mitochondial proteins (note that some mitochondial proteins are encoded by nuclear DNA)

Chloroplast

Chloroplast DNA that encode chloroplast proteins

Cell membrane

Membrane protein

Extracellular matrix

Extracellular matrix proteins

Plasma

Blood protein

Species distribution


Related Research Articles

<span class="mw-page-title-main">Protein</span> Biomolecule consisting of chains of amino acid residues

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

<span class="mw-page-title-main">Polymerase</span> Class of enzymes which synthesize nucleic acid chains or polymers

In biochemistry, a polymerase is an enzyme that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using base-pairing interactions or RNA by half ladder replication.

<span class="mw-page-title-main">Transcription factor</span> Protein that regulates the rate of DNA transcription

In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are 1500-1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.

<span class="mw-page-title-main">Integral membrane protein</span> Type of membrane protein that is permanently attached to the biological membrane

An integral, or intrinsic, membrane protein (IMP) is a type of membrane protein that is permanently attached to the biological membrane. All transmembrane proteins can be classified as IMPs, but not all IMPs are transmembrane proteins. IMPs comprise a significant fraction of the proteins encoded in an organism's genome. Proteins that cross the membrane are surrounded by annular lipids, which are defined as lipids that are in direct contact with a membrane protein. Such proteins can only be separated from the membranes by using detergents, nonpolar solvents, or sometimes denaturing agents.

In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have to be associated with a sequence motif; it can be represented by different and completely unrelated sequences in different proteins or RNA.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">CATH database</span>

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

In molecular biology and genetics, transcription coregulators are proteins that interact with transcription factors to either activate or repress the transcription of specific genes. Transcription coregulators that activate gene transcription are referred to as coactivators while those that repress are known as corepressors. The mechanism of action of transcription coregulators is to modify chromatin structure and thereby make the associated DNA more or less accessible to transcription. In humans several dozen to several hundred coregulators are known, depending on the level of confidence with which the characterisation of a protein as a coregulator can be made. One class of transcription coregulators modifies chromatin structure through covalent modification of histones. A second ATP dependent class modifies the conformation of chromatin.

<span class="mw-page-title-main">Trefoil knot fold</span>

The trefoil knot fold is a protein fold in which the protein backbone is twisted into a trefoil knot shape. "Shallow" knots in which the tail of the polypeptide chain only passes through a loop by a few residues are uncommon, but "deep" knots in which many residues are passed through the loop are extremely rare. Deep trefoil knots have been found in the SPOUT superfamily. including methyltransferase proteins involved in posttranscriptional RNA modification in all three domains of life, including bacterium Thermus thermophilus and proteins, in archaea and in eukaryota.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">R.EcoRII</span> Restriction enzyme

Restriction endonuclease (REase) EcoRII is an enzyme of restriction modification system (RM) naturally found in Escherichia coli, a Gram-negative bacteria. Its molecular mass is 45.2 kDa, being composed of 402 amino acids.

<span class="mw-page-title-main">B3 domain</span> DNA binding domain

The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Protein fold class</span> Categories of protein tertiary structure

In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

<span class="mw-page-title-main">Multiple Epidermal Growth Factor-like Domains 8</span> Protein-coding gene in the species Homo sapiens

Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1, which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.

References

  1. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (August 1997). "CATH--a hierarchic classification of protein domain structures". Structure. London, England. 5 (8): 1093–108. doi: 10.1016/s0969-2126(97)00260-8 . PMID   9309224.
  2. Csaba G, Birzele F, Zimmer R (April 2009). "Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis". BMC Structural Biology. 9: 23. doi: 10.1186/1472-6807-9-23 . PMC   2678134 . PMID   19374763.
  3. Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, et al. (January 2021). "CATH: increased structural coverage of functional space". Nucleic Acids Research. 49 (D1): D266–D273. doi:10.1093/nar/gkaa1079. PMC   7778904 . PMID   33237325.
  4. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (January 2014). "SCOP2 prototype: a new approach to protein structure mining". Nucleic Acids Research. 42 (Database issue): D310–4. doi:10.1093/nar/gkt1242. PMC   3964979 . PMID   24293656.
  5. Dishman AF, Volkman BF (June 2022). "Design and discovery of metamorphic proteins". Current Opinion in Structural Biology. 74: 102380. doi:10.1016/j.sbi.2022.102380. PMC   9664977 . PMID   35561475.
  6. Trivedi R, Nagarajaram HA (November 2022). "Intrinsically Disordered Proteins: An Overview". International Journal of Molecular Sciences. 23 (22): 14050. doi: 10.3390/ijms232214050 . PMC   9693201 . PMID   36430530.
  7. 1 2 Thomas PD, Kejariwal A, Campbell MJ, Mi H, Diemer K, Guo N, et al. (January 2003). "PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification". Nucleic Acids Research. 31 (1): 334–341. doi:10.1093/nar/gkg115. PMC   165562 . PMID   12520017.
  8. Zhou H, Yang Y, Shen HB (March 2017). "Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features". Bioinformatics. 33 (6): 843–853. doi: 10.1093/bioinformatics/btw723 . PMID   27993784.
  9. Trans J (2014). "Subcellular Compartments". Scitable. Nature Education.
  10. Thul PJ, Åkesson L, Wiking M, Mahdessian D, Geladaki A, Ait Blal H, et al. (May 2017). "A subcellular map of the human proteome". Science. 356 (6340). doi:10.1126/science.aal3321. PMID   28495876. S2CID   10744558.