A microprotein (miP) is a small protein encoded from a small open reading frame (smORF). [1] They are a class of protein with a single protein domain that are related to multidomain proteins. [2] Microproteins regulate larger multidomain proteins at the post-translational level. [3] Microproteins are analogous to microRNAs (miRNAs) and heterodimerize with their targets causing dominant and negative effects. [4] In animals and plants, microproteins have been found to greatly influence biological processes. [2] Because of microproteins' dominant effects on their targets, microproteins are currently being studied for potential applications in biotechnology. [2]
The first microprotein (miP) discovered was during a research in the early 1990s on genes for basic helix–loop–helix (bHLH) transcription factors from a murine erythroleukaemia cell cDNA library. [3] The protein was found to be an inhibitor of DNA binding (ID protein), and it negatively regulated the transcription factor complex. [3] The ID protein was 16 kDa and consisted of a helix-loop-helix (HLH) domain. [2] The microprotein formed bHLH/HLH heterodimers which disrupted the functional basic helix–loop–helix (bHLH) homodimers. [2]
The first microprotein discovered in plants was the LITTLE ZIPPER (ZPR) protein. [2] The LITTLE ZIPPER protein contains a leucine zipper domain but does not have the domains required for DNA binding and transcription activation. [2] Thus, LITTLE ZIPPER protein is analogous to the ID protein. [2] Despite not all proteins being small, in 2011, this class of protein was given the name microproteins because their negative regulatory actions are similar to those of miRNAs. [3]
Evolutionarily, the ID protein or proteins similar to ID found in all animals. [3] In plants, microproteins are only found in higher order. [3] However, the homeodomain transcription factors that belong to the three-amino-acid loop-extension (TALE) family are targets of microproteins, and these homeodomain proteins are conserved in animals, plants, and fungi. [3]
Microproteins are generally small proteins with a single protein domain. [2] [4] The active form of microproteins are translated from smORF. [1] The smORF codons which microproteins are translated from can be less than 100 codons. [1] However, not all microproteins are small, and the name was given because their actions are analogous to miRNAs. [3]
The function of microproteins is post-translational regulators. [3] Microproteins disrupt the formation of heterodimeric, homodimeric, or multimeric complexes. [4] Furthermore, microproteins can interact with any protein that require functional dimers to function normally. [3] The primary targets of microproteins are transcription factors that bind to DNA as dimers. [5] [3] Microproteins regulate these complexes by creating homotypic dimers with the targets and inhibit protein complex function. [3] There are two types of miP inhibitions: homotypic miP inhibition and heterotypic miP inhibition. [4] In homotypic miP inhibition, microproteins interact with proteins with similar protein-protein interaction (PPI) domain. [4] In heterotypic miP inhibition, microproteins interact with proteins with different but compatible PPI domain. [4] In both types of inhibition, microproteins interfere and prevent the PPI domains from interacting with their normal proteins. [4]
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
A homeobox is a DNA sequence, around 180 base pairs long, that regulates large-scale anatomical features in the early stages of embryonic development. Mutations in a homeobox may change large-scale anatomical features of the full-grown organism.
A basic helix–loop–helix (bHLH) is a protein structural motif that characterizes one of the largest families of dimerizing transcription factors. The word "basic" does not refer to complexity but to the chemistry of the motif because transcription factors in general contain basic amino acid residues in order to facilitate DNA binding.
Inhibitor of DNA-binding/differentiation proteins, also known as ID proteins comprise a family of proteins that heterodimerize with basic helix-loop-helix (bHLH) transcription factors to inhibit DNA binding of bHLH proteins. ID proteins also contain the HLH-dimerization domain but lack the basic DNA-binding domain and thus regulate bHLH transcription factors when they heterodimerize with bHLH proteins. The first helix-loop-helix proteins identified were named E-proteins because they bind to Ephrussi-box (E-box) sequences. In normal development, E proteins form dimers with other bHLH transcription factors, allowing transcription to occur. However, in cancerous phenotypes, ID proteins can regulate transcription by binding E proteins, so no dimers can be formed and transcription is inactive. E proteins are members of the class I bHLH family and form dimers with bHLH proteins from class II to regulate transcription. Four ID proteins exist in humans: ID1, ID2, ID3, and ID4. The ID homologue gene in Drosophila is called extramacrochaetae (EMC) and encodes a transcription factor of the helix-loop-helix family that lacks a DNA binding domain. EMC regulates cell proliferation, formation of organs like the midgut, and wing development. ID proteins could be potential targets for systemic cancer therapies without inhibiting the functioning of most normal cells because they are highly expressed in embryonic stem cells, but not in differentiated adult cells. Evidence suggests that ID proteins are overexpressed in many types of cancer. For example, ID1 is overexpressed in pancreatic, breast, and prostate cancers. ID2 is upregulated in neuroblastoma, Ewing’s sarcoma, and squamous cell carcinoma of the head and neck.
A leucine zipper is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amino acid segment and the display of these amino acid sequences on an idealized alpha helix revealed a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns. The polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization.
Myogenin, is a transcriptional activator encoded by the MYOG gene. Myogenin is a muscle-specific basic-helix-loop-helix (bHLH) transcription factor involved in the coordination of skeletal muscle development or myogenesis and repair. Myogenin is a member of the MyoD family of transcription factors, which also includes MyoD, Myf5, and MRF4.
Microphthalmia-associated transcription factor also known as class E basic helix-loop-helix protein 32 or bHLHe32 is a protein that in humans is encoded by the MITF gene.
The scleraxis protein is a member of the basic helix-loop-helix (bHLH) superfamily of transcription factors. Currently two genes have been identified to code for identical scleraxis proteins.
In the field of molecular biology, myocyte enhancer factor-2 (Mef2) proteins are a family of transcription factors which through control of gene expression are important regulators of cellular differentiation and consequently play a critical role in embryonic development. In adult organisms, Mef2 proteins mediate the stress response in some tissues. Mef2 proteins contain both MADS-box and Mef2 DNA-binding domains.
The gene extramachrochaetae (emc) is a Drosophila melanogaster gene that codes for the Emc protein, which has a wide variety of developmental roles. It was named, as is common for Drosophila genes, after the phenotypic change caused by a mutation in the gene (macrochaetae are the longer bristles on Drosophila).
DNA-binding protein inhibitor ID-2 is a protein that in humans is encoded by the ID2 gene.
DNA-binding protein inhibitor ID-1 is a protein that in humans is encoded by the ID1 gene.
Upstream stimulatory factor 1 is a protein that in humans is encoded by the USF1 gene.
MAX is a gene that in humans encodes the MAX transcription factor.
DNA-binding protein inhibitor ID-3 is a protein that in humans is encoded by the ID3 gene.
Upstream stimulatory factor 2 is a protein that in humans is encoded by the USF2 gene.
Transcription factor HES1 is a protein that is encoded by the Hes1 gene, and is the mammalian homolog of the hairy gene in Drosophila. HES1 is one of the seven members of the Hes gene family (HES1-7). Hes genes code nuclear proteins that suppress transcription.
ID4 is a protein coding gene. In humans, it encodes the protein known as DNA-binding protein inhibitor ID-4. This protein is known to be involved in the regulation of many cellular processes during both prenatal development and tumorigenesis. This is inclusive of embryonic cellular growth, senescence, cellular differentiation, apoptosis, and as an oncogene in angiogenesis.
Neurogenins, often abbreviated as Ngn, are a family of bHLH transcription factors involved in specifying neuronal differentiation. The family consisting of Neurogenin-1, Neurogenin-2, and Neurogenin-3, plays a fundamental role in specifying neural precursor cells and regulating the differentiation of neurons during embryonic development. It is one of many gene families related to the atonal gene in Drosophila. Other positive regulators of neuronal differentiation also expressed during early neural development include NeuroD and ASCL1.
In molecular biology, the myogenic determination factor 5 proteins are a family of proteins found in eukaryotes. This family includes the Myf5 protein, which is responsible for directing cells to the skeletal myocyte lineage during development. Myf5 is likely to act in a similar way to the other MRF4 proteins such as MyoD which perform the same function. These are histone acetyltransferases and histone deacetylases which activate and repress genes involved in the myocyte lineage.