MIF4GD | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | MIF4GD , AD023, MIFD, SLIP1, MIF4G domain containing | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 612072 MGI: 1916924 HomoloGene: 41389 GeneCards: MIF4GD | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. [5] It is also known as SLIP1, SLBP (Stem-Loop Binding Protein)-interacting protein 1, AD023, and MIFD. [6] [7] MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation. [6] [8] [9] [10]
The MIF4GD gene is located in humans on the minus strand of chromosome 17q25.1, and spans 5.0 Kb, from bases 75,266,228 to 75,271,292. [6]
There are 11 alternatively-spliced mRNA transcripts and 3 unspliced mRNA transcripts that can be transcribed from this gene, which include 7 possible exons and 11 distinct introns. [6] [11]
There are 10 viable isoforms of the MIF4G domain-containing protein. [11] The longest isoform is MIF4G domain-containing protein isoform 1, which is 263 amino acids long, however, the most common isoform is MIF4G domain-containing protein isoform 4, which consists of 6 exons and is 222 amino acids in length. [6] [11]
MIF4G domain-containing protein isoform 1 has a predicted molecular weight of 30.1 kDa, and a predicted isoelectric point of 5.2, indicating that it is an acidic protein. [12] It has a normal ratio of each amino acid when compared to the average human protein. [13] Additionally, MIF4GD is expected to form 11 alpha helices. [14] [15] [16]
Searches of MIF4GD antibodies showed that MIF4GD is present in the cytoplasm and nucleoli of cells. [17] [18] Additionally, several bioinformatic programs predict human MIF4GD, as well as several of its orthologs, are present in the cytoplasm, nucleus and mitochondria of cells. [19]
Due to its presumed localization in the cytoplasm, it is predicted that MIF4GD could be phosphorylated, acetylated, ubiquitinated, or sumoylated. Additionally, MIF4GD is predicted to contain a "YinOYang" site at S61, which may be either O-GlcNAcylated or phosphorylated at different times for regulatory purposes. [20] It is not likely that the MIF4GD protein will be lipid-linked or glycosylated. [21] [22] [23]
The MIF4GD protein that contains an MIF4G domain, which is named after the middle domain of eukaryotic initiation factor 4G (eIF4G). [24]
The MIF4G domain of the MIF4GD protein has a molecular weight of 17.0 kDa, and has a predicted isoelectric point of 5.7. [19] Similar to the entire protein, it contains normal ratios of each amino acid relative to a reference of human proteins, however, it contains less negatively-charged amino acids and more positively-charged amino acids relative to the entire protein. The MIF4G domain is predicted to contain many alpha-helices and is thought to contain alpha-helical repeats. [24]
MIF4GD is found only in animals, and is expressed ubiquitously in the body, though it has been discovered to be expressed at a somewhat higher rate in lymph nodes, bone marrow and testes. [6] [24] [25] MIF4GD is expressed at an average rate that is 1.7 times higher than the average gene. [6] [24]
The promoter region of MIF4GD is approximately 1137 nucleotide base pairs long, and is predicted to interact with various transcription factors. [26] The 5' untranslated region of MIF4GD mRNA transcripts is relatively short, at a length of around 137 nucleotides, and is predicted to form stem-loops and interior-loops to which RNA-binding proteins may bind. [27] [28] The 3' untranslated region is longer, at a length of approximately 510 nucleotides. The 3' UTR is also predicted to form stem-loops, interior-loops, and bulge-loops, as well as more complex secondary structures, and is predicted to bind to RNA-binding proteins and miRNAs at or near these sites. [27] [28] [29]
MIF4GD has been experimentally shown to bind to various other proteins, many of which play a role in alternative splicing of pre-mRNAs and translation of mRNAs into proteins. [30] It also is known to interact with eukaryotic translation initiation factors, RNA, and DNA to form a translation initiation complex. [7] Some of the most notable proteins that interact with MIF4GD are:
ATP-dependent RNA helicases DDX19A and DDX19B, [31] which is involved in mRNA export from the nucleus and helicase activity by facilitating the disassociation of nuclear mRNA binding proteins and replacement with cytoplasmic mRNA binding proteins. [32]
Cap binding complex dependent translation initiation factor, or CTIF, [33] which is a paralog of MIF4GD. CTIF binds cotranscriptionally to the cap end of the nascent mRNA, and is involved in simultaneous editing and translation of mRNA that happens directly after export from the nucleus. [34]
Histone RNA hairpin-binding protein, or SLBP, [8] [35] which is involved in histone pre-mRNA processing and movement of mRNAs from the nucleus to the cytoplasm of cells. [36]
Supervillin, or SVIL, [37] which is a peripheral membrane protein that forms a high-affinity link between the actin cytoskeleton and the membrane and contributes to myogenic membrane structure and differentiation. [38] Supervillin also regulates cell spreading and motility during the cell cycle. [37]
MIF4GD also has been verified by two-hybrid bait-prey experiments to interact with NSP7ab, or Non-structural protein 7, of SARS-CoV. [39]
MIF4GD has several known functions, including the activation of proteins that bind histone mRNAs for translation and binding of mRNAs for alternative splicing and translation into proteins. [6] [8] [9] Additionally, down-regulation of the SLIP1/MIF4GD gene and corresponding protein results in a reduced rate of histone mRNA translation and reduced cell viability. [7] Therefore, it is speculated to be needed in eukaryotic cells in order to produce proteins and for cell proliferation.
MIF4GD has been shown to bind and stabilize p27kip1, which plays an important role in the regulating the cell cycle and in cancer progression. [10] When bound to MIF4GD, the stabilized protein suppresses phosphorylation by CDK2 at T187, which controls the amount of cell proliferation in hepatocellular carcinoma (HCC). Regulation of this interaction is being studied as a potential therapeutic treatment for patients with hepatocellular carcinoma. [10] This provides more evidence that MIF4GD helps regulate cell proliferation, and suggests MIF4GD may play a role in immune response.
MIF4GD is found in Animalia, and first appeared in Porifera, which diverged from Homo sapiens around 777 million years ago. [48] Relative to humans, this gene is highly conserved (>80% identity and >90% similarity) in mammals and reptiles, moderately conserved (>70% identity and >85% similarity) in chordates, and low levels of conservation (15-25% identity and 25-40% similarity) to the rest of Animalia. [49] [50] MIF4GD is not present in trichoplax, fungi, plants, protists, archaea or bacteria. [49]
There are currently 310 known and sequenced MIF4GD orthologs found in Animalia. [6] A select number of these orthologs have been analyzed for estimated time of divergence (in millions of years), amino acid sequence identity to humans, and amino acid sequence similarity to humans. The results are shown in the table below:
Genus and Species | Common Name | Accession Number [49] | Date of Divergence (MYA) [48] | Sequence Identity (%) [50] | Sequence Similarity (%) [50] |
---|---|---|---|---|---|
Homo sapiens | Human | NP_001229430 | 0 | 100 | 100 |
Pan paniscus | Bonobo | XP_034798762 | 6.4 | 100 | 100 |
Mus musculus | House mouse | NP_001230513 | 89 | 93.2 | 97.7 |
Vombatus ursinus | Common wombat | XP_027728462 | 160 | 91.0 | 95.9 |
Ornithorhynchus anatinus | Platypus | XP_028912780 | 180 | 77.9 | 90.5 |
Crocodylus porosus | Saltwater Crocodile | XP_019398085 | 318 | 85.1 | 91.4 |
Gallus gallus | Chicken | XP_015150938 | 318 | 83.8 | 90.1 |
Xenopus tropicalis | Tropical clawed frog | NP_001016440 | 351.7 | 74.4 | 84.8 |
Danio rerio | Zebrafish | NP_001013302 | 433 | 73.9 | 86.0 |
Rhincodon typus | Whale shark | XP_020392528 | 465 | 71.2 | 85.1 |
Petromyzom marinus | Sea lamprey | XP_032832018 | 599 | 48.7 | 69.4 |
Exaiptasia pallida | Pale anemone | XP_020912437 | 687 | 22.6 | 37.3 |
Limulus polyphemus | Atlantic horseshoe crab | XP_013791968 | 736 | 22.5 | 39.5 |
Parasteatoda tepidariorum | Common house spider | XP_015912223 | 736 | 19.5 | 33.9 |
Drosophila virilis | Fruit fly | XP_015028674 | 736 | 16.2 | 29.6 |
Temnothorax curvispinosus | Ant | XP_024872082 | 736 | 14.1 | 25.6 |
Amphimedon queenslandica | Sponge | XP_011404567 | 777 | 20.4 | 39.6 |
MIF4GD has two known paralogs, which are PAIP1 and CTIF. [51] Both known paralogs have moderate to low conservation to MIF4GD, with less than 15% identity and between 20 and 25% similarity. However, both of these genes are predicted to have diverged before the evolution of orthologs, and scored E-values of nearly zero, indicating a significant relationship with MIF4GD.
MIF4GD is a slowly-evolving gene, with an approximate average of 75 amino acid changes per hundred amino acids per million years. Multiple sequence alignments of human MIF4GD and its orthologs showed two conserved amino acids throughout all sequences, which are Gly200 and Glu241.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.
Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.
Zinc finger protein 226 is a protein that in humans is encoded by the ZNF226 gene.
FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.
Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.
Chromosome 15 open reading frame 52 is a human protein encoded by the C15orf52 gene, its function is poorly understood.
In epigenetics, proline isomerization is the effect that cis-trans isomerization of the amino acid proline has on the regulation of gene expression. Similar to aspartic acid, the amino acid proline has the rare property of being able to occupy both cis and trans isomers of its prolyl peptide bonds with ease. Peptidyl-prolyl isomerase, or PPIase, is an enzyme very commonly associated with proline isomerization due to their ability to catalyze the isomerization of prolines. PPIases are present in three types: cyclophilins, FK507-binding proteins, and the parvulins. PPIase enzymes catalyze the transition of proline between cis and trans isomers and are essential to the numerous biological functions controlled and affected by prolyl isomerization Without PPIases, prolyl peptide bonds will slowly switch between cis and trans isomers, a process that can lock proteins in a nonnative structure that can affect render the protein temporarily ineffective. Although this switch can occur on its own, PPIases are responsible for most isomerization of prolyl peptide bonds. The specific amino acid that precedes the prolyl peptide bond also can have an effect on which conformation the bond assumes. For instance, when an aromatic amino acid is bonded to a proline the bond is more favorable to the cis conformation. Cyclophilin A uses an "electrostatic handle" to pull proline into cis and trans formations. Most of these biological functions are affected by the isomerization of proline when one isomer interacts differently than the other, commonly causing an activation/deactivation relationship. As an amino acid, proline is present in many proteins. This aids in the multitude of effects that isomerization of proline can have in different biological mechanisms and functions.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.