MIPOL1 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||
Aliases | MIPOL1 , CCDC193, mirror-image polydactyly 1 | ||||||||||||||||||||||||
External IDs | OMIM: 606850 MGI: 1920740 HomoloGene: 16340 GeneCards: MIPOL1 | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Orthologs | |||||||||||||||||||||||||
Species | Human | Mouse | |||||||||||||||||||||||
Entrez | |||||||||||||||||||||||||
Ensembl | |||||||||||||||||||||||||
UniProt | |||||||||||||||||||||||||
RefSeq (mRNA) |
| ||||||||||||||||||||||||
RefSeq (protein) |
|
| |||||||||||||||||||||||
Location (UCSC) | Chr 14: 37.2 – 37.55 Mb | Chr 12: 57.23 – 57.5 Mb | |||||||||||||||||||||||
PubMed search | [3] | [4] | |||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||
|
MIPOL1 (Mirror Image Polydactyly 1), also known as CCDC193 (Coiled-coil domain containing 193), is a protein that in humans is encoded by the MIPOL1 gene. [5] [6] Mutation of this gene is associated with mirror-image polydactyly (also known as Laurin-Sandrow syndrome. [7] ) in humans, which is a rare genetic condition characterized by mirror-image duplication of digits. [8]
MIPOL1 is also known as CCDC193 (Coiled-coil domain containing 193).
The MIPOL1 gene is located at 14q13.3-q21.1 on the plus strand, spanning base pairs 37,197,888 to 37,579,207 (in the human GRCh38 primary assembly, length: 381,320 base pairs), consisting of 15 exons and 11 introns. Some notable genes in its neighborhood include SLC25A21 (mutation of this gene causes synpolydactyly [10] ) and FOXA1.
MIPOL1 has at least 15 known splice isoforms produced by alternative splicing. [11]
The unmodified MIPOL1 protein isoform 1 in humans has an isoelectric point of 5.6 and molecular weight 51.5 kDa. [12] Relative to other human proteins, MIPOL1 consists of unusually low amounts of Proline and Glycine and higher amounts of Glutamic acid and Glutamine. [13]
There are at least three known isoforms of this protein in humans produced by alternative splicing: isoform 1, of length 442 amino acids, isoform 2 of length 261 amino acids and isoform 3 of length 169 amino acids. [5]
MIPOL1 contains two coiled-coil domains in its C-terminus at positions 107 – 212 and 253 – 435 [5] (shown in Fig.1). A bipartite nuclear localization signal is predicted at position 128 – 143. [16]
The following post-translational modifications are predicted using bioinformatics tools for MIPOL1. [17] Multiple phosphorylation sites are predicted for this protein, that are conserved in close orthologs, including a Casein kinase 1 (CK1) site, three Casein kinase 2 (CK2) sites, and three NEK2 sites. [18]
Post-translational modification | Amino acid site | Prediction tool |
---|---|---|
Phosphorylation | Ser (37, 42, 69, 75, 105, 126, 205, 275, 344, 350, 364, 412), Thr (80, 251, 259, 338, 365, 396, 435, 437, 440), Tyr (77, 83) | MyHits, [16] NetPhos [19] |
O-linked glycosylation | Ser (34, 105, 294, 344, 412), Thr (155, 293) | NetOGlyc [20] |
O-GlcNAcylation | Thr (104) | YinOYang [21] |
Glycation | Lys (6, 41, 133, 207, 347, 421) | NetGlycate [22] |
SUMOylation | Lys (136, 147) | GPS-SUMO [23] |
Ubiquitination | Lys (22, 47, 133, 162, 314, 418) | BDM-PUB [24] |
The exact structure of the MIPOL1 has not yet been characterized. Homology-based and de novo predictions of its tertiary structure suggest that it may consist of inter-twined alpha helices, forming coiled-coil domains (see Fig.4.). [26] [25]
Immunofluorescence imaging in the human U2OS cell line (bone Osteosarcoma epithelial cells) shows localization in the cytosol. [27] Immunohistochemistry imaging of human prostate tissue also suggests cytosolic localization. [28] A bipartite nuclear localization signal is predicted at position 128 – 143, which is highly conserved in mammalian orthologs (see Fig.2.), indicating possible localization in the nucleus. [16]
The predicted promoter sequence for this gene spans from base pair 37196852 to 37198126 (1,275 bp) and has multiple predicted binding sites for transcription factors such as GATA binding factors, SMAD3, TP63 and NRF1. [29]
MIPOL1 is ubiquitously expressed at low levels in humans, with highest expression in the prostate. [5]
The RNA secondary structure is stabilized by multiple stem loops that have been predicted (using bioinformatics tools [30] ), and conserved across closely related species. Multiple binding targets are found for microRNAs such as MIR3163 and MIR190a, that could silence these regions on the mRNA and inhibit translation. [31]
The MIPOL1 gene is an autosomal dominant gene. [32] It is one of six genes in humans causing non-syndromic polydactyly (i.e. polydactyly occurring as a separate event with no other associated anomalies). [33] Mutation of this gene is associated with mirror-image polydactyly (also known as Laurin-Sandrow syndrome [32] ) in humans, which is a rare genetic condition characterized by mirror-image duplication of digits in hands and feet. [8]
This gene has also been associated with central nervous system development, and the loss of this gene can cause craniofacial defects and agenesis of the corpus callosum. [34]
The gene is shown to function as a tumor suppressor in nasopharyngeal carcinoma (NPC), through the up-regulation of the p21 (WAF1/CIP1) and p27 (proteins that are both cyclin-dependent kinases that are linked with tumor suppression via cell cycle arrest) pathways. [35] Another study investigating the role of MIPOL1 gene in cancer progression reported that MIPOL1 was downregulated in NPC tumor tissues, and that artificially re-expressing the gene caused tumor suppression by down-regulating angiogenic factors and reducing the phosphorylation of metastasis associated proteins like AKT, p65 and FAK14. [36] MIPOL1 interacts with another well-known tumor-suppressing gene, RhoB and this interaction was confirmed to enhance RhoB activity.
In a study of pediatric high grade glioma (pHGG), MIPOL1 gene was found to be down-regulated 2.4-fold in the high vascularity tumors [37]
The protein is known to interact with Replicase polyprotein 1ab in SARS-CoV2, which is a protein involved in the transcription and replication of viral RNAs. [38]
This protein is known to interact with multiple human proteins, verified via two-hybrid screening. A few notable examples include:
LATS2: Negatively regulates YAP1 in the Hippo signaling pathway that plays a pivotal role in organ size control and tumor suppression by restricting cell proliferation and promoting apoptosis. [39]
ZGPAT (Zinc finger CCCH-type with G patch domain-containing protein): A transcription repressor that negatively regulates expression of EGFR, a gene involved in cell proliferation, survival and migration, suggesting that it may act as a tumor suppressor. [40]
RCOR3 (REST Corepressor 3): A protein that may act as a component of a co-repressor complex that represses transcription [41]
It also interacts with viral proteins such as:
Replicase polyprotein 1ab (SARS-CoV2): A multifunctional protein involved in the transcription and replication of viral RNAs. [38]
Protein E7 (Human Papillomavirus): Plays a role in viral genome replication by driving entry of quiescent cells into the cell cycle. [42]
The earliest known ortholog of this protein appeared around 948 million years ago in Trichoplax adhaerens in phylum Placozoa in kingdom Animalia. The next most distant orthologs appear in phylum Cnidaria, around 824 million years ago.
The MIPOL1 protein has no known paralogs in humans and other species for which orthologs have been found, therefore, it is the only member of its gene family.
There are more than 300 known orthologs of the MIPOL1 protein in Animalia, ranging from primates to corals and sea anemones in phylum Cnidaria. [43] Orthologs of the protein were found in species as distant as Trichoplax adhaerens, a simple primitive invertebrate species. Table 2 shows a sample of the ortholog space.
Closely related orthologs are found in chordates such as mammals, reptiles, birds and amphibians, with sequence similarities greater than 70%. Sequence lengths of orthologs were similar to the human MIPOL1 protein, with no significant gene duplication observed.
Organisms with sequence similarities in the 55-70% range (moderately related orthologs) were found in bony fish, cartilaginous fish and coelacanths. Sequence length is generally longer in these species, with a longer amino acid sequence in the N-terminus (alignment with human protein occurs around amino acid 100).
Distantly related orthologs with similarities less than 50% (around 30 – 40%) are found in hemichordates, echinoderms, arthropods, molluscs, cnidaria and placozoa. Multiple sequence alignment with distant orthologs indicates poor alignment in the N-terminus of the protein.
Two COG (Clusters of Orthologous Groups of proteins) domains were found in this protein (see Fig.3): COG1196 at position 106 - 340 (Chromosome segregation ATPase [44] ) and COG4372 at 259 - 431 (uncharacterized conserved protein containing a DUF3084 domain [45] ) [46]
Using a linear regression analysis on a plot of corrected percent divergence (amino acid changes per 100 amino acids) as a function of date of divergence from humans for different MIPOL1 orthologs (see Fig.5), it is estimated that a 1% change in amino acids in the MIPOL1 protein takes 5.68 million years. MIPOL1 protein is evolving at a moderate rate relative to fast evolving protein such as fibrinogen alpha, and slow evolving proteins such as cytochrome C.
Neuroblastoma breakpoint family, member 3, also known as NBPF3, is a human gene of the neuroblastoma breakpoint family, which resides on chromosome 1 of the human genome. NBPF3 is located at 1p36.12, immediately upstream of genes ALPL and RAP1GAP.
E3 ubiquitin-protein ligase RNF128 is an enzyme that in humans is encoded by the RNF128 gene.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.
Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.
Coiled-coil domain-containing protein 144A is a protein that in humans is encoded by the CCDC144A gene. An alias of this gene is called KIAA0565. There are four members of the CCDC family: CCDC 144A, 144B, 144C and putative CCDC 144 N-terminal like proteins.
Coiled-coil domain containing 94 (CCDC94), is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
C21orf62 is a protein that, in humans, is encoded by the C21orf62 gene. C21orf62 is found on human chromosome 21, and it is thought to be expressed in tissues of the brain and reproductive organs. Additionally, C21orf62 is highly expressed in ovarian surface epithelial cells during normal regulation, but is not expressed in cancerous ovarian surface epithelial cells.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Coiled-coil domain containing 166 is a protein that in humans is encoded by the CCDC166 gene. Its function is currently unknown. It contains a coiled-coil domain, hence the current origin of its name. It is primarily expressed in the testes.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.