Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. [5] C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. [6] C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence. [7] [8]
The gene spans 192,073 base pairs, with 29 different exons. C1orf112 is located at position 1q24.2. C1orf112 shares antisense coding regions with C1orf156 and SCYL3. [9]
There are currently eight experimentally determined RefSeq isoforms. [9] C1orf112 has a domain of unknown function DUF4487. [9]
Compositional analysis through SAPS predicted much less glycine and much more leucine than expected relative to other human protein sequences. This characteristic is conserved across primate orthologs. A mixed charge cluster was found in Isoform X1 from position 747 to 805, indicating that this segment may be aqueous and tightly bound. This mixed charge cluster is only partially conserved across orthologs. [11]
C1orf112 is determined to have 9 transcripts, or splice variants by Ensembl. [12]
Antibody immunocytochemistry and immunofluorescent staining of human cell line A-431 indicates C1orf112 is localized to the mitochondria. [13] [14]
Although tissue-level expression is ubiquitous, C1orf112 is expressed highest in the testes, lymph nodes, brain marrow, and cerebellum, with samples from 97 individual in 27 different tissues. [16] In-situ hybridization of the human transcriptome indicates expression is highest in the atrioventricular node, followed by the testis, testis germ cells, and testis interstitial tissue. [17]
Transcription factor assessment indicates many potential TATA-binding protein and CCAAT-enhancer-binding proteins sites, along with transcription factors associated with the testis, thymus, kidneys, and cardiac tissue. [18]
There are two ubiquitination sites on C1orf112, at position lysine 73 and at position 783 on isoform X1. Downstream of reading frame, there are three polyadenylation signals. In addition, there is an N6-acetyllysine site at leucine 747 and a phosphoserine site at serine 23. [12] C1orf112 has been found experimentally to interact with ATG1, an aldosterone secretion whose overexpression characterizes certain forms of breast cancer. [19] Post-translational modifications predictions include O-glycosyl-oligosaccharide-glycoprotein N-acetylglucosaminyltransferase III and sumoylation, and sumoylation interaction sites. [20] [21]
C1orf112 is predicted to interact with a diverse range of proteins, including multiple mitosis-associated proteins. [22] C1orf112 is also predicted to interact with FIGNL1, a protein involved in DNA double-stranded break repair via homologous recombination. [23] Experimental findings indicate C1orf112 interacts with NUF2, a spindle-pole body protein that plays a critical role in nuclear division, and TTK, a protein kinase capable of phosphorylating serine, threonine, and tyrosine. [24]
C1orf112 is highly conserved in Pan troglodytes, Rhinopithecus bieti, Castor canadensis, Miniopterus natalensis, and other select primates, with percent identity relative to Homo sapien C1orf112, with percent identity greater than 90%. [25] Orthologs with the greatest date of divergence (date of speciation) to human C1orf112 include Trichosporon asahii, a placozoa, and Amphimedon queenslandica, indicated that C1orf112 has been preserved over evolutionary time. [25]
Genus/Species | Common Name | Taxonomic Group | Date of Divergence (Estimated Time) | Accession # | Sequence Length (aa) | % Identity | % Similarity | E Value |
Homo sapiens | Humans | Mammals | 0 | NP_001306976.1 | 853 | 100 | 100 | |
Pan troglodytes | Chimpanzees | Primates | 6.65 mya | XP_009436263.1 | 911 | 99 | 99 | 0 |
Rhinopithecus bieti | Black Stub-Nosed Monkey | Primates | 29.44 mya | XP_017723911.1 | 911 | 97 | 98 | 0 |
Castor canadensis | American Beaver | Rodentia | 90 mya | XP_020026631.1 | 908 | 87 | 92 | 0 |
Miniopterus natalensis | Natural long-fingered bat | Chiroptera | 96 mya | XP_016077003.1 | 912 | 84 | 90 | 0 |
Condylura cristata | Star-Nosed Mole | Soricomorpha | 96 mya | XP_024409392.1 | 531 | 77 | 85 | 0 |
Bos indicus | Zebu | Cetartiodactyla | 96 mya | XP_019832063.1 | 875 | 84 | 90 | 0 |
Acinonyx jubatus | Cheetah | Carnivora | 96 mya | XP_026902260.1 | 912 | 86 | 90 | 0 |
Aptenodytes forsteri | Emperor Penguin | Aves | 312 mya | XP_009271565.1 | 839 | 59 | 76 | 0 |
Chelonia mydas | Green Sea Turtle | Reptilia | 312 mya | XP_007061247.1 | 849 | 64 | 78 | 0 |
Xenopus laevis | African Clawed Frog | Amphibia | 352 mya | XP_018114274.1 | 904 | 55 | 72 | 0 |
Nanorana parkeri | Nanora parkeri | Amphibia | 352 mya | XP_018428126.1 | 698 | 52 | 70 | 0 |
Salmo salar | Atlantic Salmon | Actinopterygii | 435 mya | XP_013979201.1 | 924 | 47 | 64 | 0 |
Helobdella robusta | Earth worm | Clitellata | 797 mya | XP_009029571.1 | 1004 | 25 | 43 | 0 |
Amphimedon queenslandica | Sponge | Porifera | 951.8 mya | XP_019856681.1 | 903 | 28 | 46 | 0 |
Trichosporon asahii | Fungi | Fungi | 1105 mya | XP_014176969.1 | 2588 | 43 | 68 | 0.006 |
Date of divergence was calculated using TimeTree. [26] The E value indicates the number of "hits" one can expect to see by chance when using the NCBI database, with a low E value indicated a significant result. Percent identity is the percentage of character that align to Homo sapien C1orf112 Isoform X1, while percent similarity is the degree of resemblance when the two sequences are aligned with one another. [27]
C1orf112 secondary structure is predicted to be predominately alpha helical, with < 5% of the protein composed of beta sheets. Ligand binding sites are predicted by I-TASSER from positions 377 to 530 in Isoform X1. [28] A leucine zipper motif is present in Isoform X1 from positions 831-852, predicted by MyHits. [29]
C1orf112 was one of many genes found to be co-expressed with cancer-associated genes, and the knockdown of this gene in a HeLa cell line suppressed growth. [30]
Intermediate filament family orphan 1 is a protein that in humans is encoded by the IFFO1 gene. IFFO1 has uncharacterized function and a weight of 61.98 kDa. IFFO1 proteins play an important role in the cytoskeleton and the nuclear envelope of most eukaryotic cell types.
Chromosome 9 open reading frame 152 is a protein that in humans is encoded by the C9orf152 gene. The exact function of the protein is not completely understood.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.
Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
Chromosome 4 Open Reading Frame 45 (C4orf45) is a protein which in humans is encoded by the C4orf45 gene. It is predicted to be localized in the cytoplasm and nucleus of a cell
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Chromosome 12 open reading frame 71 (c12orf71) is a protein which in humans is encoded by c12orf71 gene. The protein is also known by the alias LOC728858.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.