C16orf86 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C16orf86 , chromosome 16 open reading frame 86 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1918296 HomoloGene: 19274 GeneCards: C16orf86 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. [5] It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. [5] For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
C16orf86 protein function is still not well understood, however, based on the DNA microarray data and the post-translational modifications data below, this protein could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles.
C16orf86 has tissue expression high in the testes along with expression in regions such as the kidney, colon, brain, fat, spleen, liver. [5]
C16orf86 microarray data was found using NCBI UniGene and going to GeoProfiles for C16orf86. [6] This data below shows C16orf86 tissue expression patterns for cell cycle regulation in kidney cells, colon cancer cells, and adipose tissue.
This DNA microarray figure below was done on MIF deficient cells and control cells using cDNA. [7] [8] Results showed that the MIF cytoplasmic protein is a regulator for promoting cell proliferation and cell cycle progression in kidney cells, for example, HEK293. [7] [8] [9] When MIF is inhibited, P53 blocks cell cycle of G1/S phase progression. Also, inhibition of E2F and AP1 and activation of P53 contribute to cell cycle regulators that result in cell cycle arrest at the G0/G1 phase in MIF cells. These are transcription factors in the C16orf86 promoter. E2F is important for cell cycle progression with AP1 and these are blocked by MIF and P53 takes over. C16orf86 could be important in cell cycle progression in the kidney, where it is expressed in the tissues.
This DNA microarray figure below shows purified T98G Glioblastoma Cells that were cycled, G0 arrested, or released into S phase for 10 to 16 hours. [10] [11] The researchers tested to see how the mechanism of PRB, p107, and p130 represses the E2F target genes and how P130 complex interacts with Dp, RB like, and other E2F transcription factors to help module DREAM in cell cycle arrest. [12] The results showed that the E2F4 along with P130 and other transcription factors mediate the repression of the cell cycle from G1 cell to G0. If there is activation, S phase is going to bind E2F1/2/3 with other transcription factors to activate transcription in the cell cycle. [12] C16orf86 could be important in cell cycle progression in the brain due to the E2F4 and the E2F1/2/3 transcription factors being located in its promoter sequence.
This DNA microarray experiment below uses the idea of Infinium HumanMethylation450 BeadChip arrays with GWAS to figure out the DNA methylation profiles at day 3, day 8, and day 15 for skeletal myoblasts. [13] [14] This DNA methylation at day 3, day 8, and day 15 for skeletal myoblasts profiles were used to study myogenic cell differentiation. [15] The results showed that methylation patterns do indeed affect myogenic cell differentiation. One of the transcription factors tested in this experiment in particular, as pertaining to one of the transcription factors in the experiment, MYF6, it is a transcription factor that is located in C16orf86 promoter. [15] This transcription factor are supposed to be down-regulated during muscle cell differentiation. [15] This can be seen when first introduced with the stimulus and never being able to reach its top peak. This could mean that C16orf86 could be muscle cell differentiation in skeletal myoblast cells. [15]
Protein C16orf86 is mainly localized in the nucleus along with being in the cytoplasm, mitochondria, and endoplasmic reticulum. This result were found using the protein tool on Expasy called PSORTII. [16] This tool was used to put in sequence data along with comparing the results to its distant orthologs of Weddell seal and red fox. [17] [18]
C16orf86 (Chromosome 16 Open Reading Frame 86) is a gene found on the long arm of chromosome 16 at position q22.11. [5] It has a genomic sequence that starts at 67,667,030 base pair and ends at base pair 67,668,590. [19] Its genomic sequence is read in the forward direction with the positive strand. [5]
C16orf86 is part of the ENKD1 region. [5] This region contains 3 genes with the ENKD1 protein along with its isoforms ENKD1 isoform X1 and ENKD1 isoform X2. [5] [20] Other genes located near C16orf86 are GFOD2 to the right, ACD to the left, and PARD6A to the left. [5]
C16orf86 has a total of 4 Exon regions within its protein sequence. [5] [19] The first exon boundary is located at amino acid 34 and 35 within base pairs G and T. Then, the second exon boundary is located at amino acid 111 and 112 within base pairs A and G. Finally, the third exon boundary is located between amino acid 184 and 186 within base pairs C and G. [19]
C16orf86 has a total of 3 Intron regions within its protein sequence. [19]
C16orf86 spans 317 amino acids long and starts transcription at a amino acid 1 Methionine and goes until amino acid 317, which is a stop codon. [19] [21]
There are 2 isoforms of C16orf86, which is uncharacterized protein C16orf86 isoform X1 and uncharacterized protein C16orf86 isoform X2. [5]
uncharacterized protein C16orf86 isoform X1 has a span of 332 amino acids long and has a total of 2 exon regions and 1 intron region. [22] [23]
uncharacterized protein C16orf86 isoformX2 has a span of 326 amino acids long and has a total of 4 exon's and 3 introns regions. [24] [25]
There are three different promoter sequences in C16orf86. These promoter sequences were found using the tool on Genomatix called Gene2Promoter for C16orf86. [26] These promoter sequences were each compared to C16orf86 distant ortholog promoters with the human C16orf86 human protein sequence in the program Clustal Omega multiple sequence alignment. [27] The results had promoter GXP_107609 match more closely in its sequence compared to the GXP_7544221 promoter and the GXP_6033384 promoter. [26]
Promoter for C16orf86 protein (GXP_107609) had transcription factor binding sites that were found using the Genomatix tool Gene2Promoter and clicking on analyze binding sites. [26] Binding sites were chosen based on a high matrix score along with a high amount of occurrences within the promoter. [26] The transcription factors that was in the conserved regions of the promoter sequence for C16orf86 (GXP_107609) was MYF3, MYF4, E2F, and CCCTC binding factor. [26] These transcription factors all deal with cell cycle regulation.
For C16orf86, there was a multiple sequence alignment done on Clustal Omega for 5'UTRs for orangutans, gorillas, chimpanzees, macaque, and humans. [27] The results of the MSA was compared with figures of the structure of the 5'UTR. These figures were created using the bioinformatics tool called m-fold [28] The sequences that stood out in the 5'UTR compared within the MSA is base pairs 105 to 113. These regions could have a stem-loop region pertaining to a certain function or dealing with protein interactions.
For C16orf86, there was a multiple sequence alignment done on Clustal Omega for 5'UTRs for orangutans, gorillas, chimpanzees, macaque, and humans. [27] The results of the MSA was compared with figures of the structure of the 3'UTR. These figures were created using the bioinformatics tool called m-fold. [28] The sequences that stood out in the 3'UTR compared within the MSA is base pairs 1294 to 1300. These regions could have a stem loop region pertaining to certain function or dealing with protein interactions.
C16orf86 has found to have a molecular weight of 33.5 kilodaltons and a PI of 5.30. [29]
C16orf86 protein sequence is rich in Proline and Glutamate having a total of 39 Proline's (P) and 39 Glutamate's (E). [30] In addition, C16orf86 has low amino acid regions of Asparagine (N), Threonine (T), Isoleucine (I), and Phenylalanine (F). [30] These regions have 3 Asparagine's, 9 Threonine's, 2 Isoleucine's, and 1 Phenylalanine. [30] This makes the protein acidic with a low PH.
C16orf86 contains Domain of Unknown Function (DUF4691) from amino acid 1 to 184 and a Nuclear Localization Signal from amino acids 105–109. [31] [32] This figure was created using the Expasy prosite tool. [33]
For the C16orf86 protein, there is a nuclear localization signal that is from amino acid 105 to 109 and is composed of (PKRKP) in the forward direction. [16] This pattern is conserved and seen in humans and its distant orthologs such as the red fox and Weddell seal. [16]
C16orf86 overall has a high census of alpha helices compared to beta sheets. For the predicting location of alpha helices and beta sheets, Phyre 2 was used. For the alpha helices, there is a high-level prediction for amino acids 187–199, 231–244, 265–270, and 294–307. [34] In addition to the alpha helices, there is a high level of prediction for beta strands at amino acids 96–97. [34]
The tertiary structure for C16orf86 PDB file was taken from Phyre2 and I-Tasser. [34] [35] The PDB files were put into EZmol bioinformatics tool to create the tertiary structure. [36] This figure has amino acids labeled with sites that pertain to Phosphorylation, Nuclear Localization Signaling, and Nuclear Export Signaling.
C16orf86 post-translational modifications were found using protein modification tools from Expasy. [32] For this protein, the sites that were most intriguing for this protein was its nuclear export signals (L rich regions), Nuclear localization signals, and phosphorylation sites. The nuclear localization signals and export signals allow for this protein to become localized within the cell's nucleus. In addition, this protein sequence has phosphorylation sites for CDK5, GSK3, P38MAPK, PKA, PKC, CDC2, ATM, CKII, and DNAPK. These all play a specific role in cell cycle regulation. There is also a conceptual translation for C16orf86 below with the rest of the post-translation modifications.
The orthologs were sorted by increasing data of divergence and sequence similarity
Genus | Species | Common name | Taxonomic group | Date of divergence (MYA) | Accession number | Sequence length (AA) | Sequency identity to human | Sequence similarity to human |
---|---|---|---|---|---|---|---|---|
Homo | sapiens | Humans | Primates | 0.00 | NP_001013002.2 | 317 | 100.00% | 100.00% |
Pongo | abelii | Sumatran orangutan | Primates | 15.20 | XP_002826596.1 | 318 | 95.00% | 96.00% |
Rhinopithecus | bieti | Black snub-nosed monkey | Primates | 28.10 | XP_017707751.1 | 314 | 92.00% | 94.00% |
Otolemur | garnettii | Northern greater galago | Primates | 73.00 | XP_003799435.1 | 319 | 74.00% | 79.00% |
Ochotona | princeps | American pika | Lagomorphas | 88.00 | XP_004584223. | 417 | 60.00 | 65.00% |
Cricetulus | griseus | Chinese hampster | Rodentias | 88.00 | XP_007647376.1 | 324 | 64.35% | 71.00% |
Castor | canadensis | American beaver | Rodentias | 88.00 | XP_020026748.1 | 328 | 67.00% | 73.00% |
Sorex | araneus | Common shrew | Soricomorphas | 94.00 | XP_004600963.1 | 320 | 63.87% | 70.00% |
Rousettus | aegyptiacus | Egyptian fruit bat | Chiropteras | 94.00 | XP_016019485.1 | 339 | 64.81% | 71.00% |
Leptonychotes | weddellii | Weddell seal | Carnivoras | 94.00 | XP_006749032.1 | 324 | 67.68% | 72.00% |
Vulpes | vulpes | Red fox | Carnivoras | 94.00 | XP_025867300.1 | 325 | 70.46% | 70.00% |
Ovis | aries | Sheep | Artiodactylas | 94.00 | XP_027833899.1 | 329 | 70.61% | 76.00% |
Elephantulus | edwardii | Cape elephant shrew | Macroscelideas | 102.00 | XP_006878955.1 | 298 | 58.12% | 61.00% |
Vombatus | ursinus | Common wombat | Marsupials | 160.00 | XP_027703451.1 | 281 | 52.00% | 61.00% |
Aptenodytes | forsteri | Emperor penguin | Birds | 320.00 | XP_009289088.1 | 262 | 37.00% | 42.00% |
Pogona | vitticeps | Central bearded dragon | Reptiles | 320.00 | XP_020667121.1 | 266 | 40.00% | 52.00% |
Notechis | scutatus | Tiger snake | Reptiles | 320.00 | XP_026531742.1 | 266 | 42.00% | 50.00% |
Python | bivittatus | Burmese python | Reptiles | 320.00 | XP_025026382.1 | 267 | 44.00% | 54.00% |
Latimeria | chalumnae | West Indian Ocean coelacanth | Fish | 414.00 | XP_014342026.1 | 275 | 40.00% | 48.00% |
Rhincodon | typus | Whale shark | Fish | 465.00 | XP_020387814.1 | 242 | 29.00% | 44.00% |
After conducting a search with NCBI Blast and after finding no paralog sequences similar to C16orf86 in BLAT, it was confirmed that C16orf86 does not have any paralogs. Only isoforms were shown below for the sequence, but no full sequences.
C16orf86 orthologs include dogs, chimpanzee, cows, rats, mice, and chimpanzees. [37] [38]
Ortholog space: C16orf86 orthologs include only placental mammals. This means there are no other mammal groups, birds, fungi, archaea, protists, reptiles, plants, or any other invertebrate species that are orthologs to C16orf86. The most distant ortholog in the placental mammal group, macroscelidea, was the most diverged species from C16orf86, which was 102 million years ago. [39]
The most distant homologs with partial sequences to C16orf86 include marsupial mammals, reptiles, and fish. The furthest homolog for C16orf86 was the whale shark that diverged 465 million ago from humans. [39]
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
C11orf42 is an uncharacterized protein in homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total. In addition, alternative splicing results in multiple transcript variants. The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations. While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.