C8orf34 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C8orf34 , VEST-1, VEST1, chromosome 8 open reading frame 34 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2444149 HomoloGene: 14194 GeneCards: C8orf34 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C8orf34 is a protein that, in Homo sapiens , is encoded by the C8orf34 gene. [4] Aliases for C8orf34 include vestibule-1 or VEST-1. Within the cell, C8orf34 is localized to the nucleus and nucleoli where it may play a role in the regulation of gene expression as well as the cell cycle.
The C8orf34 gene is located on the positive-sense strand of chromosome 8 at locus 8q13.2. On the NCBI genome assembly GRCh38.p12, it spans from 68330373 to 68819023. [5] It is 635 kbp in length and contains 14 exons. Among the seven possible transcripts for C8orf34, the longest is 2452 base pairs, encoding for 538 amino acids. [6]
Several gene loci lie near the C8orf34 gene along chromosome 8. While many of these are non-functional pseudogenes, a few of these gene neighbors are functional and protein-coding. The nearest protein-encoding gene to C8orf34 is PREX2, a guanine-nucleotide exchange factor for the Rac family of G proteins. [7] This protein is involved in insulin signalling pathways. Mutations in and overexpression of the PREX2 gene have been observed in some cancers. [8]
Gene | Location | Function | NCBI Gene ID |
---|---|---|---|
PREX2 | 67951918...68237033 | facilitates the exchange of GDP for GTP on Rac1 (a GTPase) | 80243 [7] |
LOC105375888 | 68082051...68095535 | uncharacterized | 105375888 [9] |
LOC107986951 | 68849606...68858076 | uncharacterized | 107986951 [10] |
LOC108004543 | 68973432...68976574 | non-coding, known to undergo non-allelic homologous recombination (NAHR) with another region | 108004543 [11] |
Within the cell, C8orf34 is expressed primarily in the nucleus. C8orf34 protein lacks a signal peptide to allow it to sort outside of the nuclear membrane or to other organelles. An analysis via PSORT II concluded that C8orf34 is localized to the nucleus 94.1% reliability. [12] This nuclear localization suggests that C8orf34 protein may have a function related to the expression and regulation of genes in the nucleus. Alternatively, it may be involved in the maintenance and protection of the cell's genetic material.
C8orf34 is expressed in a wide array of tissues, including the kidney, stomach, thymus, pituitary gland, ear, and brain. [6] [14] In the brain, C8orf34 is expressed in the dentate gyrus, epithalamus, and medulla. [15] In the mouse brain, an orthologous C8orf34 is expressed highly in the granule layer of the dentate gyrus, the somatosensory areas of the cerebral cortex and in the amygdala. [16]
Several different transcription factors regulate the expression of the C8orf34 gene. Many of these transcription factors are related to regulation of the cell's progression through the cell cycle and longevity, suggesting that C8orf34 performs a function related to these processes. [17]
Transcription factor | Function |
---|---|
OCT1 | Involved in the cell cycle regulation of histone H2B gene transcription and in the transcription of other cellular housekeeping genes. [18] |
STAT3 | Involved in the expression of genes that progress cell cycle from G1 to S phase. Acts as a regulator of inflammatory response by regulating differentiation of naive CD4+ T-cells into T-helper Th17 or regulatory T-cells (Treg). [19] |
HSF1 | Rapidly induced after temperature stress and binds heat shock promoter elements (HSE). This protein plays a role in the regulation of lifespan. [20] |
MZF1 | Expressed in hematopoietic progenitor cells that are committed to myeloid lineage differentiation. It contains 13 C2H2 zinc fingers arranged in two domains that are separated by a short glycine- and proline-rich sequence. [21] |
The protein product of the C8orf34 gene is 538 amino acids in length, with a predicted molecular weight of 59kDa and an isoelectric point of 5.9. [22] At the cellular level, several pieces of evidence support the conclusion that C8orf34 plays a role in gene expression regulation and regulation of the cell cycle.
C8orf34 has a domain entitled "Dimerization-anchoring domain of cAMP-dependent protein kinase regulatory subunit" that spans residues 94 to 133. [23] Proteins with this domain are subunits of a multimer protein kinase. [24] The negatively-charged region within the middle of the protein may indicate the site of a coordination with a metal ion, a common structure in proteins that interact with DNA, including zinc-finger proteins. [25]
C8orf34 protein undergoes few modifications following translation. C8orf34 protein is not cleaved after translation. There are eight sites along the protein that are likely candidates for glycosylation and 27 probable sites for phosphorylation. There are four predicted SUMOylation sites in C8orf34. [26] Each of these post-translational modifications is expected to have some effect on the protein. O-glycosylation may influence the sorting of a protein and the protein's conformation. [27] In some cases, glycosylation may play a role in adhesion and immunological processes. [28] Phosphorylation of amino acid residues may serve to activate or deactivate the functional domain of C8orf34. [29] SUMOylation sites are residues that SUMO (small ubiquitin-like modifier) proteins can bind to modify the protein's function. [30] SUMO proteins may modify proteins to perform many functions, including nuclear-cytosolic transport, transcriptional regulation, progressing through the cell cycle, and even apoptosis. [31]
The secondary structure of C8orf34 is predicted to consist mostly of free random coils with alpha helices being the dominant organized structure. [33] Alpha helices are a common motif in proteins that regulate gene expression and may support this function in C8orf34. [34] The structure prediction and analysis application Phyre2 reported that a portion of C8orf34 has close structural similarity with the yeast methyltransferase H3K4, an enzyme that influences gene expression by catalyzing methylation of DNA. [35] [36]
Software-based predictions and experimental results yield several possibilities as to the function of C8orf34. The high frequency of alpha helices may indicate a few things about C8orf34's function. Alpha helices are commonly found in DNA-binding motifs of proteins, including helix-turn-helix motifs and zinc finger motifs. As C8orf34 is localized to the nucleus, the presence of alpha helices further supports the possibility that it is involved in gene regulation and expression. [37] The protein kinase dimerization domain within C8orf34 in combination with its presence in the nucleus may indicate that it is a type of histone kinase. [38]
C8orf34 has been carried across evolutionary events and is observed being expressed as an orthologous protein in several animal clades. There are no observed paralogs for C8orf34 within the human genome as the result of a gene duplication event. [39]
Orthologs of C8orf34 exist in many species. C8orf34 seems to have appeared first in cnidarians, with sea anemones holding its most distant ortholog. An ortholog most similar in structure and function to human C8orf34 likely arose in aquatic chordates, as there appears to be a higher level of identity beginning with sharks. There is no similar homolog of C8orf34 present in arthropods. [39] This clade may have evolved to no longer need C8orf34 for whatever function it served. Alternatively, arthropod species may have a substitute for C8orf34 that performs a similar function.
Organism | Scientific Name | NCBI Accession [39] | Identity % | Seq Length | Est Time of Divergence (MYA) [40] |
---|---|---|---|---|---|
Human | Homo sapiens | NP_443190.2 | 100.00% | 538 | 0.00 |
Gorilla | Gorilla gorilla gorilla | XP_004047177.2 | 99.44% | 538 | 9.06 |
Chimpanzee | Pan troglodytes | NP_001186058.1 | 99.26% | 538 | 6.65 |
Dog | Canis lupus familiaris | NP_001182595.1 | 91.59% | 451 | 96.00 |
Mouse | Mus musculus | NP_001153841.1 | 90.71% | 462 | 90.00 |
Chinchilla | Chinchilla lanigera | XP_013373625.1 | 90.48% | 456 | 90.00 |
Cat | Felis catus | XP_019678323.2 | 88.13% | 537 | 96.00 |
Horse | Equus caballus | XP_023504264.1 | 86.43% | 534 | 96.00 |
Thirteen-lined ground squirrel | Ictidomys tridecemlineatus | XP_021580557.1 | 85.53% | 538 | 90.00 |
Chicken | Gallus gallus | XP_025003758.1 | 83.73% | 620 | 312.00 |
American Alligator | Alligator mississippiensis | XP_019354134.1 | 82.20% | 678 | 312.00 |
White-throat sparrow | Zonotrichia albicollis | XP_026647522.1 | 79.78% | 657 | 312.00 |
Western clawed frog | Xenopus tropicalis | XP_002935369.2 | 77.23% | 621 | 352.00 |
Common box turtle | Terrapene mexicana triunguis | XP_026503128.1 | 77.21% | 414 | 312.00 |
Australian ghostshark | Callorhinchus milii | XP_007885522.1 | 70.80% | 709 | 473.00 |
Zebrafish | Danio rerio | XP_005162763.1 | 70.65% | 626 | 435.00 |
Lamp Shell | Lingula anatina | XP_013381780.1 | 30.73% | 517 | 797.00 |
C. teleta | Capitella teleta | ELU06153.1 | 29.00% | 516 | 797.00 |
Eastern Oytster | Crassostrea virginica | XP_022341487.1 | 26.91% | 500 | 797.00 |
Exaiptasia (sea anemone) | Exaiptasia pallida | XP_020895362.1 | 26.65% | 548 | 824.00 |
Yeast two hybrid experimentation has revealed that C8orf34 interacts with a number of proteins insular to the nucleus. [41] The protein has been shown to interact with ubiquitin C, a precursor protein to polyubiquitin, which functions to lead various effects in the cell cycle depending on the residues it conjugates to. C8orf34 has also demonstrated interactions with MTUS2 (microtubule associated tumor suppressor candidate 2). There is not much information available about this protein candidate, but it is likely to be involved in tumor-suppression functions and cell cycle regulation. [42] C8orf34 also interacts with MCM7 (mini chromosome maintenance complex component 7), part of a protein complex that functions in the Initiation of eukaryotic genome replication during the cell cycle. [43] C8orf34's interactions with these proteins support the conclusion that it is involved in transcription regulation and cell cycle progression.
Studies have determined that C8orf34 has associations with several diseases. Mutations within C8orf34 are associated with risk for diarrhea and neutropenia in patients receiving chemotherapy. [44] A translocation causing a fusion of the C8orf34 gene with the MET protooncogene has been found in tissue sample of patients with papillary renal carcinoma. [45] A Japanese patent application currently cites a procedure claimed to be able to scan for mutations in C8orf34 as a method for the detection of a congenital disease causing hardness of hearing. [46]
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Retrotransposon Gag Like 6 is a protein encoded by the RTL6 gene in humans. RTL6 is a member of the Mart family of genes, which are related to Sushi-like retrotransposons and were derived from fish and amphibians. The RTL6 protein is localized to the nucleus and has a predicted leucine zipper motif that is known to bind nucleic acids in similar proteins, such as LDOC1.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
WD repeat containing protein 53 (WDR53) is a protein encoded by the WDR53 gene that has been identified in the human genome by the Human Genome Project but has, at the moment, lacked experimental procedures to understand the function. It is located on chromosome 3 at location 3q29 in Homo sapiens. It has short up and down stream untranslated regions as well as WD40 repeat regions which have been linked to various functions.
Uncharacterized protein Chromosome 1 Open Reading Frame 27 is a protein in humans, encoded by the C1orf27 gene. It is accession number NM_017847. This is a membrane protein that is 3926 base pairs long with the most extensive string of amino acids being 454aa long. C1orf27 exhibits cytoplasmic expression in epidermal tissues. Predicted associated biological processes of the gene include cell fate specification and developmental properties.
Forkhead-associated domain containing protein 1 (FHAD1) is a protein encoded by the FHAD1 gene.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Tubulin epsilon and delta complex 2 (TEDC2), also known as Chromosome 16 open reading frame 59 (C16orf59), is a protein that in humans is encoded by the TEDC2 gene. Its NCBI accession number is NP_079384.2.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
C2orf80 is a protein that in humans is encoded by the c2orf80 gene. The gene c2orf80 also goes by the alias GONDA1. In humans, c2orf80 is exclusively expressed in the brain. While relatively little is known about the function of c2orf80, medical studies have shown a strong association between variations in c2orf80 and IDH-mutant gliomas, 46,XY gonadal dysgenesis, and a possible association with blood pressure.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
{{cite book}}
: CS1 maint: others (link){{cite book}}
: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)