C8orf33 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C8orf33 , chromosome 8 open reading frame 33 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2152337 HomoloGene: 11320 GeneCards: C8orf33 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
UPF0488 is a protein that in humans is encoded by the C8orf33 (Chromosome 8 Open Reading Frame 33) gene. Chromosome 8 open reading frame 33 (C8orf33) is a human protein-coding gene of currently unknown function.
The UPF0488 protein is expressed in low-moderate levels in most tissues with some exceptions. [5] It is predicted to localize in the nucleus and mitochondrion, though several orthologs were also predicted to localize in the cytosol; additionally, there is experimental evidence showing that human C8orf33 may localize in the peroxisomes. The expression of this gene is up-regulated after lithium exposure. C8orf33 is significantly up regulated in breast cancer drug treatment. [6]
Several post-translational modifications including phosphorylation, methylation, and acetylation are predicted. [7] Additionally, it has several post-translational modifications such as acetylation, methylation, phosphoprotein – this includes amino acid modifications (or modified residues) such as N-acetylalanine, omega-N-methylarginine, and phosphoserine). [8]
This gene has 5 transcripts (splice variants), 62 orthologues and is a member of 1 Ensembl protein family. This gene is a member of the Human CCDS set: CCDS34974.1 [9] This gene is a member of the Human CCDS set: CCDS34974. C8orf33 expression profile revealed that this gene was over-expressed after lithium exposure. [10]
C8orf33 (UPF0488) has 31 alternatively spliced exons which combine in 13 different transcript variants –X1 variant is the longest and seems to have the greatest identity. Human tissue RNA sequencing of UPF0488.
UPF0488 has 5 transcripts splice variants. In terms of common gene haplotype alleles, the frequency of haplotype is 96.3% for one variant site. The primary transcript is 3,593 bp while a similar variant is 1,666 bp. The mRNA secondary structure of 3’ and 5’ UTR’s indicate different fold energies. The 5’ UTR region contains a fold energy of -21.20 and consists of 54 bases, the energy of the bases is -0.393. The 3’UTR region contains a fold energy of -646.10, consisting of 1873 bases – while the energy of the bases is -0.345. [11]
According to microarray-assessed tissue expression analysis by NCBI GEO, the gene C8orf33 has average expression levels in most tissues save including thyroid gland and parathyroid gland. Expression seems to be low in the pancreas, small intestine and other digestive organs except the kidney which seems relatively higher. [11]
Approximate expression patterns inferred from EST sources. Norway rat putative protein-coding gene. Represented by 30 ESTs from 20 cDNA libraries. EST representation biased toward fetus. Gene expression seems to increase in the obesity-resistant categories
The promoter region for c8orf33 covers 1191 base pairs of DNA and contains over 700 potential factor binding sites. Fifteen transcription factors with highly conserved binding sites across multiple species’ promoter regions for c8orf33 were selected and shown (see Annotated Promoter Section). CDF1(Cycling DOF Factor 1) physically interacts with FKF1, CDF1 protein is more stable in FKF1 mutants. [12] Another transcription factor, transcription factor II B (TFIIB) is a general transcription factor that is involved in the formation of the RNA polymerase II preinitiation complex (PIC). [13]
The Isoelectric point of the protein (UPF0488) is 9.16, given a detailed analysis of isoelectric point according to different scales for individual proteins. The Net Charge had been determined using the values available from the Lehninger's Biochemistry book. The precursor protein has a molecular weight of approximately 24.9925 kDa. This is slightly greater than the average pI of 6.81 for the human proteome. It contains repeats from 149 to 166, and 167 to 186. However, the repeats contain a high degree of degeneracy. [14]
UPF0488 is an alanine rich protein relative to other proteins and low in all other amino acids besides arginine, leucine, and proline.
The evolutionary lineage of UPF0488 can be traced as distant as invertebrates with a rate of evolution greater than that of fibrinogen.
Graph shows divergence of UPF0488 in a given time scale compared to fibrinogen and cytochrome c. Analignment using the SDSC Biology Workbench gives a 27.7% match Danio rerio. The ALIGN calculates a global alignment of two sequences, giving a Global alignment score of 215. [15]
The mRNA of UPF0488 has a very high level of degeneracy across organisms. Sequences of very low identity to the human mRNA could only be identified in closely related organisms. However, the protein had far more distant relatives, including several invertebrates. Protein alignments for Homo sapiens UPF0488 was performed using the San Diego Workbench; these alignments were performed against several different taxa including vertebrates such as mammalia, reptilia, aves and invertebrates such as insecta. The protein sequences for UPf0488 are very highly conserved amongst close relatives of homo sapiens such as Gorilla Gorilla Gorilla (Gorilla). The similarity in protein sequence is inversely proportional to divergence (MYA) (table of homologs).
C8orf33 activity was found to be associated with G protein-coupled receptor signaling pathway, neuroactive ligand-receptor interaction, calcium signaling pathway and the regulation of the actin cytoskeleton. The following substances interact with UPF0488: 7,8-dihydro-7,8-dihydroxybenzo(a)pyrene 9,10-oxide, benzo(a)pyrene, methotrexate, and vitamin E. [16] [17]
The expression of the UPF0488 gene increases after treatment with cephaloridine, a semisynthetic derivative of cephalosporin C that inhibits gluconeogenesis in both target (kidney) and non-target (liver) organs. [12]
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
TMEM249 is a protein that in humans is encoded by the C8orfk29 gene.
Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.
Fanconi Anemia Opposite Strand Transcript protein is a predicted protein that in humans is encoded by the FANCD2OS gene. The name is derived from mRNA transcribed from the strand complementary to the FANCD2 gene.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Solute carrier family 66 member 3 is a gene in humans that encodes the protein SLC66A3. The function of the SLC66A3 protein is not yet well understood but belongs to a family of five evolutionarily related proteins, the SLC66 lysosomal amino acid transporters. SLC66A3 is localized to the endoplasmic reticulum and has four transmembrane domains.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.