UPF0488

Last updated
C8orf33
Identifiers
Aliases C8orf33 , chromosome 8 open reading frame 33
External IDs MGI: 2152337 HomoloGene: 11320 GeneCards: C8orf33
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_023080

NM_054099
NM_001347540

RefSeq (protein)

NP_075568

NP_001334469
NP_473440

Location (UCSC) Chr 8: 145.05 – 145.07 Mb Chr 15: 76.83 – 76.84 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

UPF0488 is a protein that in humans is encoded by the C8orf33 (Chromosome 8 Open Reading Frame 33) gene. Chromosome 8 open reading frame 33 (C8orf33) is a human protein-coding gene of currently unknown function.

Contents

Tissue and subcellular distribution

The UPF0488 protein is expressed in low-moderate levels in most tissues with some exceptions. [5] It is predicted to localize in the nucleus and mitochondrion, though several orthologs were also predicted to localize in the cytosol; additionally, there is experimental evidence showing that human C8orf33 may localize in the peroxisomes. The expression of this gene is up-regulated after lithium exposure. C8orf33 is significantly up regulated in breast cancer drug treatment. [6]

Post-translational modification

Several post-translational modifications including phosphorylation, methylation, and acetylation are predicted. [7] Additionally, it has several post-translational modifications such as acetylation, methylation, phosphoprotein – this includes amino acid modifications (or modified residues) such as N-acetylalanine, omega-N-methylarginine, and phosphoserine). [8]

Gene

This gene has 5 transcripts (splice variants), 62 orthologues and is a member of 1 Ensembl protein family. This gene is a member of the Human CCDS set: CCDS34974.1 [9] This gene is a member of the Human CCDS set: CCDS34974. C8orf33 expression profile revealed that this gene was over-expressed after lithium exposure. [10]

C8orf33 (UPF0488) has 31 alternatively spliced exons which combine in 13 different transcript variants –X1 variant is the longest and seems to have the greatest identity. Human tissue RNA sequencing of UPF0488.

Transcript

UPF0488 has 5 transcripts splice variants. In terms of common gene haplotype alleles, the frequency of haplotype is 96.3% for one variant site. The primary transcript is 3,593 bp while a similar variant is 1,666 bp. The mRNA secondary structure of 3’ and 5’ UTR’s indicate different fold energies. The 5’ UTR region contains a fold energy of -21.20 and consists of 54 bases, the energy of the bases is -0.393. The 3’UTR region contains a fold energy of -646.10, consisting of 1873 bases – while the energy of the bases is -0.345. [11]

Expression

According to microarray-assessed tissue expression analysis by NCBI GEO, the gene C8orf33 has average expression levels in most tissues save including thyroid gland and parathyroid gland. Expression seems to be low in the pancreas, small intestine and other digestive organs except the kidney which seems relatively higher. [11]

Approximate expression patterns inferred from EST sources. Norway rat putative protein-coding gene. Represented by 30 ESTs from 20 cDNA libraries. EST representation biased toward fetus. Gene expression seems to increase in the obesity-resistant categories

Promoter

The promoter region for c8orf33 covers 1191 base pairs of DNA and contains over 700 potential factor binding sites. Fifteen transcription factors with highly conserved binding sites across multiple species’ promoter regions for c8orf33 were selected and shown (see Annotated Promoter Section). CDF1(Cycling DOF Factor 1) physically interacts with FKF1, CDF1 protein is more stable in FKF1 mutants. [12] Another transcription factor, transcription factor II B (TFIIB) is a general transcription factor that is involved in the formation of the RNA polymerase II preinitiation complex (PIC). [13]

Protein

The Isoelectric point of the protein (UPF0488) is 9.16, given a detailed analysis of isoelectric point according to different scales for individual proteins. The Net Charge had been determined using the values available from the Lehninger's Biochemistry book. The precursor protein has a molecular weight of approximately 24.9925 kDa. This is slightly greater than the average pI of 6.81 for the human proteome. It contains repeats from 149 to 166, and 167 to 186. However, the repeats contain a high degree of degeneracy. [14]

UPF0488 is an alanine rich protein relative to other proteins and low in all other amino acids besides arginine, leucine, and proline.

Homology and evolution

The evolutionary lineage of UPF0488 can be traced as distant as invertebrates with a rate of evolution greater than that of fibrinogen.

Graph shows divergence of UPF0488 in a given time scale compared to fibrinogen and cytochrome c. Analignment using the SDSC Biology Workbench gives a 27.7% match Danio rerio. The ALIGN calculates a global alignment of two sequences, giving a Global alignment score of 215. [15]

The mRNA of UPF0488 has a very high level of degeneracy across organisms. Sequences of very low identity to the human mRNA could only be identified in closely related organisms. However, the protein had far more distant relatives, including several invertebrates. Protein alignments for Homo sapiens UPF0488 was performed using the San Diego Workbench; these alignments were performed against several different taxa including vertebrates such as mammalia, reptilia, aves and invertebrates such as insecta. The protein sequences for UPf0488 are very highly conserved amongst close relatives of homo sapiens such as Gorilla Gorilla Gorilla (Gorilla). The similarity in protein sequence is inversely proportional to divergence (MYA) (table of homologs).

Function

C8orf33 activity was found to be associated with G protein-coupled receptor signaling pathway, neuroactive ligand-receptor interaction, calcium signaling pathway and the regulation of the actin cytoskeleton. The following substances interact with UPF0488: 7,8-dihydro-7,8-dihydroxybenzo(a)pyrene 9,10-oxide, benzo(a)pyrene, methotrexate, and vitamin E. [16] [17]

Pathology

The expression of the UPF0488 gene increases after treatment with cephaloridine, a semisynthetic derivative of cephalosporin C that inhibits gluconeogenesis in both target (kidney) and non-target (liver) organs. [12]

Related Research Articles

<span class="mw-page-title-main">C11orf49</span> Protein-coding gene in the species Homo sapiens

C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.

<span class="mw-page-title-main">METTL26</span> Protein-coding gene in the species Homo sapiens

METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.

<span class="mw-page-title-main">Coiled-coil domain containing 42B</span> Protein found in humans

Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.

<span class="mw-page-title-main">TMEM249</span> Protein-coding gene in the species Homo sapiens

TMEM249 is a protein that in humans is encoded by the C8orfk29 gene.

<span class="mw-page-title-main">Zinc finger protein 684</span> Protein found in humans

Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.

<span class="mw-page-title-main">FANCD2OS</span> Protein-coding gene in the species Homo sapiens

Fanconi Anemia Opposite Strand Transcript protein is a predicted protein that in humans is encoded by the FANCD2OS gene. The name is derived from mRNA transcribed from the strand complementary to the FANCD2 gene.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">TMEM169</span> Gene

Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">SLC66A3</span> Entry on the gene SLC66A3

Solute carrier family 66 member 3 is a gene in humans that encodes the protein SLC66A3. The function of the SLC66A3 protein is not yet well understood but belongs to a family of five evolutionarily related proteins, the SLC66 lysosomal amino acid transporters. SLC66A3 is localized to the endoplasmic reticulum and has four transmembrane domains.

<span class="mw-page-title-main">GPATCH2L</span> It is Wikipedia article of unknown gene called "GPATCH2L".

GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.

<span class="mw-page-title-main">C5orf22</span> Protein-coding gene in the species Homo sapiens

Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).

<span class="mw-page-title-main">THAP3</span> Protein in Humans

THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000182307 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000063236 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "C8orf33 chromosome 8 open reading frame 33 [Homo sapiens]". Entrez Gene.
  6. Ma C, Chen HI, Flores M, Huang Y, Chen Y (2013). "BRCA-Monet: a breast cancer specific drug treatment mode-of-action network for treatment effective prediction using large scale microarray database". BMC Systems Biology. 7 (Suppl 5): S5. doi: 10.1186/1752-0509-7-S5-S5 . PMC   4029357 . PMID   24564956.
  7. "UPF0488 protein C8orf33 [Homo sapiens]". Entrez Protein.
  8. "C8orf33". Gene Cards.
  9. "Gene: C8orf33 ENSG00000182307". Ensembl. European Bioinformatics Institute – European Molecular Biology Laboratory.
  10. Aitchison K, Serretti A, Goldman D, Curran S, Drago A, Malhotra AK (2009). "The 8th annual pharmacogenetics in psychiatry meeting report". The Pharmacogenomics Journal. 9 (6): 358–61. doi:10.1038/tpj.2009.47. PMC   2945913 . PMID   19841640.
  11. 1 2 "C8orf33 tissue". The Human Protein Atlas.
  12. 1 2 Goldstein RS, Smith PF, Tarloff JB, Contardi L, Rush GF, Hook JB (1988). "Biochemical mechanisms of cephaloridine nephrotoxicity". Life Sciences. 42 (19): 1809–16. doi:10.1016/0024-3205(88)90018-5. PMID   3285106.
  13. Lewin B (2004). Genes VIII (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall. pp.  636–637. ISBN   978-0-13-143981-8.
  14. "Detection and alignment of repeats in protein sequences". Radar.
  15. Myers EW, Miller W (1988). "Optimal alignments in linear space". Computer Applications in the Biosciences. 4 (1): 11–7. doi:10.1093/bioinformatics/4.1.11. PMID   3382986.
  16. "C8ORF33". Comparative Toxicogenomics Database.
  17. Squassina A, Manchia M, Del Zompo M (2010). "Pharmacogenomics of mood stabilizers in the treatment of bipolar disorder". Human Genomics and Proteomics. 2010: 159761. doi: 10.4061/2010/159761 . PMC   2958627 . PMID   20981231.