Putative uncharacterized protein C6orf52 (C6orf52) is a protein in humans that is encoded by the gene "C6orf52" and has six known isoforms. [1] C6orf52 was identified in 2002 by The National Institutes of Health Mammalian Gene Collection (MGC) Program. [2] C6orf52 has one known paralog, tRNA selenocysteine 1-associated protein 1 (TRNAU1AP). [3]
The cytogenetic location of C6orf52 is 6p24.2 on the shorthand of chromosome 6. [4] It is 23,379 nucleotides long, spanning from nucleotide 10671418 to 10694797 and has a molecular weight of 17,383 Da with 9 different exons. C6orf52 has no common aliases although the major protein product is sometimes referred to as "Q5T4I8". [5]
C6orf52 is known to undergo alternative splicing and has six known isoforms of varying length.
Q5T4I8 has six known isoforms of varying amino acid length.
Isoform | Polypeptide Length |
---|---|
X1 | 207 |
X2 | 182 |
X3 | 177 |
X4 | 176 |
X5 | 126 |
X6 | 93 |
The protein composition is relatively high in glutamic acid and serine residue levels and is relatively low in tryptophan and arginine when compared to the average human protein composition. [6] [7]
C6orf52 has two commonly predicted post-translational modifications present in the highly conserved domain. [8] [9] The lysine at position 123 (of the major protein) within the highly conserved domain is expected to undergo sumoylation often, while the tyrosine at position 128 is expected to undergo phosphorylation. Sumoylation sites allow for the binding of SUMO (small ubiquitin-like modifier protein) which are known to alter different functional parameters of proteins such as subcellular localization, protein parenting, DNA binding and transactivation functions of transcription factors. [10] Tyrosine phosphorylation is associated with many things, namely growth factor signaling and cell differentiation during development which are recurring aspects of C6orf52. [11]
The secondary structure of C6orf52 consists mostly of coiled regions, however there is an extended alpha helix region within the highly conserved domain. [12] [13]
It is predicted to be a non-transmembrane protein that is located within the nucleus. [14]
Tissue expression is highest within the oocyte, with high expression in the testes and female gonad. [15]
Expression is extremely high (2000-3000 transcripts per million) in the first stages of embryonic development up until the blastocyst.
Two proteins in cattle that have been linked to fat or energy metabolism were predicted to be similar to C6orf52, however there is no known clinical study done examining C6orf52. [16]
C6orf52 has one identified paralog, tRNA selenocysteine 1-associated protein 1 (TRNAU1AP), which is located on chromosome one at 1p35.3. [3] TRNAU1AP is involved selenocysteine biosynthesis, selenoproteins synthesis efficiency enhancement and may be involved in the methylation of tRNA(Sec). [17]
C6orf52 is conserved through many species. It can be found it many mammals, reptiles, and birds, such as the Zebra Finch. [18]
Scientific Name | Name | Accession | Sequence Similarity % | Date of Divergence (Estimated MYA) [19] |
---|---|---|---|---|
Sus scrofa | Wild Boar | NP_001138494.1 | 60.976 | 96 |
Taeniopygia guttata | Zebra Finch | XP_004175377.1 | 56.90 | 312 |
Ailuropoda melanoleuca | Giant Panda | XP_019651607.1 | 64.43 | 96 |
Pelodiscus sinensis | Chinese Softshell Turtle | XP_006138812.1 | 42.55 | 312 |
Canis lupus familiaris | Dog | XP_005640089.1 | 64.29 | 96 |
Pan troglodytes | Common Chimpanzee | XP_009448762.2 | 98.03 | 6.65 |
Macaca mulatta | Rhesus macaque | NP_001180810.1 | 94.08 | 29.44 |
There is a domain of high conservation across species starting near the last third of the polypeptide.
Uncharacterized protein C1orf21, also known as Proliferation-Inducing Protein 13, is a protein that in humans is encoded by the C1orf21 gene. C1orf21 is an intracellular protein that flows between the nucleus and the cytoplasm in the cell. It has been linked with cell growth and reproduction and there has been strong links with various types of cancers. There are no paralogs for this gene, however, many conserved orthologs have been found in all invertebrates. C1orf21 has low to moderate level of expression in most tissues in humans, however, it has the most expression in the skin, lung and prostate.
Chromosome 20 open reading frame 111, or C20orf111, is the hypothetical protein that in humans is encoded by the C20orf111 gene. C20orf111 is also known as Perit1, HSPC207, and dJ1183I21.1. It was originally located using genomic sequencing of chromosome 20. The National Center for Biotechnology Information, or NCBI, shows that it is located at q13.11 on chromosome 20, however the genome browser at the University of California-Santa Cruz (UCSC) website shows that it is at location q13.12, and within a million base pairs of the adenosine deaminase locus. It was also found to have an increase in expression in cells undergoing hydrogen peroxide(H
2O
2)-induced apoptosis. After analyzing the amino acid content of C20orf111, it was found to be rich in serine residues.
Transmembrane protein 251, also known as C14orf109 or UPF0694, is a protein that in humans is encoded by the TMEM251 gene. One notable feature of this protein is the presence of proline residues on one of its predicted transmembrane domains., which is a determinant of the intramitochondrial sorting of inner membrane proteins.
WD repeat-containing protein 90 is a protein that, in humans, is encoded by the WDR90 gene (16p13.3). This human protein is 1750 amino acids, and has a molecular weight of 187.7 kDa. It contains multiple WD40 repeat domains and one domain of unknown function. This protein is conserved all the way back to invertebrates. Proteins containing WD transducin repeating domains have been found to play a role in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control, autophagy and apoptosis.
Chromosome 15 open reading frame 52 is a human protein encoded by the C15orf52 gene, its function is poorly understood.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.
Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Uncharacterized protein C17orf50 is a protein which in humans is encoded by the C17orf50 gene.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
Chromosome 3 open reading frame 67 or C3orf67 is a protein that in humans is encoded by the gene C3orf67. The function of C3orf67 is not yet fully understood.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.