UPF0602 Protein C4orf47 | |||||||
---|---|---|---|---|---|---|---|
Identifiers | |||||||
Symbol | c4orf47 | ||||||
Alt. symbols | LOC441054 | ||||||
HGNC | 34346 | ||||||
RefSeq | NM_00107829 | ||||||
UniProt | A7E2U8 | ||||||
Other data | |||||||
Locus | Chr. 4 q35.1 | ||||||
|
UPF0602 is a protein in humans that is encoded by the chromosome 4 open reading frame 47 (c4orf47) gene. [2]
The c4orf47 gene is positioned at 4q35.1 on the plus strand and spans 44,602 base pairs in length (185,405,227...185,449,828). The gene is made up of 12 exons and 11 introns. [3]
There is overlap with two other genes which reside on the negative strand. These genes are UFM1 specific peptidase 2 (UFSP2) and Coiled-coil domain containing 110 (CCDC110)
Another alias for the c4orf47 gene is LOC441054
Transcript variant 1 is the longest experimentally validated variant of c4orf47 mRNA and it encodes for UPF0602 protein isoform 1. This variant contains a total of 8 exons with an upstream in-frame stop codon located within the first exon, a disordered region, and a domain of unknown function. The mRNA is 1,333 nucleotides long and encodes for a 309 amino acid polypeptide. [4]
Transcript variant 2 differs in the 5' UTR, uses an alternate translation start site, and lacks two alternate exons in the 5' coding region compared to variant 1. The encoded protein isoform (2) is shorter and has a distinct N-terminus compared to protein isoform 1. The mRNA is 1,037 nucleotides long and encodes for a 183 amino acid polypeptide. [5]
C4orf47 mRNA is ubiquitously expressed in all tissue, with higher expression occurring within the choroid plexus, retina, fallopian tubes, and testis. [6]
UPF0602 protein isoform 1 has a molecular weight of 34.4kDa and a predicted isoelectric point of 9.64 pI. [7] It contains the domain of unknown function known as DUF4586. [8] this domain belongs to pfam15239 which is the only member of protein superfamily cl21099. [9]
This protein contains a higher than average quantity of basic amino acids relative to its size and contains two repeat sections. [7]
Location | Amino Acids |
---|---|
145 - 148 | PGKK |
235 - 238 | PGKK |
164 - 168 | SHSAD |
252 - 256 | SHSAD |
This protein contains no signal peptide and has been shown to localize within the cell to cytoplasmic microtubules, centrosomes, and non-motile cilia. [10] [3] [11] [12]
UPF0602 is ubiquitously expressed in all tissue, with higher expression occurring within the lungs, fallopian tubes, and testis. The lungs and fallopian tubes see the greatest protein abundance within ciliated cells. Specifically in the tip of cilia and the cilia axoneme. Within the testis, protein abundance is highest in elongated or late spermatid. [6] [10]
UPF0602 protein has no paralogs. However, homologs are found within most ciliated eukaryotes. Exceptions include all reptiles except turtles, salamanders, and lobe-finned fishes other than the West Indian Coelacanth. A UPF0602 protein homolog is also found within Chytridiomycetes, a class of fungi.
The following table represents a small selection of homologs found using BLAST. [13]
Genus and Species | Common Name | Taxonomic Group | Estimated Divergence (MYA) | Accession Number | Sequence Length (aa) | Sequence Identity (%) | Sequence Similarity (%) |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | Primates | 0 | NP_001107829.1 | 309 | 100 | 100 |
Gallus gallus | Domestic chicken | Aves | 312 | XP_004936032.2 | 311 | 67.2 | 78.8 |
Chrysemys picta bellii | Painted turtle | Reptilia | 312 | XP_005282053.1 | 311 | 66.6 | 79.7 |
Rhinatrema bivittatum | Two-lined caecilian | Amphibia | 351.8 | XP_029442782.1 | 308 | 61.1 | 74.3 |
Xenophus tropicalis | Western clawed frog | Amphibia | 351.8 | XP_002934310.1 | 307 | 58.3 | 75.1 |
Latimeria chalumnae | West Indian coelacanth | Coelacanthiformes | 413 | XP_014353738.1 | 311 | 55 | 70.4 |
Danio rerio | Zebrafish | Actinopterygii | 435 | NP_001038879.1 | 312 | 54.5 | 70.2 |
Rhinocodon typus | Whale shark | Chondrichthyes | 473 | XP_020367910.1 | 319 | 50.2 | 63.3 |
Lytechinus variegatus | Green sea urchin | Temnopleuroida | 684 | XP_041485424.1 | 316 | 52.8 | 67.1 |
Pomacea canaliculata | Channeled applesnail | Mollusca | 797 | XP_025090509.1 | 321 | 51.1 | 67 |
Amphibalanus amphitrite | Acorn barnacle | Arthropoda | 797 | KAF0292396.1 | 334 | 30.8 | 47.9 |
Powellomyces hirtus | Chytrids | Chytridiomycetes | 1017 | TPX58729.1 | 353 | 32.3 | 45.3 |
The c4orf47 gene has been evolving at a relatively slow rate when compared to the evolutionary rates of Fibrinogen Alpha and Cytochrome C. This suggests there is a conserved function for the encoded protein.
The function this protein carries out within the cell are not well understood by the scientific community, however evidence suggests it is related to cilia and flagella assembly. [10] [14]
High throughput evidence supports physical interaction between UPF0602 protein and nucleophosmin (NPM1), [15] as well as with ubiquitin-specific peptidase 9, Y-linked (USP9Y). [14]
Single nucleotide polymorphisms (SNPs) within regions of the UFSP2 gene overlapping c4orf47 have been linked to Beukes hip dysplasia, Spondyloepimetaphyseal dysplasia, Di Rocco type, microcephaly, and other developmental anomalies. [16] [17] [18]
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
Glutamate-rich protein 3, also known as Uncharacterized Protein C1orf173, is a protein encoded by the ERICH3 gene. ERICH3 was named “chromosome 1 open reading frame 173 (C1orf173)” based on its map location in the human genome. It was subsequently renamed “E-rich 3” as a result of the high content of glutamate (E) in its encoded amino acid sequence. Single-nucleotide polymorphisms (SNPs) in the ERICH3 gene has been identified as one of the "top" signals in a genome-wide association study (GWAS) for plasma serotonin concentrations which were themselves associated with selective serotonin reuptake inhibitor (SSRI) response in major depressive disorder (MDD) patients. The same ERICH3 SNP was later demonstrated that was significantly associated with SSRI treatment outcomes in three independent MDD trials, including STAR*D, ISPC and PReDICT. ERICH3 is most highly expressed in a variety of regions of the human brain, including the nucleus accumbens and frontal cortex based on the GTEx RNA-seq data. The single-cell RNA-seq data for human brain samples revealed that ERICH3 is predominantly expressed in neurons rather than other CNS cell types. ERICH3 was found interacts with proteins function in vesicle biogenesis and may play a significant role in vesicular function in serotonergic and other neuronal cell types, which might help explain its association with antidepressant treatment response. ERICH3 protein was also found abundant in blood platelets and cilia based on the proteomic studies. Its function in platelet was thought related to plasma serotonin storage because more than 99% of blood serotonin was stored in platelet and ERICH3 SNPs has been associated with plasma serotonin concentration in MDD patients. ERICH3 in primary cilia might regulates cilium formation and the localizations of ciliary transport.
C12orf40, also known as Chromosome 12 Open Reading Frame 40, HEL-206, and Epididymis Luminal Protein 206 is a protein that in humans is encoded by the C12orf40 gene.
Chromosome 6 open reading frame 201, C6orf201, is a protein that in humans is encoded by the C6orf201 gene. In humans this gene encodes for a nuclear protein that is primarily expressed in the testis.
KIAA1107 is a protein that in humans is encoded by the KIAA1107 gene. KIAA1107 is a Serine-rich protein, whose expression was found to increase in white matter of Multiple Sclerosis brain lesions.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
C5orf24 is a protein encoded by the C5orf24 gene (5q31.1) in humans. C5orf24 is primarily localized to the nucleus and is highly conserved with orthologs in mammals, birds, reptiles, amphibians, and fish.
C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.
Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.