UPF0602

Last updated

UPF0602 Protein C4orf47
C4orf47 AlphaFold Prediction.png
AlphaFold prediction of UPF0602 protein c4orf47's tertiary structure [1]
Identifiers
Symbolc4orf47
Alt. symbolsLOC441054
HGNC 34346
RefSeq NM_00107829
UniProt A7E2U8
Other data
Locus Chr. 4 q35.1
Search for
Structures Swiss-model
Domains InterPro

UPF0602 is a protein in humans that is encoded by the chromosome 4 open reading frame 47 (c4orf47) gene. [2]

Contents

Gene

The c4orf47 gene is positioned at 4q35.1 on the plus strand and spans 44,602 base pairs in length (185,405,227...185,449,828). The gene is made up of 12 exons and 11 introns. [3]

There is overlap with two other genes which reside on the negative strand. These genes are UFM1 specific peptidase 2 (UFSP2) and Coiled-coil domain containing 110 (CCDC110)

Another alias for the c4orf47 gene is LOC441054

Transcript

Transcript variant 1 is the longest experimentally validated variant of c4orf47 mRNA and it encodes for UPF0602 protein isoform 1. This variant contains a total of 8 exons with an upstream in-frame stop codon located within the first exon, a disordered region, and a domain of unknown function. The mRNA is 1,333 nucleotides long and encodes for a 309 amino acid polypeptide. [4]

Transcript variant 2 differs in the 5' UTR, uses an alternate translation start site, and lacks two alternate exons in the 5' coding region compared to variant 1. The encoded protein isoform (2) is shorter and has a distinct N-terminus compared to protein isoform 1. The mRNA is 1,037 nucleotides long and encodes for a 183 amino acid polypeptide. [5]

C4orf47 mRNA is ubiquitously expressed in all tissue, with higher expression occurring within the choroid plexus, retina, fallopian tubes, and testis. [6]

Protein

UPF0602 protein isoform 1 has a molecular weight of 34.4kDa and a predicted isoelectric point of 9.64 pI. [7] It contains the domain of unknown function known as DUF4586. [8] this domain belongs to pfam15239 which is the only member of protein superfamily cl21099. [9]

This protein contains a higher than average quantity of basic amino acids relative to its size and contains two repeat sections. [7]

LocationAmino Acids
145 - 148PGKK
235 - 238PGKK
164 - 168SHSAD
252 - 256SHSAD

Localization

This protein contains no signal peptide and has been shown to localize within the cell to cytoplasmic microtubules, centrosomes, and non-motile cilia. [10] [3] [11] [12]

Expression

UPF0602 is ubiquitously expressed in all tissue, with higher expression occurring within the lungs, fallopian tubes, and testis. The lungs and fallopian tubes see the greatest protein abundance within ciliated cells. Specifically in the tip of cilia and the cilia axoneme. Within the testis, protein abundance is highest in elongated or late spermatid. [6] [10]

Homology

UPF0602 protein has no paralogs. However, homologs are found within most ciliated eukaryotes. Exceptions include all reptiles except turtles, salamanders, and lobe-finned fishes other than the West Indian Coelacanth. A UPF0602 protein homolog is also found within Chytridiomycetes, a class of fungi.

The following table represents a small selection of homologs found using BLAST. [13]

Genus and SpeciesCommon NameTaxonomic GroupEstimated Divergence (MYA)Accession NumberSequence Length (aa)Sequence Identity (%)Sequence Similarity (%)
Homo sapiens HumanPrimates0NP_001107829.1309100100
Gallus gallus Domestic chickenAves312XP_004936032.231167.278.8
Chrysemys picta bellii Painted turtleReptilia312XP_005282053.131166.679.7
Rhinatrema bivittatum Two-lined caecilianAmphibia351.8XP_029442782.130861.174.3
Xenophus tropicalis Western clawed frogAmphibia351.8XP_002934310.130758.375.1
Latimeria chalumnae West Indian coelacanthCoelacanthiformes413XP_014353738.13115570.4
Danio rerio ZebrafishActinopterygii435NP_001038879.131254.570.2
Rhinocodon typus Whale sharkChondrichthyes473XP_020367910.131950.263.3
Lytechinus variegatus Green sea urchinTemnopleuroida684XP_041485424.131652.867.1
Pomacea canaliculata Channeled applesnailMollusca797XP_025090509.132151.167
Amphibalanus amphitrite Acorn barnacleArthropoda797KAF0292396.133430.847.9
Powellomyces hirtus ChytridsChytridiomycetes1017TPX58729.135332.345.3

Evolution

The c4orf47 gene has been evolving at a relatively slow rate when compared to the evolutionary rates of Fibrinogen Alpha and Cytochrome C. This suggests there is a conserved function for the encoded protein.

Graph showing UPF0602 protein c4orf47's evolutionary history C4orf47 Evolutionary History Graph.png
Graph showing UPF0602 protein c4orf47's evolutionary history

Function

The function this protein carries out within the cell are not well understood by the scientific community, however evidence suggests it is related to cilia and flagella assembly. [10] [14]

Interacting proteins

High throughput evidence supports physical interaction between UPF0602 protein and nucleophosmin (NPM1), [15] as well as with ubiquitin-specific peptidase 9, Y-linked (USP9Y). [14]

Clinical significance

Single nucleotide polymorphisms (SNPs) within regions of the UFSP2 gene overlapping c4orf47 have been linked to Beukes hip dysplasia, Spondyloepimetaphyseal dysplasia, Di Rocco type, microcephaly, and other developmental anomalies. [16] [17] [18]

Related Research Articles

<span class="mw-page-title-main">C20orf27</span> Protein-coding gene in the species Homo sapiens

UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.

TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.

<span class="mw-page-title-main">Glutamate-rich protein 3</span> Protein-coding gene in the species Homo sapiens

Glutamate-rich protein 3, also known as Uncharacterized Protein C1orf173, is a protein encoded by the ERICH3 gene. ERICH3 was named “chromosome 1 open reading frame 173 (C1orf173)” based on its map location in the human genome. It was subsequently renamed “E-rich 3” as a result of the high content of glutamate (E) in its encoded amino acid sequence. Single-nucleotide polymorphisms (SNPs) in the ERICH3 gene has been identified as one of the "top" signals in a genome-wide association study (GWAS) for plasma serotonin concentrations which were themselves associated with selective serotonin reuptake inhibitor (SSRI) response in major depressive disorder (MDD) patients. The same ERICH3 SNP was later demonstrated that was significantly associated with SSRI treatment outcomes in three independent MDD trials, including STAR*D, ISPC and PReDICT. ERICH3 is most highly expressed in a variety of regions of the human brain, including the nucleus accumbens and frontal cortex based on the GTEx RNA-seq data. The single-cell RNA-seq data for human brain samples revealed that ERICH3 is predominantly expressed in neurons rather than other CNS cell types. ERICH3 was found interacts with proteins function in vesicle biogenesis and may play a significant role in vesicular function in serotonergic and other neuronal cell types, which might help explain its association with antidepressant treatment response. ERICH3 protein was also found abundant in blood platelets and cilia based on the proteomic studies. Its function in platelet was thought related to plasma serotonin storage because more than 99% of blood serotonin was stored in platelet and ERICH3 SNPs has been associated with plasma serotonin concentration in MDD patients. ERICH3 in primary cilia might regulates cilium formation and the localizations of ciliary transport.

<span class="mw-page-title-main">C12orf40</span> Protein-coding gene in humans

C12orf40, also known as Chromosome 12 Open Reading Frame 40, HEL-206, and Epididymis Luminal Protein 206 is a protein that in humans is encoded by the C12orf40 gene.

<span class="mw-page-title-main">C6orf201</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 201, C6orf201, is a protein that in humans is encoded by the C6orf201 gene. In humans this gene encodes for a nuclear protein that is primarily expressed in the testis.

KIAA1107 is a protein that in humans is encoded by the KIAA1107 gene. KIAA1107 is a Serine-rich protein, whose expression was found to increase in white matter of Multiple Sclerosis brain lesions.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C8orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

<span class="mw-page-title-main">C17orf78</span> Mammalian protein found in Homo sapiens

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">TMEM101</span>

Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.

<span class="mw-page-title-main">C5orf24</span> Protein-coding gene in the species Homo sapiens

C5orf24 is a protein encoded by the C5orf24 gene (5q31.1) in humans. C5orf24 is primarily localized to the nucleus and is highly conserved with orthologs in mammals, birds, reptiles, amphibians, and fish.

<span class="mw-page-title-main">C1orf159</span> Protein encoded on a gene

C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.

<span class="mw-page-title-main">TEKTIP1</span> Gene

TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">Chromosome 5 open reading frame 47</span> Human C5ORF47 Gene

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

References

  1. "AlphaFold Protein Structure Database". alphafold.ebi.ac.uk.
  2. "UPF0602 protein C4orf47 isoform 1 [Homo sapiens] - Protein". National Center for Biotechnology Information . Retrieved 4 October 2021.
  3. 1 2 "C4orf47 chromosome 4 open reading frame 47 [Homo sapiens (human)] - Gene". National Center for Biotechnology Information .
  4. "Homo sapiens chromosome 4 open reading frame 47 (C4orf47), transcript variant 1, mRNA". National Center for Biotechnology Information . 2 July 2021.
  5. "Homo sapiens chromosome 4 open reading frame 47 (C4orf47), transcript variant 2, mRNA". National Center for Biotechnology Information . 18 December 2020.
  6. 1 2 "Tissue expression of C4orf47 - Summary". Human Protein Atlas . Retrieved 18 December 2021.
  7. 1 2 "SAPS < Sequence Statistics". European Bioinformatics Institute . Retrieved 18 December 2021.
  8. "UPF0602 protein C4orf47 isoform 1 [Homo sapiens] - Protein". National Center for Biotechnology Information .
  9. "CDD Conserved Protein Domain Family: DUF4586". National Center for Biotechnology Information .
  10. 1 2 3 Urizar-Arenaza I, Osinalde N, Akimov V, Puglia M, Muñoa-Hoyos I, Gómez-Giménez B, et al. (September 2020). "Kappa- opioid receptor regulates human sperm functions via SPANX-A/D protein family". Reproductive Biology. 20 (3): 300–306. doi:10.1016/j.repbio.2020.07.003. PMID   32684427. S2CID   220652968.
  11. Firat-Karalar EN, Sante J, Elliott S, Stearns T (October 2014). "Proteomic analysis of mammalian sperm cells identifies new components of the centrosome". Journal of Cell Science. 127 (Pt 19): 4128–4133. doi:10.1242/jcs.157008. PMC   4179487 . PMID   25074808.
  12. Sigg MA, Menchen T, Lee C, Johnson J, Jungnickel MK, Choksi SP, et al. (December 2017). "Evolutionary Proteomics Uncovers Ancient Associations of Cilia with Signaling Pathways". Developmental Cell. 43 (6): 744–762.e11. doi:10.1016/j.devcel.2017.11.014. PMC   5752135 . PMID   29257953.
  13. "BLAST: Basic Local Alignment Search Tool". National Center for Biotechnology Information . Retrieved 15 December 2021.
  14. 1 2 Huttlin EL, Bruckner RJ, Navarrete-Perea J, Cannon JR, Baltier K, Gebreab F, et al. (May 2021). "Dual proteome-scale networks reveal cell-specific remodeling of the human interactome". Cell. 184 (11): 3022–3040.e28. doi:10.1016/j.cell.2021.04.011. PMC   8165030 . PMID   33961781.
  15. Fasci D, van Ingen H, Scheltema RA, Heck AJ (October 2018). "Histone Interaction Landscapes Visualized by Crosslinking Mass Spectrometry in Intact Cell Nuclei". Molecular & Cellular Proteomics. 17 (10): 2018–2033. doi: 10.1074/mcp.RA118.000924 . PMC   6166682 . PMID   30021884.
  16. Watson CM, Crinnion LA, Gleghorn L, Newman WG, Ramesar R, Beighton P, Wallis GA (September 2015). "Identification of a mutation in the ubiquitin-fold modifier 1-specific peptidase 2 gene, UFSP2, in an extended South African family with Beukes hip dysplasia". South African Medical Journal = Suid-Afrikaanse Tydskrif vir Geneeskunde. 105 (7): 558–563. doi: 10.7196/SAMJnew.7917 . PMID   26428751.
  17. Di Rocco M, Rusmini M, Caroli F, Madeo A, Bertamino M, Marre-Brunenghi G, Ceccherini I (March 2018). "Novel spondyloepimetaphyseal dysplasia due to UFSP2 gene mutation". Clinical Genetics. 93 (3): 671–674. doi: 10.1111/cge.13134 . PMID   28892125. S2CID   3587666.
  18. Ni M, Afroze B, Xing C, Pan C, Shao Y, Cai L, et al. (May 2021). "A pathogenic UFSP2 variant in an autosomal recessive form of pediatric neurodevelopmental anomalies and epilepsy". Genetics in Medicine. 23 (5): 900–908. doi:10.1038/s41436-020-01071-z. PMC   8105169 . PMID   33473208.