Chromosome 5 open reading frame 47

Last updated
C5orf47
Identifiers
Aliases C5orf47 , chromosome 5 open reading frame 47
External IDs MGI: 1914842 HomoloGene: 12161 GeneCards: C5orf47
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001144954

NM_026262

RefSeq (protein)

NP_001138426

NP_080538

Location (UCSC) Chr 5: 173.97 – 174.01 Mb Chr 11: 31.92 – 31.93 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. [5] It also goes by the alias LOC133491. [6] The human C5ORF47 gene is primarily expressed in the testis. [5]

Contents

Gene

C5ORF47 is located at 5q35.2. [5] The full gene spans 16,911 nucleotides, and the mRNA transcript, made up of 5 exons, spans 2511 nucleotides. [7]

Gene expression

Human C5ORF47 is primarily expressed in the testis, as well as expressed in low levels in many tissues - stomach, lung, kidney, intestine, heart, and adrenal - in varying levels and times throughout fetal development. [5]

Transcript

The mRNA sequence of C5ORF47 is 2511 nucleotides long and consists of 5 exons and 6 introns [8]

Bird's eye view of human C5ORF47 promoter and gene. The promoter, start of transcription, and exons 1-5 are labeled. C5ORF47 Promoter and Exons Diagram.png
Bird's eye view of human C5ORF47 promoter and gene. The promoter, start of transcription, and exons 1-5 are labeled.

The human C5ORF47 gene has two known isoforms, the first (XP_016864517.1) [9] encodes a protein that is 176 amino acids in length, and the second (XP_011532733.1) [10] encodes a protein that is 150 amino acids in length. This article will primarily focus on the first, more common isoform.

Protein

The molecular weight of the unmodified precursor human C5ORF47 protein is 19.2 kDal, and the isoelectric point is predicted to be 10.49. [11]

The protein is basic and appears to have a relatively high concentration of positively charged amino acids, lysine and arginine, in comparison to negatively charged amino acids, aspartic acid and glutamic acid. The repetitive amino acid structure, “SQLR”, can be found in two locations in the protein. [12]

Domains

The human C5ORF47 protein contains DUF4680, a Domain of Unknown Function, that is characterized by two conserved amino acid sequence motifs: VISRM and ENE. [13]

Structure

Predicted tertiary structure of C5ORF47 protein. The 10 most conserved amino acids are labeled in yellow. C5ORF47 Tertiary Structure.png
Predicted tertiary structure of C5ORF47 protein. The 10 most conserved amino acids are labeled in yellow.

Within the predicted tertiary structure of C5ORF47, the most conserved amino acids fall within the Domain of Unknown Function: DUF4680.

Cellular Localization

The human C5ORF47 protein is predicted to be localized in the nucleus. [15] [16]

A stretch of five positively charged lysines, indicating a nuclear localization sequence, can be found at positions 133-137 of the amino acid sequence. [15]

Immunohistochemical staining of human testis from Sigma Aldrich shows moderate nuclear positivity in cells in seminiferous ducts of the testes. [17]

Post-Translational Modifications

Human C5ORF47 protein diagram depicting DUF4680 highlighted in blue, nuclear localization sequence highlighted in pink, conserved phosphorylation sites denoted by yellow circles, and conserved O-linked glycosylation sites denoted by green squares. C5ORF47 Protein Schematic.png
Human C5ORF47 protein diagram depicting DUF4680 highlighted in blue, nuclear localization sequence highlighted in pink, conserved phosphorylation sites denoted by yellow circles, and conserved O-linked glycosylation sites denoted by green squares.

Predicted phosphorylation sites T19, S23, S96, S129, S149, Y158, S168, S169, and O-linked glycosylation sites S96, S105, and S168 are conserved among most mammalian orthologs. [18] [19]

Homology

Conceptual translation of C5ORF47 coding sequence.The exon boundaries are defined in blue. A domain of unknown function is identified with brackets. A disordered region is shown in gray. Poly A signals poly A sites are shown in yellow.  Ten most conserved amino acids are bolded. Eight most conserved phosphorylation sites are highlighted in gold. Repeat sequences are highlighted in green. FxxP domain is highlighted in purple. The Nuclear Localization Sequence is highlighted in pink. The underlined segments refer to the two conserved amino acid sequence motifs that characterize DUF4680. Human C5ORF47 Conceptual Translation.pdf
Conceptual translation of C5ORF47 coding sequence.The exon boundaries are defined in blue. A domain of unknown function is identified with brackets. A disordered region is shown in gray. Poly A signals poly A sites are shown in yellow. 

Orthologs of the human C5ORF47 gene can be found in mammals, birds, and reptiles, but not in amphibians, fish, or invertebrates. No paralogs of the human C5ORF47 gene are known. [22] [23] [24]

Taxonomic OrderGenus and SpeciesCommon NameMedian Date of Divergence (MYA)Sequence LengthSequence Identity to Human Protein (%)Sequence Similarity to Human Protein (%)
Mammalia Primates Homo sapiens Human0176100100
Primates Gorilla gorilla Gorilla8.617697.798
Rodentia Mus musculus House Mouse871655063.1
Chiroptera Pteropus giganteus Indian flying fox941765864.9
Carnivora Neomonachus schauinslandi Hawaiian monk seal9417856.763.9
Aves Passeriforme Onychostruthus taczanowskii White-rumped snowfinch31917333.242.8
Casuariiforme Dromaius novaehollandiae Emu3191823244.5
Passeriformes Taeniopygia guttata Zebra Finch31917730.839.3
Apterygiforme Apteryx rowi Okarito Kiwi31918030.541
Caprimulgiformes Antrostomus carolinensis Chuck-will's-widow31922926.837
Apodiformes Calypte anna Anna's humming bird3192282435.6
Galliformes Phasianus colchicus Ring-necked Pheasant31926320.530.9
Reptilia Squamata Zootoca vivipara Viparous Lizard3192053042.3
Squamata Podarcis muralis Common wall lizard31924328.939.9
Testudines Dermochelys coriacea Leatherback sea turtle31924227.641.3
Testudines Gopherus flavomarginatus Bolson Tortoise31923526.638.1
Testudines Chelonoidis abingdonii Pinta Island tortoise31923426.635.5
Squamata Sceloporus undulatus Eastern fence lizard31924124.734.9
Crocodilia Alligator mississippiensis American Alligator31930722.832.2

Interacting Proteins

Proteins that are predicted to interact with the human C5ORF47 protein tend have characteristics such as testes-specific, pertaining to sperm or spermatogenesis, or related to cilia/flagella formation. [25] [26]

Interacting ProteinFull NameCellular CompartmentFunction
CCDC185Coiled-coil domain-containing protein 185Cellular localization unknownHas a role in ciliogenesis (by similarity). Required for proper cephalic and left/right axis development [27]
C10orf120Uncharacterized protein C10orf120Cellular localization unknownDiseases associated with C10orf120 include Vas Deferens, Congenital Bilateral Aplasia which occurs in males when the tubes that carry sperm out of the testes (the vas deferens) fail to develop properly. [28] [29]
C4orf22Uncharacterized protein C4orf22Predicted to be located in cytoplasm. [30] Cilia and flagella associated protein. [30]
TGIF2LXTgfb induced factor homeobox 2 like, x-linked; Homeobox protein TGIF2LXPredicted to be located in the nucleus [31] May have a transcription role in testis. Testis-specific expression suggests that this gene may play a role in spermatogenesis. [31]
ZPLD1Zona pellucida-like domain-containing protein 1
  • Predicted to be an extracellular matrix structural constituent.
  • Predicted to be located in cytoplasmic vesicle membrane.
  • Predicted to be integral component of membrane.
  • Predicted to be active in cell surface and extracellular space [32]
Glycoprotein which is a component of the gelatinous extracellular matrix in the cupulae of the vestibular organ [32]
SPERTSpermatid-associated proteinPredicted to be located in cytoplasmic vesicle. [33] Enables identical protein binding activity. [33]
ZNF606Zinc finger protein 606Predicted to be located in the nucleus [34] Nuclear protein that can act as a transcriptional repressor of growth factor-mediated signaling pathways. Reduced expression of this gene promotes chondrocyte differentiation [34]
C3orf20Uncharacterized protein C3orf20Predicted to be located in cytoplasm. [35] Unknown function
C14orf119Uncharacterized protein C14orf119Located in cytosol and mitochondria. [36] Unknown function

Clinical Significance

In a study conducted to identify rare genetic variants contributing to Neuromyelitis optica in Finland, Four missense variants were shared by two patients in C3ORF20, PDZD2, C5ORF47 and ZNF606. [37]

Microarray data shows that human C5ORF47 expression is low in an individual with teratozoospermia, which is characterized by the presence of spermatozoa with abnormal morphology over 85% in sperm. [38] [39]

Microarray data shows that human C5ORF47 expression is lower in p63 depleted cells. [40] The p63 protein functions as a transcription factor that helps regulate numerous cell activities, including cell proliferation, cell maintenance, differentiation, cell adhesion, and apoptosis. The p63 protein also plays a critical role in the formation of ectodermal structures in early development. Studies suggest that it also plays essential roles in the development of the limbs, facial features, urinary system, and other organs and tissues. [41]

Related Research Articles

<span class="mw-page-title-main">FAM63A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 63, member A is a protein that, in humans, is encoded by the FAM63A gene. It is located on the minus strand of chromosome 1 at locus 1q21.3.

TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.

Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.

<span class="mw-page-title-main">CRACD-like protein</span>

CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

<span class="mw-page-title-main">Transmembrane protein 44</span>

Transmembrane protein 44 is a protein that in humans is encoded by the TMEM44 gene.

<span class="mw-page-title-main">C7orf26</span> Human protein-encoding gene on chromosome 7

c7orf26 is a gene in humans that encodes a protein known as c7orf26. Based on properties of c7orf26 and its conservation over a long period of time, its suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">C6orf136</span> Protein-coding gene in the species Homo sapiens

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">THAP3</span> Protein in Humans

THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000185056 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000020299 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 "C5orf47 chromosome 5 open reading frame 47 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2022-12-15.
  6. "AceView: Gene:C5orf47, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2022-10-03.
  7. 1 2 "User Sequence vs Genomic". genome.ucsc.edu. Retrieved 2022-12-15.
  8. "Homo sapiens chromosome 5 open reading frame 47 (C5orf47), mRNA". U.S. National Library of Medicine. 2022-08-13.
  9. "uncharacterized protein C5orf47 isoform X1 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2022-12-15.
  10. "uncharacterized protein C5orf47 isoform X2 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2022-12-15.
  11. "Expasy - Compute pI/Mw tool". web.expasy.org. Retrieved 2022-12-15.
  12. 1 2 "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2022-12-15.
  13. "MOTIF: Searching Protein Sequence Motifs". www.genome.jp. Retrieved 2022-12-15.
  14. "AlphaFold Protein Structure Database". alphafold.ebi.ac.uk. Retrieved 2022-12-15.
  15. 1 2 3 "PSORT II Prediction". psort.hgc.jp. Retrieved 2022-12-15.
  16. "DeepLoc2.0". www.healthtech.dtu.dk. 2022-12-15. Retrieved 2022-12-15.
  17. SigmaAldrich (2022-12-15). "Anti-C5orf47 antibody produced in rabbit".
  18. 1 2 "C5orf47 (human)". www.phosphosite.org. Retrieved 2022-12-15.
  19. 1 2 "Services". www.healthtech.dtu.dk. Retrieved 2022-12-15.
  20. "IBS - Online". ibs.biocuckoo.org. Retrieved 2022-12-15.
  21. "Six-Frame Translation". www.bioline.com. Retrieved 2022-12-16.
  22. "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2022-12-15.
  23. "TimeTree :: The Timescale of Life". timetree.org. Retrieved 2022-12-15.
  24. "EMBOSS Needle". 2022-12-15.
  25. "C5orf47 protein (human) - STRING interaction network". string-db.org. Retrieved 2022-12-16.
  26. Abe T, Lee A, Sitharam R, Kesner J, Rabadan R, Shapira SD (April 2017). "Germ-Cell-Specific Inflammasome Component NLRP14 Negatively Regulates Cytosolic Nucleic Acid Sensing to Promote Fertilization". Immunity. 46 (4): 621–634. doi:10.1016/j.immuni.2017.03.020. PMC   5674777 . PMID   28423339.
  27. "UniProt". www.uniprot.org. Retrieved 2022-12-16.
  28. "C10orf120 Gene - GeneCards | CJ120 Protein | CJ120 Antibody". www.genecards.org. Retrieved 2022-12-16.
  29. "Congenital bilateral absence of the vas deferens - About the Disease - Genetic and Rare Diseases Information Center". rarediseases.info.nih.gov. Retrieved 2022-12-16.
  30. 1 2 "CFAP299 Gene - GeneCards | CF299 Protein | CF299 Antibody". www.genecards.org. Retrieved 2022-12-16.
  31. 1 2 "TGIF2LX Gene - GeneCards | TF2LX Protein | TF2LX Antibody". www.genecards.org. Retrieved 2022-12-16.
  32. 1 2 "ZPLD1 Gene - GeneCards | ZPLD1 Protein | ZPLD1 Antibody". www.genecards.org. Retrieved 2022-12-16.
  33. 1 2 "CBY2 Gene - GeneCards | CBY2 Protein | CBY2 Antibody". www.genecards.org. Retrieved 2022-12-16.
  34. 1 2 "ZNF606 Gene - GeneCards | ZN606 Protein | ZN606 Antibody". www.genecards.org. Retrieved 2022-12-16.
  35. "C3orf20 Gene - GeneCards | CC020 Protein | CC020 Antibody". www.genecards.org. Retrieved 2022-12-16.
  36. "C14orf119 Gene - GeneCards | CN119 Protein | CN119 Antibody". www.genecards.org. Retrieved 2022-12-16.
  37. Siuko M, Valori M, Kivelä T, Setälä K, Morin A, Kwan T, et al. (December 2015). "Exome and regulatory element sequencing of neuromyelitis optica patients". Journal of Neuroimmunology. 289: 139–142. doi:10.1016/j.jneuroim.2015.11.002. PMID   26616883. S2CID   24479999.
  38. "GDS2697 / 1557057_a_at". www.ncbi.nlm.nih.gov. Retrieved 2022-12-16.
  39. De Braekeleer M, Nguyen MH, Morel F, Perrin A (April 2015). "Genetic aspects of monomorphic teratozoospermia: a review". Journal of Assisted Reproduction and Genetics. 32 (4): 615–623. doi:10.1007/s10815-015-0433-2. PMC   4380889 . PMID   25711835.
  40. "GDS2534 / 1557056_at". www.ncbi.nlm.nih.gov. Retrieved 2022-12-16.
  41. "TP63 gene: MedlinePlus Genetics". medlineplus.gov. Retrieved 2022-12-16.