C5orf22

Last updated
C5orf22
Identifiers
Aliases C5orf22 , chromosome 5 open reading frame 22
External IDs MGI: 1925127 HomoloGene: 10149 GeneCards: C5orf22
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_018356

NM_001166360
NM_029998
NM_001357761

RefSeq (protein)

NP_060826

NP_001159832
NP_084274
NP_001344690

Location (UCSC) Chr 5: 31.53 – 31.56 Mb Chr 15: 12.81 – 12.82 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens . [5] The primary alias is unknown protein family 0489 (UPF0489). [5]

Contents

Gene

C5orf22 is located on the positive strand of Chromosome 5 at 5P13.3, spanning 22,779 nucleotides, from base pair 31532275 to 31555053. [6] C5orf22 encodes 9 total exons and contains 7 isoforms. [5] Isoform variants differ in their exon configuration and untranslated region. Transcript variant 1 is the canonical isoform, encoding 442 amino acids across 9 exons. [7]

Annotated human chromosome 5. Retrieved from NCBI Gene. Human Chromosome 5.jpg
Annotated human chromosome 5. Retrieved from NCBI Gene.
C5orf22 gene diagram. Human C5orf22 is located on chromosome 5 (5p13.3) at base pair 31,532,275 to 31,555,053. Transcript variant 1 (depicted above) encodes 9 exons.1 Promoter prediction is from Genomatix. The GXP# for the promoter is GXP_55076. Pro1, is the assigned promoter for all transcript variants. This promoter lies directly upstream from the 5' UTR and spans 1,081 base pairs. Promoter is labeled in green. Exons (Ex) are denoted in dark blue. Illustration was created using Domain Illustrator.21 C5orf22.png
C5orf22 gene diagram. Human C5orf22 is located on chromosome 5 (5p13.3) at base pair 31,532,275 to 31,555,053. Transcript variant 1 (depicted above) encodes 9 exons.1 Promoter prediction is from Genomatix. The GXP# for the promoter is GXP_55076. Pro1, is the assigned promoter for all transcript variants. This promoter lies directly upstream from the 5’ UTR and spans 1,081 base pairs. Promoter is labeled in green. Exons (Ex) are denoted in dark blue. Illustration was created using Domain Illustrator.21  

Expression and regulation

C5orf22 displays ubiquitous RNA expression across tissue types from all 3 germ layers and from all phases of development in humans, mice, chickens, and zebrafish. [5] There are statistically significant differences in RNA expression between select tissues, with skeletal muscle containing the greatest abundance (7.8 RPKM) [5] [9]

C5orf22 contains 1 predicted promoter directly upstream of the gene (GXP_55076). [8] This promoter is 1,081 base pairs and partially overlaps with the 5’ untranslated region. [8] GXP_55076 is assigned to all transcript variants. [8] Transcription factor binding elements consist of TATA box binding elements, SMAD transcription factors, MAF/AP1 binding factors, and several others. [8]

Neighboring elements

C5orf22 closest neighboring element is Drosha, a ribonuclease which is encoded by the minus strand proximal to C5orf22. [5] [10] Drosha is a double stranded endoribonuclease that assists with the first step of microRNA biogenesis. [11]

Structure

C5orf22 contains 2 globular domains and 3 small disordered regions. [12] The molecular-weight is approximately 50 kDa. [13] The isoelectric point is 4.7. [13] C5orf22 contains relatively average amino acid proportions compared to most proteins. [14] There were no significant outliers in abundance of individual amino acids. C5orf22 contains several predicted post-translational modifications including phosphorylation sites, ubiquitination sites, glycosylation sites, SH2 domain, and a myristylation site. [12]

C5orf22 protein structure contains 2 globular domains and 3 disordered regions. Human C5orf22 protein.png
C5orf22 protein structure contains 2 globular domains and 3 disordered regions.

Subcellular distribution

C5orf22 is most likely to exist as a soluble protein located within the cytoplasm and nucleus. [15] Amino acid sequence predictions and immunohistochemical staining support the localization of C5orf22 to cytoplasm and nucleus. [9] [16] Furthermore, amino acid sequence analysis indicated a predicted partial nuclear localization signal (NLS) from AA 175-185. [17]

Function

The precise function of C5orf22 is still unknown however it is hypothesized to be a component of a DNA splicing complex. [18] Proteomic research implicated the protein product as a novel component of the WBP11/PQBP1 splicing complex which regulates expression of genes involved in a spectrum of processes ranging from DNA repair to immunomodulation. [18] C5orf22 knockdown was associated with downregulation of alternative splicing events that led to aberrant gene expression of select genes and ultimately cell cycle dysfunction. [18] Cell localization evidence and the presence of a NLS further support this hypothesized function.

Interacting proteins

Experimental evidence has indicated over 20 interactors with C5orf22. [19] [20] [21] Interactants are localized to both the nucleus and cytoplasm. [22] The most likely interactors are WBP11, OSM, Surf2, ELOF1, and DDITL4. [20]

Evolution & homology

C5orf22 initially appeared in invertebrates approximately 797 million years ago. [23] It is the only member of its gene family. Human UPF0489 C5orf22 is conserved through invertebrates. [23] C5orf22 orthologs showed conservation of the two globular domains through bony fish and conservation of 1 globular domain within arthropods. [12] Isoelectric point and molecular weights of C5orf22 orthologs were within ∓ 0.15 and ∓ 3kDa through bony fish. [12] There are no paralogs to c5orf22 in humans. [23]

UPF0489 C5orf22 is slow evolving protein, based on comparisons of the percent corrected divergence of orthologous proteins. [24]

Table 1: C5orf22 orthologs [24]
Taxonomic Class Common NameGenus speciesDate of Divergence

Millions of Years Ago (MYA)

Sequence

Identity (%)

Sequence

Similarity (%)

Sequence

Length (AA)

Query Coverage

(%)

Accession Number
Mammal Human Homo sapiensN/A100100442100 NP_060826.2
Mouse Mus musculus907886442100 NP_084274.1
Whale Balaenoptera musculus968994467100XP_036705025.1
Aves Chicken Gallus gallus312687944698 XP_418996.3
Reptile Tiger rattlesnake Crotalus tigris312657547698XP_039212189.1
Amphibian African clawed frog Xenopus laevis352677845995XP_018121838.1
Fish Zebrafish Danio rerio435577143995 NP_956625.1
Sea lamprey Petromyzon marinus615516958989XP_032827184.1
Invertebrate Fruit fly Drosophila suzukii 797335048195XP_036671373.1
Conceptual translation of human C5orf22 isoform X1. C5orf22 isoform 1 nucleotide sequence overlying protein translation. Features and sequences are indicated in respective colors. Figure legend is listed here: Start: First ATG encoding methionine. Disordered: Disordered region. GlobD: globular domain.  Ex*|ex*: border of two exons. M-alt term: Alternate methionine N-terminus. Phos site: Phosphorylation site. Ubq site: Ubiquitination site. Sumo site: Sumoylation site. SNP: single nucleotide polymorphism. Myrstyl: myristoylation site. PDphos: Proline dependent phosphorylation site. MAPK: MAPK domain. A-hlx: alpha-helix. B-sheet:Beta-pleated sheet. NLS:Nuclear localization signal.Stop: Stop codon. miRNA site: miRNA site with target score of 98%, indicated by miRDB. PolyA signal: Polyadenylation regulatory signal. Wiki conceptual translation.pdf
Conceptual translation of human C5orf22 isoform X1. C5orf22 isoform 1 nucleotide sequence overlying protein translation. Features and sequences are indicated in respective colors. Figure legend is listed here: Start: First ATG encoding methionine. Disordered: Disordered region. GlobD: globular domain.  Ex*|ex*: border of two exons. M-alt term: Alternate methionine N-terminus. Phos site: Phosphorylation site. Ubq site: Ubiquitination site. Sumo site: Sumoylation site. SNP: single nucleotide polymorphism. Myrstyl: myristoylation site. PDphos: Proline dependent phosphorylation site. MAPK: MAPK domain. A-hlx: alpha-helix. B-sheet:Beta-pleated sheet. NLS:Nuclear localization signal.Stop: Stop codon. miRNA site: miRNA site with target score of 98%, indicated by miRDB. PolyA signal: Polyadenylation regulatory signal.

Clinical significance

Recent studies on miRNA's role in breast cancer pathogenesis has correlated upregulation of C5orf22 with reduced survival of breast cancer patients. [26]

Patient's with tibial muscular dystrophy, exhibit decreased expression of C5orf22. [27] Patient's with non-ischemic cardiomyopathy exhibit increased expression of C5orf22.

Related Research Articles

<span class="mw-page-title-main">Transmembrane protein 151b</span> Transmembrane protein

Transmembrane protein 151B is a protein that in humans is encoded by the TMEM151B gene.

<span class="mw-page-title-main">Zinc finger protein 684</span> Protein found in humans

Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">C12orf60</span> Protein-coding gene in humans

Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">SMIM15</span> Mammalian protein found in Homo sapiens

SMIM15(small integral membrane protein 15) is a protein in humans that is encoded by the SMIM15 gene. It is a transmembrane protein that interacts with PBX4. Deletions where SMIM15 is located have produced mental defects and physical deformities. The gene has been found to have ubiquitous but variable expression in many tissues throughout the body.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">C11orf98</span> Protein-coding gene in the species Homo sapiens

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

<span class="mw-page-title-main">C3orf38</span> Uncharacterized gene

Chromosome 3 open reading frame 38 (C3orf38) is a protein which in humans is encoded by the C3orf38 gene.

<span class="mw-page-title-main">C4orf19</span> Human C4orf19 gene

C4orf19 is a protein which in humans is encoded by the C4orf19 gene.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">Chromosome 5 open reading frame 47</span> Human C5ORF47 Gene

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

<span class="mw-page-title-main">ZFP62</span> Gene in Humans

Zinc Finger Protein 62, also known as "ZNF62," "ZNF755," or "ZET," is a protein that in humans is encoded by the ZFP62 gene. ZFP62 is part of the C2H2 Zinc Finger family of genes.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000082213 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000022195 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 6 7 "C5orf22 chromosome 5 open reading frame 22 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-12-18.
  6. "Human C5orf22". www.genecards.org. Archived from the original on 2011-11-26. Retrieved 2021-09-20.
  7. "Transcript: ENST00000325366.14 (C5orf22-201) - Summary - Homo_sapiens - Ensembl genome browser 105". useast.ensembl.org. Retrieved 2021-12-18.
  8. 1 2 3 4 5 "Genomatix Annotation (ElDorado)". Genomatix. Archived from the original on 2012-01-14.
  9. 1 2 "Tissue expression of C5orf22 - Summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2021-12-18.
  10. "DROSHA drosha ribonuclease III [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-12-18.
  11. "DROSHA - Ribonuclease 3 - Homo sapiens (Human) - DROSHA gene & protein". www.uniprot.org. Retrieved 2021-12-18.
  12. 1 2 3 4 "ELM - Search the ELM resource". elm.eu.org. Retrieved 2021-12-18.
  13. 1 2 "C5orf22 - UPF0489 protein C5orf22 - Homo sapiens (Human) - C5orf22 gene & protein". www.uniprot.org. Retrieved 2021-12-18.
  14. "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2021-12-18.
  15. "PSORT II Prediction". psort.hgc.jp. Retrieved 2021-12-18.
  16. "DeepLoc1.0 C5orf22". DTU Health Services. Archived from the original on 2020-08-15.
  17. "NLS Mapper". nls-mapper.iab.keio.ac.jp. Archived from the original on 2021-11-22. Retrieved 2021-12-18.
  18. 1 2 3 Zi Z, Zhang Y, Zhang P, Ding Q, Chu M, Chen Y, et al. (January 2020). "A Proteomic Connectivity Map for Characterizing the Tumor Adaptive Response to Small Molecule Chemical Perturbagens". ACS Chemical Biology. 15 (1): 140–150. doi:10.1021/acschembio.9b00694. PMC   7268550 . PMID   31846293.
  19. "IntAct Portal". www.ebi.ac.uk. Retrieved 2021-12-18.
  20. 1 2 "C5orf22 Result Summary | BioGRID". thebiogrid.org. Retrieved 2021-12-18.
  21. "Results - mentha: the interactome browser". www.mentha.uniroma2.it. Retrieved 2021-12-18.
  22. "Motif Scan". myhits.sib.swiss. Retrieved 2021-12-18.
  23. 1 2 3 "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2021-12-18.
  24. 1 2 "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2021-12-18.
  25. "miRDB - MicroRNA Target Prediction Database". www.mirdb.org. Retrieved 2021-12-18.
  26. Shinden Y, Hirashima T, Nohata N, Toda H, Okada R, Asai S, et al. (May 2021). "Molecular pathogenesis of breast cancer: impact of miR-99a-5p and miR-99a-3p regulation on oncogenic genes". Journal of Human Genetics. 66 (5): 519–534. doi:10.1038/s10038-020-00865-y. PMID   33177704. S2CID   226312590.
  27. "GDS4843 / 1552660_a_at". www.ncbi.nlm.nih.gov. Retrieved 2021-12-18.
UPF0489 C5orf22 rate of evolution. Estimated time of divergence from human C5orf22 (millions of years ago; MYA) versus % corrected divergence of orthologous protein (m; total # of AA changes/100 residues). Slopes for fibrinogen alpha, C5orf22, and cytochrome C are 0.24, 0.09, and 0.03, respectively. Orthologs are monkey (Callithrix jacchus), mouse (Mus musculus), bird (Merops nubicus), frog (Xenopus laevis), and fish (Danio rerio). Data points for C5orf22 are displayed in blue. Data points for cytochrome C are shown in red. Data points for fibrinogen alpha are indicated in yellow. All data was collected from NCBI BLASTP. C5orf22 Protein Rate of Evolution.png
UPF0489 C5orf22 rate of evolution. Estimated time of divergence from human C5orf22 (millions of years ago; MYA) versus % corrected divergence of orthologous protein (m; total # of AA changes/100 residues). Slopes for fibrinogen alpha, C5orf22, and cytochrome C are 0.24, 0.09, and 0.03, respectively. Orthologs are monkey (Callithrix jacchus), mouse (Mus musculus), bird (Merops nubicus), frog (Xenopus laevis), and fish (Danio rerio). Data points for C5orf22 are displayed in blue. Data points for cytochrome C are shown in red. Data points for fibrinogen alpha are indicated in yellow. All data was collected from NCBI BLASTP.