C5orf52

Last updated
C5orf52
Identifiers
Aliases C5orf52 , chromosome 5 open reading frame 52
External IDs MGI: 1914680; HomoloGene: 12129; GeneCards: C5orf52; OMA:C5orf52 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001145132

NM_026150

RefSeq (protein)

NP_001138604

NP_080426

Location (UCSC) Chr 5: 157.67 – 157.68 Mb Chr 11: 3.84 – 3.85 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 5 open reading frame 52 (C5orf52) is a gene of unknown function. It encodes the protein A6NGY3. The C5orf52 gene is strongly predicted to be localized in the cytoplasm. [5]

Contents

Gene

This gene is found on the positive strand of chromosome 5 (5q33.3) which spans a total of 9218 nucleotides that make up the gene. [6] C5orf52 codes for 2 introns and 3 exons with 537 base pairs of this gene being antisense to splice gene SOX30, which raises the possibility of regulated alternate expression. [7]

Gene expression

There are multiple sources that predict C5orf52 being tissue-specific in normal tissues. [8] It was expressed in the appendix, brain, colon, duodenum, endometrium, gall bladder, kidney, lung, lymph node, prostate, small intestine, spleen, urinary bladder, and was expressed in higher levels in the testis.

RNA

There is only one known isoform (C5orf52 isoform X1). [9] Its sequence has a length of 1023 base pairs that encodes for 3 exons. Transcription starts at the 385th base pair and stops at the 864th base pair. This gene contains both a 5' UTR (length of 387 nucleotides) and a 3' UTR (length of 162 nucleotides). [10]

Protein

The DUF5528 A6NGY3 is encoded by the C5orf52 gene and has a length of 159 amino acids. [11] [12] The molecular mass is 17.9 kDa and the isoelectric point is 10.8. [13] [14]

Function

Although the exact function of C5orf52 is unknown in humans, there is large evidence for the gene being associated with Spermatogenesis as there is very high expression in the testis, with lowered expression in the brain, colon, duodenum, and small intestine. [15] C5orf52 does not have any transmembrane domains or signal sequences. [16]

Structure

The protein is slightly serine rich, which is concentrated towards the beginning of the residue, and is overall slightly deficient in aspartic acid. [17] The distribution of charged positive and negative amino acids in the protein are equally spread out and result in no big charged clusters. [18] The predicted tertiary structures of the human protein were compiled with the use of multiple bioinformatic tools. All of the tools aided in predicting the protein to contain a long string of alpha helices near the C-terminus and extended strands near the N-terminus. [19] [20]

Gene level regulation

Different sites were identified to be present on the protein and these include: N-myristoylation site, amidation site, N-glycosylation site, cAMP- and cGMP-dependent protein kinase phosphorylation site, Casein kinase II phosphorylation site, and protein kinase C phosphorylation site. [21] Other areas of the protein were predicted phosphorylation sites in Serine, Threonine, and Tyrosine. [22] Only two Serines and one Threonine were strongly conserved with close orthologs.

Homology and evolution

Paralogs

There are no predicted paralogs for C5orf52 in Homo sapiens.

Orthologs

Orthologs were found by comparing the C5orf52 gene across NCBI’s database with different species' genetic codes. Twenty organisms from a variety of orders were selected to compare and further investigate. [23] These species included mammals, reptiles, amphibians, birds, and an invertebrate. The data in the table was sorted by the sequence percent identity to the human protein and then sorted by date of divergence.

Updated Ortholog Table.jpg

Orthologs of Homo sapiens C5orf52. Data is sorted by sequence identity and then date of divergence. Shading is associated with the grouping of organisms

Phylogeny

The oldest orthologs of human c5orf52 was found in Paralvinella palmiformis, which is an invertebrate with a date of divergence of around 686 million years ago. [24] The length of the tree branch or the amount of time seems to be smaller with orthologs closer to the Humans. The upper cluster of animals are all mammals, which would follow the trend with similar identities being correlated to a smaller distance away from the gene in humans. The length of the branch is proportional to the date of divergence from humans.

PS2 Phylogenetic Tree.jpg

Phylogenetic Tree containing ortholog species to the Human gene C5orf52. Tree is in Radial format meaning the distance of the line from the main branch describes species divergence. Source Phylogeny.fr [25]

Protein divergence

When the human cytochrome C and fibrinogen alpha chain sequences were compared to its orthologs, the protein m (corrected percent divergence) trendline was very similar to that of the fibrinogen alpha chain. Fibrinogen alpha chain sequence has a fast rate of change over time, which indicates that human c5orf52 does as well.

Conserved regions

Multiple sequence alignments indicated amino acid residue conservation throughout the C5orf52 with close orthologs. The most highly conserved regions spanned throughout the middle of the protein around amino acid 90 and had strong clumped conservation towards the C-terminus, which didn't have strong conservation.

Interactions

C5orf52 is not predicted to have any binary interactions with proteins. [26] [27] The true reason for this is unknown at this point. One possible explanation is the lack of any transmembrane domains. It may also be because of the lack of information on C5orf52. It may play a role in specialized pathways and conditions that aren’t explored yet in the database. A neighboring gene, upstream on the negative strand, SOX30, was found to have 63 binary interactions on PSICQUIC. [28]

Related Research Articles

<span class="mw-page-title-main">TMEM242</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">EVI5L</span> Protein-coding gene in the species Homo sapiens

EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.

<span class="mw-page-title-main">C9orf135</span> Mammalian protein found in Homo sapiens

C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.

<span class="mw-page-title-main">FAM210B</span> Protein-coding gene in the species Homo sapiens

FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.

Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">C14orf119</span> Protein-coding gene in the species Homo sapiens

C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.

<span class="mw-page-title-main">C22orf31</span> Protein-coding gene in the species Homo sapiens

C22orf31 is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. The protein has orthologs with high percent similarity in mammals. The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.

<span class="mw-page-title-main">MFSD6L</span> Protein-coding gene in the species Homo sapiens

Major facilitator superfamily domain containing 6 like (MFSD6L) is a protein encoded by the MFSD6L gene in humans. The MFSD6L protein is a transmembrane protein that is part of the major facilitator superfamily (MFS) that uses chemiosmotic gradients to facilitate the transport of small solutes across cell membranes.

<span class="mw-page-title-main">TMEM212</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.

<span class="mw-page-title-main">TEDDM1</span> Protein-coding gene in the species Homo sapiens

Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.


<span class="mw-page-title-main">C10orf53</span> Human gene

C10orf53 is a protein that in humans is encoded by the C10orf53 gene. The gene is located on the positive strand of the DNA and is 30,611 nucleotides in length. The protein is 157 amino acids and the gene has 3 exons. C10orf53 orthologs are found in mammals, birds, reptiles, amphibians, fish, and invertebrates. It is primarily expressed in the testes and at very low levels in the cerebellum, liver, placenta, and trachea.

<span class="mw-page-title-main">C11orf91</span> Protein

Chromosome 11 open reading frame 91, or C11orf91 is a protein which in humans is encoded by the C11orf91 gene.

<span class="mw-page-title-main">TMEM202</span>

Transmembrane 202 protein is encoded by the gene TMEM202 and is a member of the Claudin2 superfamily. Human paralogs include LIMP2, GSG1, CLDND2, NKG7. The specific function of TMEM202 has largely yet to be elucidated, but other Claudin2 superfamily proteins plays important roles in paracellular transport by contributing to the structure of gap junctions. In S. scrofa, TMEM202 has been found to aid in sperm motility, fertilization, and spermatogenesis.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000187658 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000020434 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "DeepLoc protein location prediction".
  6. "C5orf52 Chromosomal location". Gene Cards.
  7. "C5orf52 Splicing". NCBI AceView. Retrieved December 4, 2024.
  8. "NCBI Gene entry of C5orf52 Chromosome 5 open reading frame 52 [Homo sapiens]". NCBI Gene.
  9. "NCBI C5orf52 Gene Information". NCBI.
  10. "C5orf52 Nucleotide". NCBI. 25 August 2024.
  11. "A6NGY3 Classification". InterPro.
  12. "C5orf52 Protein". NCBI.
  13. "Protein Information". Gene Cards.
  14. "Isoelectric Point of A6NGY3". Archive Ensembl.
  15. "NCBI Gene entry of C5orf52 Chromosome 5 open reading frame 52 [Homo sapiens]". NCBI Gene.
  16. "SOSUI Transmembrane domains". SOSUI.
  17. "Protein compositional Tool". SAPS.
  18. "Protein Structure Results". iCn3D.
  19. "Protein Structure Database". Alphafold.
  20. "Protein Structure and Orientation Prediction". I-TASSER.
  21. "C5orf52 Sites". MyHit Motif Scan.
  22. "Phosphorylation Prediction". NetPhos.
  23. "Basic Local Alignment Search Tool". NCBI BLAST.
  24. "C5orf52 farthest ortholog". TimeTree.
  25. "Phylogenetic Tree Tool". Phylogenry.fr.
  26. "C5orf52 Binary Interactions". PSICQUIC View.
  27. "C5orf52 Interactions". IntAct.
  28. "SOX30 Binary Interactions". PSICQUIC View.