Proser2

Last updated
PROSER2
Identifiers
Aliases PROSER2 , C10orf47, proline and serine rich 2
External IDs MGI: 2442238 HomoloGene: 51648 GeneCards: PROSER2
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_153256

NM_001159657
NM_144883

RefSeq (protein)

NP_694988

NP_001153129
NP_659132

Location (UCSC) Chr 10: 11.82 – 11.87 Mb Chr 2: 6.1 – 6.14 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. [5] [6] It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. [5] [7] [8] The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer. [9] [10] [11] [12] [13] [14]

Contents

Gene

This gene is 48,880 bases in length and is 3,360 base pairs in length after transcription to mRNA. [15] [16] PROSER2 has 5 splice variants, 3 of which are alternatively spliced and 2 of which are unspliced forms. [17] It contains an upstream in-frame stop codon and a 2,000 bp promoter. [15] [18]

Locus

PROSER2 is located on the tenth chromosome (10p14). It is oriented on the plus strand of DNA and has 5 exons. [16] [19]

PROSER2 Transcript

Splice Variant [18] Name [20] Sequence Length (bp) [18] Protein Length (aa) [18] Mass (DA) [20] Type [18] Features
1 - Primary TranscriptPROSER2; c10orf473360 [15] 43545,802Protein coding

(full length)

5 exons; 4 coding exons [18] Predicted: 1 dimethylated arginine; 1 omega-N-methylarginine [20]

37 Phosphorylation sites; [21] 2 SUMO Interaction Motifs; [22] 2 S-Palmitoylation sites; [23] 2 GalNac O-glycosylation sites [24]

Isoelectric point: 6.81

Charge: 3.0 [18]

21646341Protein coding5 exons; 4 coding exons [18]
3c10orf47 isoform CRA_b123223924,793Protein coding1 exon; 1 coding exon

Isoelectric point: 12.21

Charge: 19.0 [18]

4476859,554Protein coding3 exons; 2 coding exons

3' truncated in transcript; [18] 85th aa is non-terminal residue [20]

Isoelectric point: 4.25

Charge: -8.0 [18]

5728No proteinProcessed transcript4 exons; 0 coding exons [18]

Homology and Evolution

The evolutionary history of PROSER2 versus Cytochrome C and Fibrinogen alpha subunit. Each data point on the graph represents a homolog of the human gene found in a different species that was found using a BLAST search. This search resulted in percent identities which are graphed against the estimated time of divergence from humans which was found using TimeTree.This graph demonstrates that PROSER2 is a fast-evolving gene, similar to the Fibrinogen alpha subunit. PROSER2 evol2.png
The evolutionary history of PROSER2 versus Cytochrome C and Fibrinogen alpha subunit. Each data point on the graph represents a homolog of the human gene found in a different species that was found using a BLAST search. This search resulted in percent identities which are graphed against the estimated time of divergence from humans which was found using TimeTree.This graph demonstrates that PROSER2 is a fast-evolving gene, similar to the Fibrinogen alpha subunit.

Orthologous Space

The orthologous space for PROSER2 is fairly large, as 143 organisms are reported to have orthologs with the human PROSER2. [16] The most distant ortholog of the human PROSER2 is the elephant shark, Callorhinchus milii. The most distant relatives of humans with PROSER2 are fish and sharks (cartilaginous fishes). For this same reason, it can be inferred that PROSER2 originated in vertebrates. [7]

Paralogous Space

The human PROSER2 gene has two paralogs: c1orf116 and specifically androgen-regulated gene protein isoform 1. [7]

Conserved Regions

Multiple sequence alignments demonstrated that the 3’ end of the proline-serine rich 2 protein is highly conserved in both distant and close homologs. These widely conserved amino acids found in all primates, mammals, reptiles, birds, amphibians, fish, and sharks for which sequences are available include: R421, G406, V409, A424, L425, L428, G429, and L430. It can be noted that these highly conserved amino acids comprise much of the 3’ end of the specifically androgen-regulated gene protein (SARG) domain. The 5’ end of the proline-serine rich 2 protein is highly conserved in close relatives of humans including all primates, mammals, reptiles, and birds for which sequences are available. PROSER2 has an even balance of basic and acidic residues which are conserved throughout all homologs. [27] [28]

Evolutionary Pattern

PROSER2 is a fast evolving gene, similar to Fibrinogen alpha subunit (FGA). It aligns almost perfectly with Fibrinogen’s evolutionary history and is much farther away from the evolutionary timeline of Cytochrome C (CYCS) which is evolving more slowly than PROSER2 or FGA. Gene duplication of PROSER2 occurred approximately in fish which diverged from humans 436.8 MYA. [8]

Proline and Serine Rich 2 Protein

In Homo sapiens, PROSER2 encodes the proline and serine-rich protein 2 which is 435 amino acids in length and has a molecular weight of 45,802 Da. [5] This protein has a fairly neutral basal isoelectric point of 6.81. [29] The proline and serine rich 2 protein contains a conserved SARG (specifically androgen-regulated gene protein) domain that spans 388 amino acids within PROSER2. The SARG domain belongs to the pfam15385 family of genes. Its true function has yet to be elucidated, but it is a suspected androgen receptor because it is up-regulated in the presence of androgens, but not glucocorticoids. [6] The SARG domain is highly expressed in the prostate where PROSER2 has also been reported. [9]

Human PROSER2 protein internal structure and features. PROSER2 protein internal structure.png
Human PROSER2 protein internal structure and features.

Protein Internal Structure

FeaturesLocation
Ser-rich Region [20] aa 8-43 [20]
Region of Low Complexity [18] aa 27-43 [18]
Region of Low Complexity [18] aa 87-105 [18]
Region of Low Complexity [18] aa 113-126 [18]
Region of Low Complexity [18] aa 143-171 [18]
Pro-rich Region [20] aa 147-254 [20]
Region of Low Complexity [18] aa 228-246 [18]
Region of Low Complexity [18] aa 291-310 [18]
SARG Domain

(Specifically androgen-regulated gene protein) [6]

aa 44-433 [6]

Post-Translational Modifications

PROSER2 contains 35 predicted Serine phosphorylation sites, as well as 2 predicted Threonine phosphorylation sites on conserved residues. [21] It is also predicted to contain 2 SUMO Interaction Motifs, 2 S-Palmitoylation sites, as well as 2 GalNac O-glycosylation sites all located on or near highly conserved amino acids. [22] [23] [24] These modifications may change the folding and function of the protein which is predicted to be localized to the nucleus. [32]

Predicted Secondary Structure

PROSER2’s structure is currently uncharacterized. However, it is likely to contain 4 alpha helices and 5 domains of beta sheets which are conserved across all mammalian homologs. [33] Based on its structural features and post-translational modifications, it is predicted to be a soluble protein secreted from the nucleus via a non-classical secretion pathway. [32] [34] [35]

Function

The function of proline and serine rich 2 protein is currently unknown. However, it is listed in several U.S. patents as a potential biomarker of cancer. [9] [10] [11] [12] [13] [14]

Interacting Proteins

Previous experimentation has found that PROSER2 interacts with several other proteins including: STK24, ESR2, POT1, ACTB, and EPS8. [19] [36] These interacting proteins are involved in control of apoptosis, reproductive cell differentiation, telomere maintenance, cell integrity, and cell cycle progression, respectively. These interactions identify PROSER2 as a gene heavily involved in the regulation of cell differentiation and apoptosis. [19]

Expression

PROSER2 is extremely tissue specific in its expression, which is often low. In humans, PROSER2 is most highly expressed in the bone marrow, fetal brain, fetal kidney, liver, fetal liver, lung, fetal lung, lymph node, prostate, stomach, thymus, and trachea (GEO Profile ID: 69555271). It has also been found to be highly expressed in the colon, testes, parotid gland, and uterus (GEO Profile ID: 10034772) . The high expression in the testicular and prostate tissue is as expected given the existence of the SARG domain in the gene and its association with androgens. PROSER2 is least expressed in the heart, spinal cord, and several areas of the adult brain (GEO Profile ID: 69555271). PROSER2 has higher expression in ETP-ALL (early T-cell precursor acute lymphoblastic leukemia) patients compared to controls (GEO Profile ID: 92018456) and is highly expressed in primary tumors of the prostate compared to benign and malignant samples (GEO Profile ID: 14264706). PROSER2 is underexpressed in males with AIS (Androgen Insensitivity Syndrome), which follows with the evidence previously described regarding the SARG domain. Treatment with dihydrotestosterone has been found to cause genital fibroblasts to increase expression of PROSER2, further supporting that it is an androgen-responsive gene (GEO Profile ID: 20808032). [37]

Transcription Factor Interactions

PROSER2 interacts most strongly with the following transcription factors: Vertebrate TATA binding protein factor, ZF5 POZ domain zinc finger, CTCF and BORIS gene family transcriptional regulators, E2F-myc activator/cell cycle regulator, MYT1 C2HC zinc finger protein, SOX/SRY– sex/testis determining and related HMG box factors, CCAAT binding factors, HOX-PBX complexes, and C2HC zinc finger transcription factors 13. [38] The SRY gene is the primary factor in determining testicular formation during development, so it is logical that PROSER2's association with androgens would be controlled by transcription factors in the SOX/SRY-sex/testis determining and related HMG box factors family. [38] The E2F-myc activator/cell cycle regulator is also important because Myc has been implicated in cancer pathways, so this relationship with its transcription factor provides supplementary evidence of PROSER2's role as a potential biomarker of cancer. [9] [10] [11] [12] [13] [14]

Clinical Significance

Although PROSER2’s function in humans has not been elucidated, its SARG domain, interacting proteins/transcription factors, and expression patterns indicate that PROSER2 is involved in cell cycle control and apoptosis, and is androgen-responsive in nature. PROSER2 may be a biomarker of epithelial cell, breast, prostate, ovarian, lung, brain, and blood cancers as demonstrated in several US Patents. [9] [10] [11] [12] [13] [14]

Mutations

The human PROSER2 contains the following non-conservative single nucleotide polymorphisms (SNPs) within its exons that are conserved among all close homologs: D9N, C21G, G30S, R35Q, R40H, I69T, S71F, L46F, D83N, D109N, K200M, A412V, F345S, T408A, and L425P. [39]

Related Research Articles

<span class="mw-page-title-main">PBDC1</span> Human gene

CXorf26, also known as MGC874, is a well conserved human gene found on the plus strand of the short arm of the X chromosome. The exact function of the gene is poorly understood, but the polysaccharide biosynthesis domain that spans a major portion of the protein product, as well as the yeast homolog, YPL225, offer insights into its possible function.

<span class="mw-page-title-main">OSER1</span>

Chromosome 20 open reading frame 111, or C20orf111, is the hypothetical protein that in humans is encoded by the C20orf111 gene. C20orf111 is also known as Perit1, HSPC207, and dJ1183I21.1. It was originally located using genomic sequencing of chromosome 20. The National Center for Biotechnology Information, or NCBI, shows that it is located at q13.11 on chromosome 20, however the genome browser at the University of California-Santa Cruz (UCSC) website shows that it is at location q13.12, and within a million base pairs of the adenosine deaminase locus. It was also found to have an increase in expression in cells undergoing hydrogen peroxide(H
2
O
2
)-induced apoptosis. After analyzing the amino acid content of C20orf111, it was found to be rich in serine residues.

<span class="mw-page-title-main">POTEB</span>

POTE ankyrin domain family, member B is a protein in humans that is encoded by the POTEB gene.(Prostate, Ovary, Testes Expressed ankyrin domain family member B).It is most likely involved in mediating protein-protein interaction via its 5 ankyrin domains. POTEB is most probably aids in intracellular signaling, but is not likely to be a secreted or nuclear protein. POTEB's function is likely to be regulated via 17 potential phosphorylation sites. There is currently no evidence to suggest that POTEB has nuclear localization signals.

<span class="mw-page-title-main">QSER1</span>

Glutamine Serine Rich Protein 1 or QSER1 is a protein encoded by the QSER1 gene.

<span class="mw-page-title-main">FAM214A</span>

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">WWC2</span> Protein-coding gene in the species Homo sapiens

WW and C2 domain containing 2 (WWC2) is a protein that in humans is encoded by the WWC2 gene (4q35.1). Though function of WWC2 remains unknown, it has been predicted that WWC2 may play a role in cancer.

<span class="mw-page-title-main">TM6SF2</span> Protein-coding gene in the species Homo sapiens

TM6SF2 is the Transmembrane 6 superfamily 2 human gene which codes for a protein by the same name. This gene is otherwise called KIAA1926. Its exact function is currently unknown.

<span class="mw-page-title-main">CCDC47</span> Protein-coding gene in the species Homo sapiens

Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.

<span class="mw-page-title-main">CXorf66</span> Human protein

CXorf66 also known as Chromosome X Open Reading Frame 66, is a 361aa protein in humans that is encoded by the CXorf66 gene. The protein encoded is predicted to be a type 1 transmembrane protein; however, its exact function is currently unknown. CXorf66 has one alias: RP11-35F15.2.

<span class="mw-page-title-main">FAM63A</span>

Family with sequence similarity 63, member A is a protein that, in humans, is encoded by the FAM63A gene. It is located on the minus strand of chromosome 1 at locus 1q21.3.

WD repeat-containing protein 90 is a protein that, in humans, is encoded by the WDR90 gene (16p13.3). This human protein is 1750 amino acids, and has a molecular weight of 187.7 kDa. It contains multiple WD40 repeat domains and one domain of unknown function. This protein is conserved all the way back to invertebrates. Proteins containing WD transducin repeating domains have been found to play a role in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control, autophagy and apoptosis.

<span class="mw-page-title-main">FAM76A</span>

FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.

<span class="mw-page-title-main">VXN</span>

Vexin is a protein encoded by VXN gene. VXN is found to be highly expressed in regions of the brain and spinal cord.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">Proser1</span>

PROSER1 is a protein that in humans is encoded by the PROSER1 gene.

<span class="mw-page-title-main">C14orf93</span> Protein-coding gene in the species Homo sapiens

C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.

<span class="mw-page-title-main">ERICH2</span> Protein-coding gene in the species Homo sapiens

Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.

<span class="mw-page-title-main">C2orf73</span>

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

<span class="mw-page-title-main">KIAA1211L</span>

KIAA1211L is a protein that in humans is encoded by the KIAA1211L gene. It is highly expressed in the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, KIAA1211L is associated with certain mental disorders and various cancers.

<span class="mw-page-title-main">TMEM171</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000148426 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000045319 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 NCBI (National Center for Biotechnology Information) Protein
  6. 1 2 3 4 5 Marchler-Bauer A et al. (2015), "CDD: NCBI's conserved domain database.", Nucleic Acids Res. 43(Database issue):D222-6.
  7. 1 2 3 NCBI BLAST (National Center for Biotechnology Information Basic Local Alignment Search Tool ) [http://blast.ncbi.nlm.nih.gov/Blast.cgi]
  8. 1 2 Time Tree [http://www.timetree.org/]
  9. 1 2 3 4 5 Pawlowski,T.,Yeatts, K., and Akhavan, R. (2012). Circulating biomarkers for cancer.
  10. 1 2 3 4 Birrer, M.J., Bonome, T.A., Sood, A., and LU, C. (2013). Pro-angiogenic genes in ovarian tumor endothelial cell isolates.
  11. 1 2 3 4 Nguyen,L.S., Kim, H.-G., Rosenfeld, J.A., Shen, Y., Gusella, J.F., Lacassie, Y., Layman, L.C., Shaffer, L.G., and Gécz, J. (2013). Contribution of copy number variants involving nonsense-mediated mRNA decay pathway genes to neuro-developmental disorders. Hum. Mol. Genet. 22, 1816–1825.
  12. 1 2 3 4 Schettini, J., Hornung, T., Holterman, D., and Spetzler, D. (2014). Biomarker compositions and methods.
  13. 1 2 3 4 Seto, M., Tagawa, H., Yoshida, Y., and Kira, S. (2008). Methods for Diagnosis and Prognosis of Malignant Lymphoma.
  14. 1 2 3 4 Zarbl, H., and Graham, J. (2014). Novel Method of Cancer Diagnosis and Prognosis and Prediction of Response to Therapy.
  15. 1 2 3 NCBI (National Center for Biotechnology Information) Nucleotide
  16. 1 2 3 NCBI (National Center for Biotechnology Information) Gene
  17. NCBI (NationalCenter for Biotechnology Information) AceView [https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/]
  18. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Ensembl
  19. 1 2 3 GeneCards [http://www.genecards.org/]
  20. 1 2 3 4 5 6 7 8 UniProt [http://www.uniprot.org/uniprot/]
  21. 1 2 NetPhos. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  22. 1 2 GPS-SUMO. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  23. 1 2 GPS-Lipid. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  24. 1 2 YinOYang. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  25. NCBI BLAST (National Center for Biotechnology Information Basic Local Alignment Search Tool ) [http://blast.ncbi.nlm.nih.gov/Blast.cgi]
  26. Time Tree [http://www.timetree.org/]
  27. SDSC (San Diego Supercomputer Center) Biology Workbench. BOXSHADE [http://workbench.sdsc.edu/]
  28. SDSC (San Diego Supercomputer Center) Biology Workbench. ClustalW Multiple Sequence Alignment [http://workbench.sdsc.edu/]
  29. Isoelectric Point Determination. Biology WorkBench [http://workbench.sdsc.edu]
  30. Ensembl
  31. UniProt [http://www.uniprot.org/uniprot/]
  32. 1 2 Predict Protein. [https://www.predictprotein.org/]
  33. PELE. SDSC Biology Workbench [http://workbench.sdsc.edu/]
  34. SOSUI. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  35. Secretome. ExPASy BioInformatics Resource Portal [http://expasy.org/]
  36. STRING 10: Known and Predicted Protein-Protein Interactions. [http://string-db.org/]
  37. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. (2013). "(Jan 2013). "NCBI GEO: archive for functional genomics data sets--update"". Nucleic Acids Research. 41 (Database issue): D991–5. doi:10.1093/nar/gks1193. PMC   3531084 . PMID   23193258.
  38. 1 2 Genomatix. ElDorado.[https://www.genomatix.de/cgi-bin//eldorado/eldorado.pl]
  39. dbSNP NCBI (National Center for Biotechnology Information Basic Local Alignment Search Tool) [https://www.ncbi.nlm.nih.gov/projects/SNP]

Further reading