C2orf16

Last updated
C2orf16
Identifiers
Aliases C2orf16 , chromosome 2 open reading frame 16
External IDs HomoloGene: 82476 GeneCards: C2orf16
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_032266

XM_017321194

RefSeq (protein)

NP_115642

n/a

Location (UCSC) Chr 2: 27.54 – 27.58 Mb n/a
PubMed search [2] [3]
Wikidata
View/Edit Human View/Edit Mouse

C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein (NCBI ID: CAH18189.1 [4] henceforth referred to as C2orf16) is 1,984 amino acids long. [5] The gene contains 1 exon and is located at 2p23.3. [6] Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence. [7]

Contents

68 orthologs are known for this gene, including in mice and sheep, but no paralogs have been found. [8]

Gene

The C2orf16 isoform 2 is a 6.2 kb, 1 exon gene at locus 2p23.3, and contains P-S-E-R-S-H-H-S repeats on the C-terminal side of the gene from amino acid 1,559 to 1,903. These repeats appear to have arisen from a transposable element. Primates show more P-S-E-R-S-H-H-S repeats than other mammalian orthologs do. [6]

Expression

C2orf16 is found to be highly expressed in the testes [9] and a retinoic acid and mitogen-treated human embryonic stem cell line, [10] but is not known to be expressed differently in age or disease phenotypes. [11] C2orf16 is also seen to have high expression in the pre-implantation embryo from the 4-cell embryo stage to the blastocyst stage. [12]

C2orf16 is not seen to have rapamycin sensitive expression. [13] C2orf16 is also seen to significantly increase expression in c-MYC knockdown breast cancer cells. [14]

mRNA

Isoforms

Two isoforms exist of C2orf16. Isoform 1 is 5,388 amino acids long encoded in 5 exons over 16,401 base pairs. Isoform 2 uses an alternate start site of transcription and is considerably shorter at 1,984 amino acids long encoded in 1 exon over 6,200 base pairs. [8]

Expression Regulation

One miRNA is predicted to bind to the 3'UTR of C2orf16, accession number MI0005564. [15] [16]

Protein

C2orf16 has a predicted molecular weight of 224kD and a predicted isoelectric point of 10.08, [17] values that are relatively constant between orthologs. The protein includes higher than average composition of serine, histidine, and arginine and a lower than average composition of alanine. [18]

Compositional Features

A positive charge cluster is found from amino acid residues 1,274 to 1,302. [18]

An arginine rich region is found from amino acids 1,545 to 1,933, a serine rich region is found from amino acids 1,568 to 1,934, and a histidine rich region is found from amino acids 1,630 to 1,853. [18]

A dot matrix analysis [19] reveals a heavily repeated region from approximately residue 1,500 to 1,984, this being the P-S-E-R-S-H-H-S repeat. a small band of dots at approximately amino acid 1,200 denotes a half repeat of the P-S-E-R-S-H-H-S sequence.

Dot matrix analysis of uncharacterized protein C2orf16 isoform 2. The P-S-E-R-S-H-H-S repeat sequence is visualized via the darker area of the matrix from amino acid 1500-1984, and a half P-S-E-R-S-H-H-S repeat sequence is seen as a band near amino acid 1200. Dot Matrix analysis of C2orf16.png
Dot matrix analysis of uncharacterized protein C2orf16 isoform 2. The P-S-E-R-S-H-H-S repeat sequence is visualized via the darker area of the matrix from amino acid 1500–1984, and a half P-S-E-R-S-H-H-S repeat sequence is seen as a band near amino acid 1200.

C2orf16 isoform 2 has no transmembrane domains, [20] and is predicted to be localized to the nucleus after translation due to two nuclear localization sequences predicted at residues 1,233 and 1,281. [21] No nuclear export sequence is conserved amongst orthologs, [22] suggesting C2orf16 is not meant to leave the nucleus after import. No N- or C- terminal modifications were predicted. [23] [24] [25] [26]

Sub-cellular Localization

C2orf16 is predicted to be localized to the nucleus after transcription. [8]

Structure

C2orf16 Isoform 2 predicted 3D structure showing the three major domains of the protein. Domain 3 contains the P-S-E-R-S-H-H-S repeat sequence. C2orf16 Isoform 2 Predicted 3D Structure.png
C2orf16 Isoform 2 predicted 3D structure showing the three major domains of the protein. Domain 3 contains the P-S-E-R-S-H-H-S repeat sequence.

The 3D structure of C2orf16 is predicted to have three major domains. Domain 1 is from amino acids 1 to 662, domain 2 is from amino acids 674 to 1,487, and domain 3 is from amino acids 1,488 to 1,984. [27] Domain 1 and 2 are predicted to be connected via a stretch of 12 amino acids not otherwise organized into a secondary structure allowing flexibility between domains 1 and 2. Domain 2 is predicted to have protein interacting domains for transcription factors. [27] Domain 3 is predicted to follow a "balls on a string" structure [27] and has many sites for possible phosphorylation. [28]

Protein Interactions

C2orf16 has been shown to have a physical interaction with proto-oncogene Myc by tandem affinity purification. [29]

Ortholog Phylogeny

68 orthologs are known for C2orf16. [8] The protein seems to have appeared in the mammalian evolutionary history 320 million years ago, around the divergence of mammals from reptiles. This history would explain why orthologs do not exist in amphibians, reptiles, birds, nor other more distantly related species. [30]

Any orthologs from species more distant from humans than other mammals are likely not related in function, however, the P-S-E-R-S-H-H-S repeat is present in bony fishes, crustaceans, stramenopiles including potato blight, plantae, and prokaryotes. [30]

The transposon repeat may have been reintroduced to mammals by a viral vector.

Repeat Sequence

P-S-E-R-S-H-H-S Repeat Sequence Logo P-S-E-R-S-H-H-S Repeat Sequence Logo.png
P-S-E-R-S-H-H-S Repeat Sequence Logo

The P-S-E-R-S-H-H-S repeat sequence is seen to be conserved in orthologs for C2orf16, and is conserved in organisms as distantly related as oomycete slime mold [31] and plants including the chloroplasts of Ashby's Wattle. [32] The S-P-S-E-R portion of the repeat is seen to be the most important for conservation, as seen by alignment with these orthologs and by creation of a Logo. [33]

The conservation analysis of the repeat shows the initial S-P-S is highly conserved, possibly for phosphorylation(S) and structure(P), and the R is almost completely conserved, mutating to a Lysine in some orthologs, [32] implying the positive charge is necessary for the purpose of the repeat.

The 3D shape of the repeat sequence is unclear as it has been predicted to be either balls-on-a-string [34] or an antiparallel beta-sheet [6] structure.

Function

C2orf16 isoform 2 is predicted to have a possible function in mitosis regulation through its nuclear localization, [8] [21] predicted transcription factor binding site, [27] physical association with Myc, [29] and increased expression in c-MYC knockdown breast cancer cells. [14]

Clinical Significance

There are four patents on record for C2orf16, one each involving: cancerous PPP2RIA and ARID1A mutations, [35] Alzheimer's predisposition, [36] viral vaccine diversity, [37] and copy number variation relation to common variable immunodeficiency. [38] C2orf16 is also shown to have increased expression in some breast cancer lines, [14] as well as being involved with Myc [29] which is a common oncogene, making C2orf16 a possible oncogene to target in cancer treatments.

Related Research Articles

<span class="mw-page-title-main">C20orf27</span> Protein-coding gene in the species Homo sapiens

UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.

<span class="mw-page-title-main">Morn repeat containing 1</span> Protein-coding gene in the species Homo sapiens

MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">DEPDC1B</span> Protein-coding gene in the species Homo sapiens

DEP Domain Containing Protein 1B also known as XTP1, XTP8, HBV XAg-Transactivated Protein 8, [formerly referred to as BRCC3] is a human protein encoded by a gene of similar name located on chromosome 5.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

<span class="mw-page-title-main">SHLD1</span> Protein-coding gene in the species Homo sapiens

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.

<span class="mw-page-title-main">C1orf112</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.

<span class="mw-page-title-main">TMEM171</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">MIF4GD</span> Protein-coding gene in the species Homo sapiens

MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.

Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.

<span class="mw-page-title-main">ZNF821</span> Zinc Finger 821

Zinc Finger Protein 821, also known as ZNF821, is a protein encoded by the ZNF821 gene. This gene is located on the 16th chromosome and is expressed highly in the testes, moderately expressed in the brain and low expression in 23 other tissues. The protein encoded is 412 amino acids long with 2 Zinc Finger motifs and a 23 amino acid long STPR domain.

<span class="mw-page-title-main">TMEM212</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of 5 transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.

<span class="mw-page-title-main">ZNF548</span> Protein-coding gene in the species Homo sapiens

Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">THAP3</span> Protein in Humans

THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000221843 - Ensembl, May 2017
  2. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  3. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "hypothetical protein [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-02.
  5. "C2orf16 Gene". GeneCards Human Gene Database. Weizmann Institute of Science, Life Map Sciences. Retrieved Feb 5, 2019.
  6. 1 2 3 "C2orf16 – Uncharacterized protein C2orf16 – Homo sapiens (Human) – C2orf16 gene & protein". www.uniprot.org. Retrieved 2019-05-02.
  7. "C2orf16 Gene". GeneCards Human Gene Database. Weizmann Institute of Science, Life Map Sciences. Retrieved Feb 5, 2019.
  8. 1 2 3 4 5 Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. (January 2018). "Ensembl 2018". Nucleic Acids Research. 46 (D1): D754–D761. doi:10.1093/nar/gkx1098. PMC   5753206 . PMID   29155950.
  9. Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD (December 2003). "Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays". Science. 302 (5653): 2141–4. Bibcode:2003Sci...302.2141J. doi:10.1126/science.1090100. PMID   14684825. S2CID   10007258.
  10. "Homo sapiens gene C2orf16, encoding chromosome 2 open reading frame 16". AceView. NCBI National Institute of Health. Retrieved Feb 5, 2019.
  11. "EST Profile – Hs.131021". www.ncbi.nlm.nih.gov. Retrieved 2019-05-02.
  12. Xie D, Chen CC, Ptaszek LM, Xiao S, Cao X, Fang F, Ng HH, Lewin HA, Cowan C, Zhong S (June 2010). "Rewirable gene regulatory networks in the preimplantation embryonic development of three mammalian species". Genome Research. 20 (6): 804–15. doi:10.1101/gr.100594.109. PMC   2877577 . PMID   20219939.
  13. Petrich AM, Leshchenko V, Kuo PY, Xia B, Thirukonda VK, Ulahannan N, Gordon S, Fazzari MJ, Ye BH, Sparano JA, Parekh S (May 2012). "Akt inhibitors MK-2206 and nelfinavir overcome mTOR inhibitor resistance in diffuse large B-cell lymphoma". Clinical Cancer Research. 18 (9): 2534–44. doi:10.1158/1078-0432.CCR-11-1407. PMC   3889476 . PMID   22338016.
  14. 1 2 3 Cappellen D, Schlange T, Bauer M, Maurer F, Hynes NE (January 2007). "Novel c-MYC target genes mediate differential effects on cell proliferation and migration". EMBO Reports. 8 (1): 70–6. doi:10.1038/sj.embor.7400849. PMC   1796762 . PMID   17159920.
  15. Agarwal V, Bell GW, Nam JW, Bartel DP (August 2015). Izaurralde E (ed.). "Predicting effective microRNA target sites in mammalian mRNAs". eLife. 4: e05005. doi: 10.7554/eLife.05005 . PMC   4532895 . PMID   26267216.
  16. "miRNA Entry for MI0005564". www.mirbase.org. Retrieved 2019-05-05.
  17. "ExPASy – Compute pI/Mw tool". web.expasy.org. Retrieved 2019-05-02.
  18. 1 2 3 Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey AR, Potter SC, Finn RD, Lopez R (April 2019). "The EMBL-EBI search and sequence analysis tools APIs in 2019". Nucleic Acids Research. 47 (W1): W636–W641. doi:10.1093/nar/gkz268. PMC   6602479 . PMID   30976793.
  19. "EMBOSS: dotmatcher". www.bioinformatics.nl. Retrieved 2019-05-02.
  20. Cserzö M, Eisenhaber F, Eisenhaber B, Simon I (September 2002). "On filtering false positive transmembrane protein predictions". Protein Engineering. 15 (9): 745–52. doi: 10.1093/protein/15.9.745 . PMID   12456873.
  21. 1 2 Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N (January 2010). "PROSITE, a protein domain database for functional characterization and annotation". Nucleic Acids Research. 38 (Database issue): D161-6. doi:10.1093/nar/gkp885. PMC   2808866 . PMID   19858104.
  22. la Cour T, Kiemer L, Mølgaard A, Gupta R, Skriver K, Brunak S (June 2004). "Analysis and prediction of leucine-rich nuclear export signals". Protein Engineering, Design & Selection. 17 (6): 527–36. doi: 10.1093/protein/gzh062 . PMID   15314210.
  23. Kiemer L, Bendtsen JD, Blom N (April 2005). "NetAcet: prediction of N-terminal acetylation sites". Bioinformatics. 21 (7): 1269–70. doi: 10.1093/bioinformatics/bti130 . PMID   15539450.
  24. Bologna G, Yvon C, Duvaud S, Veuthey AL (June 2004). "N-Terminal myristoylation predictions by ensembles of neural networks". Proteomics. 4 (6): 1626–32. doi:10.1002/pmic.200300783. PMID   15174132. S2CID   20289352.
  25. Chuang GY, Boyington JC, Joyce MG, Zhu J, Nabel GJ, Kwong PD, Georgiev I (September 2012). "Computational prediction of N-linked glycosylation incorporating structural properties and patterns". Bioinformatics. 28 (17): 2249–55. doi:10.1093/bioinformatics/bts426. PMC   3426846 . PMID   22782545.
  26. Julenius K (August 2007). "NetCGlyc 1.0: prediction of mammalian C-mannosylation sites". Glycobiology. 17 (8): 868–76. doi: 10.1093/glycob/cwm050 . PMID   17494086.
  27. 1 2 3 4 Yang J, Zhang Y (July 2015). "I-TASSER server: new development for protein structure and function predictions". Nucleic Acids Research. 43 (W1): W174-81. doi:10.1093/nar/gkv342. PMC   4489253 . PMID   25883148.
  28. Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID   10600390.
  29. 1 2 3 "PSICQUIC View". www.ebi.ac.uk. Retrieved 2019-05-02.
  30. 1 2 Madden, Tom (2003-08-13). The BLAST Sequence Analysis Tool. National Center for Biotechnology Information (US).
  31. "cyst germination specific acidic repeat protein precursor [Phytophthora infestans]". NCBI. Retrieved 2019-05-02.
  32. 1 2 "accD (chloroplast) [Acacia ashbyae] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-02.
  33. "WebLogo – Create Sequence Logos". weblogo.berkeley.edu. Retrieved 2019-05-02.
  34. Zhang Y (January 2008). "I-TASSER server for protein 3D structure prediction". BMC Bioinformatics. 9 (1): 40. doi: 10.1186/1471-2105-9-40 . PMC   2245901 . PMID   18215316.
  35. Vogelstein B, Kinzler KW, Velculescu V, Papadopoulous N, Jones S (May 31, 2012). "US Patent 9,982,304" (US 20130210900 A1). Retrieved Feb 5, 2019.{{cite journal}}: Cite journal requires |journal= (help)[ permanent dead link ]
  36. Nagy, Zsuzsanna (Jan 16, 2014). "US Patent 9,944,986" (US 20150141491 A1). Retrieved Feb 5, 2019.{{cite journal}}: Cite journal requires |journal= (help)[ permanent dead link ]
  37. Shenk T, Wang D (April 16, 2009). "US Patent 9,439,960" (US 20100285059 A1). Retrieved Feb 5, 2019.{{cite journal}}: Cite journal requires |journal= (help)[ permanent dead link ]
  38. Hakonarson H, Glessner J, Orange J (Nov 28, 2013). "US Patent 9,109,254" (US 20130315858 A1). Retrieved Feb 5, 2019.{{cite journal}}: Cite journal requires |journal= (help)[ permanent dead link ]