CXorf66

Last updated
CXorf66
Identifiers
Aliases CXorf66 , SGPX, chromosome X open reading frame 66
External IDs MGI: 3779666; HomoloGene: 82551; GeneCards: CXorf66; OMA:CXorf66 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001013403

NM_001039240
NM_001382307

RefSeq (protein)

NP_001013421

n/a

Location (UCSC) Chr X: 139.96 – 139.97 Mb Chr X: 59.48 – 59.58 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

CXorf66 also known as Chromosome X Open Reading Frame 66, is a 361aa protein in humans that is encoded by the CXorf66 gene. The protein encoded is predicted to be a type 1 transmembrane protein; however, its exact function is currently unknown. [5]

Contents

There is a patent for CXorf66 under the file US 8586006 by the Institute for Systems Biology and Integrated Diagnostics, Inc. [6]

CXorf66 protein is a potential novel cancer biomarker. [7]

Gene

CXorf66 is located on Chromosome X at Xq27.1 and is on the complement strand. [8] The CXorf66 gene is located between ATP11C ATPase, MIR505, and HNRNPA3P3. [8] In addition to this, according to OMIM, CXorf66 is positioned between SOX3, SPANXB1, and CDR1. [9]

Gene locus for human gene CXorf66 on Chromosome X CXorf66 Gene Locus.png
Gene locus for human gene CXorf66 on Chromosome X

mRNA

Splice variants

CXorf66 only consists of one known splice variant with three exons (1-117, 118-271, and 272-1288bp) and two introns. [10] Locations of junctions occur at 30aa [G] and 81aa [M]. [10]

Splicing of CXorf66 gene AceView CXorf66 splicing.png
Splicing of CXorf66 gene

CXorf66 has only been found to have only one polyadenylation site. [11]

Protein

Composition

With 57 serines and 42 lysines, the CXorf66 protein is both serine and lysine rich. [12] CXorf66 has a molecular weight of 39.9kdal and an isoelectric point of 9.89. [12]

Domains

CXorf66 protein has a predicted signal peptide from 1-19aa, a topological domain from 20-47aa, a transmembrane domain from 48-68aa, and a second topological domain from 69-361aa. [13] A signal peptide cleavage site is predicted to occur between the 17-18aa. [14] Upon analyzing the protein's composition (serine and lysine rich) and post-translational modifications (high levels of phosphorylation), it is predicted that the first topological domain [20-47aa] is extracellular, while the topological domain [69-361aa] is cytoplasmic. A visual can be seen in Figure II. [15]

Figure I. Candidate phosphorylation sites in human CXorf66 protein Human CXorf66 Protein Phosphorylation Sites.png
Figure I. Candidate phosphorylation sites in human CXorf66 protein
Figure II. SOSUI Prediction of CXorf66 protein transmembrane topology Sosui Prediction of CXorf66.png
Figure II. SOSUI Prediction of CXorf66 protein transmembrane topology

Three repeat motifs of DKPV [31-34 and 204-207aa], SEAK [97-100 and 287-290aa], and PKRS [161-164 and 245-248aa] have been found in the human CXorf66 protein. These repeats are conserved in other primates like Gorilla gorilla gorilla and Macaca mulatta , but are not present in other mammals. [16]

SNPs

There is one natural variant of the population (frequency 0.436) at 233aa from proline to leucine in the CXorf66 protein, with proline being the ancestral encoded amino acid. No effects have been observed with this missense mutation. [13] [17]

Interacting proteins

Based on STRING's predicted protein interaction, CXorf66 has medium level scoring for being tied to the proteins listed in Figure III. [18] It is important to note that all proteins listed are not experimentally determined.

Regulation

Transcription

Promoter

There is only one known promoter predicted by Genomatix for the CXorf66 protein on the negative strand from 139047554-139048298 that is 745bp in length. [19] When BLAT Search Alignment was used for the CXorf66 promoter generated, numerous hits with high identity were retrieved for various genes on different chromosomes. The following are a few generated top scoring search results that share a high percent identity: [20]

NameGene IDScoreSpan (bp) out of 745IdentityChromosomeStrandStartEnd
ZBTB8A 653121 28265688.2%1-3299489232995547
TESK2 10420 26362490.3%1-4584309345843716
TBCK 93627 24463991.5%4+107146630107147268
USP48 84196 24163189.0%1+2201472522015355
PTPN22 26191 22728190.0%1-114365307114365587
PSPH 5723 22060590.6%7-5609831956098923

Uniquely, TESK2 is a testis-specific protein kinase, which correlates with predicted CXorf66 tissue expression.

Transcription factors

Through the use of Genomatix, a table was generated of the top 20 transcription factors and their binding sites in the CXorf66 promoter (see Figure IV). [19]

Figure IV: Generated list of transcription factor binding sites in CXorf66 promoter Transcription Factor Binding Sites in CXorf66 Promoter.png
Figure IV: Generated list of transcription factor binding sites in CXorf66 promoter

Translation

CXorf66 has two miRNAs, hsa-mir-1290 Archived 2021-03-05 at the Wayback Machine and hsa-miR-4446-5p Archived 2016-03-04 at the Wayback Machine predicted to bind to the 3' UTR region of the mRNA. [21]

Post-translational modifications

An N-glycosylation site has been predicted by Expasy's NetNGlyc at NGSS [24aa] with a secondary site also possible at NGTN [21aa]. [22] Utilizing NetPhos, a total of 48 phosphorylation sites have been predicted (41 Serines, 2 Threonines, and 5 Tyrosines), all of which occur after the predicted transmembrane domain, suggesting cytoplasmic topology. [23] Using YinOYang, many O-GlcNAc sites have been predicted. All that include high potential occur after the 48-68aa transmembrane region. [24] A SUMOplot Analysis conducted of Homo sapiens CXorf66 protein, discovered a high probability of a sumoylation motif at position K241, alongside low probability motifs at K316 and K186. With sumoylation having a role in various cellular processes like nuclear-cytosolic transport and transcriptional regulation, it is expected CXorf66 is modified by a SUMO protein post-translation. [25]

Subcellular localization

Figure V. CXorf66 Nuclear Localization Signals Across Homologs CXorf66 Nuclear Localization Signals.png
Figure V. CXorf66 Nuclear Localization Signals Across Homologs

Using PSORT II, there is a nuclear localization signal of PYKKKHL at 268aa. [26] This signal can be seen to be conserved in fellow primate species; however, is not present in other mammals. In addition to this, following SDSC's Biology Workbench's SAPS kNN-Prediction, the CXorf66 protein for humans and the mouse homolog have a 47.8% likelihood to end up in the nuclear region of a cell. For more distant homologs, like Bos taurus, that do not have nuclear localization signals however, CXorf66 has a 34.8% likelihood to end up in the extracellular, including cell wall region, or plasma membrane regions. [12] [26] To view several homologs and their nuclear localization signals, see Figure V.

Homology

CXorf66 has no known paralogs in humans; however CXorf66 has conserved homologs throughout the Mammalia kingdom. Highly conserved in primates, a noticeable rapid evolution has been spotted for CXorf66, see Figure VI, explaining the greater number of orthologs in mammals, rather than in invertebrates, birds, and reptiles. [27]

Figure VI. CXorf66 Protein Homolog Divergence CXorf66 Percent Non-identity vs Date of Divergence.png
Figure VI. CXorf66 Protein Homolog Divergence
CXorf66 ProteinSpeciesDate of divergence (MYA) [28] ncbi accession Numberquery coverE valueIdentity
CXorf66 homolog Chimpanzee (Pan troglodytes)6.3 XP_001139133.1 100%098%
CXorf66 homolog Gorilla (Gorilla gorilla gorilla)8.8 XP_004065002.1 100%098%
LOC631784 isoform X1 Mouse (Mus musculus)92.3 XP_006528296.1 98%2E-4134%
CXorf66-like isoform X1 Rat (Rattus norvegicus)92.3 XP_001068529.2 84%6E-3232%
CXorf66 homolog Cow (Bos taurus)94.2 XP_005200949.1 96%2.00E-4635%
CXorf66 homolog White rhino (Ceratotherium simum simum)94.2 XP_004441715.1 100%8.00E-8648%
CXorf66 homolog Horse (Equus caallus)94.2 XP_005614614.1 96%8.00E-5844%
Neurofilament medium polypeptide Zebra finch (Taeniopygia guttata)296 XP_002197538.1 44%2.00E-0830%
Triadin-like, partial Alligator (Alligator mississippiensis)296 XP_006271227.1 53%2.00E-1223%
LOC590028 Sea urchin (Strongylocentrotus purpuratus)742.9 XP_794743.3 45%2.00E-0535.40%
Alpha-L-fucosidase Streptococcus mitis 2535.8 WP_001083113.1 47%7.00E-3822%

Expression

From Unigene's EST cDNA Tissue Abundance display and Protein Atlas, CXorf66 has a moderately high expression levels in testes, in addition to higher expression levels in fetus tissue in comparison to other developmental stages. [29] [30] CXorf66 protein also has a notable low presence in both the control endometrium total RNA and endometriosis total RNA. [31] CXorf66 has been portrayed to have notable presence in the plasma and platelet. [5] Based upon PaxDb data, CXorf66 has been found ranking in the top 5% for one study of human plasma and in the top 25% for another study conducted with human platelet. [32] In addition to this, there has been a noticeable 60–100% CXorf66 protein presence in both non-failing and dilated cardiomyopathy septum tissue. [33] Furthermore, CXorf66 has a ~75% protein presence in peripheral blood mononuclear cells. [34]

Unigene EST Tissue Expression Data for Human CXorf66 Protein Unigene EST Tissue Expression for CXorf66.png
Unigene EST Tissue Expression Data for Human CXorf66 Protein
GeneCards Predicted Tissue Expression of Human CXorf66 Protein GeneCards CXorf66 Tissue Expression.png
GeneCards Predicted Tissue Expression of Human CXorf66 Protein

Related Research Articles

<span class="mw-page-title-main">SUHW4</span> Protein-coding gene in the species Homo sapiens

Zinc finger protein 280D, also known as Suppressor Of Hairy Wing Homolog 4, SUWH4, Zinc Finger Protein 634, ZNF634, or KIAA1584, is a protein that in humans is encoded by the ZNF280D gene located on chromosome 15q21.3.

<span class="mw-page-title-main">TMEM242</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.

<span class="mw-page-title-main">Morn repeat containing 1</span> Protein-coding gene in the species Homo sapiens

MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.

KIAA0090 is a human gene coding for a protein of unknown function. KIAA0090 has two aliases OTTHUMP00000002581 and RP1-43E13.1. The gene codes for multiple transcript variants which can localize to different subcellular compartments. KIAA0090 interacts with multiple effector proteins. KIAA0090 contains a conserved COG1520 WD40 like repeat domain thought to be the method of such interaction.

<span class="mw-page-title-main">Protein FAM46B</span> Protein-coding gene in the species Homo sapiens

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.

<span class="mw-page-title-main">CCDC130</span> Protein found in humans

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

<span class="mw-page-title-main">TM6SF2</span> Protein-coding gene in the species Homo sapiens

TM6SF2 is the Transmembrane 6 superfamily 2 human gene which codes for a protein by the same name. This gene is otherwise called KIAA1926. Its exact function is currently unknown.

<span class="mw-page-title-main">EVI5L</span> Protein-coding gene in the species Homo sapiens

EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.

Transmembrane protein 251, also known as C14orf109 or UPF0694, is a protein that in humans is encoded by the TMEM251 gene. One notable feature of this protein is the presence of proline residues on one of its predicted transmembrane domains., which is a determinant of the intramitochondrial sorting of inner membrane proteins.

<span class="mw-page-title-main">C3orf70</span> Protein-coding gene in the species Homo sapiens

C3orf70 also known as Chromosome 3 Open Reading Frame 70, is a 250aa protein in humans that is encoded by the C3orf70 gene. The protein encoded is predicted to be a nuclear protein; however, its exact function is currently unknown. C3orf70 can be identified with known aliases: Chromosome 3 Open Reading Frame 70, AK091454, UPF0524, and LOC285382.

<span class="mw-page-title-main">C12orf40</span> Protein-coding gene in humans

C12orf40, also known as Chromosome 12 Open Reading Frame 40, HEL-206, and Epididymis Luminal Protein 206 is a protein that in humans is encoded by the C12orf40 gene.

PRP36 is an extracellular protein in Homo sapiens that is encoded by the PRR36 gene that contains a domain of unknown function, DUF4596, towards the C terminus of the protein. The function of PRP36 is unknown, but high gene expression has been observed in various regions of the brain such as the prefrontal cortex, cerebellum, and the amygdala. PRP36 has one alias: Putative Uncharacterized Protein FLJ22184.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

Putative uncharacterized protein C6orf52 (C6orf52) is a protein in humans that is encoded by the gene "C6orf52" and has six known isoforms. C6orf52 was identified in 2002 by The National Institutes of Health Mammalian Gene Collection (MGC) Program. C6orf52 has one known paralog, tRNA selenocysteine 1-associated protein 1 (TRNAU1AP).

<span class="mw-page-title-main">C6orf136</span> Protein-coding gene in the species Homo sapiens

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

<span class="mw-page-title-main">C11orf98</span> Protein-coding gene in the species Homo sapiens

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000203933 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000079583 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 GeneCard for CXorf66
  6. Leroy Hood; Patricia M. Beckmann; Richard Johnson; Marcello Marelli; Xiaojun Li. "Organ-specific proteins and methods of their use". US8586006 Patent. Institute For Systems Biology, Integrated Diagnostics, Inc. Retrieved 2015-03-11.
  7. Delgado AP, Hamid S, Brandao P, Narayanan R (2014). "A novel transmembrane glycoprotein cancer biomarker present in the X chromosome". Cancer Genomics & Proteomics. 11 (2): 81–92. PMID   24709545.
  8. 1 2 "NCBI CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  9. "OMIM CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  10. 1 2 "NCBI AceView CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  11. "Softberry". Softberry.com. Retrieved 2015-03-11.
  12. 1 2 3 Department of Bioengineering. "SDSC Biology Workbench". University of California, San Diego. Retrieved 2015-03-11.
  13. 1 2 "UniProtKB/Swiss-Prot Q5JRM2". UniProt consortium. Retrieved 2015-03-11.
  14. Thomas Nordahl Petersen; Søren Brunak; Gunnar von Heijne; Henrik Nielsen. "SignalP 4.0: discriminating signal peptides from transmembrane regions". Nature Methods. Retrieved 2015-03-11.
  15. Hirokawa T.; Boon-Chieng S.; Mitaku S. (1998). "SOSUI: classification and secondary structure prediction system for membrane proteins". Bioinformatics. 14 (4). Journal of Bioinformatics: 378–379. doi: 10.1093/bioinformatics/14.4.378 . PMID   9632836 . Retrieved 2015-03-11.
  16. Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E. & Karlin, S. (1992). "Methods and algorithms for statistical analysis of protein sequences". Proceedings of the National Academy of Sciences. 89 (6): 2002–2006. Bibcode:1992PNAS...89.2002B. doi: 10.1073/pnas.89.6.2002 . PMC   48584 . PMID   1549558 . Retrieved 2015-03-11.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. "dbSNP". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  18. "CXorf66 Protein Interactions". STRING - Known and Predicted Protein-Protein Interactions. String-db.org. Retrieved 2015-03-11.
  19. 1 2 Genomatix Software. "Genomatix ElDorado". Archived from the original on 2021-12-02. Retrieved 2015-03-11.
  20. Jim Kent. "BLAT". UCSC Genome Bioinformatics. Retrieved 2015-03-11.
  21. "miRBase: the microRNA database". University of Manchester. Retrieved 2015-03-11.
  22. Blom, N.; Gammeltoft, S. & Brunak, S. (1999). "Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–1362. doi:10.1006/jmbi.1999.3310. PMID   10600390 . Retrieved 2015-03-11.
  23. R Gupta. "Prediction of glycosylation sites in proteomes: from post-translational modifications to protein function". Cbs.dtu.dk. Retrieved 2015-03-11.
  24. Qi Zhao; Yubin Xie; Yueyuan Zheng; Shuai Jiang; Wenzhong Liu; Weiping Mu; Yong Zhao; Yu Xue; Jian Ren. "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. Archived from the original on 2013-05-10.
  25. 1 2 Paul Horton. "PSORT II". Psort.hgc.jp. Retrieved 2015-03-11.
  26. "BLAST: Basic Local Alignment Search Tool". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  27. "The Timescale of Life". TimeTree. Retrieved 2015-03-11.
  28. "Unigene". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-11.
  29. Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus H, Björling L, Ponten F (2010). "Towards a knowledge-based Human Protein Atlas". The Human Protein Atlas. 28 (12). Nat Biotechnology: 1248–1250. doi:10.1038/nbt1210-1248. PMID   21139605. S2CID   26688909 . Retrieved 2015-03-11.
  30. "CXorf66 -Endometriosis". NCBI GEO Profile. National Center for Biotechnology Information. Retrieved 2015-03-11.
  31. Wang, M.; et al. "PaxDb CXorf66". PaxDb: Protein Abundance Across Organisms. Mol Cell Proteomics. Retrieved 2015-03-11.
  32. "CXorf66 - Dilated cardiomyopathy: septum". NCBI GEO Profile. National Center for Biotechnology Information. Retrieved 2015-03-11.
  33. "Occupational benzene exposure: peripheral blood mononuclear cells (HumanRef-8)". NCBI GEO Profile. National Center for Biotechnology Information. Retrieved 2015-03-11.