CXorf38 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CXorf38 , chromosome X open reading frame 38, CXorf38 Isoform 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1916405; HomoloGene: 17013; GeneCards: CXorf38; OMA:CXorf38 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chromosome X Open Reading Frame 38 (CXorf38) is a protein which, in humans, is encoded by the CXorf38 gene. [5] CXorf38 appears in multiple studies regarding the escape of X chromosome inactivation (see Clinical Significance). [6] [7] [8]
The CXorf38 gene is located on chromosome X at p11.4. [9] Including 5' and 3' untranslated regions, isoform 1 is 18,515 base pairs long, spanning chromosome X at 40,626,921 - 40,647,554 on the minus strand. [10] Neighboring genes include MPC1L and MED14, which encode for mitochondrial pyruvate carrier 1-like protein and mediator of RNA polymerase II transcription subunit 14 enzyme, respectively. [11]
The CXorf38 gene encodes 8 mRNA variants, each encoding a protein isoform. Isoform 1, the canonical sequence, has 7 exons. [12] The remaining isoforms are missing various exons and/or have varying 5'UTR or 3'UTR region lengths.
Isoform | Number of Amino Acids | Exon 1 | Exon 2 | Exon 3 | Exon 4 | Exon 5 | Exon 6 | Exon 7 | Notes |
---|---|---|---|---|---|---|---|---|---|
1 | 319 | x | x | x | x | x | x | x | |
X1 | 319 | x | x | x | x | x | x | x | Extended 5'UTR, shortened 3'UTR |
2 | 200 | x | x | x | x | x | Extended 5'UTR, shortened 3'UTR | ||
X2 | 330 | x* | x | x | x | x | x | *Exon 1 is of an entirely different sequence | |
X3 | 274 | x | x | x | x | x | x | ||
X4 | 275 | x | x | x | x | x | x | Shortened 3'UTR | |
X5 | 259 | x | x | x | x | x | x | Extended 5'UTR | |
X6 | 274 | x | x | x | x | x | Extended 5'UTR |
The CXorf38 gene codes for a protein with 319 amino acids. [5] The predicted precursor molecular weight is approximately 36.65 kDa. [13] The isoelectric point is predicted to be approximately 6. [13] Compositional Analysis shows that CXorf38 is threonine poor (1.9%) relative to other human proteins. [14]
CXorf38 has one conserved domain: DUF4559 (Arg9 - Asp298), which is part of PFAM 15112. [5] The DUF covers nearly the entire protein.
About two-thirds of the secondary protein structure is predicted to consist of alpha helices. [15] The remaining one-third is predicted to be random coils. [15] Analysis of the secondary structure of CXorf38 isoform 1 orthologs from mammals to invertebrates revealed similar results, suggesting that secondary structure is largely conserved (see Homology and Evolution for ortholog details).
The space-filling model predicted by I-TASSER reveals an overall linear shape. [16] The ribbon structure shows multiple alpha helices, coiled coils, and random coils. There is a known coiled coil region from Pro82 - Gln88, as well as a predicted coiled coil region from approximately Asn240 - Tyr255. Within the coiled coil region, there is a predicted nuclear export signal (NES) from Lys247-Leu256. [17] Folding of the protein is predicted to leave ~30% of amino acids buried, ~60% exposed to the cytosol, and ~10% in an intermediate state. [15] CXorf38 does not have any predicted high scoring hydrophobic segments or transmembrane segments. [14] [18]
CXorf38 is experimentally determined via immunocytochemistry to localize in the cytoplasm, though not specifically to the cytoplasm. [9] PSORTII also predicted a 13% probability of localization to the nucleus and 13% to the mitochondria. [19] [20] Nuclear localization is likely prior to nuclear export, which is supported by the predicted nuclear export signal. [17] Further, immunohistochemical staining of the human colon was positive for moderate expression of CXorf38 in the cytoplasm and nucleus of glandular cells. [9]
CXorf38 has moderate expression across nearly all tissues. [21] The highest expression occurs in the lymph node, thyroid, spleen, thymus, bone marrow, and various female reproductive tissues. [21] All of these tissues with the exception of the thyroid and female reproductive tissues have functions related to the human immune system and/or lymphatic system. Moreover, computational analysis revealed that CXorf38 is overexpressed in B lymphoblasts and CD56+ NK cells, which both have important roles in the vertebrate immune response. [22] CXorf38 has the lowest expression in the fetal brain, testis, and pancreas.
CXorf38 is also expressed at all stages of development. [23] Microarray analysis shows evidence of CXorf38 expression in blood at all life stages, amniotic fluid during the late embryonic stage, oviduct epithelium in 25-44 year old women, and vaginal epithelium in 25-44 year old and 65-79 year old women. [23]
There are three promoter regions predicted by Genomatix. [24] One predicted promoter region (GXP_261939) appears prior to the coding region and the other two appear in the 3'UTR. There are two predicted polyadenylation sites and two predicted microRNA binding sites in the 3'UTR. [25]
A subset of possible transcription factors (TFs) predicted by Genomatix have functions associated with cardiovascular, lymphatic, and reproductive systems, as well as intrauterine development. [24] Transcription factors TFIIB and NRF1 both occur twice within the first 100 base pairs upstream from the transcription start site.
CXorf38 isoform 1 is predicted to have various post-translational modifications such as N-terminal methionine cleavage, phosphorylation, palmitoylation, sumoylation, O-GlcNAcylation, glycation, and acetylation. [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] There is one predicted Yin-Yang site, which represents an amino acid that is O-GlcNAcylated and phosphorylated. [36] There is an experimentally determined omega-N-methylarginine site at Arg75 and phosphothreonine site at Thr314. [5] Post-translational modifications were largely conserved across the ortholog space (see Homology and Evolution for ortholog details).
CXorf38 is experimentally determined to interact with NFYC, a protein involved in binding of CCAAT motifs. CXorf38 is also predicted via two-hybrid array to interact with proteins associated with regulation of intrauterine development, immune system development, and reproductive development (see table below). [37] [38] In particular, PAX5 addresses all of these areas, as it plays a role in regulation of early development, encodes B-cell specific activator proteins expressed in early B-cell differentiation, and has been detected in developing testis. [39] MEOX2 and PAX6 also have functions related to early development, including regulation of limb myogenesis and development of neural tissues, respectively. [40] [41] PAX6, PAX5, and NFYC are predicted to physically interact with CXorf38 in the nucleus, while CDHR3, MEOX2, and DDIT4L are predicted to physically interact with CXorf38 in the cytosol. [37]
Protein | Location of Interaction | Function |
---|---|---|
CDHR3 | Cytosol | Calcium ion binding [42] |
MEOX2 | Cytosol | Limb myogenesis regulation [40] |
DDIT4L | Cytosol | Regulation of cell growth [43] |
NFYC | Nucleus | Binding of CCAAT motifs [44] |
PAX5 | Nucleus | Early development regulation B-cell lineage specific activator protein expressed at early stages of B-cell differentiation Detected in developing testis [39] |
PAX6 | Nucleus | Development of neural tissues, especially the eye [41] |
*All the above interactions have been determined via two-hybrid array, with the exception of NFYC, the interaction of which has been experimentally determined.
The CXorf38 gene has no paralogs. [45] Orthologs of CXorf38 have been found in some invertebrates and nearly all vertebrates. [45] Among invertebrates sequenced to date, CXorf38 has only been found in Cnidaria and Mollusca taxonomic phyla. [45] It has not been found in Porifera, Ctenophora, Echinodermata, Platyhelminthes , Nematoda , Annelida , or Arthropoda. [45] The most distant ortholog of CXorf38 is the invertebrate Stylophora pistillata (Hood Coral), which is predicted to have appeared approximately 824 million years ago. [45] [46] Of note, the majority of invertebrate orthologs have disproportionately longer protein sequences.
Among vertebrates sequenced to date, CXorf38 has been found in all vertebrate taxonomic orders except Pilosa and Peremelemorphia. [45] Notably, CXorf38 is absent in all birds except 2 flightless birds sequenced to date: the emu and kiwi. Further, these bird proteins have much shorter sequences compared to other human CXorf38 orthologs.
The CXorf38 gene is known to escape X-chromosome inactivation (XCI), though at varying rates among different populations. [7] [8] For example, it escapes XCI in 20-40% of Europeans and 40-60% of Yorubans. [7] There is also evidence to suggest that this XCI is at least partially conserved, as CXorf38 is one of eight genes out of the eleven tested found to escape XCI in both mice and humans. [47] However, unlike mice, there is a positive clustering of escape genes in humans, which suggests that human XCI escape could be regulated at the level of chromatin domains rather than individual genes. [47] Regarding the clustering of escape genes, a computational analysis study revealed that CXorf38 is part of an escape gene cluster that includes genes MED14, USP9X, and DDX3X. [48] CXorf38 is also 1 of 5 genes (XIST, KDM6A, DDX3X, KDM5C, CXorf38) that are experimentally determined to both escape XCI and have female-biased expression in the human liver, which suggests that these 5 genes also escape XCI in the human liver. [49]
In an analysis of DNA sequence Copy Number Variation (CNV) associated with premature ovarian failure, CXorf38 was identified as a gene involved with sizeable CNV loss. [50] CXorf38 was also found to be hypomethylated in smokers and hypermethylated in non-smokers, which may have implications regarding early stage lung cancer. [51] In summary, CXorf38 has roles associated with XCI escape, CNV loss, and potential abnormalities if hypomethylated.
RNA-seq data shows increased CXorf38 expression in a variety of cancers with the greatest expression in endometrial cancer, colorectal cancer, and urothelial cancer. [52] There is also experimental evidence to show that CXorf38 is 1 of 163 genes that are upregulated in ovarian cancer cell lines (OVCAR-3 and OV-90) overexpressing CD157, an exoenzyme that regulates leukocyte diapedesis. [53] High CD157 expression strengthens the probability of processes favoring tumor progression such as cell motility, and weakens processes inhibiting tumor progression such as apoptosis. [53]
Chromosome 12 Open Reading Frame 42 (C12orf42) is a protein-encoding gene in Homo sapiens.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.
c7orf26 is a gene in humans that encodes a protein known as c7orf26. Based on properties of c7orf26 and its conservation over a long period of time, its suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
Chromosome 12 open reading frame 71 (c12orf71) is a protein which in humans is encoded by c12orf71 gene. The protein is also known by the alias LOC728858.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.