PBDC1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | PBDC1 , CXorf26, Cxorf26, polysaccharide biosynthesis domain containing 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1914933 HomoloGene: 9542 GeneCards: PBDC1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
CXorf26 (Chromosome X Open Reading Frame 26), also known as MGC874, is a well conserved human gene found on the plus strand of the short arm of the X chromosome. The exact function of the gene is poorly understood, but the polysaccharide biosynthesis domain that spans a major portion of the protein product (known as UPF0368), as well as the yeast homolog, YPL225, offer insights into its possible function.
Given the mass of data available on CXorf26, potential function is likely related to the workings of RNA polymerase II, ubiquitination, and ribosomes in the cytoplasm. The basis of these arguments is on the interaction data of human CXorf26 as well as its yeast homolog, YPL225W. Both homologs show interaction with multiple ubiquinated proteins as well as the transcriptional enzyme RNA polymerase II. For example, ubiquitiation and subsequent degradation of the 26S proteasome serves an important function in regulating transcription in eukaryotes. [5] The yeast protein RPN11, which interacts with YPL225W, has a homolog in humans that is a metalloprotease component of 26S proteasome that also degrades proteins targeted for destruction by the ubiquitin pathway. [6] These functions do not seem to relate to a polysaccharide biosynthesis function as would be assumed due to its conserved domain, but it may still play a role in secondary structure or sites of phosphorylation.
Further experimentation into the potential role of CXorf26 can give further insight into its exact function in these key cellular processes. Experiments such as a RNA polymerase II inhibitor and subsequent gene expression of CXorf26 could enlighten potential function as well as a complete knockout of YPL225W in yeast using methods such as RNAi.
CXorf26 is found on the plus strand of the short arm of the X chromosome, specifically on the gene locus Xq13.3 spanning the genomic chromosome region from bases 75,393,420-75,397,740. [7] The primary mRNA transcript sequence has 1214 base pairs and its protein product, UPF0368, is composed of 233 amino acids and has a predicted mass of 26,057 Da. [7] The locus where CXorf26 is located, Xq13.3, has known associations to X-linked mental retardation. [8] The third gene located upstream of CXorf26 is ATRX, which encodes for an ATPase/helicase domain, and when mutated causes an X-linked mental retardation syndrome along with alpha thalassemia syndrome; both are known to cause changes in the DNA methylation patterns. [9] Furthermore, the third gene downstream of CXorf26, ZDHHC15, which when mutated, causes mental retardation X-linked type 91. [10] One noteworthy gene located nearby is Xist, which plays a role in the inactivation process of the X chromosome. X inactivation relates to CXorf26, and is discussed below in the relevant research section.
Expression data for CXorf26 shows it is highly ubiquitously expressed throughout human tissues and ESTs in nearly all situations. The GEO profile to the right shows the expression levels for CXorf26 in common human tissues to consistently be around the 75th percentile range, suggesting it may possess a housekeeping function due its seemingly ubiquitous expression. If the conserved domain does indeed play a role in polysaccharide biosynthesis of some sort, this high gene expression is sensible to that function.
Gene expression profiles in the Gene Expression Omnibus (GEO) repository located within the NCBI website demonstrated that there were not many treatments that resulted in a changing of expression of CXorf26 in examined tissues. However, one experiment compared CXorf26 expression in lung adenocarcinoma CL1-5 cells either overexpressing or underexpressing Claudin-1. Results indicated that CXorf26 expression greatly drops when CLDN1 is overexpressed. [12] CLDN1 is a major component in forming tight junction complexes between cells, which foster cell-cell adhesion of cell membranes. [13] More tight junctions formed by CLDN1 would likely result in decreased expression of CXorf26 since the cell membrane would be used for tight junctions instead of its normal function related to heparan sulfate.
There is only one alternative splice form for CXorf26. This splice form has significantly fewer mRNA base pairs at 977, but still has a protein product of 232 amino acids. [14] This alternative splice form appears to be missing exon 5 of the transcript, but it may be added onto exon 6, creating a larger exon compared to the consensus transcript.
There were no other predicted exons within the genomic CXorf26 sequence when 3000 base pairs were added on either side in the search. [15]
The promoter for CXorf26 is predicted to be located from bases 75392235 to 75393075 on the X chromosome positive strand. [16] The promoter region has extensive conservation with all primates and most mammal homologs, but conservation is lessened in more distantly related species. Given the primary transcript begins at base 7539277, the promoter overlaps with it by 304 bases. 20 predicted transcription factor binding sites with their transcription factor family was collected as well. A high amount of the transcriptional factors relate to zinc finger factors, which have the function of stabilizing protein folds, while none of the factors seem to relate to a potential polysaccharide biosynthesis function. One transcription factor family predicted to bind to the promoter region was V$CHRF, and is involved in regulation of the cell cycle. The regulation could be related to ubiquitin function; proteins with ubiquitination type function were found to interact with CXorf26.
The CXorf26 protein is 56.5% likely to be localized within the cytoplasm [17] while 17.4% likely to localized to the mitochondria. CXorf26's yeast homolog, YPL225W, was GFP tagged and its location was determined to be in the cytoplasm. [18] Cytoplasmic location instead of transmembrane was supported since no hydrophobic signal peptide sequence and TMAP [19] [ non-primary source needed ] predicted no potential transmembrane segments in CXorf26 or any of its homologs in other species.
CXorf26 was found to have conserved domain known as DUF757 within its sequence. [20] The conserved domain spans a majority of the protein sequence, from amino acids 39-159. Conservation of the domain is strong throughout all homologs compared, including mammals, invertebrates such as insects, and even sponges. The yeast homolog, YPL225W, shows 42.4% identity and 62% similarity in this domain. Conservation of the domain is especially high in areas which include one of the multiple alpha helices or beta sheets. There are also multiple conserved phosphorylation sites located in the amino acid sequence at tyrosine 72 and serine 126.
According to NCBI, [21] this domain is in the Pfam PF04669 family of proteins expected play a role in xylan biosynthesis in plant cell walls, but its exact role in the synthesis pathway is unknown. As animal cells do not contain cell walls, its exact function in other organisms such as humans is unknown.
Xylan is made from units of the pentose sugar xylose, which is known for being the first saccharide in multiple biosynthetic pathways of anionic polysaccharides such as heparan sulfate and chondroitin sulfate. Like Xylan, heparan sulfate it is found on the cell surface; [22] since it is needed for both the cell surface and extracellular matrix, it may explain CXorf26's high expression in nearly all human tissues. Heparan biosynthesis occurs in the lumen of the endoplasmic reticulum [23] and is initiated by the transfer of a xylose from UDP-xylose by xylosyltransferase to specific serine residues within the protein core. PSORTII predicts the presence of a KKXX-like motif, GEKA, near the C-terminus of CXorf26. KKXX-like motifs are predicted endoplasmic reticulum membrane retention signals. This motif is only conserved in primates. However, another KKXX-like motif, QDKE, is found to exist at the end of the domain. The K in this motif is highly conserved back to most invertebrates. However, contradicting results from NetNGlyc predicted no N-glycosylation sites, suggesting CXorf26 does not undergo special folding in the endoplasmic reticulum lumen. [24] [ non-primary source needed ] Given that the conserved domain cannot function to create xylan since there are no cell walls in animal cells, the function may be related to this pathway.
Predictions across multiple programs suggest the presence of 7 alpha helices and 2 beta sheets for CXorf26; the majority of the secondary structures are in the conserved domain. Experimental evidence in the yeast homolog shows 4 alpha helices and 2 beta sheets all in the polysaccharide domain, [25] just as the predicted SWISS model above shows for humans. The location of the secondary structures are also conserved.
Pepsin (pH 1.3), Asp-N endopeptidase, N-terminal Glutamate and Proteinase K all had 50 or more cleavage sites within the protein, but none of the 10 caspases had any cleavage sites. [26] [ non-primary source needed ] This suggests CXorf26 is not likely to be cleaved or degraded during apoptosis. This follows with the observation that CXorf26 is expressed highly in nearly all tissues and experimental conditions.
Lysine 63 and 66 are potential sites of glycation of epsilon amino groups of lysines. [27] [ non-primary source needed ] Lysine 63 was conserved in both Macaca mulatta and Bombus impatiens. There are 10 serine, 3 threonine, and 6 tyrosine phosphorylation sites predicted within the CXorf26 protein. When comparing the predicted phosphorylation sites, those shown in the table below were those conserved in Macaca mulatta as well as Bombus impatiens . S127 was left in the table even though Homo sapiens and Macaca mulatta did not have significant scores above threshold for that position. Through evolutionary change, the serine in Bombus was changed to a tyrosine in Homo sapiens and Macaca mulatta, which is still capable of phosphorylation, suggesting although there was a mutation, it would likely not result in a large change for the protein and its function.
Bombus impatiens | Homo sapiens & Macaca mulatta |
---|---|
Serine 20 | Serine 23 |
Serine 91 | Serine 94 |
Tyrosine 69 | Tyrosine 72 |
Tyrosine126 | Tyrosine 129 |
Serine 127* | Tyrosine 130* |
CXorf26 is strongly evolutionary conserved, [28] [ non-primary source needed ] with conservation found in Batrachochytrium dendrobatidis. A multiple sequence alignment of 20 orthologous protein sequences reveals very strong conservation of the polysaccharide biosynthesis domain, but conservation after it was essentially non-existent in invertebrates. [29] [ non-primary source needed ] For those vertebrates that contained a sequence after the conserved domain, it was found to be of low complexity and filled with repetitive sequence of the amino acid motif 'GEK', corresponding to amino acids glycine, glutamic acid, and lysine. Glutamic acid and lysine both are charged, which contributes to the overall hydrophilicity of the section after the conserved domain.
Species | Common name | Accession number | Length | Protein identity | Protein similarity |
---|---|---|---|---|---|
Homo sapiens | Human | NP_057584.2 | 233aa | 100% | 100% |
Nomascus leucogenys | Gibbon | XP_003269034.1 | 233aa | 99% | 99% |
Macaca mulatta | Rhesus monkey | NP_001181035.1 | 233aa | 98% | 98% |
Callithrix jacchus | Marmoset | XP_002763066.1 | 232aa | 95% | 97% |
Mus musculus | Mouse | NP_080588.1 | 198aa | 80% | 85% |
Loxodonta africana | African elephant | XP_003412818.1 | 202aa | 80% | 88% |
Ailuropoda melanoleuca | Giant panda | XP_002930750.1 | 219aa | 80% | 84% |
Bos taurus | Cattle | XP_002700032.1 | 219aa | 78% | 86% |
Monodelphis domestica | Opossum | XP_001381973.1 | 226aa | 59% | 89% |
Oreochromis niloticus | Nile tilapia | XP_003453679.1 | 169aa | 46% | 83% |
Bombus impatiens | Bumblebee | XP_003487356.1 | 168aa | 38% | 74% |
Acromyrmex echinatior | Ant | EGI60293.1 | 197aa | 32% | 74% |
Amphimedon queenslandica | Sponge | XP_003383281.1 | 159aa | 31% | 74% |
Saccharomyces cerevisiae | Yeast | NP_015099.1 | 146aa | 27% | 62% |
Batrachochytrium dendrobatidis | Fungus | EGF83065.1 | 74aa | 16% | 65% |
The CXorf26 homolog in yeast, YPL225W, has an overall identity match of 27% but a 42.4% identity and 62% similarity with the polysaccharide biosynthesis domain. Like the predicted human secondary structure, YPL225W is experimentally verified to also contain four alpha helices and two beta sheets within the biosynthesis domain. [30] Like CXorf26, YPL225W function in yeast is unknown, but based on co-purification experiments it may interact with ribosomes since many of its 18 interacting proteins were related to RNA and ribosomes. There were also multiple proteins involved with RNA polymerase, which is involved in the cellular process of transcription. Furthermore, multiple proteins were involved in ubiquitination. Some of the interacting yeast proteins with the higher interaction scores were UBI4, RPB8, SRO9, and NAB2.
Potential interacting proteins were identified using the tools provided at the I2D Interlogous Interaction Database [31] and the STRING 9.0 program. [32] Although more proteins were predicted, those shown below had the highest scores and showed the greatest possibility of relating to potential CXorf26 function.
SMAD2, PHB, and CTNNB1 were found in an experiment investigating transcriptional factor networks. [33] The BABAM1 interaction was found in both databases using an anti-tag coimmunoprecipitation assay [34] while POLR2H was based on a tandem affinity purification assay using the yeast homolog, YPL225W. [35]
Interacting Protein | Accession Number | Protein Function |
---|---|---|
SMAD2 | AAC39657.1 | Part of family acting as signal transducer and transcriptional modulator |
PHB | CAG46507.1 | Evolutionary conserved, ubiquitously expressed, negative regulator of cell proliferation |
CTNNB1 | NP_001091679.1 | Catenin associated, part of protein complex that constructs adherens junctions |
BABAM1 | NP_001028721.1 | Part of complex that recognizes Lys-63 ubiquinated histones |
BRIX1 | NP_060791.3 | Required for biogenesis of 60s large eukaryotic ribosomal subunit |
POLR2H | NP_006223.2 | Encodes essentential subunit of RNA Polymerase II |
MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.
DEPDC5 is a human protein of poorly understood function but has been associated with cancer in several studies. It is encoded by a gene of the same name, located on chromosome 22.
Transmembrane protein 131 (TMEM131) is a protein that is encoded by the TMEM131 gene in humans. The TMEM131 protein contains three domains of unknown function 3651 (DUF3651) and two transmembrane domains. This protein has been implicated as having a role in T cell function and development. TMEM131 also resides in a locus (2q11.1) that is associated with Nievergelt's Syndrome when deleted.
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.
Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.
Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.
CXorf66 also known as Chromosome X Open Reading Frame 66, is a 361aa protein in humans that is encoded by the CXorf66 gene. The protein encoded is predicted to be a type 1 transmembrane protein; however, its exact function is currently unknown. CXorf66 has one alias: RP11-35F15.2.
EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.
CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Forkhead-associated domain containing protein 1 (FHAD1) is a protein encoded by the FHAD1 gene.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome X Open Reading Frame 38 (CXorf38) is a protein which, in humans, is encoded by the CXorf38 gene. CXorf38 appears in multiple studies regarding the escape of X chromosome inactivation.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.