WDCP | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | WDCP , C2orf44, PP384, WD repeat and coiled coil containing, MMAP | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 616234 MGI: 3040699 HomoloGene: 49822 GeneCards: WDCP | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. [5] WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA. [6]
WDCP is located in chromosome 2, specifically locus 2p23.3 on the minus strand, in humans. The total gene is 20,235 bp long, from 24,029,340 – 24,047,575. WDCP is located in between the MFSD2B and FKBP1B genes. [7] The total gene contains 4 exons, the details of which can be seen in the table below. [8]
Exon Number | Length (bp) | Start and End Positions |
---|---|---|
Exon 1 | 78 | 24047393-24047343 |
Exon 2 | 1836 | 24039512-24037677 |
Exon 3 | 118 | 24032946-24032829 |
Exon 4 | 1816 | 24031162-24029347 |
Table 1. Exons of WDCP and their various lengths.
Common aliases of the gene include chromosome 2, open reading frame 44 (c2orf44), MMAP, and PP384. [9]
The WDCP isoform 1 is encoded by mRNA-WD repeat and coiled-coil containing, transcript variant 1. The total RNA transcript is 18,045 bp long and is transcribed from the WDCP gene from nucleotides 24,029,347 - 24,047,391. [9] The coding DNA sequence is 3848 nucleotides long. The 5’ UTR contains 7,897 nucleotides, and the 3’ UTR contains 1,597 nucleotides.
There are two known transcript variants of WDCP: WDCP transcript variant 2 and WDCP transcript variant X1. Information about the two transcripts can be seen below. [8]
Transcript Variant | Accession Number [9] | Alternative Splicing Pattern | Transcript Length (bp) | 5' UTR Length | 3' UTR Length |
---|---|---|---|---|---|
WDCP Transcript Variant 2 [10] | NP_001142319 | Removal of Exon 3 | 1869 | 7932 | 1776 |
WDCP Transcript Variant X1 [11] | XM_017005029 | Removal of Exon 4 | 2039 | 96 | 89 |
Table 2. Transcript Variants of WDCP with their alternative splicing pattern in comparison to WDCP transcript variant 1.
WDCP protein isoform 1 is 721 amino acids in length. Its molecular weight is 79 kDa and the theoretical isoelectric point is 6.2. [12] The protein sequence for WDCP Protein Isoform 1 is shown below. [13]
1 MELGKGKLLR TGLNALHQAV HPIHGLAWTD GNQVVLTDLR LHSGEVKFGD SKVIGQFECV 61 CGLSWAPPVA DDTPVLLAVQ HEKHVTVWQL CPSPMESSKW LTSQTCEIRG SLPILPQGCV 121 WHPKCAILTV LTAQDVSIFP NVHSDDSQVK ADINTQGRIH CACWTQDGLR LVVAVGSSLH 181 SYIWDSAQKT LHRCSSCLVF DVDSHVCSIT ATVDSQVAIA TELPLDKICG LNASETFNIP 241 PNSKDMTPYA LPVIGEVRSM DKEATDSETN SEVSVSSSYL EPLDLTHIHF NQHKSEGNSL 301 ICLRKKDYLT GTGQDSSHLV LVTFKKAVTM TRKVTIPGIL VPDLIAFNLK AHVVAVASNT 361 CNIILIYSVI PSSVPNIQQI RLENTERPKG ICFLTDQLLL ILVGKQKLTD TTFLPSSKSD 421 QYAISLIVRE IMLEEEPSIT SGESQTTYST FSAPLNKANR KKLIESLSPD FCHQNKGLLL 481 TVNTSSQNGR PGRTLIKEIQ SPLSSICDGS IALDAEPVTQ PASLPRHSST PDHTSTLEPP 541 RLPQRKNLQS EKETYQLSKE VEILSRNLVE MQRCLSELTN RLHNGKKSSS VYPLSQDLPY 601 VHIIYQKPYY LGPVVEKRAV LLCDGKLRLS TVQQTFGLSL IEMLHDSHWI LLSADSEGFI 661 PLTFTATQEI IIRDGSLSRS DVFRDSFSHS PGAVSSLKVF TGLAAPSLDT TGCCNHVDGM 721 A
Figure 1. Protein sequence of WDCP protein isoform 1.
Compositional analysis of WDCP Isoform 1 shows no extremely high or low levels of particular amino acids. The protein contains no positive, negative, or mixed charged clusters. [14]
There are two isoforms of WDCP, as seen in the table below.
Isoform Name | Accession No. | Length (aa) | Molecular Weight [12] | Isoelectric Point [12] |
---|---|---|---|---|
WDCP Protein Isoform 2 [15] | NP_001135791 | 622 | 69 | 6.5 |
PREDICTED: WDCP Protein Isoform X1 [16] | XP_016860518 | 617 | 68 | 6.4 |
Table 3. Table of WDCP protein Isoforms and Protein Information.
The secondary structure of WDCP Protein Isoform 1 consists of 47 random coils (429 residues, 59.5%), 19 alpha-helices (160 residues, 22.19%), and 31 extended strands (132 residues, 18.31%). [17]
There are two predicted disulfide bonds in WDCP, one between cysteine residues 574 and 623, and the other between cysteine residues 713 and 714. [18]
WDCP protein domains include two tryptophan-aspartic acid repeat sites, multiple phosphorylation sites, and a domain that interacts with the hemopoietic cell kinase. [19]
Across various tissue types, WDCP shows increased mRNA expression in white blood cells (3.0 RPKM), thymus (3.6 RPKM), lymph nodes, bone marrow, and testes. [9] WDCP exhibits increased protein expression in endocrine tissues, and well as the kidney and urinary bladder. [20] Across multiple tissue lines in the GTEx database, WDCP expression seemed to be highest in Epstein-Barr Virus transformed lymphocytes and lowest in the pancreas. [21] NCBI GEO Records reveal that overall WDCP expression is in the 65-70th percentile according to the Universal Human Reference RNA. [22]
In fetal tissue, WDCP mRNA expression is highest in the lung at 17 weeks at 3.75 RPKM, the heart at 10 weeks at 3.5 RPKM, and in the intestine at 11 weeks 3.0 RPKM. At 17 weeks, WDCP expression in the intestine drops down from 3.0 RPKM to 0.75 RPKM. The fetal kidney at 20 weeks exhibits the lowest WDCP expression, at 0.5 RPKM. [9]
WDCP does not have any CpG islands associated with its promoter. WDCP has relatively low levels of H3K27ac, but higher levels of H3K4me1 and H3K4me3 across various cell types, including HeLa, HUVEC, and leukemia cell lines. [8]
The GeneHancer promoter for WDCP is listed as GH02J024045. The transcription factor binding sites associated with this promoter and confirmed with a ChIP signal include HNF4A, CEBPB, ERG1, FOS1, ETS1, and E2F6. [8] The binding sites for FOS, EGR1, and ETS1 are located in a DNase hypersensitive site.
There are two transcript variants of WDCP detailed in the table in the mRNA section.
The mRNA secondary structures of the UTR regions exhibited a high number of predicted stem-loop structures in the WDCP transcript. The 5' UTR region closest to the start codon contained about 22 predicted loops. Stem loops in the 5' UTR near the start codon could indicate lower levels of expression. [23] There are 108 predicted loops in the 3' UTR region. [24] There are no known miRNA targets in the 3' UTR.
WDCP Isoform 1 contains the following post-translational modifications:
Glycation is the addition of a sugar molecule to an amino acid and is associated with pathologies including renal failure and diabetes. [25] Glycation is predicted to occur at lysine residues: 5, 7, 83, 189, 244, 262, 294, 325, 389, 405, 407, 461, 552, and 617.
Acetylation is the addition of an acetyl group at the starting methionine residue. This is usually associated with metabolic-relating pathways. WDCP has one confirmed acetylation site at the starting methionine residue. [26]
Phosphorylation is the addition of a phosphate group to amino acids. It is mainly associated with cellular signaling pathways and can instigate tumor development. Serine, Threonine, and Tyrosine phosphorylation sites were identified in 27 residues at a NetPhos threshold of 0.9. [27] Phosphorylation was detected at:
Possible kinases that interact with WDCP include Casein kinase 1, Casein kinase 2, cAMP, cGMP, P38MAPK, DNAPK, Protein kinase A, and Protein kinase C. [27]
SUMOylation is the addition of a small ubiquitin-like modifier to lysine residues in proteins. SUMOYlation sites in WDCP include lysine residues 47, 152, 298, 310, 709, with lysine residues 47 and 152 having the highest probability of SUMOylation. [28] SUMOylation can affect protein-protein interactions and affect protein ubiquitination. [29]
Palmitoylation is the addition of a fatty acid chain to cysteine residues. There is one confirmed site of palmitoylation at cysteine residue 714. [30]
GalNAc O-Glycosylation is the addition of a sugar molecule to a serine or threonine residue, which possibly increases structural stability. [31] Some of these residues overlap with phosphorylation sites, indicating that these residues can switch between a phosphorylation site. [32] [33] These sites were detected at:
N-glycosylation is the addition of a sugar molecule to an asparagine residue. Asparagine residue 483 is the only detected N-glycosylation site in WDCP. [34]
There were no sites of amidation, C-linked mannosylation, GPI modification sites, non-classical protein secretion, transmembrane helices or regions, prediction of R and K cleavage sites, lipoprotein sites, sulfonated tyrosines, or Twin Arginine signal peptides. [35]
WDCP Isoform 1 has no transmembrane domains, actin-binding motifs, ER retention motifs, or Golgi transport signals. The protein is most likely located in the nucleus, with a reliability score of 47.8%, and a 30.4% chance of being located in the cytoplasm. [36] [37] Close orthologs of WDCP Isoform 1 have shown similar results for orthologous proteins, where the protein is most likely located in the nucleus. [37] In addition, there are two predicted nuclear localization sequences in WDCP, starting at residues 401 and 581. [38]
Immunostaining of WDCP has shown localization in the nucleoli of osteosarcoma cells, as well as the cytoplasm of kidney cells. [39]
The function of WDCP is currently not well-understood, but due to increased expression levels in the bone marrow and thymus, the protein could have possible relations to immune function and development. Its location in the nucleus, relation to the MRN complex, an abundance of phosphorylation sites, and associations with various cancers could indicate a role in cell growth regulation or a proto-oncogenic function.
WDCP has known interactions with HCK, where a proline-rich region of WDCP binds to the Src homology 3 domain of HCK. As mentioned before, WDCP was known to exist in a fusion with ALK. This fusion changes the structure of ALK, which results in constitutive signaling. [40]
Studies have confirmed interactions between WDCP and RuvB-like proteins 1 and 2 in human embryonic kidney cells, which belong to a family of AAA proteins associated with ATPase activity, C1q and tumor necrosis factor related protein 2 and DYNLT1. [41] [42] [43]
Based on the transcription factor binding sites listed in the transcriptional regulation section, WDCP could have possible interactions with the following transcription factors:
Studies have linked WDCP to various cancers, including colorectal cancer, leukemia, and osteosarcomas. WDCP levels are higher in colorectal cancer metastases compared to the primary tumor. [51] GEO Records show elevated levels of WDCP in leukemia cell lines, which are regulated with Imatinib, a drug used to treat chronic myelogenous leukemia. [52] This pattern is also seen in HeLa cell lines when treated with Casiopenias, small molecules with an active Cu2+ that allow the molecule to bind to tumors and induce apoptosis. [53] [54]
There are no paralogs of WDCP, but orthologs of this gene were found in primates, rodents, reptiles, birds, fish, amphibians, echinoderms, and possibly fungi. There are no orthologs in prokaryotes or plants. There were no organisms with proteins containing homologous domains. [55]
The graph to the right shows the rate of evolution of WDCP in comparison to the evolution rate of the fibrinogen alpha-chain (NCBI: NP_068657) and cytochrome c (NCBI: NP_061820). As seen in the graph to the right, the evolution rate of WDCP is faster than that of cytochrome c, but slower than the evolution of the fibrinogen alpha-chain.
While there are some sequences in WDCP that are conserved (which can be seen in the conceptual translation), there are very few known conserved domains among the various orthologs. There is one conserved glycation site detected through a multiple sequence alignment, lysine 389. [56] The table below shows a list of orthologs, the evolutionary date of divergence between the organism and humans, and the % identity between WDCP Isoform 1 and the orthologous protein sequence.
Organism | Accession number [55] | Date of divergence (MYA) [57] | % ID (compared to Homo sapiens) |
---|---|---|---|
Chimpanzee | XP_001143574 | 6 | 99 |
Rhesus monkey | NP_001181022.2 | 29 | 95 |
Mouse | NP_001164329.1 | 89 | 65 |
Common wall lizard | XP_028578194 | 318 | 51 |
Crested ibis | XP_009465592 | 324 | 51 |
Central bearded dragon | XP_020643724 | 318 | 49 |
African clawed frog | NP_001090236 | 352 | 51 |
Barn owl | KFV58830 | 318 | 49 |
Whale shark | XP_020374495 | 465 | 48 |
Zebrafish | NP_001013552 | 433 | 46 |
Tropical clawed frog | XP_017949975 | 352 | 40 |
Black-legged tick | XP_022781485.1 | 736 | 29 |
Octopus | XP_029634208 | 736 | 27 |
Table 4. Table of organisms with a WDCP orthologous protein.
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
Interferon-inducible GTPase 5 also known as immunity-related GTPase cinema 1 (IRGC1) is an enzyme that in humans is coded by the IRGC gene. It is predicted to behave like other proteins in the p47-GTPase-like and IRG families. It is most expressed in the testis.
Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.
Coiled coil domain containing protein 120 (CCDC120), also known as JM11 protein, is a protein that, in humans, is encoded by the CCDC120 gene. The function of CCDC120 has not been formally identified but structural components, conservation, and interactions can be identified computationally.
Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.
Hematopoietic SH2 Domain Containing (HSH2D) protein is a protein encoded by the hematopoietic SH2 domain containing (HSH2D) gene.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.