YJU2 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | YJU2 , coiled-coil domain containing 94, CCDC94, YJU2 splicing factor homolog | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1920136 HomoloGene: 6350 GeneCards: YJU2 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. [5] The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
CCDC94 is a 21,975 basepair gene orientated on the plus strand (see Sense) of chromosome 19 from 4,247,111-4,269,085. [5] The gene product is a 1,441 base pair mRNA with 8 predicted exons in the human gene. As predicted by Ensemble, there exists one protein-coding alternative splice form. [7] This splice form contains 5 exons, and 4 of them are coding exons. Promoter prediction and analysis was carried out using ElDorado. [8] The predicted promoter region spans 714 basepairs from 4,246,532 to 4,247,245 on the plus strand of chromosome 19.
CCDC94 is located directly adjacent to the EBI3 gene (4,229,540-4,237,525) on the positive DNA strand. The SH2 domain gene (4,278,598-4,290,720) lies upstream from CCDC94 on the positive strand. [9]
CCDC94 is expressed in low to moderate levels throughout most regions of the body. However, slightly elevated levels of CCDC94 are expressed in the thyroid, lung, dendritic cells, and lymphoblasts. Expression data is available at BioGPS. [10] GEO expression data is available from NCBI. [11]
CCDC94 belongs to the CWC16 family [12] and its function is not well understood. The human form as 323 amino acid residues, with an isoelectric point of 5.618 and a molecular mass of 37,086 daltons. There are no predicted transmembrane domains. [13] The one alternative splice form of CCDC94 encodes for a protein with 161 amino acids. [14] A DUF572 and COG5134 domains are located at residues 1-319 and 7–108, respectively. [15] The coiled-coil domain region is located at residues 105–206. [16] The intracellular localization of CCDC94 has not yet been experimentally determined, but bioinformatic analysis using PSORT highly suggests CCDC94 resides in the nucleus due to the presence of nuclear localization signals. [17]
Protein interaction analysis for CCDC94 has been carried out using computational tools. No interactions were identified through the MINT database. [18] CCDC94 is shown to interact with CDC5L, PLRG1, and PRPF19 with the highest score based on an anti tag coimmunoprecipitation assay. [19] 6 additional interacting proteins were found. Closer analysis shows very little potential for these interactions to be real, thus none should be considered actual protein-protein interactions. The protein interaction from the STRING analysis is shown.
CCDC94 has a promoter region that contains sites for transcription factor binding. Notable transcription factors, as generated by the ElDorado program on Genomatix: [20]
Bioinformatic analysis of CCDC94 using NetPhos [21] predicted 7 phosphorylation sites at serine residues, 3 at threonine residues, and 3 at tyrosine residues. Two of the threonine and all of the tyrosine phosphorylated residues are highly conserved as supported by their occurrence at the same location in several analyzed orthologs. Predicted phosphorylated tyrosines with high scores occurred on the N-terminus half of CCDC94 while serine residues are phosphorylated on the C-terminus half. Sulfinator predicted only one tyrosine sulfation site at amino acid 98. [22] Highly probably sumoylation sites at residues 90, 24, and 270 were predicted by SUMOplot. [23]
The tertiary structure of CCDC94 was shown to have several beta sheet regions and only one highly predicted alpha helix region. The PHYRE2 analysis of 65 residues of CCDC94, 20% of the entire amino acid sequence, was modeled with 87.9% confidence. [24]
CCDC94 is very well conserved in many species, and the entire protein is conserved throughout all of its orthologs. [25] However, conservation does not extend as far back as bacteria. A phylogenetic tree, generated from Biology WorkBench [26] shows the evolutionary relationships between Homo sapiens CCDC94 and its orthologs. The table below show CCDC94 conservation among orthologs:
Genus Species | Organism Common Name | Divergence from Humans (MYA) [27] | NCBI Protein Accession | Sequence Similarity [25] | Protein Length |
---|---|---|---|---|---|
Pan panicous | Bonobo | 6.3 | XP_003819321.1 | 99% | 323 |
Gorilla gorilla gorilla | Gorilla | 8.8 | XP_004059817.1 | 98% | 286 |
Callithrix jacchus | Common marmoset | 42.6 | XP_002761642.1 | 83% | 278 |
Mus musculus | Mouse | 92.3 | NP_082657.1 | 87% | 314 |
Rattus norvegicus | Rat | 92.4 | NP_001103143.1 | 87% | 313 |
Cricetulus griseus | Chinese hamster | 92.4 | XP_003501789.1 | 85% | 321 |
Bos taurus | Cow | 94.4 | NP_001069159.1 | 89% | 320 |
Felis catus | Cat | 94.4 | XP_003981794.1 | 73% | 363 |
Sarcophilus harrisii | Tasmanian Devil | 163.9 | XP_003760628.1 | 78% | 326 |
Monodelphis domestica | Opossum | 163.9 | XP_001374444.1 | 86% | 326 |
Gallus gallus | Red junglefoul | 296.4 | XP_423475.3 | 84% | 291 |
Anolis carolinensis | Lizard | 324.5 | XP_003230268.1 | 72% | 311 |
Xenopou tropicalis | Western clawed frog | 342.7 | NP_001017176.1 | 73% | 345 |
Xenopus laevis | African clawed frog | 371.2 | NP_001087648.1 | 83% | 280 |
Takifugu rubripes | Puffer fish | 454.6 | XP_003962830.1 | 64% | 348 |
Acyrthosiphon pisum | Pea aphid (insect) | 910 | NP_001155925.1 | 49% | 278 |
Harpegnathos saltor | Ant | 910 | EFN80619.1 | 47% | 351 |
CCDC94 has only one paralog, CCDC130 or MGC10471. [28] CCDC130 is very similar to CCDC94, as it contains both the DUF572 and COG5134 domain. [29]
Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.
QRICH1, also known as Glutamine-rich protein 1, is a protein that in humans is encoded by the QRICH1 gene. One notable feature of this protein is that it contains a Caspase Activation Recruitment Domain, also known as a CARD domain. As a result of having this domain, QRICH1 is believed to be involved in apoptotic, inflammatory, and host-immune response pathways.
Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.
Cilia And Flagella Associated Protein 206 (CFAP206) is a gene that in humans encodes a protein “DUF3508”. This protein has a function that is not currently very well understood. Other known aliases are “dJ382I10.1, UPF0704 Protein C6orf165.” In humans, the gene coding sequence is 56,501 base pairs long, with an mRNA of 2,215 base pairs, and a protein sequence of 622 amino acids. The C6orf165 gene is conserved in chimpanzee, rhesus monkey, dog, cow, mouse, rat, chicken, zebrafish, mosquito, frog, and more C6orf165 is rarely expressed in humans, with relatively high expression in brain, lungs (trachea) and testis. The molecular weight of UPF0704 is 71,193 Da and the PI is 6.38
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
Intermediate filament family orphan 1 is a protein that in humans is encoded by the IFFO1 gene. IFFO1 has uncharacterized function and a weight of 61.98 kDa. IFFO1 proteins play an important role in the cytoskeleton and the nuclear envelope of most eukaryotic cell types.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.
Chromosome 3 Open Reading Frame 62 (C3orf62), is a protein that in humans is encoded by the C3orf62 gene. C3orf62 is a glycine depleted protein relative to the amount of glycine in proteins in the rest of the genome. C3orf62 has a KKXX-like motif and is predicted to be localized in the nucleus. Expression of C3orf62 remains highest in whole blood.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.