C1orf94 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||
Aliases | C1orf94 , chromosome 1 open reading frame 94 | ||||||||||||||||||||||||
External IDs | MGI: 3616080 HomoloGene: 57187 GeneCards: C1orf94 | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Orthologs | |||||||||||||||||||||||||
Species | Human | Mouse | |||||||||||||||||||||||
Entrez | |||||||||||||||||||||||||
Ensembl | |||||||||||||||||||||||||
UniProt |
| ||||||||||||||||||||||||
RefSeq (mRNA) | |||||||||||||||||||||||||
RefSeq (protein) |
| ||||||||||||||||||||||||
Location (UCSC) | Chr 1: 34.17 – 34.22 Mb | Chr 4: 127.93 – 127.97 Mb | |||||||||||||||||||||||
PubMed search | [3] | [4] | |||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||
|
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. [5] The function of this protein is still poorly understood.
C1orf94 gene is also known as Q6P1W5; B3KVT1; D3DPR3; E9PJ76 and Q96IC8is; MGC15882.
C1orf94 has the FLJ20508 gene as an alias. [5]
C1orf94 is located on the short arm of chromosome 1 specifically at 1p34.3 chr1:34,166,883-34,219,131 and is situated near HSPD1P14 gene. It is encoded on the sense strand. [6]
This gene has 7 exons (only 6 of them are coding) [7]
Exon | Start | End | Size |
---|---|---|---|
ENSE00001207243 (non transcribed) | 34,166,883 | 34,167,171 | 289 |
ENSE00003530680 | 34,197,225 | 34,197,913 | 689 |
ENSE00002095077 | 34,200,772 | 34,201,032 | 261 |
ENSE00002136629 | 34,202,084 | 34,202,259 | 176 |
ENSE00002136447 | 34,208,157 | 34,208,234 | 78 |
ENSE00002125161 | 34,212,210 | 34,212,406 | 197 |
ENSE00001460399 | 34,218,686 | 34,219,131 | 446 |
This protein has two isoforms a and b; a being the longest (598 aa). [8]
Name | Transcript ID | Base pairs | Protein type | Protein length |
---|---|---|---|---|
C1orf94-202 | ENST00000488417.2 | 3050 | Protein Coding | 598 aa |
C1orf94-201 | ENST00000373374.7 | 2136 | Protein Coding | 408 aa |
There are two promoters predicted for C1orf94. Only one of them is predicted for the transcript used for the analysis. This is the list of transcription factor binding sites that bind transcription factors: [9]
ZF02 (C2H2 zinc finger transcription factors 2)
Cart1 Sequence-specific DNA-binding transcription factor
HTLV-I U5 repressive element-binding protein 1
NKX homeodomain factors
AARE binding factors PREB core-binding element
DUF4688 is a large region found within C1orf94 protein sequence and in both isoforms a and b. [10] This sequence is conserved in eukaryotes. [11]
C1orf94 is a Protein tissue co-expression partner for RBBP8NL. [12] the isoelectric point is 8.56 and the molecular weight is around 65353 KDa. Proline is the most abundant amino acid in the protein sequence (11.7%) then followed closely by Leucine (10.4%). [13]
Seven PEST motifs were identified in from positions 1 to 598 : PEST domain signatures, rich in proline (P), glutamic acid (E), serine (S), and threonine (T).
Prediction of only one potential PEST motif with 21 amino acids between positions 133 and 155. This sequence is associated with proteins that have a short intracellular half-life. [14]
C1orf94 goes through Palmitoylation, [16] phosphorylation [17] and glycation [18] mainly on the N-terminus of C1orf94. Also, Mitochondrial processing peptidase cleavage site is predicted on the first Methionine.
According to CFSSP, [19] the secondary structure of C1orf94 shows alpha Helix, extended strands, beta turns, and Random coils.
Both Tertiary structures predicted by Phyre2 [20] and the SWISS model [15] show that C1orf94 is a monomer.
According to I-TASSER [21] the closest protein structures and Identified structural analogs to C1orf94 are 3IXZ (Pig gastric H+/K+-ATPase complexed with aluminum fluoride) and 3B8E (Crystal structure of the sodium-potassium pump).
Mentha [22] proposed a strong physical interaction with ATXN1 which is a chromatin-binding factor that represses Notch signaling in the absence of the Notch intracellular domain.
According to PSICQUIC, [23] C1orf94 and MMADHC have physical interactions that were demonstrated through affinity chromatography technology. MMADHC is a gene that encodes a mitochondrial protein that is involved in early steps of vitamin B12 metabolism. [24]
RFX2 is possibly a functional partner according to STRING [25] and it is a query protein and involved in first shell of interactors.RFX2 is a Transcription factor that acts as a key regulator of spermatogenesis.
According to AceView, this gene is well expressed, 0.5 times the average gene in this release. [26]
According to PSORT II [27] C1orf94 is 69.6 % nuclear.
Data from NCBI shows that C1orf94 is primarily expressed in the testis tissues. [28]
According to the human protein Atlas, [29] C1orf94 is slightly expressed in the brain tissue.
According to GEO profiles, [30] the C1orf94 increase of expression is highly correlated with Morbid obesity. Also, C1orf94 increased after related coactivator depletion.
The function of C1orf94 is not yet fully understood and there are no experiments yet that proved otherwise. However, C1orf94 shows higher rates of expression in HPA RNA sequences in normal tissues compared to tissues during fetal development. [28]
According to GWAS, [31] C1orf94 was identified as an OncoORF (Oncogenic Open Reading frame). According to Colorectal cancer Atlas, [32] C1orf94 is involved in protein-protein interactions with 50 nodes causing colorectal cancer like interactions with AKAP9 kinase anchor protein, which is the most dangerous one as it promotes colorectal cancer development by regulating Cdc42 interacting protein. [33]
C1orf94 evolved faster than both Cytochrome C and less than fibrinopeptides.
C1orf94 has no paralogs. Orthologs were identified using NCBI BLASTp. [34] Mammalians showed the most conservation and the most distant orthologs were found in fish.
After running SAPS on a group of orthologs (Gorilla, Rat, Dog, and Bat), the protein’s composition only shows minor variations compared to the human sequence: Proline is still the most abundant amino acid followed by leucine and tryptophan remains the least abundant. [13]
Protein YIF1A is a protein that in humans is encoded by the YIF1A gene.
C20orf96 is a protein-coding gene in humans. It codes for an unknown protein known as uncharacterized protein C20orf96, predicted to be a nuclear protein. The function and biological processes of the gene is not well understood by the scientific community yet.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Chromosome 18 open reading frame 63 is a protein which in humans is encoded by the C18orf63 gene. This protein is not yet well understood by the scientific community. Research has been conducted suggesting that C18orf63 could be a potential biomarker for early stage pancreatic cancer and breast cancer.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.
C5orf46 is a protein coding gene located on chromosome 5 in humans. It is also known as sssp1, or skin and saliva secreted protein 1. There are two known isoforms known in humans, with isoform 2 being the longer of the two. The protein encoded is predicted to have one transmembrane domain, and has a predicted molecular weight of 9,692 Da, and a basal isoelectric point of 4.67.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens.. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.