C17orf98 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C17orf98 , chromosome 17 open reading frame 98 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1919465 HomoloGene: 19140 GeneCards: C17orf98 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. [5] The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. [6] C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. [7] [8] C17orf98 does not belong to any other families nor does it have any isoforms. [9] The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family. [10]
Like most proteins, C17orf98 is known to be highly expressed in the testes. [11] The protein has also been known to have elevated levels in cancer. [11] The protein has been shown to be expressed in proximity to or within intermediate filaments and the nucleolus. [11] Additionally, c17orf98 has transcription factors which are also active in hematopoietic stem cells, the immune system, and the cardiovascular system, among others. [12] The gene is over-expressed in many cancer types, including kidney renal clear cell carcinoma and lung squamous cell carcinoma. [13] Motif and transcription factor analysis points towards c17orf98 playing a role in proliferation, specially in immune cell proliferation.
The C17orf98 gene consists of 6,303 bases. It has three exons and two large introns. The gene has no alternative splice sites. [14] The 5' UTR sequence of C17orf98 is highly conserved in primates. No non-mammalian 5' UTR matches were able to be determined. [15] [16] C17orf98 has 11 Alu repeats. [17]
GeneCards determined that C17orf98 has five enhancer sequences. The role of the sequences may provide insight into the function of C17orf98. Four of the five enhancers are active in the thymus. All five enhancers are active in the H1 hESC. Additionally, all five enhancers are active in iPS DF 19.11 derived from foreskin fibroblasts. [18]
The C17orf98 promoter has many transcription factors binding sites. [19] C17orf98's transcription factors are commonly found in hematopoietic cells, connective tissue, cardiovascular tissue, and the immune system. The presence of Krueppel Like Transcription Factors suggests a role for c17orf 98 in proliferation or apoptosis. The presence of SMAD indicates an involvement in the TGF-β pathway, while the presence of Myc related transcription factors indicates a potential proliferation function of the protein. Additionally, other C17orf98 transcription factors, like RBPJ-Kappa are involved in proliferation and signalling.
Numerous SNPs were found in the 5' UTR, 3' UTR, and coding region of c17orf98. [20] Few SNPs were found in highly conserved regions. In all, four SNPs were found in the highly conserved amino acids. One SNP was found in the start codon sequence. Of these five, three had a SNP on the third position of the codon. Due to the wobble hypothesis, three of the five SNPs would have no effect on the overall protein structure.
C17orf98 does not have any miRNA binding sites. [21] Its mRNA has low abundance (0.44%). [22] The mRNA sequence has three hexaloops, none of which are significant. [23]
C17orf98 is a 17.6kDa protein. [8] Distant orthologs are 5 to 6 kDa larger, but some of the discrepancies come from an added NLS sequence, which Homo sapiens does not have There are no positive or negative charge clusters. There are no transmembrane components. The isoelectric point is 9.80 / 17564.67 pI/Mw. [24] C17orf98 is hydrophobic and soluble.
Secondary structure of c17orf98 consists of both beta sheets and alpha helices (see diagram on right). Results are confirmed in the tertiary structure, however, alpha helix and beta sheet numbers differ slightly (see diagram on right).
There are no N-terminal signal peptides. Cleavage motifs were not found. There are no ER membrane retention signals, nor peroxisomal targeting signal. SKL2 is not present, thus a secondary peroxisome signal is not present. There are no vacuolar targeting signals. There are no RNA binding motifs or actinin type actin binding motifs. There are no N-myristoylation pattern or prenylation patterns. [25]
Kinase finder at Cuckoo determined kinase binding sites for c17orf98. There are many Serine/Threonine, and Tyrosine kinase phosphorylation sites. [26] Serine and Threonine kinase binding sites are the most prevalent above the statistically significant threshold. There are no SUMOylation sites. [27] C17orf98 gene has six sites on the sequence of possible O-GlcNAc sites. [28] Highly conserved O-GlcNAc amino acid sites are 24, 32, 117, and 142. O-GlcNAc post-translational modification occurs on Ser/Thr residues, specifically on oncogenes, tumor suppressors, and proteins involved in growth factor signaling. [29]
C17orf98 has a Caspase3/7 motif, where either Caspase 3 or 7 would cleave. [30] This supports the idea that C17orf98 is involved in proliferation, as a proapoptotic caspase would want to destroy any protein driving proliferation. The protein also has a motif where peptidyl-prolyl cis-trans isomerase NIMA interacting 1 (Pin1) binds. [30] Pin1 upregulation is involved in cancer and immune disorders. [31] This supports the claim that C17orf98 is involved in cancer, immune cells, and perhaps cancers of the immune system. Additionally, C17orf98 protein has an IBM site, where inhibitors of apoptosis (IAPs) bind. [30] This again supports the idea of C17orf98 being involved in inhibiting apoptosis, and logically, driving cancer. Furthermore, C17orf98 has motifs where GRB2's SH2 domain binds. GRB2 is an adapter protein involved in the RAS signaling pathway, a pathway that when deregulated drives uncontrolled proliferation.
A duplication may have occurred at positions 59–71.
Homo sapiens
MAYLSECRLRLEKGFILDGVAVSTAARAYGRSRPKLWSAIPPYNAQQDYHARSYFQ SHVVPPLLRVVPPLLRKTDQDHGGTGRDGWIVDYIHIFGQGQRYLNRRNWAGTGHS LQQVTGHDHYNADLKPIDGFNGRFGYRRNTPALRQSTSVFGEVTHFPLF
Protein abundance in Homo sapiens whole organism is quite low. No data is available for other species. [36] Allen Brain Atlas yields no brain atlas for c17orf98. [37]
C17orf98 protein has been found to be expressed in the intermediate filaments and the nucleoli. [38] A C17orf98 antibody is available from Sigma-Aldrich. [39] Additionally, C17orf98 localizes in the cytoplasm. Distantly related c17orf98 ortholgs in organisms such as Macrostomum lignano and Amphimedon queenslandica exhibit nuclear expression. [40] Nuclear localization signals are present in distantly related organisms in non-conserved sites. The results of the k-NN prediction is cytoplasmic localization. [41] C17orf98 is not a signal peptide. [42] The protein is a soluble. [43]
Like most proteins, C17orf98 protein is highly expressed in the testes. [44] The protein is expressed on adult tissues as well as fetal tissue. The protein has been found to be mildly expressed in connective tissue. [45] Additionally, expression has been seen in the sperm, breast epithelial cells, and various cells of the immune system. [46]
Protein expression is elevated in many cancer patients. Specifically, protein expression has been shown to be high on colorectal, breast, prostate, and lung. [47] C17orf98 is expressed in papillary thyroid cancer as well. [48] Additionally, mutations were found in c17orf98 in endometrial, stomach, coloratura, and kidney cancer. [49] C17orf98 expression is elevated in cancer patients with BRCA. In kidney renal clear cell carcinoma patients, c17orf98 expression dramatically decreased compared to the non cancerous state. [13] In 80% of chromophobe renal cell carcinoma patients, at least one gene duplication c17orf98 was present. [13]
Protein expression is lower in males with teratozoospermia as compared to those without. [50] Many Geo Profile experiments have been conducted with C17orf98, however, none yield data showing significant change in expression. [51]
C17orf98 is a slow mutating protein. It resembles cytochrome c in its rate of divergence, as determined by the molecular clock equations. [52]
There are no known Homo sapiens paralogs for C17orf98. [53]
C17orf98 protein has additional distantly related orthologs across the metazoan kingdom. Its most distant relative is in the sponge family. There is no known ortholog in ctenophores, nematodes, bacteria, fungus, plants, or zebrafish. [10] There are only two fish with the C17orf 98 gene. Model organisms such as Caenorhabditis elegans , and Drosophila melanogaster , do not have the gene.
C17orf98 Orthologs [10]
Sequence # | Genus and species | Common name | Accession # | Protein length | MYA Div | Seq Id | Confidence |
---|---|---|---|---|---|---|---|
1 | Homo sapiens | Human | NP_001073934 | 154 | 0 | 100% | na |
2 | Camelus ferus | Wild Bactrian camel | XP_006176436 | 154 | 96 | 83% | 2.00E-94 |
3 | Pteropus alecto | Black flying fox | XP_006924784 | 154 | 96 | 81% | 1.00E-92 |
4 | Lipotes vexilifer | Yangtze river dolphin | XP_007465208 | 154 | 96 | 81% | 6.00E-89 |
5 | Condylura cristat | Star-nosed mole | XP_004684322 | 154 | 96 | 75% | 5.00E-78 |
6 | Myotis brandtii | Brandt's bat | EPQ05064 | 171 | 96 | 78% | 6.00E-78 |
7 | Marmata marmata marmata | Alpine marmot | XP_015362150.1 | 154 | 90 | 81% | 3.00E-94 |
8 | Octodon degus | Chilean rodent | XP_004633931 | 153 | 90 | 73% | 1.00E-76 |
9 | Alligator sinensis | Chinese alligator | XP_006022630 | 154 | 312 | 63% | 8.00E-68 |
10 | Anolis carolinensis | Lizard | XP_003222553 | 154 | 312 | 62% | 6.00E-67 |
11 | Xenopus laevis | African clawed frog | XP_018090228 | 244 | 352 | 51% | 4.00E-38 |
12 | Rhincodon typus | Whale shark | XP_020388051.1 | 164 | 476 | 53% | 5.00E-52 |
13 | Acanthaster planci | Starfish | XP_022086463 | 209 | 684 | 48% | 1.00E-37 |
14 | Mizuhopecten yessoensis | Scallop | XP_021340301 | 275 | 797 | 45% | 5.00E-06 |
15 | Lottia gigantea | Sea snail | XP_009063876 | 173 | 797 | 45% | 2.00E-37 |
16 | Lingula anatine | Lamp shell | XP_013388744.1 | 211 | 797 | 43% | 2.00E-35 |
17 | Biomphalaria glabrata | Freshwater snail | XP_013088317 | 198 | 797 | 41% | 6.00E-15 |
18 | Nematostella vectensis | Sea anemone | XP_001629616 | 173 | 824 | 48% | 2.00E-35 |
19 | Stylophora pistillata | Coral | XP_022795125 | 226 | 824 | 46% | 3.00E-38 |
20 | Macrostonum lignano | Flatworm | PAA73615 | 235 | 824 | 36% | 4.00E-25 |
21 | Amphimedon queenslandica | Sponge | XP_003389909 | 275 | 951.8 | 32% | 2.00E-12 |
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
C3orf56 is a protein encoding gene found on chromosome 3. Although, the structure and function of the protein is not well understood, it is known that the C3orf56 protein is exclusively expressed in metaphase II of oocytes and degrades as the oocyte develops towards the blastocyst stage. Degradation of the C3orf56 protein suggests that this gene plays a role in the progression from maternal to embryonic genome and in embryonic genome activation.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
C15orf54 is a protein in humans that is encoded by the C6orf54 gene. This gene is mostly conserved in mammals, primarily primates. While the function of the gene is currently unknown, the gene has shown high expression in the prostate, thymus, appendix, bone marrow, and lungs.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19-85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
C12orf29 is a protein that in humans is encoded by chromosome 12 open reading frame 29. The gene is ubiquitously expressed in various tissues. The protein has 325 amino acids. The biological process of C12orf29 has been annotated as hematopoietic progenitor cell differentiation. The molecular and cellular functions of C12orf29 gene have not yet well understood by the scientific community.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
Proline-rich protein 29, encoded by the PRR29 gene in humans, is a protein which is located in the human genome at 17q23. Its function is not fully understood. Its name is derived from the chain of 5 proline amino acids located toward the end of the protein. The primary domain within the sequence of this protein is known as DUF4587. It is reported to have high levels of expression in tissues pertaining to the circulatory system and the immune system. It is hypothesized that PRR29 is a nuclear protein that facilitates communication between the nucleus and the mitochondria.