CFAP95 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CFAP95 , chromosome 9 open reading frame 135, C9orf135, cilia and flagella associated protein 95 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1914733 HomoloGene: 49850 GeneCards: CFAP95 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. [5] The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. [6] Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1. [7]
There is some evidence associating the c9orf135 gene with premature ovarian failure. [8] In affected women, an autosomal recessive microduplication occurs which may be linked to premature ovarian failure. A Single Nucleotide Polymorphism (SNP) the c9orf135 gene has been linked to Parkinson's disease; a statistically significant mutation has been seen on a Manhattan plot. [9] Further research is required to establish whether c9orf135 relates to Parkinson's disease. [9]
The mRNA of c9orf135 is 906 nucleotides in length. [10] The 5' and 3' Untranslated regions (UTR) contain hairpin loops. [11] The 3' UTR comprises 123 nucleotides and the 5' UTR comprises 18 nucleotides. The mRNA encodes a protein with a secondary structure composed of both beta-sheets and alpha-helices. [12]
It is likely that c9orf135 is a nuclear protein because it has properties that match attributes of nuclear proteins rather than secretory pathways. [13] [14] Furthermore, there is a nuclear localization signal (PEKVKKL) from amino acid 67 to 73 on c9orf135. [15] C9orf135 is soluble with an average hydrophobicity of -0.772. The negative hydrophobicity value is due to its slightly acidic properties. [16]
Serine phosphorylation sites were seen at amino acid positions 7, 50, 86, 98, and 194. Threonine phosphorylation occurs at 34, 129, 155, and 201. Tyrosine phosphorylation sites occur at 78, 160, 177, and 209. Also, a N-terminal acetylation site is present at amino acid 3. A Signal cleavage site is present between amino acids 11 and 12. [17]
PB2 interacts with c9orf135 which was found from a two-hybrid yeast assay. The information provided about PB2 (Polymerase Basic Protein 2) is that it is a viral protein that is involved with the influenza A virus. It is primarily involved in Cap stealing in which it binds the pre-mRNA cap an ultimately cleaves 10-13 nucleotides off. PB2 is also important for starting the replication of viral genomes. PB2 is also known to inhibit type 1 interferon by inhibiting the mitochondrial antiviral signalling protein MAVS. [18]
Eleven different common DNA genome variants of the human c9orf135 gene have been identified. All of the mutations within those genome variants have been compiled into the following table. [19] Mutations that were present at levels of 0.01 frequency or higher have been incorporated into the table; synonymous mutations were excluded.
Location | Amino Acid Position | Mutation | Frequency |
---|---|---|---|
5' UTR | 57 | N/A | 0.386 |
5' UTR | 93 | N/A | 0.047 |
5' UTR | 138 | N/A | 0.018 |
Exon | 169 | Missense K30T | 0.018 |
Exon | 237 | Missense R53K | 0.047 |
Exon | 456 | Missense E126K | 0.01 |
c9orf135 is expressed in connective tissue and testicular tissue at high levels. [20] It is likely that the expression of c9orf135 is expressed at low levels throughout human cells. It was also found that c9orf135 is found at significantly higher levels in the adult human umbilical cord versus the foetal human umbilical cord. [21] Furthermore, in women with ovarian adenocarcinoma the expression of c9orf135 is much higher in the epithelial cells within the ovaries. [22] Women with polycystic ovarian syndrome have a lower expression of c9orf135 than those people who do not have the condition. [23]
A comparison between the c9orf135 from Mus musculus (House Mouse) and Pteropus alecto (Black Flying Fox) is described here. There were no significant amino acids that differed in c9orf135 from the rest of the mouse body. However, in the Black Flying Fox, it was valine poor and tryptophan rich. As seen from the human results, the Black Flying fox only shared the tryptophan surplus results. The House Mouse and Black Flying Fox were both used because they shared 64% and 79% similarity in the c9orf135 genome respectively. Analysis demonstrates that alanine and tyrosine could predict points of interest because they both contained results differing from the rest of the human gene averages. [16]
Amino Acid of Interest | Compositional Percentage | Compared to normal protein amounts in H. sapiens |
---|---|---|
Alanine | 3.1% | Less than Average |
Tyrosine | 5.2% | More than Average |
Trytophan | 3.1% | More than average |
c9orf135 is conserved through eukaryotes, ranging from mammals, reptiles and Annelida.
The orthologs of c9orf135 were sequenced in BLAST and 20 orthologs were picked. The orthologs were all multicellular organisms and were limited to aquatic animals, reptiles, amphibians, and warm-blooded animals. Also, protists, bacteria, archea, and fungi did not have orthologs. However, no paralogs were found when c9orf135 was sequenced in BLAST. Please refer to the spreadsheet for the complete list of orthologs to c9orf135. Time tree was a program that was used to find the evolutionary branching shown in MYA [24] There were no paralogs found for c9orf135.
Genus/Species | Common name | Divergence from Humans (MYA) | Accession number | Amino Acid Length | Sequence identity | Sequence similarity |
---|---|---|---|---|---|---|
Homo sapiens | Human | -- | Q5VTT2 | 229 | -- | -- |
Pongo abelii | Sumatran Orangutan | 15.8 | XP_002819904 | 206 | 86% | 87% |
Rhinopithecus roxellana | Golden Snub-Nosed Monkey | 29.1 | XP_010361250 | 229 | 93% | 95% |
Mus musculus | House Mouse | 90.9 | EDL41604 | 228 | 64% | 73% |
Pteropus alecto | Black Flying Fox | 97.5 | XP_785964 | 230 | 79% | 86% |
Equus przewalskii | Przewalski's horse | 97.5 | XP_008504806 | 183 | 77% | 86% |
Panthera tigris altaica | Siberian Tiger | 97.5 | XP_007077537 | 187 | 73% | 83% |
Ovis aries | Sheep | 97.5 | XP_014948670 | 207 | 69% | 77% |
Elephantulus edwardii | Cape Elephant Shrew | 105 | XP_006894485 | 254 | 72% | 82% |
Pelodiscus sinensis | Chinese Softshell Turtle | 320.5 | XP_006137902 | 217 | 55% | 68% |
Gekko japonicus | Gekko | 320.5 | XP_015275999 | 221 | 52% | 64% |
Alligator mississippiensis | American Alligator | 320.5 | XP_014464144 | 212 | 51% | 64% |
Ophiophagus hannah | King Cobra | 320.5 | ETE61720 | 215 | 43% | 59% |
Salmo salar | Atlantic Salmon | 429.6 | XP_013998840 | 99 | 34% | 55% |
Esox lucius | Northern Pike | 429.6 | XP_010901691 | 154 | 30% | 47% |
Branchiostoma floridae | Lancelet | 733 | XP_002591786 | 221 | 45% | 59% |
Strongylocentrotus purpuratus | Sea Urchin | 747.8 | XP_785964 | 241 | 47% | 62% |
Saccoglossus kowalevskii | Acorn Worm | 747.8 | XP_002733410 | 153 | 38% | 58% |
Lingula anatina | Ocean Clam | 847 | XP_013398605 | 220 | 43% | 59% |
Crassostrea gigas | Pacific Oyster | 847 | XP_011426944 | 215 | 40% | 57% |
Helobdella robusta | Leech | 847 | XP_009019861 | 256 | 29% | 44% |
A divergence comparison of c9orf135 with fast diverging cytochrome C, and slow diverging fibrinogen is shown in the chart. Overall, c9orf135 has diverged significantly quicker than fibrinogen and slightly slower than cytochrome C.
TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long and its cDNA has 1214 base pairs. It was previously designated C16orf42.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
LOC105377021 is a protein which in humans is encoded by the LOC105377021 gene. LOC105377021 exhibits expressional pathology related to breast cancer, specifically triple negative breast cancer. LOC105377021 contains a serine rich region in addition to predicted alpha helix motifs.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Proline-rich protein 16 (PRR16) is a protein coding gene in Homo sapiens. The protein is known by the alias Largen.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.