C9orf85

Last updated
C9orf85
Identifiers
Aliases C9orf85 , chromosome 9 open reading frame 85
External IDs MGI: 1913456 HomoloGene: 11933 GeneCards: C9orf85
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_182505
NM_198394

NM_025423

RefSeq (protein)

NP_872311
NP_001351982
NP_001351984
NP_001351986

NP_079699

Location (UCSC) Chr 9: 71.91 – 71.99 Mb Chr 19: 21.56 – 21.63 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. [5] When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. [6] Isoelectric point was found to be 9.54. [6] The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation. [7]

Contents

Background

Protein Sequence

The sequence for C9orf85 isoform 1 in Homo sapiens, derived from NCBI: [5]

MSSQKGNVARSRPQKHQNTFSFKNDKFDKSVQTKKINAKLHDGVCQRCKEVLEWRVKYSKYKPLSKPKKCVKCLQKTVKDSYHIMCRPCACELEVCAKCGKKEDIVIPWSLPLLPRLECSGRILAHHNLRLPCSSDSPAS ASRVAGTTGAHHHAQLIFVFLVEMGFHYVGQAGLELLTS

Aliases

Isoforms

Table Showcasing the Lengths of all C9orf85 Isoforms
Isoform #mRNA Length (bp)Amino Acid Length (aa)
1 [11] 3821179
2 [12] 1185157
3 [13] 1316138
4 [14] 370769

Isoform 1 is the major form of the gene used. This isoform contains 4 exons. It's accession number is NM_001365053.2. [5]

Homology

Orthologs

The C9orf85 gene was found in all species type including vertebrate to bacteria. However no type of protist was found as an ortholog for the human gene except for plasmodium.

A List of 20 Orthologs for the gene C9orf85 in Homo sapiens [15]
Genus speciesCommon NameTaxonomic GroupDate of Divergence (MYA)Accession NumberLength (aa)IdentitySimilarity
Homo sapiensHumanChordata0NP_001351982179100%100%
Meriones unguiculatusMongolian gerbilRodentia90XP_02151463815474%84%
Gallus gallusChickenChordata312XP_00123382116678.70%60%
Terrapene carolina triunguisThree-toed box turtleChordata312XP_02406679217185.45%61%
Chelonia mydasGreen sea turtleChordata312XP_00706567617884.31%61%
Calidris pugnaxRuff birdChordata312XP_01481398516677.78%60%
Microcaecilia unicolorTiny cayenne caecilianChordata351.8XP_03004972317876.15%60%
Xenopus tropicalisWestern clawed frogChordata351.8KAE863308513378.18%61%
Electrophorus electricusElectric eelChordata435XP_02688615815655.13%87%
Oncorhynchus mykissRainbow troutChordata435XP_02146115617760.77%72%
Acanthaster planciCrown-of-thorns starfishEchinodermata684XP_02209625419756.76%62%
Photinus pyralisBig dipper fireflyArthropoda797XP_03134672618358.04%62%
Pomacea canaliculataGolden apple snailMollusca797XP_02507710120847.33%73%
Drosophila melanogasterFruit FlyArthropoda797NP_57320923449.58%65%
Acropora milleporaCoralCnidaria824XP_02918751719057.14%62%
Salpingoeca rosettaChoanoflagellateChoanoflagellate1023XP_00499570028641.67%60%
Apophysomyces ossiformisFungiMucoromycota1105KAF772513918140%72%
Ricinus communisCaster oil plantSpermatophyta1496XP_00253099722734.21%78%
Plasmodium ovale wallikeriMalarian protistApicomplexa1768SBT5695468068.75%17%
Bacillus cereusBacteriaFirmicutes4290KXI725398373.61%39%

Paralogs

5 Possible Paralogs for the gene C9orf85 in Homo sapiens [15]
ParalogAccession NumberLength (aa)IdentitySimilarityLocation
CCDC198XP_00526786329044.38%89%Chromosome 14
RetbidinEAW8431622460%51%Chromosome 19
hCG2038446EAX1146013568.54%49%Chromosome 2
hCG1820974EAW9421514372.58%41%Chromosome 17
O-phosphoseryl-tRNA(Sec) selenium transferase isoform X1XP_01686376658670.67%41%Chromosome 4

Rate of Molecular Evolution

A graph depicting the rate of divergence for the human gene C9orf85 in comparison to Homo sapiens Cytochrome C and Fibrinogen Alpha Chain. C9orf85 Evolutionary Divergence.png
A graph depicting the rate of divergence for the human gene C9orf85 in comparison to Homo sapiens Cytochrome C and Fibrinogen Alpha Chain.

A rate of divergence can be calculated using the molecular clock hypothesis. As observed by the graph, C9orf85 lies between Cytochrome C and Fibrinogen Alpha with a slope leaning more towards Cytochrome C. Therefore, C9orf85 is possibly evolving at a slower rate than most proteins.

Conservation

Multiple Sequence Alignment

A multiple sequence alignment (MSA) [16] was done between the top 15 closely related orthologs to the Homo sapiens C9orf85. 20 amino acids were discovered to be conserved among all 15 sequences at the beginning of the protein sequence; within the first three exons.

In a MSA between distantly related homologs, 5 amino acids were observed to be conserved between exons two and three.

Yet, when running a multiple sequence alignment between Homo sapiens and the extremely distant Bacillus cereus , 53 amino acids are observed to be conserved primarily in the fourth exon.

Cysteine

Multiple sequence alignment of C9orf85 showcasing the most significant & conserved cysteines. Cysteine Conservation.png
Multiple sequence alignment of C9orf85 showcasing the most significant & conserved cysteines.

The amino acid cysteine appears the most throughout the protein sequence as a conserved amino acids; 8 out of 20 instances. Cysteine 48, 70, 89, 96, and Tryptophan 54 are amino acids conserved in all species type – including vertebrate, invertebrate, fungi, plants, and protists – besides bacteria.

Using the Statistical Analysis of Protein Sequences tool, [6] SAPS, 5 spacings of cysteine were found. Four with the pattern of C-X-X-C—at amino acids 45, 70, 86, and 96—and the fifth spacing at amino acid 89 (CAC). The C-X-X-C pattern is known to be present in metal-binding proteins and oxidoreductases. [17] Additionally, three of the five cysteine spacings were also the top conserved amino acids throughout the most closely related orthologs; C70, C89, and C96.

Localization

Gene Localization in Humans

C9orf85 has been found to be expressed highly in epithelial cells. [18] of the pancreas. [19] Additionally, high levels of expression have been established in the urinary bladder and thymus of the adult human, while expression levels were significant in the intestine of a 20-week-old fetus. [5]

Subcellular Localization

k-NN results predict C9orf85 to be 78.3% nuclear, 8.7% mitochondrial, 8.7% cytoplasmic, and 4.3% vacuolar. [20]

Promoter

C9orf85 has 3 predicted promoters for the gene. [21] The choice promoter was GXG_18858 on the plus strand. Chosen for its large quantity of CAGE tags and its position being furthest upstream. Its start position is 71909780 and its end position is 71911841. It includes 2062 base pairs and has 13 transcripts. The last 500 base pairs of the double stranded promoter is featured below:

5' GCAGGAGGCGGGGATTGCGGAAAAGAAGAACCAATAGGAACAAAGGTTCC 3' 3' CGTCCTCCGCCCCTAACGCCTTTTCTTCTTGGTTATCCTTGTTTCCAAGG 5'
5' CCGCCCCTTTGATTTGATGGACTACACATTCGGGCCAATGGGGGAATTCT 3' 3' GGCGGGGAAACTAAACTACCTGATGTGTAAGCCCGGTTACCCCCTTAAGA 5'
5' CATTTCGAAGAAAGTGGGACTTGTTCTCCGGGTTTGAGAAAGAGGCTGCG 3' 3' GTAAAGCTTCTTTCACCCTGAACAAGAGGCCCAAACTCTTTCTCCGACGC 5'
5' CGGAGCCGGAGGGGTCGAGGCTGCGCCGCGTGGAGTGGCTTGGCTTAACA 3' 3' GCCTCGGCCTCCCCAGCTCCGACGCGGCGCACCTCACCGAACCGAATTGT 5'
5' GCAGGGAGGGCAGAGCGATGCTCTTTGACCTCCCAGAAGAGTCACGTGGG 3' 3' CGTCCCTCCCGTCTCGCTACGAGAAACTGGAGGGTCTTCTCAGTGCACCC 5'
5' CTGACCCAGAGCCGGGGCGGAAAGGCTGCGTTTGTTTCTTCCGGGTCATT 3' 3' GACTGGGTCTCGGCCCCGCCTTTCCGACGCAAACAAAGAAGGCCCAGTAA 5'
5' GACAGAAGCGTCAATTCCTGGGAGTAGTTCGTTGGTTTTCTTTCCCCTCA 3' 3' CTGTCTTCGCAGTTAAGGACCCTCATCAAGCAACCAAAAGAAAGGGGAGT 5'
5' TCCTTTTGCCTGCTCCCGGCGAGGGGTGGCTTTGATTTCGGCGATGAGCT 3' 3' AGGAAAACGGACGAGGGCCGCTCCCCACCGAAACTAAAGCCGCTACTCGA 5'
5' CCCAGAAAGGCAACGTGGCTCGTTCCAGACCTCAGAAGCACCAGAATACG 3' 3' GGGTCTTTCCGTTGCACCGAGCAAGGTCTGGAGTCTTCGTGGTCTTATGC 5'
5' TTTAGCTTCAAAAATGACAAGTTCGATAAAAGTGTGCAGACCAAGGTAGG 3' 3' AAATCGAAGTTTTTACTGTTCAAGCTATTTTCACACGTCTGGTTCCATCC 5'
A Table of 16 Possible Transcription Factors Predicted to Bind to the Promoter [22]
Transcription FactorDetailed Matrix InformationMatrix Score
CLOXTranscriptional repressor CDP0.962
KLFSGut-enriched Krueppel-like factor1.000
CAATNuclear factor Y (Y-box binding factor)0.940
HIFFAryl hydrocarbon receptor nuclear translocator-like, homodimer1.000
MZF1Myeloid zinc finger protein0.992
STATSTAT5: signal transducer and activator of transcription 50.944
ETSFETS-like gene 1 (ELK-1)0.958
CREBTax/CREB complex0.834
P53FTumor suppressor p53 (3' half site)0.921
TCFFTCF11/LCP-F1/Nrf1 homodimers1.000
FKHDFork head homologous X binds DNA with a dual sequence specificity (FHXA and FHXB)0.870
MIRFZinc finger protein 7680.819
BCL6B-cell CLL/lymphoma 6, member B (BCL6B)0.878
AP2FTranscription factor AP-2, alpha0.931
EBOXMYC associated factor X0.926
GCMFGlial cells missing homolog 1, chorion-specific transcription factor GCMa0.942

Regulation

Transmembrane Domain

Though there is a presence of hydrophobic regions in the protein sequence, [6] [23] [24] there have been no confirmed transmembrane domains present [25]

Phosphorylation

A protein kinase C phosphorylation site is predicted at amino acid 3-5. [26] There is also a possible CK2 phosphorylation site at amino acid 77-80 [26]

SUMOylating

There is one predicted SUMO site at position 23. [27] The result is significant with a p-value of 0.041.

Function

Through the level of expression in various tissue samples, the C9orf85 protein is a regulated gene rather than a constitutive gene. [5]

Additionally, urinary bladder epithelial cells function by altering the immune system of an infection. [28] The thymus is a primary lymphoid organ of the immune system, composed of T cells and epithelial cells. Research has found that the thymus has an increasing role in the development of intestinal immunity [29] Both are an element of the immune system, designed to ensure proper function of the immune system.

Related Research Articles

CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">C17orf98</span> Protein-coding gene in the species Homo sapiens

C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">FAM71E1</span> Mammalian protein found in Homo sapiens

FAM71E1, also known as Family With Sequence Similarity 71 Member E1, is a protein that in humans is encoded by the FAM71E1 gene. It is thought to be ubiquitously expressed at low levels throughout the body, and it is conserved in vertebrates, particularly mammals and some reptiles. The protein is localized to the nucleus and can be exported to the cytoplasm.

<span class="mw-page-title-main">C16orf86</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

C11orf42 is an uncharacterized protein in homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C17orf78</span> Mammalian protein found in Homo sapiens

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">C3orf38</span> Uncharacterized gene

Chromosome 3 open reading frame 38 (C3orf38) is a protein which in humans is encoded by the C3orf38 gene.

<span class="mw-page-title-main">C5orf22</span> Protein-coding gene in the species Homo sapiens

Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">Chromosome 5 open reading frame 47</span> Human C5ORF47 Gene

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000155621 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000035171 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 "C9orf85 chromosome 9 open reading frame 85 [Homo sapiens (human)] – Gene – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-09-30.
  6. 1 2 3 4 EMBL-EBI. (2020). SAPS Results. Ebi.Ac.Uk. https://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=saps-I20201219-191317-0344-54841082-p1m
  7. Chen BZ, Yu SL, Singh S, Kao LP, Tsai ZY, Yang PC, et al. (January 2011). "Identification of microRNAs expressed highly in pancreatic islet-like cell clusters differentiated from human embryonic stem cells". Cell Biology International. 35 (1): 29–37. doi:10.1042/CBI20090081. PMID   20735361. S2CID   30538749.
  8. C9orf85 - Uncharacterized protein C9orf85 - Homo sapiens (Human) - C9orf85 gene & protein. (2020). Uniprot.Org. https://www.uniprot.org/uniprot/Q96MD7
  9. (2020). Genenames.Org. https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/28784
  10. 1 2 3 "AceView a comprehensive annotation of human and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2020-09-30.
  11. "uncharacterized protein C9orf85 isoform 1 [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-09-30.
  12. "uncharacterized protein C9orf85 isoform 2 [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  13. "uncharacterized protein C9orf85 isoform 3 [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  14. "uncharacterized protein C9orf85 isoform 4 [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  15. 1 2 "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2020-10-26.
  16. "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2020-10-26.
  17. Miseta A, Csutora P (August 2000). "Relationship between the occurrence of cysteine in proteins and the complexity of organisms". Molecular Biology and Evolution. 17 (8): 1232–9. doi: 10.1093/oxfordjournals.molbev.a026406 . PMID   10908643.
  18. GENEVESTIGATOR Team at Nebion AG. (2020). Genevisible. Genevisible.com; genevisible. https://genevisible.com/tissues/HS/UniProt/Q96MD7
  19. "Gene: C9orf85 – ENSG00000155621". bgee.org. Retrieved 2020-09-30.
  20. PSORT II Prediction. (2020). Psort.Hgc.Jp. https://psort.hgc.jp/form2.html
  21. Genomatix: Gene2Promoter Result. (2020). Genomatix.De. https://www.genomatix.de/cgi-bin/c2p/c2p.pl?s=c5402bf929e4d6000dfc7ce8c56fa1e6;TASK=c2p;SHOW=TempSeq_kd0ZKohP.html
  22. Genomatix: MatInspector Result. (2019). Genomatix.De. https://www.genomatix.de/cgi-bin/eldorado/eldorado.pl?s=c5402bf929e4d6000dfc7ce8c56fa1e6;PROM_ID=GXP_18858;GROUP=vertebrates;GROUP=others;ELDORADO_VERSION=E35R1911
  23. "ProtScale". Expasy. Archived from the original on 2019-01-08.
  24. TMPred results. (2020). Vital-It.Ch. https://embnet.vital-it.ch/cgi-bin/TMPRED_form_parser
  25. "SOSUI/submit a protein sequence". harrier.nagahama-i-bio.ac.jp. Retrieved 2020-12-19.
  26. 1 2 "Motif Scan". myhits.sib.swiss. Retrieved 2020-12-19.
  27. "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interaction Motifs". sumosp.biocuckoo.org. Retrieved 2020-12-19.
  28. Abraham SN, Miao Y (October 2015). "The nature of immune responses to urinary tract infections". Nature Reviews. Immunology. 15 (10): 655–63. doi:10.1038/nri3887. PMC   4926313 . PMID   26388331.
  29. Falk W (July 2006). "A ticket to the gut for thymic T cells". Gut. 55 (7): 910–2. doi:10.1136/gut.2005.087288. PMC   1856347 . PMID   16766746.