C9orf50

Last updated
C9orf50
Identifiers
Aliases C9orf50 , chromosome 9 open reading frame 50
External IDs MGI: 1923631; HomoloGene: 18859; GeneCards: C9orf50; OMA:C9orf50 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_199350

NM_198000

RefSeq (protein)

NP_955382

NP_932117

Location (UCSC) Chr 9: 129.61 – 129.62 Mb Chr 2: 30.68 – 30.69 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. [5] C9orf50 has one other known alias, FLJ35803. [6] In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

Contents

Gene

Location

In humans the gene is located on the negative strand at 9q34.11 and the coding sequence is 8,552 base pairs long. [7] On human chromosome 9, the gene spans bases chr9:132,374,504-132,383,055 [8] Near C9orf50 is ASB6 which is the gene directly before C9orf50 on the negative strand and on the positive strand is NTMT1 which is more than double the size of C9orf50.[1] [2]

Protein

C9orf50 Schematic Illustration using Dog2.0. The 431 amino acid is displayed showing regions of disorder in yellow, polyampholyte in red, and DUF4685 in teal. Other motifs are shown and labelled at the correct AA position. Pink circles represent sites of acetylation, teal represents glycation, and blue represents sumoylation. C9orf50 Schematic Illustration.png
C9orf50 Schematic Illustration using Dog2.0. The 431 amino acid is displayed showing regions of disorder in yellow, polyampholyte in red, and DUF4685 in teal. Other motifs are shown and labelled at the correct AA position. Pink circles represent sites of acetylation, teal represents glycation, and blue represents sumoylation.

The C9orf50 protein has a molecular weight of 47,639 kD and consists 431 amino acids with a predicted isoelectric point of 10.38 [7] The C9orf50 protein contains the conserved domain in pfam15737- DUF4685, the function of which is not well understood and conserved in vertebrates. The protein is made up of 7 exons.

Isoforms

C9orf50 has 9 different splice isoforms (SI) and 11 different transcript variants (TV), the most common is isoform 1 and transcript variant 1. [9]

C9orf50 Isoform table. Author Hannah Berhow C9orf50 Isoform Table.png
C9orf50 Isoform table. Author Hannah Berhow
C9orf50 Isoforms and Transcript Variants C9orf50 Isoforms and Transcript Variants.png
C9orf50 Isoforms and Transcript Variants

Domains

The protein can be analyzed as a whole as well as split into 3 parts including the N-terminal Domain of 193 residues, DUF4685 of 103 residues, and the C-terminal Domain of 135 residues. The full protein pI is similar to the average pI of the NTD, DUF4685, and CTD. Of these sections the NTD has the highest pI and mW but also has the most residues at 193 of 431. [10] [11]

C9orf50pImW kDResidues
Human Whole Protein10.3847.6431
NTD11.1421.1193
DUF468510.811.8103
CTD9.4714.7135

Composition

The compositional analysis of the C9orf50 protein reveals low amounts of I, M, Y and FIKMNY relative to humans and high amounts of R, and KR-ED. There are no findings for charge clusters, high scoring charged or uncharged segments, charge runs, patterns, high scoring hydrophobic or transmembrane segments. Three different unique spacings of C were found at positions 161, 190, and 342. C9orf50 is also found to have 3 repetitive structures, the first sequence PRLP_KLT occurs starting at position 30 and then is repeated at position 78. Another repetitive structure is SLLP at positions 99 and 398. The last repeat structure at 250 and 303 made of KAAL. [12]

Tertiary Structure

Tertiary C9orf50 protein structures can be found using I-Tasser. This tool results in 5 visualized structures, the two with the highest C scores are -3.25 and -1.27.

Gene level regulation

Promoter

The promoter region for C9orf50 was found using the Genomatix Gene2Promoter search engine. [13] This resulted in 6 found promoter regions. Only 2 of which were supported by transcripts and cage tags. The most supported promoter region spans 1,962 bases and is conserved in 6 of 8 orthologous loci with 945 cage tags. The transcription start site was determined to be located at 1,503 from a transcript with 7 exons supported by 118 cage tags. [13]

Transcription factor binding sites

There are hundreds of transcription factors that are predicted to bind the promoter region. The promoter region transcription factors table highlight 20 of these.

Transcript Level Regulation

C9orf50 5' UTR intermolecular base paired structure with the highest delta G is -323.4 kcal/mol. This is the lowest energy structure predicted for the 5'UTR region. [14] For the 3 ' UTR, the highest dG is -127.5 kcal/mol indicating that it is not as stable as the 5' UTR.

C9orf50 3' UTR Stem Loop Structures C9orf50 3' UTR Stem Loop Structures.png
C9orf50 3' UTR Stem Loop Structures
C9orf50 5' UTR Stem Loop Structures C9orf50 5' UTR Stem Loop Structures.png
C9orf50 5' UTR Stem Loop Structures

Tissue expression

RNA-seq data of C9orf50 has found a low expression level, 25-50th percentile, in most human tissues compared to all human proteins. [15] However, it is most highly expressed in testes, brain and gallbladder. [9] C9orf50 protein expression is higher than the C9orf50 RNA expression. [16] When studying in situ hybridization data, The mouse C9orf50 ortholog, symbol 1700001O22Rik, was used to compare protein expression against Beta-actin which is ubiquitously expressed and the analyses shows similar expression patterns in the mouse brain. [17] During development, the protein can be found in the fetal stages. [18]

Subcelluar expression

The protein has been located primarily in the nucleus and less so found in mitochondria and cytosol. [19]

C9orf50 Promoter Region Transcription Factors C9orf50 Promoter Region Transcription Factors.png
C9orf50 Promoter Region Transcription Factors

Orthologs

There are no known paralogs of C9orf50. orthologs of C9orf50 have been found conserved across most subclasses of mammals with the furthest, opossum of the infraclass marsupialia, diverged 159 million years ago. [20] This gene is not found in reptiles, amphibians, birds, or any other organisms evolved before mammals. A list of mammals in which C9orf50 is conserved is shown below.

C9orf50 Orthologs
Common NameTaxonomic GroupDivergence from Humans (MYA)NCBI Accession #Protein Length (AA)Sequence Identity to Humans%
HumanHominini0NP_955382.3431100
ChimpanzeePrimates6.65XP_016817319.143197.22
GorillaPrimates9.06XP_018889539.143593.17
Deer MouseRodentia90XP_006983488.139146.14
Prairie VoleRodentia90XP_005346778.137045.18
American PikaLagomorpha90XP_004593748.157938.11
Narrow Ridged Finless PorpoiseCetacea96XP_024617982.147356.71
Killer WhaleCetacea96XP_012388229.134359.34
AlpacaArtiodactyla96XP_006205645.139953.83
Black Flying FoxChiroptera96XP_015449607.143253.21
Egyption Fruit batChiroptera96XP_015989428.143153.01
GoatArtiodactyla96XP_017910228.143852.4
Northern Fur SealCarnivora96XP_025744313.144152.36
Grizzly BearCarnivora96XP_026369526.144750.63
European HedgehogSoricomorpha96XP_007527129.141951.42
Star Nosed MoleProboscidea96XP_012576659.138348.68
Southern White RhinocerosPerissodactyla96XP_014637447.148947.25
African Bush ElephantProboscidea105XP_023401069.152749.31
Nine-Banded ArmadilloCingulata105XP_023443586.147646.72
Gray short tailed opossumDidelpimorphia159XP_007475193.158332.56

Evolution

C9orf50 is predicted to evolve more quickly than other common proteins including cytochrome C, hemoglobin beta, and fibrinogen alpha chain.

C9orf50 Molecular Clock C9orf50 Molecular Clock.png
C9orf50 Molecular Clock

Amino acid conservation

Important amino acids are characterized by those that were on the 100% consensus line created in MView of the strict ortholog multiple sequence alignment. [21] Amino Acids in red represent conserved amino acids in DUF4685. 14 of the 22 highly conserved amino acids are found within this domain. Leucine occupies the most conserved positions of the C9orf50 protein.

Conserved Amino AcidsC9orf50 AA Position
Proline33,325
Leucine147, 155, 158, 280, 285, 321, 328
Phenylalanine231, 275
Arginine272, 286
Valine273, 313
Alanine267
Aspartic Acid277
Glutamic Acid278, 289
Threonine279
Tyrosine287
Tryptophan288

Mutations

Post Translational Modifications and Secondary Structure of C9orf50. PTMs for C9orf50 were found using the tools posted on the Expasy Protein Modifications site. The secondary structure for C9orf50 was predicted by using analysis from Gor, COILS, CFSSP, JPRED, and SOPMA. Helix indicated by green cylinders, beta sheet indicated by blue arrows, and turn structures indicated by pink arrows were included below in the conceptual translation if they had a high prediction score. All the structures that were found in more than one analysis tool were also kept. The protein has no transmembrane sequences. Conceptual Translation with Secondary Structure.pdf
Post Translational Modifications and Secondary Structure of C9orf50. PTMs for C9orf50 were found using the tools posted on the Expasy Protein Modifications site. The secondary structure for C9orf50 was predicted by using analysis from Gor, COILS, CFSSP, JPRED, and SOPMA. Helix indicated by green cylinders, beta sheet indicated by blue arrows, and turn structures indicated by pink arrows were included below in the conceptual translation if they had a high prediction score. All the structures that were found in more than one analysis tool were also kept. The protein has no transmembrane sequences.

Common variants in C9orf50 were found with NCBI SNPGeneView. [22]

dbSNP rs# Cluster IDFunctiondbSNP AlleleAmino Acid Position
rs146521610SynonymousV → G317
rs566893379SynonymousS → T310
rs111868243SynonymousS → A258
rs918165MissenseK → A248
rs141573674MissenseS → A201
rs759058008FrameshiftDeleted L189
rs111606531SynonymousA → T86
rs146618124MissenseS → C52
rs372378735SynonymousG → A45
rs751493011NonsenseInsert T11

Related Research Articles

<span class="mw-page-title-main">C11orf49</span> Protein-coding gene in the species Homo sapiens

C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.

<span class="mw-page-title-main">METTL26</span> Protein-coding gene in the species Homo sapiens

METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.

<span class="mw-page-title-main">Proline-rich 12</span> Protein-coding gene in the species Homo sapiens

Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

<span class="mw-page-title-main">TEX9</span> Protein-coding gene in the species Homo sapiens

Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.

<span class="mw-page-title-main">C16orf86</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">C20orf202</span>

C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.

<span class="mw-page-title-main">Fam89A</span> Human protein and gene

ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">SMIM19</span> Protein-coding gene in the species Homo sapiens

SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">C12orf50</span> Protein-coding gene in humans

Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.

<span class="mw-page-title-main">KIAA2013</span> Protein-coding gene in the species Homo sapiens

KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000179058 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000044320 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "uncharacterized protein C9orf50 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
  6. "Gene: C9orf50 (ENSG00000179058) - Summary - Homo sapiens - Ensembl genome browser 95". uswest.ensembl.org. Retrieved 2019-02-25.
  7. 1 2 "C9orf50 Gene". www.genecards.org. Retrieved 2019-02-25.
  8. "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
  9. 1 2 "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI".
  10. Gene https://www.ncbi.nlm.nih.gov/gene/375759
  11. "ExPASy - Compute pI/Mw tool".
  12. "EBI Tools: Job not available".
  13. 1 2 "Genomatix: Login Page".
  14. "The Mfold Web Server | mfold.rit.albany.edu".
  15. "Gds3113 / 115495".
  16. "Anti-C9orf50 antibody produced in rabbit Prestige Antibodies Powered by Atlas Antibodies, affinity isolated antibody, buffered aqueous glycerol solution | Sigma-Aldrich".
  17. "Gene Detail :: Allen Brain Atlas: Mouse Brain".
  18. "EST Profile - Hs.124223".
  19. "WoLF PSORT: Advanced Protein Subcellular Localization Prediction Tool - GenScript".
  20. "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
  21. "EBI Tools: Error".
  22. "SNP linked to Gene (geneID:375759) Via Contig Annotation".