C9orf50 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C9orf50 , chromosome 9 open reading frame 50 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1923631 HomoloGene: 18859 GeneCards: C9orf50 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. [5] C9orf50 has one other known alias, FLJ35803. [6] In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
In humans the gene is located on the negative strand at 9q34.11 and the coding sequence is 8,552 base pairs long. [7] On human chromosome 9, the gene spans bases chr9:132,374,504-132,383,055 [8] Near C9orf50 is ASB6 which is the gene directly before C9orf50 on the negative strand and on the positive strand is NTMT1 which is more than double the size of C9orf50.[1] [2]
The C9orf50 protein has a molecular weight of 47,639 kD and consists 431 amino acids with a predicted isoelectric point of 10.38 [7] The C9orf50 protein contains the conserved domain in pfam15737- DUF4685, the function of which is not well understood and conserved in vertebrates. The protein is made up of 7 exons.
C9orf50 has 9 different splice isoforms (SI) and 11 different transcript variants (TV), the most common is isoform 1 and transcript variant 1. [9]
The protein can be analyzed as a whole as well as split into 3 parts including the N-terminalDomain of 193 residues, DUF4685 of 103 residues, and the C-terminal Domain of 135 residues. The full protein pI is similar to the average pI of the NTD, DUF4685, and CTD. Of these sections the NTD has the highest pI and mW but also has the most residues at 193 of 431. [10] [11]
C9orf50 | pI | mW kD | Residues |
---|---|---|---|
Human Whole Protein | 10.38 | 47.6 | 431 |
NTD | 11.14 | 21.1 | 193 |
DUF4685 | 10.8 | 11.8 | 103 |
CTD | 9.47 | 14.7 | 135 |
The compositional analysis of the C9orf50 protein reveals low amounts of I, M, Y and FIKMNY relative to humans and high amounts of R, and KR-ED. There are no findings for charge clusters, high scoring charged or uncharged segments, charge runs, patterns, high scoring hydrophobic or transmembrane segments. Three different unique spacings of C were found at positions 161, 190, and 342. C9orf50 is also found to have 3 repetitive structures, the first sequence PRLP_KLT occurs starting at position 30 and then is repeated at position 78. Another repetitive structure is SLLP at positions 99 and 398. The last repeat structure at 250 and 303 made of KAAL. [12]
Tertiary C9orf50 protein structures can be found using I-Tasser. This tool results in 5 visualized structures, the two with the highest C scores are -3.25 and -1.27.
The promoter region for C9orf50 was found using the Genomatix Gene2Promoter search engine. [13] This resulted in 6 found promoter regions. Only 2 of which were supported by transcripts and cage tags. The most supported promoter region spans 1,962 bases and is conserved in 6 of 8 orthologous loci with 945 cage tags. The transcription start site was determined to be located at 1,503 from a transcript with 7 exons supported by 118 cage tags. [13]
There are hundreds of transcription factors that are predicted to bind the promoter region. The promoter region transcription factors table highlight 20 of these.
C9orf50 5' UTR intermolecular base paired structure with the highest delta G is -323.4 kcal/mol. This is the lowest energy structure predicted for the 5'UTR region. [14] For the 3 ' UTR, the highest dG is -127.5 kcal/mol indicating that it is not as stable as the 5' UTR.
RNA-seq data of C9orf50 has found a low expression level, 25-50th percentile, in most human tissues compared to all human proteins. [15] However, it is most highly expressed in testes, brain and gallbladder. [9] C9orf50 protein expression is higher than the C9orf50 RNA expression. [16] When studying in situ hybridization data, The mouse C9orf50 ortholog, symbol 1700001O22Rik, was used to compare protein expression against Beta-actin which is ubiquitously expressed and the analyses shows similar expression patterns in the mouse brain. [17] During development, the protein can be found in the fetal stages. [18]
The protein has been located primarily in the nucleus and less so found in mitochondria and cytosol. [19]
There are no known paralogs of C9orf50. orthologs of C9orf50 have been found conserved across most subclasses of mammals with the furthest, opossum of the infraclass marsupialia, diverged 159 million years ago. [20] This gene is not found in reptiles, amphibians, birds, or any other organisms evolved before mammals. A list of mammals in which C9orf50 is conserved is shown below.
Common Name | Taxonomic Group | Divergence from Humans (MYA) | NCBI Accession # | Protein Length (AA) | Sequence Identity to Humans% |
Human | Hominini | 0 | NP_955382.3 | 431 | 100 |
Chimpanzee | Primates | 6.65 | XP_016817319.1 | 431 | 97.22 |
Gorilla | Primates | 9.06 | XP_018889539.1 | 435 | 93.17 |
Deer Mouse | Rodentia | 90 | XP_006983488.1 | 391 | 46.14 |
Prairie Vole | Rodentia | 90 | XP_005346778.1 | 370 | 45.18 |
American Pika | Lagomorpha | 90 | XP_004593748.1 | 579 | 38.11 |
Narrow Ridged Finless Porpoise | Cetacea | 96 | XP_024617982.1 | 473 | 56.71 |
Killer Whale | Cetacea | 96 | XP_012388229.1 | 343 | 59.34 |
Alpaca | Artiodactyla | 96 | XP_006205645.1 | 399 | 53.83 |
Black Flying Fox | Chiroptera | 96 | XP_015449607.1 | 432 | 53.21 |
Egyption Fruit bat | Chiroptera | 96 | XP_015989428.1 | 431 | 53.01 |
Goat | Artiodactyla | 96 | XP_017910228.1 | 438 | 52.4 |
Northern Fur Seal | Carnivora | 96 | XP_025744313.1 | 441 | 52.36 |
Grizzly Bear | Carnivora | 96 | XP_026369526.1 | 447 | 50.63 |
European Hedgehog | Soricomorpha | 96 | XP_007527129.1 | 419 | 51.42 |
Star Nosed Mole | Proboscidea | 96 | XP_012576659.1 | 383 | 48.68 |
Southern White Rhinoceros | Perissodactyla | 96 | XP_014637447.1 | 489 | 47.25 |
African Bush Elephant | Proboscidea | 105 | XP_023401069.1 | 527 | 49.31 |
Nine-Banded Armadillo | Cingulata | 105 | XP_023443586.1 | 476 | 46.72 |
Gray short tailed opossum | Didelpimorphia | 159 | XP_007475193.1 | 583 | 32.56 |
C9orf50 is predicted to evolve more quickly than other common proteins including cytochrome C, hemoglobin beta, and fibrinogen alpha chain.
Important amino acids are characterized by those that were on the 100% consensus line created in MView of the strict ortholog multiple sequence alignment. [21] Amino Acids in red represent conserved amino acids in DUF4685. 14 of the 22 highly conserved amino acids are found within this domain. Leucine occupies the most conserved positions of the C9orf50 protein.
Conserved Amino Acids | C9orf50 AA Position |
---|---|
Proline | 33,325 |
Leucine | 147, 155, 158, 280, 285, 321, 328 |
Phenylalanine | 231, 275 |
Arginine | 272, 286 |
Valine | 273, 313 |
Alanine | 267 |
Aspartic Acid | 277 |
Glutamic Acid | 278, 289 |
Threonine | 279 |
Tyrosine | 287 |
Tryptophan | 288 |
Common variants in C9orf50 were found with NCBI SNPGeneView. [22]
dbSNP rs# Cluster ID | Function | dbSNP Allele | Amino Acid Position |
---|---|---|---|
rs146521610 | Synonymous | V → G | 317 |
rs566893379 | Synonymous | S → T | 310 |
rs111868243 | Synonymous | S → A | 258 |
rs918165 | Missense | K → A | 248 |
rs141573674 | Missense | S → A | 201 |
rs759058008 | Frameshift | Deleted L | 189 |
rs111606531 | Synonymous | A → T | 86 |
rs146618124 | Missense | S → C | 52 |
rs372378735 | Synonymous | G → A | 45 |
rs751493011 | Nonsense | Insert T | 11 |
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.
C5orf34 is a protein that in humans is encoded by the C5orf34 gene (5p12).
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.