Chromosome 9 open reading frame 43

Last updated
C9orf43
Identifiers
Aliases C9orf43 , chromosome 9 open reading frame 43
External IDs MGI: 3045314 HomoloGene: 51897 GeneCards: C9orf43
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001278629
NM_001278630
NM_152786

NM_177607
NM_001356420
NM_001356421
NM_001356422
NM_001377106

Contents

RefSeq (protein)

NP_001265558
NP_001265559
NP_689999

NP_001343349
NP_001343350
NP_001343351
NP_001364035
NP_808275

Location (UCSC) Chr 9: 113.41 – 113.43 Mb Chr 4: 62.44 – 62.47 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 9 open reading frame 43 is a protein that in humans is encoded by the C9orf43 gene. [5] The gene is also known as MGC17358 and LOC257169. C9orf43 contains DUF 4647 and a polyglutamine repeat region although protein function is not well understood. [6]

Gene

Location

C9orf43 is located on the long arm of chromosome 9 at 9q32 and is expressed on the positive strand. [7] [8] The genomic sequence starts at 113,410,054 bp and ends at 113,429,684 bp. [8] The gene neighborhood of C9orf43 contains 5 other genes: HDHD3, ALAD, POLE3, RGS3, and LOC105376222. [9]

Promoter

The promoter region of C9orf43 predicted by Genomatix ElDorado is 1199 base pairs long and contains a CpG island and part of the 5' UTR. [10] Transcription factor binding sites determined include zinc finger protein GC-Box factors SP1/GC and HOX-PBX complexes associated with development.

Expression pattern

C9orf43 gene expression is relatively high in humans, 0.7 times the average gene. [11] C9orf43 is expressed in many different tissues during the fetal and adult developmental stages according to NCBI's EST Profile. [12] The highest expression is seen in the testes with lower relative expression in the umbilical cord, cervix, and brain. Expression is also seen in the fetal liver, lung, liver, brain, heart, spinal cord, pancreas, thymus, salivary gland and placenta. [13]

C9orf43 is expressed in health states including cervical tumors, normal tumors, and soft tissue/muscle tumors. [12] C9orf43 is expressed at high levels in sperm cells compared to lower relative expression in teratozoospermia. [14] Greater relative expression of C9orf43 is seen in the hyperplastic enlarged lobular unit of the mammary gland as compared to the normal terminal duct lobular unit. [15] C9orf43 expression varies in megakaryocyte differentiation as seen in peripheral blood CD34 plus cells as compared to CHRF-288-11 cells. [16]

mRNA

Variants

C9orf43 contains 10 alternatively spliced mRNAs, 8 of which encode good proteins. [11] The mRNA discussed in this entry has accession number NM_001278629.1 and differs from other mRNA variants in truncation of 5' and 3' ends and inclusion or exclusion of the six cassette exons. [9]

Transcriptional regulation

376 bp of C9orf43 are antisense to spliced gene POLE3, indicating possible regulation of alternate expression. [11] A short open reading frame upstream of the main open reading frame may reduce efficiency of translation. The 3’ untranslated region of C9orf43 contains 3 predicted stem -loop structures with 4 miRNA predicted to bind. [17] [18]

The gene neighborhood of C9orf43 constructed by NCBI Gene. POLE3 antisense expression is shown. C9orf43 interaction.png
The gene neighborhood of C9orf43 constructed by NCBI Gene. POLE3 antisense expression is shown.

Protein

Characteristics and Isoforms

C9orf43 has three known complete isoforms and four partial isoforms. [11] Isoform X1 with accession number NP_001265558.1 is the isoform discussed in this entry and contains 461 amino acids and 16 exons. [9] Isoform X1 has a predicted molecular weight of 52.2kD and a predicted isoelectric point of 8.037. [19] [20] Isoform X1 protein abundance is predicted to be normal with normal expression. [21]

C9orf43 is seen to diverge quickly relative to cytochrome c but slowly relative to fibrinopeptides. Molecular Clock.svg
C9orf43 is seen to diverge quickly relative to cytochrome c but slowly relative to fibrinopeptides.

Structure

Composition analysis of C9orf43 showed an above average frequency of Glutamine while all other amino acids were in the normal range for human proteins. [23] DUF4647, a domain of unknown function, is present from amino acid 1-454. [24] Globular domains are predicted from amino acid 193-361 and 375-461. [25] C9orf43 secondary structure is predicted to be 36% alpha helical and 3% beta sheet with 73% disordered. [26] Secondary structure predictions are conserved in orthologous proteins.

Homology / Evolution

C9orf43 has no known paralogs but has orthologs ranging from mammalian orders to reptile Crocodylus porosus. [22] [27] No orthologs have been found in invertebrate or plant species. C9orf43 divergence is moderate based on the molecular clock hypothesis. [22] C9orf43 is seen to diverge more quickly than cytochrome c but slower than fast evolving fibrinopeptides. A table of orthologous proteins constructed using BLAST contains a small subset of orthologs to exhibit the diversity of C9orf43. [27]

Genus and Species [27] Common NameTaxonomyAccession number [27] Date of Divergence [22] Percent Identity [27]
Aotus Nancymaae Ma's night monkey Primates XP_009186498.142.679%
Pteropus alecto Black flying fox Chordata XP_015448812.19462%
Bos taurus Cow Cerartiodactyla XP_01532817.99457%
Erinaceus europaeus European Hedgehog Soricomorpha XP_007526825.19454%
Phascolarctos cinereus Koala Diprotodontia XP_020821154.115944%
Ursus maritimus Polar Bear Carnivora XP_008706212.19443%
Monodelphis domestica Gray short-tailed opossum Didelphimorphia XP_007475046.116041%
Sarcophilus harisii Tasmanian Devil Dasyuromorphia XP_003757404.316032%
Crocodylus porosis Saltwater Crocodyle Reptilia XP_019396109.131230%

Post Translational Modification

SUMOylation, O-linked glycosylation, N-acetylgluocose addition, and phosphorylation are predicted post translational modifications of C9orf43. [28]

Subcellular Localization

C9orf43 is predicted to be intracellular with a nuclear localization signal that is conserved across orthologs. [25] The protein does not contain signal peptides or mitochondrial targeting signals indicating the protein is not predicted to be secreted or targeted to the mitochondria. [29]

Interacting Proteins

Predicted C9orf43 interaction with OR7D2 and OR4X2 olfactory receptors is likely as olfactory associated zinc finger protein is predicted to bind to the C9orf43 upstream promoter. [30] Predicted interaction of C9orf43 with CATSPER3, a sperm associated voltage gated calcium channel, is also likely as C9orf43 is highly expressed in sperm. [31]

Clinical Significance

C9orf43 has no known disease associations, however the polyglutamine repeat region is similar to genetic precursors in trinucleotide repeat disorders. [32] An increase in the length of the polyglutamine repeat region is seen in diseases such as Huntington's disease and Spinocerebellar ataxia.

Related Research Articles

C11orf49

C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.

C10orf67

Chromosome 10 open reading frame 67 (C10orf67), also known as C10orf115, LINC01552, and BA215C7.4, is an un-characterized human protein-coding gene. Several studies indicate a possible link between genetic polymorphisms of this and several other genes to chronic inflammatory barrier diseases such as Crohn's Disease and sarcoidosis.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

C21orf58 Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

C16orf46 Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

C15orf39

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

TMEM44

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

C19orf44

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

CFAP299 Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

C9orf50

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

SMCO3

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

C1orf185

Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.

C17orf78

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

LSMEM2

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

C14orf180

C14orf180 is found on chromosome 14 in humans: 14q32.33. It consists of 1832 bp and 160 amino acids post translation. There is a total number of 6 exons. C14orf180 is also known as NRAC, C14orf77, and Chromosome 14 Open Reading Frame 180.

C9orf85

Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.

FAM214B

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

FAM120AOS

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

CCDC190

Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.

C5orf22

Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000157653 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000058046 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "Entrez Gene: Chromosome 9 open reading frame 43".
  6. Butland SL, Devon RS, Huang Y, Mead CL, Meynert AM, Neal SJ, Lee SS, Wilkinson A, Yang GS, Yuen MM, Hayden MR, Holt RA, Leavitt BR, Ouellette BF (May 2007). "CAG-encoded polyglutamine length polymorphism in the human genome". BMC Genomics. 8: 126. doi:10.1186/1471-2164-8-126. PMC   1896166 . PMID   17519034.
  7. "BLAT".
  8. 1 2 Database, GeneCards Human Gene. "C9orf43 Gene - GeneCards | CI043 Protein | CI043 Antibody". www.genecards.org. Retrieved 2018-05-05.
  9. 1 2 3 4 "NCBI gene".
  10. "El Dorado".
  11. 1 2 3 4 "AceView".
  12. 1 2 "NCBI, Unigene EST Profile".
  13. "GDS3113 / 167802". www.ncbi.nlm.nih.gov. Retrieved 2018-05-05.
  14. "GDS2697 / 1554280_a_at". www.ncbi.nlm.nih.gov. Retrieved 2018-05-05.
  15. "GDS2739 / Hs2.190121.2.S1_3p_s_at". www.ncbi.nlm.nih.gov. Retrieved 2018-05-05.
  16. "GDS2926 / 3613". www.ncbi.nlm.nih.gov. Retrieved 2018-05-05.
  17. "miRDB Search Result Details". www.mirdb.org. Retrieved 2018-05-05.
  18. "No query string". unafold.rna.albany.edu. Retrieved 2018-05-05.
  19. "Expression of C9orf43 in cancer - Summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2018-05-05.
  20. Kozlowski, Lukasz P. "IPC - ISOELECTRIC POINT CALCULATION OF PROTEINS AND PEPTIDES". isoelectric.org. Retrieved 2018-05-06.
  21. "PaxDb".
  22. 1 2 3 4 "TimeTree :: The Timescale of Life". www.timetree.org. Retrieved 2018-05-05.
  23. "SAPS".
  24. "C9orf43 - Uncharacterized protein C9orf43 - Homo sapiens (Human) - C9orf43 gene & protein". www.uniprot.org. Retrieved 2018-05-05.
  25. 1 2 "ELM".
  26. "Phyre 2 Results for Undefined". www.sbg.bio.ic.ac.uk. Archived from the original on 2018-05-06. Retrieved 2018-05-05.
  27. 1 2 3 4 5 "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2018-05-05.
  28. EMBL-EBI. "EBI Tools: Job not available". www.ebi.ac.uk. Retrieved 2018-05-05.
  29. "SignalP 4.1 Server". www.cbs.dtu.dk. Retrieved 2018-05-06.
  30. "Genomatix: Login Page". www.genomatix.de. Retrieved 2018-05-05.
  31. "STRING: functional protein association networks". string-db.org. Retrieved 2018-05-05.
  32. Orr HT, Zoghbi HY (2007). "Trinucleotide repeat disorders". Annual Review of Neuroscience. 30: 575–621. doi:10.1146/annurev.neuro.29.051605.113042. PMID   17417937.

Further reading