C6orf47

Last updated

C6ORF47 is a gene. In humans, it is on chromosome 6.

Contents

Gene

C6orf47
Identifiers
Aliases C6orf47 , D6S53E, G4, NG34, chromosome 6 open reading frame 47
External IDs MGI: 90673; HomoloGene: 75155; GeneCards: C6orf47; OMA:C6orf47 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_021184

NM_033477

RefSeq (protein)

NP_067007

NP_258438

Location (UCSC) Chr 6: 31.66 – 31.66 Mb Chr 17: 35.35 – 35.35 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

General Information

In humans, Chromosome 6 open reading frame 47, C6ORF47, is a single exon gene that spans 2481 nucleotides that encodes for a 294 amino acid protein. [5] [6]

Location

In humans, this gene is located on the minus strand at 6p21.33. [7]

Gene Expression

NCBI GEO shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas. The Salivary Gland and Cerebellum showed high RNA expression levels. Kljklhlkhkjlbnln.jpg
NCBI GEO shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas. The Salivary Gland and Cerebellum showed high RNA expression levels.

Tissue expression in human C6ORF47 was found to ubiquitously expressed throughout all tissues. C6ORF47 gene is also seen to be over-expressed in the colon, urinary bladder, ovary, and pancreas. [7] NCBI GEO Profiles shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas like the Salivary Gland and Cerebellum. [9]

Research by Pontus Boström et al. looked into C6ORF47 mRNA expression using microarray data from macrophages from 4 healthy donors. The goal of this study was to investigate whether or not hypoxia can influence the accumulation of lipids in macrophages. These results would help identify whether or not the macrophages loaded with lipids in the atherosclerotic lesions are there because of the hypoxic regions. Human macrophages were exposed to hypoxia for 24 hours and showed an increased formation of cytosolic lipid droplets and increased tri-glyceride accumulation. Results showed that the hypoxic regions in the atherosclerotic lesions could contribute to forming lipid-loaded macrophages and accumulating triglycerides.8 As we can see below, expression of C6ORF47 shows that expression is almost 6 times greater in the non-hypoxic region than in the hypoxic regions, showing that C6ORF47 is likely not contributing to either the lipid accumulation or an essential process since expression decreased. Once put under hypoxic conditions, only essential processes are left on likely hence when C6ORF47 expression decreased. [10]

Transcription Factors

Below is a short list of transcription factors binding to the promoter region, contains 5' UTR and 500 nucleotides upstream. Bioline [11] software was utilized for the double-stranded DNA seqeunce. UCSC genome browers [12] was used for transcription factors and binding sites providing the information of the transcription factors that bind listed below in the table (click show button below).

Transcription FactorGeneralized Function
KLF17, Krüppel-like factor 17regulates gene expression, influencing cell differentiation and development.
PROX1, Prospero homeobox 1Regulates lymphatic development, cell differentiation, and organogenesis processes.
WT1, Wilms' tumor 1Regulates kidney development, cell growth, and tissue differentiation processes
GATA1, GATA binding protein 1Controls red blood cell development and regulates hematopoiesis processes.
THRB, Thyroid hormone receptor betaRegulates thyroid hormone signaling, influencing metabolism and growth regulation.
ZNF454, Zinc Finger Protein 454Regulates gene expression, potentially influencing cell differentiation and development.
SP9, Specificity Protein 9 Regulates cartilage development and skeletal patterning during embryogenesis.
EGR3,Early Growth Response 3Regulates gene expression involved in neuronal activity and immune response.
SOX4, SRY-box transcription factor 4Regulates cell fate, development, and differentiation in multiple tissues.
EBF1, Early B-cell Factor 1Regulates B cell differentiation and immune system development.
ZNF669,Zinc Finger Protein 669Regulates gene expression, potentially involved in development and differentiation.
KLF1 Krüppel-like factor 1Regulates red blood cell development and hemoglobin expression.
STAT3, Signal Transducer and Activator of Transcription 3Regulates immune response, cell survival, and inflammation processes.
ZIC3, Zinc Finger of the Cerebellum 3Regulates brain and heart development, influencing neuronal patterning and function.
NHLH2, Nighthawk-like Protein 2Regulates neural differentiation and development, influencing nervous system patterning.
ZNF454, Zinc Finger Protein 454Involved in transcriptional regulation, potentially affecting gene expression and development.
EBF2, Early B-cell Factor 2Regulates adipocyte differentiation and energy metabolism, influencing fat tissue development.
ZNF42, Zinc Finger Protein 42Involved in regulating gene expression and cellular differentiation processes.
ERF::FIGLA, ETS2 Repressor Factor and Factor of Germline AlphaTranscription factor complex that regulates ovarian development and folliculogenesis.

Single-Nucleotide-Polymorphisms (SNPs)

SNPsPositionBase ChangeAmino Acid ChangeMutation TypeSignificanceClinical Significance
Rs963273525Amino Acid 1TCMetValMissenseIn start codon (CDS)N/A
Rs1800736098Base pair 8CAN/ATransversion mutationConserved Transcription binding region (NHLH2) in 5’ UTR that is conserved between all orthologs testedN/A
Rs1296872402Base pair 2425TGN/ATransversion mutationPolyA signal (3’ UTR) that is conserved in all orthologs testedN/A

This table above illustrates 3 SNPs that occur within the CDS, 5' UTR, and 3' UTR. These SNPs were found using Variation Viewer [13] These SNPs were chosen due to location within C6ORF47 gene. Variation Viewer showed no pathogenic SNPs and only large deletions that include copious gene.

Protein

Basic Information

Family

The C6ORF47 protein belongs to the family of proteins referred to as MHC proteins (Major histocompatibility complex) which is a band on the short arm of chromosome located at 6p21.3 that spans 3.6 megabases. [16] The generalized function of MHC molecules is to bind peptide fragments that are from pathogens and display them on the surface of the cell for recognition by T cells. [17] C6ORF47 protein is considered to be part of the MHC Class III protein. [18] MHC class III proteins are noted to be poorly defined structurally and functionally. It is noted that the MHC Class III genes contain cytokines and heat shock proteins within this region. It was recently found that genes encoded in the telomeric region on the MHC class III and appears to be involved in specific and global inflammatory responses. [19]

Primary

Human C6ORF47 mRNA encodes for a 294 amino acid protein. SAPS also showed that the protein had shown enrichment of leucine, proline, and glycine in C6ORF47 protein compared to other human proteins. [14] It had also shown that a significantly lower amount of isoleucine as well as lower valine, tyrosine, threonine, phenylalanine, and asparagine than normal in the C6ORF47 protein when compared to other human proteins. Repeats of leucine residues spaced seven amino acids apart in the basic leucine zipper (as shown in blue text in the conceptual translation below on right) and was found to be conserved in mammalian orthologs of the C6ORF47 protein via Motif Scan. [20]

Conceptual Translation of human C6ORF47 gene. Conceptual Translation of C6ORF47.png
Conceptual Translation of human C6ORF47 gene.

Secondary

PredictProtein [21] predicted that the secondary structure of the human C6ORF47 protein was 35.4% helix, 2.4% strand, and 62.2% loop.

Tertiary

I-TASSER predicted tertiary structure for human C6ORF47 protein. I-Tasser Image.png
I-TASSER predicted tertiary structure for human C6ORF47 protein.

PSORT II prediction tool [23] showed three transmembrane segments in amino acids 182-198, 222-238, and 246-262 of the human C6ORF47 protein.

It is also important to note that all of the mammalian orthologs presented show quite similar transmembrane regions (close in A.A sequence locations) besides the platypus (See table below for all Mammalian ortholgos used).

Due to other C6ORF47 orthologs mainly being much shorter than the mammalian sequences, the predicted cleavage site is usually slightly higher, while the transmembrane segments vary depending on the length of protein sequences. 1-2 transmembrane segments were found in reptiles, one of the two amphibians, and one fish ortholog, but it is by far still most popular to have 3 transmembrane segments in orthologs.

PSORT II [23] showed that the C6ORF47 protein is predicted to be localized in the endoplasmic reticulum (55.6%). DeepLoc [24] software further supports the idea that the C6ORF47 protein is localized to the endoplasmic reticulum, showing that there is about an 86.12% chance that it is localized there. It also supports the idea previous finding by PSORT II prediction and SOSUI about human C6ORF47 protein being a transmembrane protein (93.6% chance).

Post-Translational Modifications

Phosphorylation sites were experimentally proven on amino acids 34, 35, 71, and 90 in the human C6ORF47 protein via NCBI. [6] Sites 34 and 35 are predicted to be phosphorylated by Casein Kinase II. [20]

Endoplasmic Reticulum (ER) signals ensure the protein remains in the endoplasmic reticulum, aiding proper folding, quality control, and trafficking.

Sumoylation attaches SUMO proteins to targets, regulating nuclear transport, transcription, DNA repair, and protein stability. Sumolyation was found at amino acids 75, 114, and 147. [25]

O-linked β-N-acetylglucosamine modifies serine/threonine residues, regulating signaling, transcription, and protein-protein interactions dynamically and was found to be at amino acid 60. [26]

Human C6ORF47 protein with annotated domains, transmembrane regions, and post-translational modifications. P=experimentally proven phosphorylation sites, Pre. P=predicted phosphorylation sites that showed a likelihood of above 0.5, Pre. S= predicted sumoylation sites, TM= transmembrane segments, and Pre-O-GlcNAc= predicted O-linked b-N-acetylglucosamine. IBS2-20241242020-600dpi.jpg
Human C6ORF47 protein with annotated domains, transmembrane regions, and post-translational modifications. P=experimentally proven phosphorylation sites, Pre. P=predicted phosphorylation sites that showed a likelihood of above 0.5, Pre. S= predicted sumoylation sites, TM= transmembrane segments, and Pre-O-GlcNAc= predicted O-linked β-N-acetylglucosamine.

Interactions

FGFR3: An interaction of C6ORF47 and FGFR3 was found via a two-hybrid assay with an average detection confidence of medium. This was found via a BioGRID interaction database that was found in August 2022 during a large-scale dataset being scored individually and all other interactions globally. [7] [28]

Fibroblast growth factor receptor 3, FGFR3, is part of the fibroblast growth factor receptor family that shares similar structure and functions. FGFR3 is known to span the membrane with one end remaining within the membrane while the other end projects to the outer surface of the cell. [29] Fibroblast growth factor receptor 3 is known to play an important role in cartilage development in the growth plate. FGFR3, commonly known as fibroblast growth factor receptor 3, is a tyrosine-protein kinase that acts on the cell-surface receptor for fibroblast growth factors and plays an essential role in cell proliferation, angiogenesis, differentiation, and apoptosis. [30] FGFR3 is known to interact with growth factors outside the cell and receive signals that regulate growth and development within the cell. [29]

Homology

Orthologs

C6ORF47 gene is estimated to have first appeared approximately 563 million years ago (MYA) in lampreys. C6ORF47 was found in ray-fined fish (actinopterygii), cartilaginous fish, lampreys, and lobe-finned fish (sacropterygii), but no hagfish suggesting that possibly this gene was inserted into lampreys. C6ORF47 is conserved to vertebrates with no traces of it being present before vertebrates as seen by its oldest ancestor lampreys (563 MYA). The C6ORF47 gene evolved quite rapid since it was shown to evolve slightly slower than Fibrinogen Alpha and it much faster than Cytochrome C. Orthologs used for this diagram included Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Danio rerio, Seven-gill Sharpnose Shark, and Sea Lamprey) (See Time-Calibrated comparative date of divergence diagram located to the down to the right).

Time-Calibrated comparative date of divergence diagram comparing the evolution of C6ORF47, Cytochrome C, and Fibrinogen Alpha. Orthologs used for this table are as listed: Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Zebrafish, Seven-gill Sharpnose Shark, and Sea Lamprey. TIme-Calibrated comparative date of divergence diagram comparing the evolution of C6ORF47, Cytochrome C, and Fibrinogen Alpha.png
Time-Calibrated comparative date of divergence diagram comparing the evolution of C6ORF47, Cytochrome C, and Fibrinogen Alpha. Orthologs used for this table are as listed: Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Zebrafish, Seven-gill Sharpnose Shark, and Sea Lamprey.


Global Alignments with Human C6ORF47 protein with the seven-gill sharpnose shark C6ORF47 protein showed two noticeable large gaps found from human C6ORF47 protein in amino acids 44-62 and 153-173 . These gaps were present in all descendants of vertebrates until rodents and rabbits. The second global alignment with the human C6ORF47 protein and pacific pocket mouse (rodent) C6ORF47 protein shows that these gaps are no longer present indicating a possible insertions of these gaps in the protein in mammals. It is important to note that the pacific pocket mouse C6ORF47 protein was one of the least related sequences within the rodents from the orthologs table and still showed these 2 large gaps being no longer being present when aligned with the human C6ORF47 protein sequence. [31]






Ortholog Table for C6ORF47 Protein

C6ORF47Genus and SpeciesCommon NameTaxonomic OrderDate of Divergence (MYA)Acession # [32] Sequence (aa)Identity(%)Similariity (%)Gaps (%)
Mammals Homo sapiens HumansPrimates0NP_0670072941001000
Perognathus longimembris pacificus Pacific Pocket MouseRodentia87XP_04820412829379.3840.3
Mus musculus House MouseRodentia87NP_25843829375.981.30.3
Loxodonta africana African Bush ElephantProboscideans99XP_00342232529774.581.51.7
Phascolarctos cinereus KoalaDiprotodontia160XP_0208297393025463.810.8
Vombatus ursinus Common WombatDiprotodontia160XP_0277324973005566.16.5
Ornithorhynchus anatinus PlatypusMonotremata180XP_02891123024142.350.231.2
Reptile Chrysemys picta bellii Painted TurtleTestudines319XP_00528937319923.429.452
Terrapene triunguis Three-toed Box TurtleTestudines319XP_02407972417420.425.159.9
Anolis sagrei Brown AnoleSquamata319XP_06061544921725.934.536.7
Pseudonaja textilis Eastern Brown SnakeSquamata319XP_02657586921227.436.638.9
Amphibians Xenopus laevis African Clawed FrogAnura352XP_01808874022421.430.339.6
Pleurodeles waltl Iberian ribbed newtUrodela352KAJ113444826827.132.936.2
Fish Protopterus annectens West African LungfishLepidosireniformes408XP_04393920628927.739.224.4
Misgurnus anguillicaudatus Pond LoachCypriniformes429XP_05507508030225.43927.7
Cirrhinus molitorella Mud CarpCypriniformes429KAK2887169311243823.1
Danio rerio Zebra fishCypriniformes429NP_00141033231522.535.728.9
Carcharodon carcharias Great White SharkLamniformes462XP_04106936425022.530.440.9
Heptranchias perlo Seven-gill Sharpnose SharkHexanchiformes462XP_06783007924925.537.627.1
Lethenteron reissneri Far Eastern Brook LampreyPetromyzontiformes563XP_06140660121722.129.536.2
Petromyzon marinus Sea LampreyPetromyzontiformes563XP_03281487721522.730.435.3

The Table above illustrates 20 orthologs of C6ORF47 protein. This table shows a couple orthologs from each major class of class of vertebrates except Aves (Agnatha, Chondrichthyes, Osteichthyes, Amphibia, Reptilia, Mammalia). This is because the C6ORF47 gene is conserved in vertebrates. The identity, similarity, and gaps are referring to each of the orthologs protein amino acid contents being compared to the human C6ORF47 protein.

The C6ORF47 ortholog phylogenetic tree is limited to vertebrates because it is only conserved back to vertebrates.
Abbreviation (From MYA Youngest to Oldest)
Common Name
Hsa
Humans
Mum
House Mouse
Phc
Koala
Ora
Platypus
Heb (319 MYA)
Bynoes Gecko
Ans
Brown Anole
Pst
Eastern Brown Snake
Xel
African Clawed Frog
Plw
Iberian ribbed newt
Pra
West African Lungfish
Mia
Pond Loach
Cim
Mud Carp
Dar
Zebra fish
Cac
West African Lungfish
Hst
Pond Loach
Ebl
Far Eastern Brook Lamprey
Sel
Sea Lamprey Time-calibrated unrooted phylogenetic tree for c6orf47 orthologs.png
The C6ORF47 ortholog phylogenetic tree is limited to vertebrates because it is only conserved back to vertebrates.
Abbreviation (From MYA Youngest to Oldest)Common Name
HsaHumans
MumHouse Mouse
PhcKoala
OraPlatypus
Heb (319 MYA)Bynoes Gecko
AnsBrown Anole
PstEastern Brown Snake
XelAfrican Clawed Frog
PlwIberian ribbed newt
PraWest African Lungfish
MiaPond Loach
CimMud Carp
DarZebra fish
CacWest African Lungfish
HstPond Loach
EblFar Eastern Brook Lamprey
SelSea Lamprey

Paralogs

No paralogs were found for the human C6ORF47 gene in humans. [7] [33]

Conserved Regions

The promoter region was found to have many stretched of nucleotides that were conserved across mammalian orthotlogs like transcriptional bindings sites of at least one SP9 spot (just upstream to 5' UTR), NHLH2 and ERF:FIGLA (just just after the start of transcription), ZNF454 (shortly after previous mentioned transcription factor; ~20 nucleotides downstream), EBF1 and EBF2 (~330 basepairs downstream of transcriptional start), NR5A2, ZNF423, STAT3 (all found ~120 basepairs downstream of previous transcription factor mentioned), and ZND42 (found overlaying the start of the coding sequence).

Multiple sequence alignments with C6ORF47 orthologs showed that there were many amino acids on the C-terminal side of the protein that are conserved while there is much less conservation in the N-terminal side. This is likely due to the protein containing a large disordered region on the N-terminal side.

The 3' UTR was found to have 9 conserved areas in it. Listed below in the table is all conserved ares that were found for C6ORF47

miRNAPosition in the UTRseed match
Conserved sites in The 3' UTR
hsa-miR-125b-5p85-928mer
hsa-miR-431985-928mer
hsa-miR-125a-5p85-928mer
hsa-miR-138-5p204-2107mer-m8
hsa-miR-24-3p438-4458mer
hsa-miR-137677-6848mer
hsa-miR-325-3p679-6857mer-1A
hsa-miR-140-5p714-7207mer-1A
hsa-miR-142-3p.1716-7227mer-1A

References

  1. 1 2 3 ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531 GRCh38: Ensembl release 89: ENSG00000203623, ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000043311 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "Homo sapiens chromosome 6 open reading frame 47 (C6orf47), mRNA". NCBI. 2024-04-04.
  6. 1 2 3 "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.
  7. 1 2 3 4 "C6orf47 Gene - Chromosome 6 Open Reading Frame 47". Gene Card The Human Gene Database. Weizmann Institute of Science. Retrieved 26 September 2024.
  8. "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-15.
  9. "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-05.
  10. Boström, Pontus; Magnusson, Björn; Svensson, Per-Arne; Wiklund, Olov; Borén, Jan; Carlsson, Lena M. S.; Ståhlman, Marcus; Olofsson, Sven-Olof; Hultén, Lillemor Mattsson (August 2006). "Hypoxia converts human macrophages into triglyceride-loaded foam cells". Arteriosclerosis, Thrombosis, and Vascular Biology. 26 (8): 1871–1876. doi:10.1161/01.ATV.0000229665.78997.0b. ISSN   1524-4636. PMID   16741148.
  11. "Six-Frame Translation". www.bioline.com. Retrieved 2024-12-05.
  12. "UCSC Genome Browser Home". genome.ucsc.edu. Retrieved 2024-12-05.
  13. "Variation Viewer". www.ncbi.nlm.nih.gov. Retrieved 2024-12-13.
  14. 1 2 "SAPS". www.ebi.ac.uk. Retrieved 2024-12-05.
  15. "PaxDb: Protein Abundance Database". pax-db.org. Retrieved 2024-12-14.
  16. Mungall, A. J.; Palmer, S. A.; Sims, S. K.; Edwards, C. A.; Ashurst, J. L.; Wilming, L.; Jones, M. C.; Horton, R.; Hunt, S. E.; Scott, C. E.; Gilbert, J. G. R.; Clamp, M. E.; Bethel, G.; Milne, S.; Ainscough, R. (October 2003). "The DNA sequence and analysis of human chromosome 6". Nature. 425 (6960): 805–811. Bibcode:2003Natur.425..805M. doi:10.1038/nature02055. ISSN   1476-4687. PMID   14574404.
  17. Charles A Janeway, Jr; Travers, Paul; Walport, Mark; Shlomchik, Mark J. (2001), "The major histocompatibility complex and its functions", Immunobiology: The Immune System in Health and Disease. 5th edition, Garland Science, retrieved 2024-10-16
  18. Lehner, Ben; Semple, Jennifer I; Brown, Stephanie E; Counsell, Damian; Campbell, R. Duncan; Sanderson, Christopher M (2004-01-01). "Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region" . Genomics. 83 (1): 153–167. doi:10.1016/S0888-7543(03)00235-0. ISSN   0888-7543. PMID   14667819.
  19. Gruen, J R; Weissman, S M (2001-08-01). "Human MHC class III and IV genes and disease associations". Frontiers in Bioscience. 6: D960–72. doi:10.2741/gruen. ISSN   1093-9946. PMID   11487469.
  20. 1 2 "Motif Scan". myhits.sib.swiss. Archived from the original on 2021-06-02. Retrieved 2024-12-05.
  21. "PredictProtein - Protein Sequence Analysis, Prediction of Structural and Functional Features". predictprotein.org. Retrieved 2024-12-05.
  22. "I-TASSER results". seq2fun.dcmb.med.umich.edu. Retrieved 2024-12-13.
  23. 1 2 "PSORT II Prediction". psort.hgc.jp. Retrieved 2024-12-05.
  24. "DeepLoc 2.1 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-05.
  25. "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interacting Motifs". sumo.biocuckoo.cn. Retrieved 2024-12-14.
  26. "YinOYang 1.2 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-14.
  27. "IBS 2.0: Illustrator for Biological Sequences". ibs.renlab.org. Retrieved 2024-12-04.
  28. "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-05.
  29. 1 2 "FGFR3 gene: MedlinePlus Genetics". medlineplus.gov. Retrieved 2024-12-13.
  30. "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-10.
  31. "Emboss Needle". www.ebi.ac.uk. Retrieved 2024-12-15.
  32. "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.
  33. "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2024-10-16.