CCDC47

Last updated

CCDC47
Identifiers
Aliases CCDC47 , MSTP041, GK001, coiled-coil domain containing 47, THNS
External IDs OMIM: 618260; MGI: 1914413; HomoloGene: 41351; GeneCards: CCDC47; OMA:CCDC47 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_020198

NM_026009

RefSeq (protein)

NP_064583

NP_080285

Location (UCSC) Chr 17: 63.75 – 63.78 Mb Chr 11: 106.09 – 106.11 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein PAT complex subunit CCDC47. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response. [5]

Contents

Gene

The CCDC47 gene itself is located on the minus strand of human chromosome 17 and contains 13 exon splice sites and 14 distinct introns. After removal of exons, the gene is 3445 base pairs in length. No evidence for micro RNA or pseudogenes has been found. The gene does not have various isoforms, only transcript variant 1X exists.

Genomic location of CCDC47 at 17q23.3 Chromosome 17 Diagram.jpg
Genomic location of CCDC47 at 17q23.3

Protein

Structure

The protein encoded by CCDC47 is 483 amino acids in length and contains both a signal peptide and transmembrane domain. It is rich in negatively charged amino acids such as aspartic acid and glutamic acid giving it an acidic isoelectric point of 4.56. [7] The protein is also rich in methionine. In total, it weighs 55.9 kDal which is conserved through various orthologs. CCDC47 also contains the SEEEED superfamily and domain of unknown function 1682 (DUF1682). The SEEEED superfamily is a short, low complexity region which is composed mainly of serine. The family routinely lies on the clathrin adaptor complex 3 beta-1 subunit proteins. [8] The exact function of DUF 1682 is unclear but one member of the family has been described as an adipocyte-specific protein. [9]

PHYRE was able to predict with 76.1% confidence the C-terminus structure of CCDC47 from amino acids 396-473. This alpha helix structure is depicted above. C Terminus CCDC47.png
PHYRE was able to predict with 76.1% confidence the C-terminus structure of CCDC47 from amino acids 396-473. This alpha helix structure is depicted above.

There are two predicted disulfide bonds in the structure of CCDC47 at cysteines 209 to 214 and cysteines 215 to 283, respectively. [10] The C-terminal portion of the protein is highly charged and its secondary structure is predicted to be that of an alpha helix region. [11] This region also contains coiled coil domains which are structural motifs in which 2-7 alpha helices are coiled together and are subsequently involved in biological expression. These domains typically follow the pattern HxxHCxC where H is a hydrophobic amino acid, C is a charged amino acid and x is any amino acid. [12] Many amino acid sequences following this pattern are seen in the C-terminal region of CCDC47 where the highest conservation through orthologs is represented.

The CCDC47 protein construct, including the signal peptide, SEEEED superfamily, transmembrane domain and DUF1682. CCDC47 Protein.jpg
The CCDC47 protein construct, including the signal peptide, SEEEED superfamily, transmembrane domain and DUF1682.

Regulation and translation

CCDC47 is regulated by the promoter GXP43413. [13] The promoter is 819 base pairs in length and is highly conserved in mammals. Conserved binding sites in mammals which are located on this promoter include nuclear respiratory factor 1 (NRF1), cAMP response element-binding protein (CREB), PAR bZIP family and Sp4 transcription factor. NRF1 encodes a protein which homodimerizes and activates expression of key metabolic genes. CREB binds to cAMP response elements thereby increasing or decreasing the transcription of downstream genes [14] while PAR bZIP family is involved in the regulation of circadian rhythms. [15] In regards to the mRNA, translation begins at base pair 337 and ends at 1728. There is a strong stem loop located in the 5' UTR from bases 289-318 which likely is involved in regulation of the mRNA due to its close proximity to the start codon. [16]

Cellular distribution

The final protein is thought to be translated from the endoplasmic reticulum into the cytoplasm of the cell. The protein is anchored in the membrane of the ER at the transmembrane domain located from amino acid 137 to 165. [17] The portion of the protein which extends into the cytosol is predicted to be highly phosphorylated as the protein's phosphorylation sites are conserved into the bony fish orthologs. [18] Research has shown that CCDC47 is expressed in the response to an ER overload making this close proximity to the ER important. [19]

Post translational modification

In addition to the high levels of phosphorylation seen in CCDC47, three sulfation sites are predicted and conserved in mammals, reptiles and birds but not in fish, amphibians or invertebrates. [20] Five potential sumoylation sites are also seen and conserved back to the bony fish. [21] There is no glycosylation of the protein as it is not predicted to extend into the extracellular space.

Expression

Microarray tissue expression patterns from GEO were analyzed and showed that CCDC47 appears to be an ubiquitously expressed at moderate levels in many different human tissues. [22] Although the protein is ubiquitously expressed, the highest levels of expression are seen in neuronal tissues such as the superior cervical ganglion, brain amygdala and ciliary ganglion. Elevated expression is also seen in the thyroid and CD34 + cells.

Homology

CCDC47 has no known paralogs through text based queries, BLAST and BLAT. The gene has many orthologs extending back to invertebrates such as C. elegans and is highly conserved in mammals with a percent identity greater than 95%. CCDC47 has been sequenced in a wide taxonomy of organisms including mammals, birds, reptiles, amphibians, bony fish and invertebrates. Percent identity of human CCDC47 to a specific ortholog declines with increasing years of divergence, as expected. Homologous genes of CCDC47 are also present in mosquitos, mushrooms, arabidopsis and Asian rice. These homologs contain the same DUF1682 which is found in CCDC47.

Orthologs of CCDC47
Genus

Species

Common Organism NameDivergence from

Humans (MYA) [23]

NCBI Protein

Accession Number

Sequence Identity

to Humans [24]

Sequence Length

(AA)

Mus musculusMouse92.3NP_080285.297.90%483
Myotis davidii Mouse-eared Bat94.2XP_006776781.197.50%483
Elephantulus edwardiiElephant Shrew98.7XP_006886355.195.00%483
Alligator mississippiensisAmerican Alligator296XP_006271625.191.00%482
Falco cherrugSaker Falcon296XP_005439470.190.10%482
Ophiophagus hannahKing Cobra296ETE7395578.90%516
Xenopus laevis African Clawed Frog371.2NP_001087058.178.70%489
Danio rerioZebra Fish400.1NP_001004551.176.20%486
Latimeria chalumnaeCoelacanth414.9XP_00599466.383.50%478
Saccoglossus kowalevskiiAcorn Worm661.2XP_00682210850.50%496
Pediculus humanus corporisHuman Body Lice782.7XP_00242435946.10%447
Acyrthosiphon pisonAphid782.7NP_00116214743.50%449
Caenorhabditis elegansRoundworm937.5NP_497788.135.10%442

Related Research Articles

<span class="mw-page-title-main">KIAA1109</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.

<span class="mw-page-title-main">Transmembrane protein 53</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 53, or TMEM53, is a protein that is encoded on chromosome 1 in humans. It has no paralogs but is predicted to have many orthologs across eukaryotes.

<span class="mw-page-title-main">TMEM69</span> Protein-coding gene in the species Homo sapiens

TMEM69, also known as Transmembrane protein 69, is a protein that in humans is encoded by the TMEM69 gene. A notable feature of the protein encoded by TMEM69 is the presence of five transmembrane segments.

Uncharacterized LOC644249 gene., also known as RP11-195B21.3, is about 1058 base pairs long and is found in Homo sapiens on chromosome 9q12. More specifically, the sequence is located on Chromosome: 9; NC_000009.11(67977457..67987991 bp). This gene’s protein product is the “coiled-coil domain-containing protein 29” which is 291 amino acids long and may contain a conserved domain in the superfamily, pfam 12001. In particular, this conserved domain contains the domain of unknown function DUF3496 which is about 110 amino acids long, functionally uncharacterized, and found in eukaryotes. Other possible motifs for the protein product exist but the DUF3496 remains the most likely. This protein may play a role as a transmembrane protein.

<span class="mw-page-title-main">CCDC144A</span> Protein-coding gene in humans

Coiled-coil domain-containing protein 144A is a protein that in humans is encoded by the CCDC144A gene. An alias of this gene is called KIAA0565. There are four members of the CCDC family: CCDC 144A, 144B, 144C and putative CCDC 144 N-terminal like proteins.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

<span class="mw-page-title-main">Coiled-coil domain containing 42B</span> Protein found in humans

Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.

<span class="mw-page-title-main">Transmembrane protein 134</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 134 is a protein encoded by the TMEM134 gene. TMEM134 does not have any other known aliases. There are two transmembrane domains and a domain of unknown function (DUF872). Evolutionary, the majority of the organisms that have this gene are primates and mammals, although there are some organisms dating back to Drosophila and C. elegans. Through current research, there has not been any confirmed function of TMEM134.

<span class="mw-page-title-main">GPATCH11</span> Protein-coding gene in the species Homo sapiens

GPATCH11 is a protein that in humans is encoded by the G-patch domain containing protein 11 gene. The gene has four transcript variants encoding two functional protein isoforms and is expressed in most human tissues. The protein has been found to interact with several other proteins, including two from a splicing pathway. In addition, GPATCH11 has orthologs in all taxa of the eukarya domain.

<span class="mw-page-title-main">C14orf93</span> Protein-coding gene in the species Homo sapiens

C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

Transmembrane protein 151A, also known as TMEM151A, is a protein that is encoded by the TMEM151A gene.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">FAM155B</span> Protein-coding gene in humans

Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">MIPOL1</span> Protein-coding gene in the species Homo sapiens

MIPOL1 , also known as CCDC193 , is a protein that in humans is encoded by the MIPOL1 gene. Mutation of this gene is associated with mirror-image polydactyly in humans, which is a rare genetic condition characterized by mirror-image duplication of digits.

<span class="mw-page-title-main">IGSF6</span> Protein-coding gene in the species Homo sapiens

IGSF6 is a protein that in humans is encoded by the IGSF6 gene.

<span class="mw-page-title-main">CCDC184</span> Protein found in humans

Coiled-coil domain-containing 184 (CCDC184) is a protein which, in humans, is encoded by the CCDC184 gene

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000108588 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000078622 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "AceView". NCBI. Retrieved 1 March 2014.[ permanent dead link ]
  6. "CCDC47 coiled-coil domain containing 47". NCBI. Retrieved 3 March 2014.
  7. "SAPS Anaysis". SDSC Workbench. Retrieved 14 April 2014.
  8. "NCBI BLAST". National Center for Biotechnology Information. Retrieved 7 March 2014.[ permanent dead link ]
  9. "Genecards". The Human Gene Compendium. Retrieved 7 March 2014.
  10. "Sulfinator". ExPASy. Retrieved 7 April 2014.
  11. "PHYRE 2 Protein Recognition Software" . Retrieved 14 April 2014.
  12. Mason JM, Arndt KM (2004). "Coiled coil domains: stability, specificity, and biological implications". ChemBioChem. 5 (2): 170–6. doi:10.1002/cbic.200300781. PMID   14760737. S2CID   39252601.
  13. "El Dorado". Genomatix. Retrieved 3 April 2014.[ permanent dead link ]
  14. "Protein One". Transcription Factors. Archived from the original on 2014-06-05. Retrieved 29 March 2014.
  15. "Protein Spotlight, The PAR b ZIP Family". 20 August 2004. Retrieved March 28, 2014.
  16. "The mfold Web Server" . Retrieved 3 April 2014.
  17. "DAS-TM Filter Server". ExPASy. Archived from the original on 5 February 2018. Retrieved 17 April 2014.
  18. "NetPhos Server 2.0". ExPASy. Retrieved 20 April 2014.
  19. Viguerie N, Picard F, Hul G, Roussel B, Barbe P, Iacovoni JS, Valle C, Langin D, Saris WH (2012). "Multiple effects of a short-term dexamethasone treatment in human skeletal muscle and adipose tissue". Physiological Genomics. 44 (2): 141–151. doi:10.1152/physiolgenomics.00032.2011. ISSN   1094-8341. PMID   22108209.
  20. "Sulfinator". ExPASy. Retrieved 20 April 2014.
  21. "SumoPLOT". ExPASy. Retrieved 20 April 2014.[ permanent dead link ]
  22. "GEO Profiles". NCBI. Retrieved 20 March 2014.
  23. "Time Tree: The Timescale of Life" . Retrieved 13 March 2014.
  24. "BLAST". NCBI. Retrieved 13 March 2014.