CCDC47 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CCDC47 , MSTP041, GK001, coiled-coil domain containing 47, THNS | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 618260; MGI: 1914413; HomoloGene: 41351; GeneCards: CCDC47; OMA:CCDC47 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein PAT complex subunit CCDC47. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response. [5]
The CCDC47 gene itself is located on the minus strand of human chromosome 17 and contains 13 exon splice sites and 14 distinct introns. After removal of exons, the gene is 3445 base pairs in length. No evidence for micro RNA or pseudogenes has been found. The gene does not have various isoforms, only transcript variant 1X exists.
The protein encoded by CCDC47 is 483 amino acids in length and contains both a signal peptide and transmembrane domain. It is rich in negatively charged amino acids such as aspartic acid and glutamic acid giving it an acidic isoelectric point of 4.56. [7] The protein is also rich in methionine. In total, it weighs 55.9 kDal which is conserved through various orthologs. CCDC47 also contains the SEEEED superfamily and domain of unknown function 1682 (DUF1682). The SEEEED superfamily is a short, low complexity region which is composed mainly of serine. The family routinely lies on the clathrin adaptor complex 3 beta-1 subunit proteins. [8] The exact function of DUF 1682 is unclear but one member of the family has been described as an adipocyte-specific protein. [9]
There are two predicted disulfide bonds in the structure of CCDC47 at cysteines 209 to 214 and cysteines 215 to 283, respectively. [10] The C-terminal portion of the protein is highly charged and its secondary structure is predicted to be that of an alpha helix region. [11] This region also contains coiled coil domains which are structural motifs in which 2-7 alpha helices are coiled together and are subsequently involved in biological expression. These domains typically follow the pattern HxxHCxC where H is a hydrophobic amino acid, C is a charged amino acid and x is any amino acid. [12] Many amino acid sequences following this pattern are seen in the C-terminal region of CCDC47 where the highest conservation through orthologs is represented.
CCDC47 is regulated by the promoter GXP43413. [13] The promoter is 819 base pairs in length and is highly conserved in mammals. Conserved binding sites in mammals which are located on this promoter include nuclear respiratory factor 1 (NRF1), cAMP response element-binding protein (CREB), PAR bZIP family and Sp4 transcription factor. NRF1 encodes a protein which homodimerizes and activates expression of key metabolic genes. CREB binds to cAMP response elements thereby increasing or decreasing the transcription of downstream genes [14] while PAR bZIP family is involved in the regulation of circadian rhythms. [15] In regards to the mRNA, translation begins at base pair 337 and ends at 1728. There is a strong stem loop located in the 5' UTR from bases 289-318 which likely is involved in regulation of the mRNA due to its close proximity to the start codon. [16]
The final protein is thought to be translated from the endoplasmic reticulum into the cytoplasm of the cell. The protein is anchored in the membrane of the ER at the transmembrane domain located from amino acid 137 to 165. [17] The portion of the protein which extends into the cytosol is predicted to be highly phosphorylated as the protein's phosphorylation sites are conserved into the bony fish orthologs. [18] Research has shown that CCDC47 is expressed in the response to an ER overload making this close proximity to the ER important. [19]
In addition to the high levels of phosphorylation seen in CCDC47, three sulfation sites are predicted and conserved in mammals, reptiles and birds but not in fish, amphibians or invertebrates. [20] Five potential sumoylation sites are also seen and conserved back to the bony fish. [21] There is no glycosylation of the protein as it is not predicted to extend into the extracellular space.
Microarray tissue expression patterns from GEO were analyzed and showed that CCDC47 appears to be an ubiquitously expressed at moderate levels in many different human tissues. [22] Although the protein is ubiquitously expressed, the highest levels of expression are seen in neuronal tissues such as the superior cervical ganglion, brain amygdala and ciliary ganglion. Elevated expression is also seen in the thyroid and CD34 + cells.
CCDC47 has no known paralogs through text based queries, BLAST and BLAT. The gene has many orthologs extending back to invertebrates such as C. elegans and is highly conserved in mammals with a percent identity greater than 95%. CCDC47 has been sequenced in a wide taxonomy of organisms including mammals, birds, reptiles, amphibians, bony fish and invertebrates. Percent identity of human CCDC47 to a specific ortholog declines with increasing years of divergence, as expected. Homologous genes of CCDC47 are also present in mosquitos, mushrooms, arabidopsis and Asian rice. These homologs contain the same DUF1682 which is found in CCDC47.
Genus Species | Common Organism Name | Divergence from Humans (MYA) [23] | NCBI Protein Accession Number | Sequence Identity to Humans [24] | Sequence Length (AA) |
---|---|---|---|---|---|
Mus musculus | Mouse | 92.3 | NP_080285.2 | 97.90% | 483 |
Myotis davidii | Mouse-eared Bat | 94.2 | XP_006776781.1 | 97.50% | 483 |
Elephantulus edwardii | Elephant Shrew | 98.7 | XP_006886355.1 | 95.00% | 483 |
Alligator mississippiensis | American Alligator | 296 | XP_006271625.1 | 91.00% | 482 |
Falco cherrug | Saker Falcon | 296 | XP_005439470.1 | 90.10% | 482 |
Ophiophagus hannah | King Cobra | 296 | ETE73955 | 78.90% | 516 |
Xenopus laevis | African Clawed Frog | 371.2 | NP_001087058.1 | 78.70% | 489 |
Danio rerio | Zebra Fish | 400.1 | NP_001004551.1 | 76.20% | 486 |
Latimeria chalumnae | Coelacanth | 414.9 | XP_00599466.3 | 83.50% | 478 |
Saccoglossus kowalevskii | Acorn Worm | 661.2 | XP_006822108 | 50.50% | 496 |
Pediculus humanus corporis | Human Body Lice | 782.7 | XP_002424359 | 46.10% | 447 |
Acyrthosiphon pison | Aphid | 782.7 | NP_001162147 | 43.50% | 449 |
Caenorhabditis elegans | Roundworm | 937.5 | NP_497788.1 | 35.10% | 442 |
Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.
Transmembrane protein 53, or TMEM53, is a protein that is encoded on chromosome 1 in humans. It has no paralogs but is predicted to have many orthologs across eukaryotes.
TMEM69, also known as Transmembrane protein 69, is a protein that in humans is encoded by the TMEM69 gene. A notable feature of the protein encoded by TMEM69 is the presence of five transmembrane segments.
Uncharacterized LOC644249 gene., also known as RP11-195B21.3, is about 1058 base pairs long and is found in Homo sapiens on chromosome 9q12. More specifically, the sequence is located on Chromosome: 9; NC_000009.11(67977457..67987991 bp). This gene’s protein product is the “coiled-coil domain-containing protein 29” which is 291 amino acids long and may contain a conserved domain in the superfamily, pfam 12001. In particular, this conserved domain contains the domain of unknown function DUF3496 which is about 110 amino acids long, functionally uncharacterized, and found in eukaryotes. Other possible motifs for the protein product exist but the DUF3496 remains the most likely. This protein may play a role as a transmembrane protein.
Coiled-coil domain-containing protein 144A is a protein that in humans is encoded by the CCDC144A gene. An alias of this gene is called KIAA0565. There are four members of the CCDC family: CCDC 144A, 144B, 144C and putative CCDC 144 N-terminal like proteins.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
Transmembrane protein 134 is a protein encoded by the TMEM134 gene. TMEM134 does not have any other known aliases. There are two transmembrane domains and a domain of unknown function (DUF872). Evolutionary, the majority of the organisms that have this gene are primates and mammals, although there are some organisms dating back to Drosophila and C. elegans. Through current research, there has not been any confirmed function of TMEM134.
GPATCH11 is a protein that in humans is encoded by the G-patch domain containing protein 11 gene. The gene has four transcript variants encoding two functional protein isoforms and is expressed in most human tissues. The protein has been found to interact with several other proteins, including two from a splicing pathway. In addition, GPATCH11 has orthologs in all taxa of the eukarya domain.
C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Transmembrane protein 151A, also known as TMEM151A, is a protein that is encoded by the TMEM151A gene.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
MIPOL1 , also known as CCDC193 , is a protein that in humans is encoded by the MIPOL1 gene. Mutation of this gene is associated with mirror-image polydactyly in humans, which is a rare genetic condition characterized by mirror-image duplication of digits.
IGSF6 is a protein that in humans is encoded by the IGSF6 gene.
Coiled-coil domain-containing 184 (CCDC184) is a protein which, in humans, is encoded by the CCDC184 gene