CCDC138

Last updated

Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.

Contents

CCDC138
Identifiers
Aliases CCDC138 , coiled-coil domain containing 138
External IDs MGI: 1923388 HomoloGene: 44912 GeneCards: CCDC138
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001162956

RefSeq (protein)

NP_001156428

Location (UCSC) Chr 2: 108.79 – 108.89 Mb Chr 10: 58.5 – 58.58 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Gene

The CCDC138 gene can be found at the positive strand of chromosome 2. [5]

Locus

The CCDC138 gene is located at the long(q) arm of chromosome 2 at locus 12.13, [6] or in short 2q12.3. It can be found at location 108,786,752-108,876,591. [7] The DNA sequence is 89,840bp long.

The red line shows the CCDC138 locus on chromosome 2q12.3. Homo sapiens CCDC138 on chromosome 2.png
The red line shows the CCDC138 locus on chromosome 2q12.3.

Common aliases

CCDC138 is the only established common alias.

Homology and evolution

Paralogs

No paralogs of CCDC138 have been identified.

Orthologs

CCDC138 is conserved in various organisms as shown in the table below.

Scientific nameCommon nameDate of divergence from human lineage [8] Sequence lengthSequence identity to human RNA/proteinSequence similarity to human RNA/protein
Mus musculus House mouse 92.3 MYA2466 bp77%65.7%
Columbia livia Rock dove 296 MYA1863 bp61%59.4%
Xenopus laevis African clawed frog 371.2 MYA2634 bp52%40.1%
Anolis carolinensis Red-throated anole 296 MYA9588 bp78%46.3%
Latimeria chalumnae West Indian Ocean coelacanth 414.9 MYA1838 bp71%38.1%
Strongylocentrotus purpuratus Purple sea urchin 742.9 MYA2047 bp59%14.5%
Ciona intestinalis Vase tunicate 722.5 MYA2420 bp56%23.8%
Aplysia californica California sea slug 782.7 MYA2103 bp49%17.2%
Hydra vulgaris Fresh-water polyp 855.3 MYA1482 bp46%13.7%
Chrysemys picta bellii Western painted turtle 296 MYA700 bp36%15.9%
Alligator mississippiensis American alligator 296 MYA2089 bp76%38.4%
Melopsittacus undulatus Budgerigar 296 MYA1764 bp73%40.3%
Taeniopygia guttata Zebra finch 296 MYA1980 bp75%44.2%
Lepisosteus oculatus Spotted gar 400.1 MYA1269 bp65%30.9%
Saccoglossus kowalevskii Acorn worm 661.2 MYA2515 bp42%24.1%
Branchiostoma floridae Lancelet713.2 YA1758 bp55%27.0%
Maylandia zebra Zebra mbuna fish400.1 MYA4815 bp58%21.4%
Trichoplax adhaerens Trichoplax 800 MYA1605 bp56%10.3%
Pelodiscus sinensis Chinese softshell turtle 296 MYA2895 bp77%33.9%
Falco cherrug Saker falcon 296 MYA1866 bp73%48.9%

Distant homologs

The most distant homolog detected or predicted is Trichoplax adhaerans. It has a conserved CCDC138 gene and has evolved 800 MYA before the human lineage.

Homologous domains

Among the orthologs stated above, there are various homologous regions that are conserved shown in figures below.

CCDC138 multiple sequence alignment showing conserved regions. CCDC138 multiple sequence alignment BOXSHADE 1.png
CCDC138 multiple sequence alignment showing conserved regions.
CCDC138 multiple sequence alignment showing conserved regions. CCDC138 multiple sequence alignment BOXSHADE 2.png
CCDC138 multiple sequence alignment showing conserved regions.
CCDC138 multiple sequence alignment showing conserved regions. CCDC138 multiple sequence alignment BOXSHADE 3.png
CCDC138 multiple sequence alignment showing conserved regions.

Green colors shows completely conserved residues, yellow color shows identical residues, cyan color shows similar residues, white color shows different residues.

Phylogeny

The observed phylogeny of the CCDC138 gene of the above mentioned orthologs recapitulates the evolutionary history. [9]

CCDC138 rooted phylogeny tree CCDC138 rooted phylogeny tree.png
CCDC138 rooted phylogeny tree

The figure above shows the evolutionary relationship of CCDC138 in the orthologs.

Protein

The CCDC138 protein is predated to have a molecular weight of 76.2Kda [10] and an isoelectric point of 8.614. [11] Compositional analysis shows that there is a low usage of the AGP grouping in CCDC138, and there are no positive, negative or mixed charge clusters. The protein has no ER retention motif in the C-terminus and no RNA binding motif. [12] It has also been predicted to be a soluble nuclear protein with a leucine zipper pattern (PS00029) at position 205 onwards with a sequence LQKRERFLLEREQLLFRHENAL. [12]

Primary sequence and variants/isoforms

There are two isoforms of the CCDC138 protein. The primary isoform has 665 amino acids [13] while the secondary isoform has 577 amino acids, [13] and is missing 88 amino acids at the C-terminus.

Pairwise sequence alignment comparing isoforms 1 and 2 of the CCDC138 protein. CCDC138 isoforms.png
Pairwise sequence alignment comparing isoforms 1 and 2 of the CCDC138 protein.

Figure shows the pairwise sequence alignment comparing the primary isoform (Isoform 1) to the secondary isoform (Isoform 2).

Domain and motifs

A domain of unknown function (DUF2317) on the protein at location 212 – 315 has been characterized in bacteria. TMHMM [14] and TMAP [15] suggests that there are no predicted transmembrane domain. SOSUI [16] further predicts that CCDC138 is a soluble protein with no transmembrane domain.

Post-translational modifications

According to SUMOplot Analysis Program, [17] there are 7 predicted sumoylation at lysine residues K7, K207, K336, K374, K383, K521, and K591. NetPhos [18] predicts that there are 44 phosphorylations sites, including 29 serine residues, 10 threonine residues, and 5 tyrosine residues. There are no further post-translational modifications as predicted by NetNGlyc, [19] NetOGlyc, [20] SignalP, [21] Sulfinator, [22] and Myristoylator. [23]

Secondary structure

The CCDC138 protein contains multiple alpha helixes, beta sheets and coiled-coils as predicted by PELE, CHOFAS, and GOR4.

CCDC138 secondary structure as predicted by PELE CCDC138 secondary structure.png
CCDC138 secondary structure as predicted by PELE

Yellow shows coiled-coil, blue shows alpha helix, and red shows beta sheet. The majority of the sequence are coiled-coils and alpha helixes.

3° and 4° structures

There are no predicted 3° and 4° Structures for the CCDC138 protein. However, there is a similar structure that has a 29% identity. [24] The predicted structure is Chain A, crystal structure analysis of Clpb, a protein that encodes an ATP-dependent protease and chaperone. This protein has an aligned-length of 144 amino acids, and the alignment is located at the domain of unknown function of CCDC138.

Chain A, crystal analysis structure of Clpb Chain A of crystal structure analysis of Clpb.png
Chain A, crystal analysis structure of Clpb

Expression

The gene is expressed at low levels in almost all human tissues, but higher levels have been seen in certain cancer tissues. CCDC138 is a soluble protein that is pre diced to localise in the nucleus of a cell.

Promoter

The promoter region of CCDC138 is shown as figure below.

Promoter region of CCDC138 with labeled transcription factor binding sites CCDC138 promoter.png
Promoter region of CCDC138 with labeled transcription factor binding sites

Expression

Microarray-assessed tissue expression patterns through GEO profiles show that CCDC138 is expressed in moderate levels in various tissues including peripheral blood lymphocyte, fetal thymus, thymus, testis, ovary, feral brain, colon, mammary gland, and bone marrow. [25]

Microarray-assessed tissue expression patterns shown in GEO profile. CCDC138 microarray-assessed tissue expression patterns (geo profile).png
Microarray-assessed tissue expression patterns shown in GEO profile.

Transcript variants

There are two most significant alternative transcript variants for CCDC138 mRNA. The first variant as shown in the figure below has been found in lung, blood, and human embryonic stem cells. [26] The second variant has been found in adenocarcinoma, prostate, lung, and primary lung epithelial cells. [27]

Transcript variants of CCDC138 CCDC138 transcript variants.png
Transcript variants of CCDC138

First transcript shows the complete mRNA transcript. Second transcript is the first variant, while the thirst transcript is the second variant. [28]

Function and biochemistry

The exact function of CCDC138 is yet to be known.

Interacting proteins

The CCDC138 protein has been found to interact with ubiquitin C, [29] a protein involved in ubiquination and eventually protein degradation.

Transcription factors that might bind to regulatory sequence

The table below shows some transcription factors that have been predicted by Genomatix that binds to the regulatory sequence of the CCDC138 gene. [30]

Detailed family informationDetailed matrix informationTissue
GC-Box factors SP1/GCStimulating protein 1, ubiquitous zinc finger transcription factorUbiquitous
Peroxisome proliferator-activated receptorPeroxisome proliferator-activated receptor gamma, DR1 sitesAdipose Tissue, Connective Tissue, Digestive System, Liver
MYT1 C2HC zinc finger proteinMyT1 zinc finger transcription factor involved in primary neurogenesisCentral Nervous System, Nervous System, Neuroglia, Neurons
NGFI-B response elements, nur subfamily of nuclear receptorsMonomers of the nur subfamily of nuclear receptors (nur77, nurr1, nor-1)Brain, Central Nervous System, Endocrine System, Immune System, Leydig Cells, Nervous System, Neurons, Testis, Thymus Gland, Urogenital System
Krueppel-like transcription factorsCore promoter-binding protein (CPBP) with 3 Krueppel-type zinc fingers (KLF6, ZF9)Blood cells, bone marrow cells, digestive system, embryonic structures, Erythrocytes, Hematopoietic System, liver
Grainyhead-like transcription factorsGrainyhead-like 3 (sister-of-mammalian grainyhead - SOM)Embryonic Structures, Integumentary System
CTCF and BORIS gene family, transcriptional regulators with 11 highly conserved zinc finger domainsInsulator protein CTCF (CCCTC-binding factor)Blood Cells, Embryonic Structures, Endocrine System, Erythrocytes, Germ Cells, Testis, Urogenital System
Core promoter motif ten elementsHuman motif ten element-
Abdominal-B type homeodomain transcription factorsHomeobox C13 / Hox-3gammaBone Marrow Cells, Bone and Bones, Central Nervous System, Connective Tissue, Embryonic Structures, Hematopoietic System, Integumentary System, Kidney, Nervous System, Neurons, Prostate, Skeleton, Spinal Cord, Urogenital System
E2F-myc activator/cell cycle regulatorE2F transcription factor 2Ubiquitous
PAX-3 binding sitesPax-3 paired domain protein, expressed in embryogenesis, mutations correlate to Waardenburg SyndromeEmbryonic structures, muscle, skeletal, muscles
ZF5 POZ domain zinc fingerZF5 POZ domain zinc finger, zinc finger protein 161-
Vertebrate TATA binding protein factorCellular and viral TATA box elements-
CCAAT binding factorsAvian C-type LTR CCAAT boxUbiquitous
Ccaat/Enhancer Binding ProteinCCAAT/enhancer binding protein alphaAdipose Tissue, Bone Marrow Cells, Connective Tissue, Digestive System, Hematopoietic System, Immune System, Liver, Myeloid Cells, Phagocytes
Activator-, mediator- and TBP-dependent core promoter element for RNA polymerase II transcription from TATA-less promotersX gene core promoter element 1-

Clinical significance

CCDC138 has been identified as one of the many genes involved in initiating term labor in myometrium. [31]

Related Research Articles

<span class="mw-page-title-main">QRICH1</span> Protein-coding gene in the species Homo sapiens

QRICH1, also known as Glutamine-rich protein 1, is a protein that in humans is encoded by the QRICH1 gene. One notable feature of this protein is that it contains a Caspase Activation Recruitment Domain, also known as a CARD domain. As a result of having this domain, QRICH1 is believed to be involved in apoptotic, inflammatory, and host-immune response pathways.

<span class="mw-page-title-main">Proline-rich 12</span> Protein-coding gene in the species Homo sapiens

Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.

<span class="mw-page-title-main">Coiled-coil domain containing 42B</span> Protein found in humans

Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.

C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">CCDC121</span> Protein found in humans

Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

<span class="mw-page-title-main">ZNF548</span> Protein-coding gene in the species Homo sapiens

Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.

<span class="mw-page-title-main">THAP3</span> Protein in Humans

THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000163006 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000038010 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "Homo sapiens coiled-coil domain containing 138 (CCDC138), mRNA" . Retrieved 2 Feb 2014.
  6. "CCDC138 - GeneCards" . Retrieved 2 Feb 2014.
  7. "CCDC138 coiled-coil domain containing 138 [ Homo sapiens (human) ]" . Retrieved 2 Feb 2014.
  8. "TimeTree :: The Timescale of Life" . Retrieved 5 March 2014.
  9. "CLUSTALW" . Retrieved 1 March 2014.[ permanent dead link ]
  10. "SAPS" . Retrieved 10 April 2014.[ permanent dead link ]
  11. "pI" . Retrieved 10 April 2014.[ permanent dead link ]
  12. 1 2 "PSORT WWW Server" . Retrieved 18 April 2014.
  13. 1 2 "Coiled-coil domain-containing protein 138 - CCDC138 - Homo sapiens (Human)" . Retrieved 10 February 2014.
  14. "TMHMM" . Retrieved 20 April 2014.[ permanent dead link ]
  15. "TMAP" . Retrieved 20 April 2014.[ permanent dead link ]
  16. "Classification and Secondary Structure Prediction of Membrane Proteins" . Retrieved 15 April 2014.
  17. "SUMOplot™ Analysis Program" . Retrieved 15 April 2014.
  18. "NetPhos 2.0 Server" . Retrieved 15 April 2014.
  19. "NetNGlyc 1.0 Server" . Retrieved 15 April 2014.
  20. "NetOGlyc 4.0 Server" . Retrieved 15 April 2014.
  21. "SignalP 4.1 Server" . Retrieved 15 April 2014.
  22. "The Sulfinator" . Retrieved 15 April 2014.
  23. "Myristoylator" . Retrieved 15 April 2014.
  24. "CBLAST" . Retrieved 22 April 2014.
  25. "GDS3113 / 172084 / CCDC138" . Retrieved 29 March 2014.
  26. "AHomo sapiens coiled-coil domain containing 138 (65.7 kD) (CCDC138) alternative variant dAug10, complete mRNA" . Retrieved 30 March 2014.
  27. "Homo sapiens coiled-coil domain containing 138 (47.0 kD) (CCDC138) alternative variant fAug10, complete mRNA" . Retrieved 30 March 2014.
  28. "AceView: Gene:CCDC138, a Comprehensive Annotation of Human, Mouse and Worm Genes with mRNAs or ESTs" . Retrieved 30 March 2014.
  29. "CCDC138 protein (Homo sapiens) - STRING network view" . Retrieved 22 April 2014.
  30. "Transcription Factors" . Retrieved 27 March 2014.[ permanent dead link ]
  31. Weiner CP, Mason CW, Dong Y, Buhimschi IA, Swaan PW, Buhimschi CS (May 2010). "Human effector/initiator gene sets that regulate myometrial contractility during term and preterm labor". American Journal of Obstetrics and Gynecology. 202 (5): 474.e1–20. doi:10.1016/j.ajog.2010.02.034. PMC   2867841 . PMID   20452493.