CCDC142

Last updated
Coiled-coil domain-containing protein 142
Identifiers
Aliases CCDC142IPR026700Coiled-Coil Domain Containing 142
External IDs GeneCards:
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

n/a

n/a

RefSeq (protein)

n/a

n/a

Location (UCSC)n/an/a
PubMed searchn/an/a
Wikidata
View/Edit Human

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2 (at 2p13), spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. [1] [2] There are two known isoforms of CCDC142. [1] CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. [3] Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. [1] Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain. [3] [4]

Contents

Locus

CCDC142 Gene Locus CCDC142 Locus.png
CCDC142 Gene Locus

CCDC142 is found on the – strand of chromosome 2 (2p13.1), with the genomic sequence spanning bases 74,472,832 to 74,483,230. [1] The coding region is 8292 base pairs long, encoding for two protein isoforms 743 to 665 amino acids in length. [1] On the telomeric side, CCDC142 is followed by the MOGS and MRPL53 genes. On the centromeric side, it is followed by the C31, LBX2, LBX2-AS1, and PCGF1 genes. [1]

mRNA

In Homo sapiens, the CCDC142 gene encodes for two alternatively spliced isoforms of the mRNA, called isoform 1 and isoform 2. [3] Both of these isoforms have 9 exons. Isoform 1 is the longer of the two, being 4339bp long, while isoform 2 is 2253bp long. [3] The main difference between the isoforms that isoform 2 has a shorter exon 9 and 3' UTR. [3] Isoform 1 is the longest variant of the gene and protein and is the subject of this article. [1]

Conservation

Paralogs

CCDC142 has no paralogs in Homo sapiens.

Orthologs

Below is a table of a variety of orthologs of CCDC142 whose protein sequence identity was compared to the Homo sapiens protein amino acid sequence. CCDC142 has more than 73% amino acid similarity in mammals, but is less conserved in other vertebrates and in invertebrates. [5]

Genus and SpeciesCommon NameDate of Divergence from Human Lineage (MYA)% identity
Homo sapiens Human 0100
Pan troglodytes Chimpanzee 6.696
Gorilla gorilla gorilla Gorilla 8.998
Jaculus jaculus Lesser Egyptian jerboa 90.973
Bos mutus Yak 97.574
Eptesicus fuscus Big brown bat 97.574
Python bivittatus Burmese python 320.536
Gallus gallus Chicken 320.535
Haliaeetus leucocephalus Bald eagle 320.533
Anolis carolinensis Carolina anole (lizard) 320.533
Calidris pugnax Ruff (bird) 320.532
Xenopus tropicalis Western clawed frog 355.733
Callorhinchus milii Australian ghostshark 429.636
Lepisosteus oculatus Spotted gar 429.634
Esox lucius Northern pike 429.633
Danio rerio Zebrafish 429.633
Lingula anatina Tailed mussel 84729
Crassostrea gigas Pacific oyster 84729
Octopus bimaculoides California two-spot octopus 84727
Drosophila melanogaster Fruit fly 84723

Phylogeny

CCDC142 is closely related in mammals, mollusks and amphibians, reptiles and birds, and in fish. [5] The CCDC142 gene goes as far back as Drosophila melanogaster , which split from the human lineage 847 million years ago. CCDC142 has mutated at a greater rate than both Cytochrome C (a highly conserved protein) and Fibrinogen A (a rapidly mutating protein). This indicates that CCDC142 is a rapidly mutating gene with an increasing rate of mutation (that is, evolution) over time.

The mutation rate of CCDC142 compared to benchmark proteins, Cytochrome C and Fibrinogen A, which mutate slowly and quickly respectively. Mutation rate, m, which is the corrected percent of amino acid changes between the Homo sapiens protein and its orthologs, is plotted against the logarithm of the number of millions of years since the date of divergence of Homo sapiens lineage and the lineage of the species in which the ortholog is seen. The points on the graph are calculated according to m/100 = -ln(1-n/100), where m is the total number of amino acid changes occurred in a 100-amino-acid segment of a protein and n is the observed number of amino acid changes per 100 residues compared to the Homo sapiens protein sequence. CCDC142mutation.png
The mutation rate of CCDC142 compared to benchmark proteins, Cytochrome C and Fibrinogen A, which mutate slowly and quickly respectively. Mutation rate, m, which is the corrected percent of amino acid changes between the Homo sapiens protein and its orthologs, is plotted against the logarithm of the number of millions of years since the date of divergence of Homo sapiens lineage and the lineage of the species in which the ortholog is seen. The points on the graph are calculated according to m/100 = –ln(1–n/100), where m is the total number of amino acid changes occurred in a 100-amino-acid segment of a protein and n is the observed number of amino acid changes per 100 residues compared to the Homo sapiens protein sequence.

Protein

Domain Structure of the CCDC142 Protein. Highly conserved regions outside the RINT1_TIP1 motif are in black. The putative nuclear localization signal is in red. CCDC142domains3.png
Domain Structure of the CCDC142 Protein. Highly conserved regions outside the RINT1_TIP1 motif are in black. The putative nuclear localization signal is in red.

Primary Structure, Variants, and Isoforms

The main isoform of the CCDC142 protein is 743 amino acids in length and the second isoform is 665 amino acids long. The difference in length is made entirely by amino acids missing from the C-terminus of isoform 2. [1]

Domains and Motifs

CCDC142Conceptualtranslation2.png
CCDC142 Conceptual Translation
Ccdc142 Conceptual translation legend2.png
CCDC142 Conceptual Translation Legend

The predicted coiled-coil domain of CCDC142 is from amino acids 308–719. [2] A RINT1_TIP1 motif is also present from amino acids 490–621. RINT1_TIP1 is a family that includes RINT-1 (a protein involved in radiation-induced check point control) and TIP-1 (a yeast protein which is involved in Golgi transport). [4] The extra ~250 amino acids found in the distant ortholog CCDC142 proteins are not found in the Homo sapiens genome the near CCDC142 gene.

Post-Translational Modifications

CCDC142 is predicted to have 6 phosphorylation sites, 4 methylation sites, 1 palmitoylation site, 1 sumoylation site, and 1 weak Nuclear Localization Signal. [6] [7] [8] [9] [10] These modifications indicate that CCDC142 is localized to the nucleus and cytosol. Refer to the Conceptual Translation for annotations of these sites in the protein.

Structure prediction

Secondary structure of CCDC142 contains only α-helices as predicted by the Quick2D and Phyre2 programs . [11] [12] It is predicted that CCDC142 contains eight conserved α-helices, with six located in the coiled-coil region of the protein. [11] [12] The predicted tertiary structure of CCDC142 contains a large coiled-coil domain from amino acids 308–719. [2] [13]

The I-TASSER predicted tertiary structure of CCDC142. This structure has a C-score of -.75 (measured on a scale of -5 to 2, with higher values equating higher confidence) and a cluster density of .375 (on a scale of 0 to 1, with higher values indicating greater protein prediction coverage). The C-score takes into account both significance of the model's structure and the quality of the prediction coverage from other proteins. CCDC142ITASSERPREDICTION.gif
The I-TASSER predicted tertiary structure of CCDC142. This structure has a C-score of -.75 (measured on a scale of −5 to 2, with higher values equating higher confidence) and a cluster density of .375 (on a scale of 0 to 1, with higher values indicating greater protein prediction coverage). The C-score takes into account both significance of the model's structure and the quality of the prediction coverage from other proteins.

Expression

Promoters and Regulatory Factors

The promoter region for CCDC142 was identified using the El Dorado program at Genomatix, it spans bases 74482896–74483908 in chromosome 2. [14] This 1013bp region spans 1071–58bp upstream of the start codon of CCDC142. [14] There is a region in the promoter which binds a large number of Krueppel-like transcription factors and BED zinc-finger proteins. [14] This region has no single-nucleotide polymorphisms (SNPs) located in it. [15] Many of the transcription factors that bind to the promoter region of CCDC142 have functions dealing with tumor suppression, neurogenesis, DNA damage, and photoreception. [14] This promoter region also contains a mammalian C-type LTR TATA box which overlaps with the transcription start site of the gene. [14]

RNA Binding Proteins

A number of possible RNA binding proteins bind to both the 3’ and 5’ untranslated regions (UTRs) of the CCDC142 mRNA. The PABPC1 and RBMX protein binding sites occur in high frequency in the 3’ UTR, with 49 and 21 sites respectively. [16]

Expression

Above are the Allen Human Brain Atlas expression data on CCDC142, with red indicating lower expression and green indicating higher expression. [17] In the Homo sapiens brain, it was found that CCDC142 is lowly expressed in the cerebellar cortex, thalamus and hypothalamus. CCDC142 is also highly expressed in the substantia nigra, pons, claustrum, and mesencephalon. [17] There is also relatively higher expression of CCDC142 in the mouth and thymus. [18]

The above experimental expression data shows many possible findings for CCDC142. [19] Overexpression of SNAI1, a zinc finger protein, is correlated to the reduction of CCDC142 expression in Homo sapiens. [20] A Mus musculus knockout of MEKK 2/3, which help regulate helper T cell differentiation, also showed lowered expression of CCDC142. [21] Another Mus musculus experiment focusing on cardiomyopathy in mice showed lower levels of CCDC142 in mice with damaged myocardial cells. [20]

Function and Biochemistry

Composition

CCDC142 has a relatively typical distribution of amino acids compared to other Homo sapiens proteins. [5] However, some variations are noted across orthologs. [5] Leucine is present in large amounts relative to other proteins (at over 15% of the protein) and asparagine is present in low amounts relative to other proteins (at less than 0.7% of the protein). [5]

The coiled-coil domain and RINT1_TP1 motif of CCDC142 contain higher amounts of leucine relative to the rest of the protein (at over 16.6% of the region), higher amounts of glutamine (at over 8.4% of the region), and similarly low amounts of asparagine (at less than 0.7% of the region). [5]

Interacting Proteins

No protein interactions have been found for CCDC142.

Clinical Significance

Pathology and Diseases

Copy number gain in the CCDC142 loci, including 25 other genes, showed a phenotype of developmental delay and significant developmental or morphological phenotypes. [22] One result with a copy number loss in the CCDC142 loci, including 29 other genes, showed phenotypes of short stature, abnormal face shape, delayed speech and language development, overlapping toe, intrauterine growth retardation, patent ductus arteriosus, and delayed gross motor development. [22] However, the effect of CCDC142 may have been confounded for these phenotypes since there were also abnormalities in many other genomic sections.

Mutations

There are a number of SNPs located in the CCDC142 gene. Some of these in the promoter region and 5’ UTR are within anchor sequences for transcription factors, and affect transcription factor binding if they are changed.

There are many SNPs in the protein's coding sequence which change CCDC142's amino acid composition. One SNP with a high prevalence rate in the population (1.8%) is notable for its change in chemistry, with a tyrosine to an asparagine shift at amino acid 548. [15]

There are also numerous SNPs located in the large 3’ UTR of the gene, with many of these binding to areas containing stem loop structures in the mRNA. An SNP with a 7.7% prevalence rate (guanine to adenosine at bp4285) is in the 3’ UTR but not located in the conserved stem loop region. [15]

These SNPs have been annotated in the Conceptual Translation located in the Protein section above.

Multiple Sequence Alignment

In the Multiple Sequence Alignment above (created using the CLUSTALW and TEXSHADE programs at SDSC Biology Workbench), organisms are labeled by the first letter of their genus and the first two letters of their species. The whole CCDC142 protein is highly conserved in mammals. [5] The regions containing the Homo sapiens coiled-coil domain and the RINT1_TIP1 motif region are highly conserved in distant homologs. [5] 12 of the 15 amino acids that match across all organisms in this region are nonpolar. [5] Conserved Region 1 contains mostly nonpolar amino acids. [5] Conserved Region 2 contains mostly nonpolar and basic amino acids. Conserved Region 3 contains both polar and nonpolar amino acids. [5] Conserved Region 5 contains mostly nonpolar and basic amino acids. [5]

Additional Transcription Factor Information

Final Transcription Factor Annotation of CCDC142.png
Transcription Factor Binding Site Annotation
Final Transcription Factor Annotation Legend of CCDC142.png
Transcription Factor Binding Site Legend

Related Research Articles

<span class="mw-page-title-main">Interferon-inducible GTPase 5</span> Protein-coding gene in the species Homo sapiens

Interferon-inducible GTPase 5 also known as immunity-related GTPase cinema 1 (IRGC1) is an enzyme that in humans is coded by the IRGC gene. It is predicted to behave like other proteins in the p47-GTPase-like and IRG families. It is most expressed in the testis.

Transmembrane protein 241 is a ubiquitous sugar transporter protein which in humans is encoded by the TMEM241 gene.

Coiled-coil domain-containing 37, also known as FLJ40083, is a protein that in humans is encoded by the CCDC37 gene (3q21.3). There is no confirmed function of CCDC37.

CCDC92, or Limkain beta-2, is a protein which in humans is encoded by the CCDC92 gene. It is likely involved in DNA repair or reduction/oxidation reactions. The gene ubiquitously found in humans and is highly conserved across animals.

Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

Forkhead-associated domain containing protein 1 (FHAD1) is a protein encoded by the FHAD1 gene.

C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.

<span class="mw-page-title-main">TEX9</span> Protein-coding gene in the species Homo sapiens

Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

<span class="mw-page-title-main">SMCO3</span>

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">TMEM128</span>

TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.

<span class="mw-page-title-main">CLIP4</span> Protein

CAP-Gly Domain Containing Linker Protein Family Member 4 is a protein that in humans is encoded by the CLIP4 gene. In terms of conserved domains, the CLIP4 gene contains primarily ankyrin repeats and the eponymous CAP-Gly domains. The structure of the CLIP4 protein is largely made up of coil, with alpha helices dominating the rest of the protein. CLIP4 mRNA expression occurs largely in the adrenal cortex and atrioventricular node. The literature encompassing CLIP4's conserved domains and paralogs points toward microtubule regulation as a possible function of CLIP4.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">CCDC190</span> Protein-coding gene in the species Homo sapiens

Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19-85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">C12orf50</span> Protein encoding gene C12orf50

Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.

Proline-rich protein 29, encoded by the PRR29 gene in humans, is a protein which is located in the human genome at 17q23. Its function is not fully understood. Its name is derived from the chain of 5 proline amino acids located toward the end of the protein. The primary domain within the sequence of this protein is known as DUF4587. It is reported to have high levels of expression in tissues pertaining to the circulatory system and the immune system. It is hypothesized that PRR29 is a nuclear protein that facilitates communication between the nucleus and the mitochondria.

References

  1. 1 2 3 4 5 6 7 8 9 "CCDC142 coiled-coil domain containing 142 [Homo sapiens (human)] – Gene – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  2. 1 2 3 "coiled-coil domain-containing protein 142 [Homo sapiens] – Protein – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  3. 1 2 3 4 5 "CCDC142 – Coiled-coil domain-containing protein 142 – Homo sapiens (Human) – CCDC142 gene & protein". www.uniprot.org. Retrieved 2016-05-01.
  4. 1 2 "SSDB Motif Search Result: hsa:84865". www.kegg.jp. Retrieved 2016-05-01.
  5. 1 2 3 4 5 6 7 8 9 10 11 12 "SDSC Biology Workbench".
  6. "NetPhos 2.0 Server". www.cbs.dtu.dk. Retrieved 2016-05-01.
  7. "Memo:Protein Methylation Prediction". www.bioinfo.tsinghua.edu.cn. Archived from the original on 2016-03-14. Retrieved 2016-05-01.
  8. ":::NBA-Palm – Prediction of Palmitoylation Site Implemented In Naive Bayesian Algorithm:::". www.bioinfo.tsinghua.edu.cn. Archived from the original on 2016-06-09. Retrieved 2016-05-01.
  9. "SUMOplot™ Analysis Program | Abgent". www.abgent.com. Retrieved 2016-05-01.
  10. "NLS_Mapper". nls-mapper.iab.keio.ac.jp. Archived from the original on 2021-11-22. Retrieved 2016-05-01.
  11. 1 2 Kelley, Lawrence. "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2016-05-01.
  12. 1 2 Remmert, Michael. "Quick2D". toolkit.tuebingen.mpg.de. Retrieved 2016-05-01.
  13. 1 2 3 "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2016-05-01.
  14. 1 2 3 4 5 "Genomatix – NGS Data Analysis & Personalized Medicine". www.genomatix.de. Retrieved 2016-05-01.
  15. 1 2 3 snpdev. "SNP linked to Gene (geneID:84865) Via Contig Annotation". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  16. "RBPDB: The database of RNA-binding specificities". rbpdb.ccbr.utoronto.ca. Retrieved 2016-05-01.
  17. 1 2 "Microarray Data :: Allen Brain Atlas: Human Brain". human.brain-map.org. Retrieved 2016-05-01.
  18. "EST Profile – Hs.430199". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  19. geo. "Home – GEO – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  20. 1 2 "GDS3596 / 1451178_at". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  21. "GDS4795 / ILMN_3023885". www.ncbi.nlm.nih.gov. Retrieved 2016-05-01.
  22. 1 2 ClinVar. "No items found – ClinVar – NCBI". www.ncbi.nlm.nih.gov. Retrieved 2016-05-05.

Further reading