Tetratricopeptide repeat protein 39B is a protein that in humans is encoded by the TTC39B gene. TTC39B is also known as C9orf52 or FLJ33868. The main feature within tetratricopeptide repeat 39B is the domain of unknown function 3808 (DUF3808), spanning the majority of the protein.
The gene for TTC39B is located on the short arm of the ninth chromosome at 9p22.3. The genomic DNA is 136,517 bases long, consists of 39 introns and 20 exons, and is on the minus strand. The mRNA has a length of 3,276 bases. TTC39B is surrounded by LOC100419056, a chloride channel, voltage-sensitive 3 pseudogene. [5]
TTC39B is expected to have a molecular binding function as well as a role in lipid regulation; the phenotype as well as the function in vivo is unknown. [6]
There are two known paralogs for TTC39B: TTC39A and TTC39C. TTC39A has two splice isoforms and TTC39C has three splice isoforms.
TTC39A has been tested for association to diseases like breast neoplasms and is expected to have molecular binding function and localizes in various compartments (extracellular space, membrane, nucleus). [7]
TTC39C is expected to localize in cytoplasm. No phenotype has been discovered, and the gene's in vivo function is unknown. [8]
Genus and Species | Common Name | RNA Percent Identity | Divergence |
---|---|---|---|
Pan paniscus | Bonobo | 99% | 6.3 MYA |
Pan troglodytes | Chimpanzee | 99% | 6.3 MYA |
Gorilla gorilla gorilla | Gorilla | 99% | 8.8 MYA |
Nomascus leucogenys | Gibbon | 98% | 20.4 MYA |
Papio anubis | Baboon | 97% | 29.0 MYA |
Pongo pygmaeus | Orangutan | 97% | 15.7 MYA |
Callithrix jacchus | Marmoset | 96% | 42.6 MYA |
Saimiri boliviensis boliviensis | Squirrel monkey | 94% | 42.6 MYA |
Canis lupus familiaris | Dog | 91% | 94.2 MYA |
Otolemur garnettii | Bushbaby | 90% | 74.0 MYA |
Felis catus | Cat | 89% | 94.2 MYA |
Bos taurus | Cow | 88% | 94.2 MYA |
Cricetulus griseus | Hamster | 87% | 92.3 MYA |
Ovis aries | Sheep | 85% | 94.2 MYA |
Rattus norvegicus | Rat | 85% | 92.3 MYA |
Genus and Species | Common Name | RNA Percent Identity | Divergence |
---|---|---|---|
Sarcophilus harrisii | Tasmanian devil | 78% | 162.6 MYA |
Gallus gallus | Chicken | 75% | 296.0 MYA |
Taeniopygia guttata | Zebra finch | 75% | 296.0 MYA |
Anolis carolinensis | Lizard | 75% | 296.0 MYA |
Xenopus laevis | Frog | 74% | 371.2 MYA |
TTC39B is conserved in organisms from human to platyhelminthes and is not conserved in yeast and fungi.
The TTC39B gene has five different transcript variants, each coding for a different protein. This article focuses on tetratricopeptide repeat protein 39B isoform 1, the longest of all of the proteins. When translated, the TTC39B protein is composed of 682 amino acids and has a molecular weight of 76,955.64 kDa. The isoelectric point of the protein is 7.16 pH. [9]
Close Orthologs:
Genus and Species | Common Name | Protein Percent Identity | Divergence |
---|---|---|---|
Pan troglodytes | Chimpanzee | 99% | 6.3 MYA |
Pan paniscus | Bonobo | 99% | 6.3 MYA |
Nomascus leucogenys | Gibbon | 98% | 20.4 MYA |
Papio anubis | Baboon | 98% | 29.0 MYA |
Callithrix jacchus | Marmoset | 97% | 42.6 MYA |
Saimiri boliviensis boliviensis | Squirrel monkey | 96% | 42.6 MYA |
Heterocephalus glaber | Naked mole-rat | 92% | 92.3 MYA |
Canis lupus familiaris | Dog | 91% | 94.2 MYA |
Cricetulus griseus | Hamster | 90% | 92.3 MYA |
Ovis aries | Sheep | 89% | 94.2 MYA |
Cavia porcellus | Guinea pig | 86% | 92.3 MYA |
Distant Orthologs:
Genus and Species | Common Name | Protein Percent Identity | Divergence |
---|---|---|---|
Sarcophilus harrisii | Tasmanian devil | 73% | 162.6 MYA |
Taeniopygia guttata | Zebra finch | 72% | 296.0 MYA |
Pteropus alecto | Bat | 55% | 94.2 MYA |
Bos taurus | Cow | 54% | 94.2 MYA |
Rattus norvegicus | Rat | 54% | 92.3 MYA |
Gallus gallus | Chicken | 54% | 296.0 MYA |
Danio rerio | Zebrafish | 54% | 400.1 MYA |
Crassostrea gigas | Oyster | 50% | 782.7 MYA |
Camponotus floridanus | Ant | 43% | 782.7 MYA |
Nasonia vitripennis | Wasp | 42% | 782.7 MYA |
Ciona intestinalis | Urochordata | 40% | 722.5 MYA |
Clonorchis sinensis | Liver fluke | 35% | 792.4 MYA |
The Domain of Unknown Function 3808 (DUF3808) domain is conserved from fungi to humans and is currently has an unknown function. It is located from amino acid 142 until 568 (a length of 427 amino acids). Proteins of this family also contain a TPR_2 domain at their C-terminus, which also has an unknown function. [10]
Another conserved domain in the TTC39B protein is the TPR_12 tetratricopeptide repeat. It is located from amino acid 600 until 658 (a length of 59 amino acids). [11] The TPR domains are found in many proteins that facilitate specific interactions with a partner protein. Three-dimensional structural data have shown that a TPR region forms two antiparallel alpha-helices. TPR motifs that are arranged one in front of another create a right-handed helical structure with an amphipathic channel which could possibly accommodate the complementary region of a target protein. Most TPR-containing proteins are associated with multiprotein complexes, and there is extensive evidence indicating that TPR motifs are important to the functioning of chaperone, cell-cycle, transcription, and protein transport complexes. [12] Two more TPR domains are found in the TTC39B protein: TPR1 which spans from amino acids 393 to 426 (34 amino acids long) and TPR2 which spans from amino acids 626 to 659 (also 34 amino acids long). [13]
TTC39B contains three transmembrane regions, all located within the DUF3808 region. [14] Since there are three transmembrane regions, the N-terminus and C-terminus of the protein will be on opposite sides of the plasma membrane.
Phosphorylation Sites: [15]
Amino Acid | Position |
---|---|
Serine (S) | 28, 32, 42, 51, 61, 62, 72, 91, 93, 94, 96, 101, 102, 107, 120, 123, 124, 125, 126, 127, 134, 148, 165, 173, 194, 215, 217, 218, 221, 224, 229, 270, 279, 305, 313, 329, 344, 347, 350, 365, 393, 421, 454, 461, 464, 477, 500, 509, 524, 526, 548, 551, 557, 573, 578, 580, 614, 634, 638, 660, 663, 680, 681 |
Threonine (T) | 89, 100, 110, 121, 128, 152, 174, 183, 202, 211, 250, 269, 356, 362, 370, 467, 487, 493, 512, 563, 628, 651 |
Tyrosine (Y) | 167, 172, 206, 210, 239, 271, 274, 295, 363, 398, 434, 451, 452, 453, 468, 523, 542, 608, 620, 623, 636, 656, 659 |
Probability of Sumoylation Sites [16] (bolded):
No. | Position | Group | Score |
---|---|---|---|
1 | 619 | ESEKL LKYD HYLVP | 0.91 |
2 | 262 | NMINF IKGG LKIRT | 0.77 |
3 | 302 | EFEGG VKLG SGAFN | 0.76 |
4 | 133 | STKVD LKSG LEECA | 0.73 |
There is one possible N-glycosylation site at amino acid 391, however, since the TTC39B protein does not contain a signal peptide, it is unlikely that this glycosylation actually occurs.
According to an analysis of the secondary protein structure, TTC39B is most likely to be expressed in the endoplasmic reticulum, mitochondria, and Golgi apparatus. [14]
The TTC39B protein folds into an alpha-alpha super helix. 40% of its structure matches with d1w3ba, the superhelical domain of o-linked GlcNAc transferase. O-GlcNAc couples metabolic status to the regulation of a wide variety of cellular signaling pathways by acting as a nutrient sensor. [17]
The promoter for TTC39B starts at base pair 15,307,109 and ends at base pair 15,307,858. It has a length of 750 base pairs. The transcription start site for TTC39B protein isoform 1 is located from base pairs 15,307,340 to 15,307,389 and has a length of 50 bp.
TTC39B is well expressed in muscles, internal organs, secretory organs, reproductive organs, the immune system, and the nervous system. [6] TTC39B is expressed in a multitude of tissues: testis, lung, islets of langerhans, pancreas, kidney, pooled germ cell tumors, breast carcinoma, etc. [6]
There are five different transcript variants for the TTC39B gene. Isoform 1 is the longest transcript and encodes the longest isoform. Isoform 2 uses an alternate in-frame splice site in the central coding region, compared to variant 1, which results in a shorter protein. Isoform 3 and 4 have multiple differences in the central coding region but maintain the reading frame compared to isoform 1. Isoform 5 differs in the 5' UTR and has multiple coding region differences, compared to variant 1. These differences cause translation initiation at an in-frame downstream AUG and results in isoform 5 having a shorter N-terminus compared to isoform 1. [18]
Transcription Factor Binding Sites: [19]
Matrix Family | Detailed Family Information | From | To | Strand | Similarity | Sequence (CAPITALS: core sequence) |
---|---|---|---|---|---|---|
V$PLAG | Pleomorphic adenoma gene | 51 | 73 | (+) | 1.000 | taGGGGgaagtagaggagttcca |
V$TALE | TALE homeodomain class recognizing TG motifs | 157 | 173 | (+) | 1.000 | ggtggtgtGTCAgaggc |
V$ZF02 | C2H2 zinc finger transcription factors 2 | 294 | 316 | (-) | 1.000 | cagcgCCCCacctggggtccgtg |
V$MIZ1 | Myc-interacting Zn finger protein 1 | 417 | 427 | (-) | 1.000 | cacgcCCTCtg |
O$TF2B | RNA polymerase II transcription factor II B | 517 | 523 | (-) | 1.000 | ccgCGCC |
TTC39B interacts with ubiquitin C (UBC), a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers leads to different effects within a cell. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. [20]
On a locus on chromosome 9p22 found to be associated with high-density lipoprotein (HDL-C), TTC39B was the only one of several genes in the locus to have an eQTL in liver, with the allele associated with decreased expression correlating with increased HDL-C. Knockdown of the mouse ortholog TTC39B via a viral vector (50% knockdown) resulted in significantly higher plasma HDL-C levels at 4 days and 7 days. The data indicates that TTC39B as causal genes for lipid regulation. [21]
Transmembrane and Tetratricopeptide repeat containing 4 is a protein that in humans is encoded by the TMTC4 gene. This protein crosses the plasma membrane 10 times, and resides in the ER lumen and cytosol. The predicted structure of the TMTC4 protein is a series of alpha-helices.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
FAM107B is a gene found in humans. It is located on the minus strand of chromosome 10, p13, which is on the short arm of the chromosome. It has other alias names, such as C10orf45, FLJ45505, MGC11034 and MGC90261. The gene contains a conserved domain, DUF1151, which is a family that consists of several eukaryotic proteins of unknown function. FAM107B is expressed in most tissues in the human body without there being a high frequency in any one tissue. It is found in all stages of human development.
Tetratricopeptide repeat 39A is a human protein encoded by the TTC39A gene. TTC39A is also known as DEME-6, KIAA0452, and c1orf34. The function of TTC39A is currently not well understood. The main feature within tetratricopeptide repeat 39A is the domain of unknown function 3808 (DUF3808), spanning almost the entire protein. KIAA0452 can also be seen as an isoform of TTC39A because of differences in genome sequence, but overlap in DUF domain.
Tetratricopeptide repeat protein 39C is a protein that in humans is encoded by the TTC39C gene. TTC39C is one of three TTC39. Its function is currently unknown; however, there is some evidence suggesting that it plays a role in anaphase. It also contains a relatively well-characterized structural motif called the tetratricopeptide repeat (TPR).
Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.
Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.
EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
Leucine-rich repeats and IQ motif containing 1 is a protein that in humans is encoded by the LRRIQ1 gene. The protein is likely a nuclear encoding mitochondrial protein and is found in all Metazoans.
Chromosome 9 open reading frame 43 is a protein that in humans is encoded by the C9orf43 gene. The gene is also known as MGC17358 and LOC257169. C9orf43 contains DUF 4647 and a polyglutamine repeat region although protein function is not well understood.
Family with sequence similarity 222 member A or Aggregatin is a protein of unknown function. In humans it is encoded by the gene FAM222A. Aggregatin's cellular function is not well understood, however it has been implicated in Alzheimer's disease.
KRBA1 is a protein that in humans is encoded by the KRBA1 gene. It is located on the plus strand of chromosome 7 from 149,411,872 to 149,431,664. It is also commonly known under two other aliases: KIAA1862 and KRAB A Domain Containing 1 gene and encodes the KRBA1 protein in humans. The KRBA family of genes is understood to encode different transcriptional repressor proteins
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
KIAA1143 is an uncharacterized protein in humans that is encoded by the KIAA1143 gene. it may play a role in cell growth mechanisms and regulation/creation of cytoskeletal structure. This gene is located on chromosome 3 on the minus strand
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.