TMEM39B | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | |||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | GeneCards: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. [1] TMEM39B is a multi-pass membrane protein with eight transmembrane domains. [1] The protein localizes to the plasma membrane and vesicles. [1] [2] The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. [3] [4] Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C (also known as ORF14) protein. [5] TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain. [6] [7]
The TMEM39B gene in humans is located on the plus strand at 1p35.2. [1] The gene is composed of 14 exons and covers 30.8 kb, spanning from 32,072,031 to 32,102,866. [1] It is flanked by KHDRBS1 upstream and KPNA6 downstream. [1] The TMEM39B gene region also contains the microRNA-encoding gene MIR5585. [1]
There are four validated transcript variants for TMEM39B produced by different promoters and alternative splicing. [1] Transcript variant 1 is translated into the longest and most abundant protein isoform.
Transcript variant | RefSeq Accession | Length (bp) | Description | Number of exons |
Transcript variant 1 | NM_018056.4 | 1778 bp | Encodes isoform 1 | 9 |
Transcript variant 2 | NM_001319677.1 | 2106 bp | Extended 5' UTR, encodes isoform 2 | 9 |
Transcript variant 3 | NM_001319678.2 | 1542 bp | Lacks a portion of the 5' coding region, encodes isoform 3 | 7 |
Transcript variant 4 | NM_001319679.2 | 1539 bp | Lacks a portion of the 5' coding region, encodes isoform 3 | 7 |
There are three validated protein isoforms for TMEM39B. [1] Isoform 1 is the longest and the other two isoforms use a downstream in-frame start codon. [1]
Protein isoform | Protein size | Molecular weight | Description |
Isoform 1 | 492 aa | 56 kDa | Longest and most abundant isoform |
Isoform 2 | 365 aa | 42 kDa | Shorter at N-terminus, uses downstream in-frame start codon |
Isoform 3 | 293 aa | 33 kDa | Shorter at N-terminus, uses downstream in-frame start codon |
The human TMEM39B protein isoform 1 is composed of 492 amino acids and has a predicted molecular weight of 56 kDa. [1] The basal isoelectric point (pI) of the protein is 9.51. [10] Compared to the composition of the human proteome, TMEM39B has a higher percentage of serine, histidine, and leucine and a lower percentage of glutamate and aspartate, making it basic overall. [11] It contains two pairs of tandem repeats: “GSSG” from amino acids 21–28 and “PPSH” from amino acids 107–114. [11] There is a periodic motif of four leucines spaced seven residues apart from amino acids 168–195, which is not predicted to form a leucine zipper. There is an “F..Y” motif with three repeats from amino acids 183-202 and a motif of phenylalanine at every other residue from amino acids 409–416. [11] There are no notable charge clusters, charge runs, or spacings, nor are there any sorting signals. [11]
TMEM39B isoform 1 contains eight transmembrane regions, and the N-terminus and C-terminus are predicted to be located in the cytosol. [9]
TMEM39B has several promoter regions predicted by GenoMatix ElDorado. [12] Most promoters are overlapping in a similar region, where use of a different promoter would only cause skipping of the first exon.
The promoter of TMEM39B transcript variant 1 contains numerous transcription factor binding sites. The transcription factors SMARCA3, TLX1, and CMYB have binding sites with high affinity near the binding site of transcription factor IIB, so they are potential regulators of gene transcription.
Transcription factor | Description | Matrix similarity |
TCF11 | TCF11/LCR-F1/Nrf1 homodimers | 1 |
TFIIB | Transcription factor II B (TFIIB) recognition element | 1 |
ETV1 | Ets variant 1 | 0.996 |
ZNF300 | KRAB-containing zinc finger protein 300 | 0.994 |
CMYB | c-Myb, important in hematopoesis, cellular equivalent to avian myoblastosis virus oncogene v-myb | 0.994 |
ASCL1 | Achaete-scute family bHLH transcription factor 1 | 0.99 |
OSR2 | Odd-skipped related 2 | 0.99 |
E2F1 | E2F transcription factor 1 | 0.989 |
ZNF35 | Human zinc finger protein ZNF35 | 0.986 |
GKLF | Gut-enriched Krueppel-like factor | 0.981 |
PURALPHA | Purine-rich element binding protein A | 0.974 |
SMARCA3 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 3 | 0.973 |
NFAT | Nuclear factor of activated T-cells | 0.971 |
ZF5 | Zinc finger / POZ domain transcription factor | 0.962 |
INSM1 | Zinc finger protein insulinoma-associated 1 (IA-1) | 0.958 |
ZBTB14 | Zinc finger and BTB domain containing 14 (ZFP-5, ZFP161) | 0.914 |
KLF6 | Core promoter-binding protein (CPBP) with 3 Krueppel-type zinc fingers (KLF6, ZF9) | 0.891 |
GABPA | GA binding protein transcription factor, alpha | 0.891 |
TLX1 | T-cell leukemia homeobox 1 | 0.873 |
ZNF704 | Zinc finger protein 704 | 0.847 |
RNA sequencing data show that TMEM39B is expressed in all tissues types, with higher levels in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain. [6] [7] Microarray data show that TMEM39B is expressed at moderate levels in most tissues, on average in the 58th percentile of genes expressed in a given tissue sample. [13] By percentile rank, TMEM39B is most highly expressed with respect to other genes in BDCA4+ dendritic cells, CD19+ B-cells, and CD14+ monocytes. [13]
The 3' UTR of the TMEM39B protein contains binding sites for the miRNAs miR-1290, miR-4450, and miRNA-520d-5p. [14] Binding of these miRNAs may lead to RNA silencing.
The RNA-binding proteins SFRS13A, ELAVL1, and KHDRBS3 have binding sites in the 3' UTR, and the proteins KHSRP, SFRS9 and YBX1 have binding sites in the 5' UTR. [16] [17]
The predicted secondary structure of the 5' and 3' UTR of TMEM39B contains multiple stem-loops which may play a role in stability and binding. [15]
The TMEM39B protein contains numerous sites of predicted post-translational modifications, including phosphorylation, SUMOylation, acetylation, and glycosylation. [18] [19] [20] [21] [22] [23] [24] Sites of predicted S-palmitoylation at Cys13, Cys87, and Cys264 are conserved in orthologs. SUMOylation is predicted at Lys279 and Lys359. Several well-conserved sites of phosphorylation, glycation, and O-linked-N-acetylglucosaminylation are predicted in cytosolic regions of the protein, as annotated on the conceptual translation of TMEM39B transcript variant 1.
The TMEM39B protein has been found to localize to the vesicles using immunohistochemistry. [2]
The human TMEM39B gene has a paralog called TMEM39A, also referred to by the alias SUSR2 (suppressor of SQST-1 aggregates in rpl-43 mutants), which is located at 3q13.33. [25] The TMEM39A protein contains 488 amino acids and shares 51.2% identity with TMEM39B. [26] Although the function of the paralog TMEM39A is not well-understood, variants are associated with greater risk of autoimmune disease. [27] The paralog TMEM39A has also been found to interact with Encephalomyocarditis virus (EMCV) capsid proteins as a regulator of the viral autophagy pathway. [28]
TMEM39B has orthologs in species as distant as cartilaginous fish. [26] Mammalian orthologs are highly similar to human TMEM39B, with percent identity greater than 85%. In orthologs in birds, reptiles, and amphibians, the percent identity to human TMEM39B ranges between 70% and 85%. In fish, the percent identity ranges from 40% to 75%. TMEM39B is only conserved in vertebrates, but the paralog TMEM39A has orthologs in species as distant as arthropods. [26] A selected list of orthologs from NCBI BLAST is displayed below. [26]
Genus and Species | Common name | Taxonomic group | Date of divergence (MYA) from humans [29] | Accession # | Sequence length (aa) | Sequence identity to human protein | Sequence similarity to human protein |
Homo sapiens | Human | Mammalia | 0 | NP_060526.2 | 492 | 100 | 100 |
Mus musculus | House mouse | Mammalia | 89 | NP_955009.1 | 492 | 96.1 | 98 |
Ornithorhynchus anatinus | Platypus | Mammalia | 180 | XP_028937398.1 | 489 | 85.8 | 90.5 |
Gallus gallus | Red junglefowl | Aves | 318 | NP_001006313.2 | 489 | 85 | 91.5 |
Thamnophis elegans | Western terrestrial garter snake | Reptilia | 318 | XP_032083369.1 | 491 | 81.5 | 88.5 |
Xenopus tropicalis | Western clawed frog | Amphibia | 352 | NP_001005048.1 | 483 | 75.2 | 83.2 |
Oryzias latipes | Japanese medaka | Actinopterygii | 433 | XP_004082414.1 | 488 | 74.3 | 85.2 |
Danio rerio | Zebrafish | Actinopterygii | 433 | NP_956154.1 | 491 | 74.2 | 84.9 |
Erpetoichthys calabaricus | Reedfish | Actinopterygii | 433 | XP_028675900.1 | 489 | 71.8 | 84.6 |
Callorhinchus milii | Australian ghostshark | Chondrichthyes | 465 | XP_007902480.1 | 490 | 73 | 85.1 |
Amblyraja radiata | Thorny skate | Chondrichthyes | 465 | XP_032900681.1 | 504 | 70.5 | 83.5 |
Scyliorhinus torazame | Cloudy catshark | Chondrichthyes | 465 | GCB75241.1 | 373 | 55.5 | 65.4 |
The TMEM39B gene appears most distantly in cartilaginous fish (chondrichthyes), which diverged from humans approximately 465 million years ago. [29] Orthologs of the paralog TMEM39A are found in arthropods, which diverged from humans approximately 763 million years ago, suggesting that TMEM39B was produced by the duplication of an ancestral form of TMEM39A . [29]
TMEM39B evolves at a relatively slow rate; a 1% change in the amino acid sequence requires approximately 13.9 million years. Based on sequence similarity of orthologs, TMEM39B evolves approximately 1.5 times faster than cytochrome c and 7 times slower than fibrinogen alpha.
Using co-immunoprecipitation, affinity capture MS, and two-hybrid screens, the TMEM39B protein has been found to interact with various membrane glycoproteins . [30] [31] [32] Many interacting proteins have immune functions, including IL13RA1 (interleukin-13 receptor subunit alpha-1), KLRD1 (killer cell lectin-like receptor subfamily D, member 1), and SEMA7A (semaphorin-7A). SEMA7A acts as an activator of T cells and monocytes, while KLDR1 encodes an antigen presented on natural killer cells. [33] [34] IL13RA1 has been proposed to mediate JAK-STAT signaling, which regulates immune cell activation. [35]
The TMEM39B protein interacts with the SARS-CoV-2 ORF9c accessory protein, also sometimes referred to as ORF14. [5] [36] ORF9C is located within the nucleocapsid (N) gene, overlapping with ORF9b. [36] Two mutations in OFC9c resulting in premature stop codons have been observed in SARS-CoV-2 isolates, suggesting that this reading frame is dispensable for viral replication. [37] The ORF9c protein has been shown to localize to vesicles when transfected into HeLa cells and is predicted to have a non-cytoplasmic domain and transmembrane domain. [38]
Many SNPs (single nucleotide polymorphisms) have been detected in the TMEM39B gene, of which a smaller subset cause nonsynonymous amino acid changes. [39] There are notably fewer SNPs that occur at sites of post-translational modifications, motifs, or highly conserved amino acids; changes in these amino acids may be more likely to have phenotypic effects. The table below lists selected SNPs resulting in a change at such sites.
SNP | mRNA position | Base change | Amino acid position | Amino acid change | Description |
---|---|---|---|---|---|
rs1259613993 | 180 | C > T | 11 | S > P | “GSSG” repeat |
rs1446462546 | 271 | C > T | 41 | S > F | O-GlcNAc, phosphorylation site |
rs867417059 | 282 | A > T | 45 | S > C | O-GlcNAc, phosphorylation site |
rs1009960963 | 289 | C > T | 47 | S > F | Phosphorylation site |
rs377359320 | 503 | C > A | 118 | N > K | Highly conserved |
rs748779192 | 555 | C > T | 136 | R > C | Highly conserved |
rs778604874 | 558 | C > T | 137 | R > C | Highly conserved |
rs1419668726 | 696 | T > C | 183 | F > L | [F..Y] motif |
rs759591458 | 963 | C > T | 272 | R > C | Highly conserved |
rs1180695332 | 1003 | G > C | 285 | R > P | Highly conserved |
rs200048180 | 1009 | A > G | 287 | K > R | Glycation site |
rs1445226108 | 1060 | C > T | 304 | P > L | Highly conserved |
rs771743935 | 1206 | C > A | 353 | H > N | Highly conserved |
rs376257849 | 1294 | G > A | 382 | G > D | Highly conserved |
rs1368770455 | 1302 | G > T | 385 | V > L | Highly conserved |
rs756106866 | 1336 | G > A | 396 | G > D | Highly conserved |
rs868721112 | 1356 | C > T | 403 | P > S | Highly conserved |
rs1383803294 | 1369 | C > G | 407 | S > C | Phosphorylation site |
rs917085732 | 1581 | T > G | 478 | S > A | O-GlcNAc, phosphorylation site |
In a study using 164 tumor samples from patients with diffuse large B cell lymphoma, TMEM39B was one of 17 genes identified as part of a prognostic profile for 5-year progression-free survival. [3] In another study using a genome-wide siRNA screen, knockdown of TMEM39B with siRNAs decreased viral capsid/autophagosome colocalization, survival of virus-infected cells, and mitophagy in HeLa cells infected with Sindbis virus. [4] This may suggest that TMEM39B plays a role in viral autophagy like its paralog TMEM39A.
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.
EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
C16orf82 is a protein that, in humans, is encoded by the C16orf82 gene. C16orf82 encodes a 2285 nucleotide mRNA transcript which is translated into a 154 amino acid protein using a non-AUG (CUG) start codon. The gene has been shown to be largely expressed in the testis, tibial nerve, and the pituitary gland, although expression has been seen throughout a majority of tissue types. The function of C16orf82 is not fully understood by the scientific community.
Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
Transmembrane protein 179 is a protein that in humans is encoded by the TMEM179 gene. The function of transmembrane protein 179 is not yet well understood, but it is believed to have a function in the nervous system.
TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.
Transmembrane protein 247 is a multi-pass transmembrane protein of unknown function found in Homo sapiens encoded by the TMEM247 gene. Notable in the protein are two transmembrane regions near the c-terminus of the translated polypeptide. Transmembrane protein 247 has been found to be expressed almost entirely in the testes.
MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Transmembrane Protein 269 (TMEM269) is a protein which in humans is encoded by the TMEM269 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.