TM6SF2 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | TM6SF2 , transmembrane 6 superfamily member 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 606563; MGI: 1933210; HomoloGene: 77694; GeneCards: TM6SF2; OMA:TM6SF2 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
TM6SF2 is the Transmembrane 6 superfamily 2 human gene which codes for a protein by the same name. This gene is otherwise called KIAA1926. [5] Its exact function is currently unknown.
TM6SF2 is located on chromosome 19 precisely at locus 19p13.3-p12. It is flanked by SUGP1 (a SURP and G-Patch Domain-Containing protein thought to play a role in pre-mRNA splicing [5] ) and HAPLN4 (a hyaluronan and proteoglycan link protein 4 that binds to hyaluronic acid and may be involved in formation of the extracellular matrix [5] ) genes upstream and downstream respectively. [6]
TM6SF2 is a moderately conserved gene. There exist orthologs in several phyla as far diverged as invertebrates. 82 organisms have been identified as having orthologs of this gene. The most distant orthologs of TM6SF2 are in zebra fish ( Danio rerio ) and the deer tick ( Ixodes scapularis ). [6] Below is a summary table of some of the gene orthologs obtained from the NCBI database.
Scientific Name | Common Name | Divergence date (MYA) | NCBI [6] accession number | Sequence Length | Percent Identity | Percent Similarity |
---|---|---|---|---|---|---|
Homo sapiens | Human | 0 | NP_001001524.2 | 377 | 100 | 100 |
Pan troglodytes | Chimpanzee | 6.3 | XP_001140342.2 | 377 | 99 | 99 |
Mus musculus | Mouse | 92.3 | XP_003125904 | 378 | 79 | 87 |
Ceratotherium simum simum | Southern white rhinoceros | 94.2 | XP_004422975.1 | 376 | 89 | 92 |
Capra hircus | Goat | 94.2 | XP_005682141.1 | 343 | 89 | 86 |
Myotis davidii | Mouse-eared bat | 94.2 | XP_006778388.1 | 338 | 86 | 91 |
Mustela putorius furo | Domestic ferret | 94.2 | XP_004760922.1 | 376 | 84 | 89 |
Vicugna pacos | Alpaca | 94.2 | XP_006199087.1 | 376 | 84 | 89 |
Canis lupus familiaris | Dog | 94.2 | XP_852125.1 | 376 | 83 | 89 |
Orcinus orca | Killer whale | 94.2 | XP_004277546.1 | 376 | 82 | 88 |
Bos taurus | Cow | 94.2 | XP_005208509.1 | 376 | 74 | 80 |
Loxodonta africana | African savanna elephant | 98.7 | XP_003413566.1 | 377 | 90 | 93 |
Alligator mississipiens | American alligator | 296 | XP_006271093.1 | 346 | 67 | 79 |
Ophiophagus hannah | King cobra | 296 | ETE70999 | 292 | 25.3 | ? |
Gallus gallus | Chicken | 296 | XP_423447.3 | 374 | 62 | 74 |
Falco peregrinus | Peregrine falcon | 296 | XP_005244205.1 | 376 | 59 | 73 |
Xenopus tropicalis | Western-clawed frog | 371.2 | XP_004760922.1 | 375 | 58 | 74 |
Danio rerio | Zebrafish | 400.1 | NP_001074130 | 374 | 44.3 | ? |
Latimeria chalumnae | Coelocanth | 414.9 | XP_005989673.1 | 327 | 63 | 75 |
Ixodes scapularis | Deer tick | 782.7 | XP_002406440.1 | 113 | 45.1 | ? |
TM6SF1 has been identified as a paralog of TM6SF2 in humans [6] about which little is known.
The domain of unknown function DUF2781 is highly conserved across homologs. DUF2781 belongs to the pfam10914 family which comprises uncharacterized eukaryotic proteins, some of which are membrane proteins [6]
The RNA product is 1483 base pairs long and is spliced alternatively to yield seven different isoforms (alternative mRNAs a - f with form a being the most abundant) with varying combinations of the 10 identified exons. [7] The microRNA miR-1343 binds to a 3’ UTR site called 7mer-m8 (as predicted by TargetScan [8] ).
The 5' and 3' UTR regions of the mRNA show some stem loop formation for stability. Much of this chemistry appears to be taking place in the 5' region which has three stem loops compared to the 3' region with only one. [9]
There are ten different exons and the ones expressed depend on how alternative splicing proceeds. There are four alternative polyadenylation sites present. [7]
The promoter for this gene is upstream and spans bases 19383923 to 19384700 (778 bp long) on the minus strand of chromosome 19. There exist several transcription factors capable of binding to this promoter region including cAMP responsive element binding protein, SMAD3, KLF3, EGR1, SOX/SRY, PAX2/PAX5 [10] and two SNP regions have been identified as well. [11] The transcription factors predicted to bind the TM6SF2 promoter suggest this protein functions in growth and tumor regulation as well as sex determination to a lesser extent.
The TM6SF2 protein contains 377 amino acids and is 42,554 Da large with an isoelectric point of about 7.7. [12]
There is a domain of unknown function, DUF2781 ( pfam10914 family) spanning amino acids 218 to 359 in the C-terminus of the protein. [6] There are nine transmembrane regions in this protein. The first one contains the signal peptide which is eventually cleaved following protein localization to the ER. A terminal KHHQ sequence is an endoplasmic reticulum retention signal. [13]
Several alpha helices and beta strands are formed by the mature protein with as many as thirteen helices (including transmembrane helices) and fifteen beta sheets predicted. [14]
The protein side groups in this protein do not necessarily interact in a manner to form tertiary and quaternary structures. The cysteines present are not predicted to form stable disulfide bonds. [15]
Two main post-translational modifications occur; phosphorylation at tyrosine, serine and tryptophan sites and two low probability sumoylation sites. [16]
In humans, TM6SF2 expression has been documented in the adult stage only specifically in the intestine and liver in moderate amounts as well as embryonic tissue and ovary at low levels. Other sources indicate expression in brain, lung, testis, stomach, heart, colon, kidney and adipose tissue. [17]
Protein subcellular localization studies with confocal microscopy demonstrated that TM6SF2 is localized in the endoplasmic reticulum and the ER-Golgi intermediate compartment of human liver cells. [18]
No known protein-protein interactions have been established thus far. [19] [20] [21]
In a study that used pre-made kits to predict cardiac allograft rejection using peripheral blood only, graft rejection was associated with decreased levels of TM6SF2 expression, alongside other genes. [22]
A variant TM6SF2 gene causes susceptibility to nonalcoholic fatty liver disease due to impaired very low density lipoprotein (VLDL) production. [23]
TM6SF2 inhibition was associated with reduced secretion of TG-rich lipoproteins (TRLs) and increased cellular TG concentration and lipid droplet content, whereas TM6SF2 overexpression reduced liver cell steatosis. TM6SF2 is a regulator of liver fat metabolism with opposing effects on the secretion of TRLs and hepatic lipid droplet content. [18]
TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long and its cDNA has 1214 base pairs. It was previously designated C16orf42.
C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.
Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.
OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.
Chromosome 1 open reading frame 162 is a protein that in humans is encoded by the C1orf162 gene. It has been found to be hypomethylated in instances of gastric cancer.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
Transmembrane protein 82 (TMEM82) is a protein encoded by the TMEM82 gene in humans.
Transmembrane protein 19 is a protein that in humans is encoded by the TMEM19 gene.