TMEM212 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | TMEM212 , transmembrane protein 212 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2685410 HomoloGene: 28471 GeneCards: TMEM212 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. [5] [6] The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. [6] TMEM212 has orthologs in vertebrates but not invertebrates. [7] [8] TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans. [9] [10] [11]
The TMEM212 gene is on chromosome 3 at position 3q26.3 and is located on the plus strand. [6] [13] The gene is encoded from position 171,561,140 to 171,577,108. [14] Its longest isoform consists of 4 exons, a coding sequence of 1881 nucleotides, and an upstream in-frame stop codon. [5] The coding sequence is the 36-660 bases of the gene TMEM212. Other genes in the gene neighborhood include: PLD1, RNU6-348P and FNDC3B. [15]
The gene TMEM212 has 2 isoforms. [14] The mRNA splice variants of the TMEM212 vary at the last exon. One of the variants has spliced out part of the last exon—making it 537 nucleotides shorter. [14] This isofrom has a mRNA sequence of 1343 nucleotides. The later variant has not been acknowledged in NCBI. [13]
TMEM212 has 5 transmembrane regions, making it a transmembrane protein. [5] [16] The 5 transmembrane regions are primarily non-polar amino acids. However, 2 of the transmembrane regions contain a polar, charged amino acid. [5] The protein is 194 amino acids long and has a calculated molecular weight of 21kDa. [6] The isoelectric point is at approximately a pH of 8. [17] [6] There are no internal repeats in the amino acid sequence of TMEM212. [18] [19] In addition, all amino acids are in normal abundance. [18] Human TMEM212 displays a similar molecular weight and isoelectric point to its orthologs as displayed in Table 1.
Table 1: TMEM212 Protein Characteristics in Humans and Orthologs
Organism | Accession # | Molecular weight | Isoelectric Point |
Human | NP_001157908.1 | 21kDa | 8.2 |
Mouse | NP_001157909.1 | 21kDa | 7.6 |
Black Swan | XP_035413176.1 | 21kDa | 6.1 |
Whale Shark | XP_020372667.1 | 20 kDa | 9.1 |
Phyre 2 and Ali2D predict TMEM212 to have a secondary structure rich in alpha helices, specifically in the transmembrane regions. [20] [21] The alpha helices were conserved in orthologs from mammals to reptiles. [21] Additionally, DiANNA predicts TMEM212 to consist of 3 disulfide bonds between 6 cysteine amino acids: C46-C88, C105-C154, and C135-C16. [22] The tertiary structure has 5 regions within the membrane. [23] [16] [24]
According to Genomatix, TMEM212 has 3 possible promoters. However, the most likely promoter for TMEM212 is directly upstream the Gene of TMEM212 and is 1654 base pairs (GXP_277729). [25]
TMEM212 RNA is expressed lowly and ubiquitously in most tissue types (GDS1096). [26] TMEM212 is expressed at a slightly higher level in the ovaries, brain, lungs, heart, kidneys, testes. [14] TMEM212 was expressed in specific parts of the brain including the pons and trigeminal ganglion. [26] Other tissues with moderate expression included the adrenal cortex and the appendix. In all available RNA-sequencing data shows TMEM212 is found in the lungs. [13]
The 5' untranslated region is 35 base pairs long. The 3' untranslated region is 1221 base pairs in length and is located from base 661 to 1881. [5] The lowest energy structure was predicted for the 5' UTR and 3' UTR. Because of the short length, the 5' UTR was predicted to have 1 stem-loop. The 3' UTR is predicted to have 17 stem-loops. [27]
It is predicted that TMEM212 have two sulfation sites and one phosphorylation site. [28] [29] There is a potential cleavage signal peptide between amino acids 23 and 24. [30] The presence of the phosphorylation site and cleavage signal peptide is common among orthologs.
The primary subcellular locations include the plasma membrane and endoplasmic reticulum. [6] The subcellular location of TMEM212 is conserved in orthologs. Immunofloursecent staining of TMEM212 antibodies show that TMEM212 is present in the nucleus, but the reason remains unknown. [31] TMEM212 is less abundant than most proteins in humans. [32]
TMEM212 has one known paralog: Membrane-spanning 4 domains A7 (MS4A7). [6] The gene is located on chromosome 11 at 11q12.2. MS4A7 likely evolved from TMEM212 435-475 million years ago. This is shown on the right where the divergence rates of different proteins are compared.
TMEM212 in Homo sapiens is highly conserved. It is found in vertebrates but not invertebrates and has many orthologs including mammals, birds, reptiles, amphibians and fish. [7] [33] [8] Table 2 below shows orthologs of TMEM212 in mammals, reptiles, birds, amphibians and fish. As displayed in the multiple sequence alignment to the right, strict orthologs such as mammals and reptiles have highly conserved regions. Most of the conserved areas fell where transmembrane regions are localized. TMEM212 is evolving moderately quickly compared to reference sequences Cytochrome C and Fibrinogen alpha. This is shown to the right when comparing the divergence rates of TMEM212, MS4A7, Cytochrome C, and Fibrinogen Alpha.
Genus and Species | Common Name | Taxonomic Group | Median Date of Divergence (MYA*) | Accession # | Sequence Length (aa) | Sequence Identity to Human Protein (%) | Sequence Similarity to Human Protein (%) |
---|---|---|---|---|---|---|---|
Homo sapiens | Humans | Primates | 0 | NP_001157908.1 | 194 | 100.0 | 100.0 |
Pan troglodytes | Chimpanzee | Primates | 6.4 | * PNI77830.1 | 194 | 99.5 | 99.5 |
Mus musculus | House Mouse | Rodentia | 89 | NP_001157909.1 | 194 | 75.8 | 82.5 |
Orcinus orca | Orca Whale | Cetacea | 94 | XP_033285415.1 | 183 | 76.8 | 84.5 |
Suricata suricatta | Meercat | Carnivora | 94 | XP_029796633.1 | 195 | 74.6 | 81.0 |
Chelydra serpentina | Common Snapping Turtle | Testudines | 318 | KAG6939202.1 | 194 | 64.9 | 74.2 |
Sceloporus undulatus | Eastern Fence Lizard | Squamata | 318 | XP_042315631.1 | 184 | 60.3 | 72.7 |
Crocodylus porosus | Saltwater Crocodile | Crocodylia | 318 | XP_019388281.1 | 186 | 57.7 | 70.6 |
Python bivittatus | Burmese Python | Serpentes | 318 | XP_007424363.2 | 182 | 53.1 | 68.6 |
Cygnus atrat | Black Swan | Anseriformes | 318 | XP_035413176.1 | 196 | 53.1 | 64.3 |
Apteryx mantelli | North Island Brown Kiwi | Struthioniformes | 318 | XP_013800634.1 | 200 | 49.1 | 59.3 |
Tyto alba | Barn Owl | Strigiformes | 318 | KFV59051.1 | 198 | 44.9 | 56.6 |
Rhinatrema bivittatum | Two-lined Caecilians | Gymnophiona | 352 | XP_029472601.1 | 197 | 61.4 | 75.1 |
Geotrypetes seraphini | Gaboon Caecilian | Caeciliidae | 352 | XP_033815221.1 | 197 | 57.9 | 75.1 |
Xenopus laevis | African Clawed Frog | Pipidae | 352 | XP_018119180.1 | 183 | 51.5 | 70.6 |
Xenopus tropicalis | Western Clawed Frog | Pipidae | 352 | XP_002931615.2 | 187 | 50.0 | 70.1 |
Erpetoichthys calabaricus | Reedfish | Polypteriformes | 433 | XP_028648077.1 | 187 | 50.0 | 64.4 |
Polyodon spathula | American Paddlefish | Acipenseriformes | 433 | XP_041127156.1 | 191 | 50.0 | 63.9 |
Amia calva | Bowfin | Amiiformes | 433 | MBN3298530.1 | 198 | 43.7 | 57.8 |
Rhincodon typus | Whale Shark | Orectolobiformes | 465 | XP_020372667.1 | 180 | 48.5 | 61.3 |
Amblyraja radiata | Thorny Skate | Rajiformes | 465 | XP_032886885.1 | 190 | 41.7 | 60.8 |
Petromyzon marinus | Sea Lamprey | Petromyzontiformes | 599 | XP_032834400.1 | 190 | 38.3 | 51.2 |
*MYA = Millions of Years Ago
Three proteins were revealed to potentially interact with TMEM212: TMEM45A, GPR137C, and HNRNPL. [34] [35] [36] [37] These proteins were identified experimentally through co-expression or affinity capture-RNA. They were also identified using textmining. TMEM45A and GPR137 are also found in the plasma membrane, similar to TMEM212 making this interaction likely. [38] [39]
Table 3: Proteins that Interact with TMEM212
Abbreviated Name | Full Name | Basis of Identification | Function |
TMEM45A | Transmembrane protein 45A | co-expression, [35] textmining [34] | enables protein binding |
GPR137C | G protein-coupled receptor 137C | co-expression, [35] textmining [34] | involved in cell signaling and regulation of protein TORC1 |
HNRNPL | Heterogeneous nuclear ribonucleoprotein L | affinity capture-RNA [37] [36] | formation, packaging and processing of mRNA |
TMEM212 may be associated with adiposity in African Americans, facial processing, and sporadic Parkinson's disease. [9] [10] [11] Increases in TMEM212 (mrna or protein) and high BMI in African Americans have shown a link because SNPs at a locus near TMEM212 have been associated with increased adiposity. [11] Moreover, gene TMEM212 may also be involved in facial processing. Facial processing is genetically controlled, and in response to facial expressions, a common SNP found in TMEM212 led to the activation of the right fusiform gyrus area of the brain, which is important for facial processing. [10] This specifically happened at 3q26.31. The SNP is number rs12485367. People with a G alle had higher activation compared to C homozygotes. [10] TMEM212 may also be associated with Parkinson's Disease. Alterations in the SNP rs2270568 (Chromosome 3 position 172048861) in the TMEM212 gene changing the base T to C was associated with and increased incidence of Sporadic Parkinson's Disease. [9]
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
Major facilitator superfamily domain containing 6 like (MFSD6L) is a protein encoded by the MFSD6L gene in humans. The MFSD6L protein is a transmembrane protein that is part of the major facilitator superfamily (MFS) that uses chemiosmotic gradients to facilitate the transport of small solutes across cell membranes.
KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Proline-Rich Protein 23A is a protein that is encoded by the Proline-Rich 23A (PRR23A) gene.
Transmembrane protein 19 is a protein that in humans is encoded by the TMEM19 gene.