Transmembrane protein 151A, also known as TMEM151A, is a protein that is encoded by the TMEM151A gene. [1]
The gene encoding transmembrane protein 151A is located on the positive strand of chromosome 11q13.2 and has two exons. [1] [2] The gene consists of a total of 2538 nucleotides, 1407 of which encode for the amino acids of the final transcript. [2]
Transmembrane protein 151A has 468 amino acids, weighs 51,278 daltons, and is an integral component of cellular membranes. [1] Transmembrane protein 151A has three transmembrane domains. The N-terminus of the protein is located in the cytosol and the C-terminus is located in the extracellular matrix. According to Compute pI, this protein has a molecular weight of approximately 51 kDa, [3] which is the same molecular weight provided by NCBI protein. [4] Sigma-Aldrich demonstrates that TMEM151A has a molecular weight of approximately 55 kDa [5] which suggests that the protein may undergo post-translational modifications. The theoretical pI of the whole protein is approximately 8. Exon 1 is smaller and more acidic than exon 2. Transmembrane domain 2 is acidic, although it is located in exon 2 which averages to be basic. TMEM151A has no known isoforms, [6] and is a part of family pfam15857 with one paralogue: TMEM151B. [7]
The secondary structure of the protein is predicted to consist of approximately alpha helices and beta sheets (the exact location and number of which depend on the program which is used for the prediction) but no coiled coils. [8] [9] [10] Diagrams of the predicted tertiary structure and transmembrane domains [11] are to the right.
The most abundant amino acids in TMEM151A are alanine and leucine. The least abundant are lysine, methionine, and asparagine. There is less asparagine, isoleucine, and lysine than would be expected for an average protein. The internal compositional analysis of all the individual segments of the protein (ex. transmembrane domain 1, intermembrane space, n-terminus, etc.) all had amino acid abundances that fell into expected ranges except for the c-terminus. Specifically, there is less asparagine, isoleucine, and lysine and more arginine than would be expected for an average protein in the c-terminus. TMEM151A composition is highly conserved between mammals. [12]
TMEM151A in Homo sapiens and other vertebrates (grizzly bears, chickens, and lancelets) have N-myristoylation sites in the N-terminus whereas non vertebrates (capitella teleta and water bear) instead have more phosphorylation sites instead (and one n-myrstyolation site) in the N-terminus. [13] TMEM151A in humans has 3 glycation sites, it may have a nuclear export signal, 2 palmoylation sites, many phosphorylation sites, 1 signal peptide, 1 arginine and lysine cleavage site, and multiple O-BetaGlcNac sites. [14] [15] [16] [17] [18] [19] [20] [21] [22] The table below lists predicted post translational modifications for TMEM151A in several species.
Number of modifications | Human | Ursus arctos horribilis | Gallus gallus | Branchiostoma floridae | Capitella teleta | Ramazzottius varieornatus |
Palmoylation sites [23] | 2 | 2 | 0 | 1 | 2 | 2 |
Transmembrane regions [24] | 3 | 3 | 3 | 3 | 3 | 3 |
Phosphorylation sites* | 743 | 765 | 999 | 574 | 538 | 1076 |
Sumo-interaction [25] | 1 | 1 | 1 | 0 | 0 | 4 |
Net glycation ** [26] | 3 | 3 | N/A | N/A | N/A | N/A |
Nuclear export signals | 3 | 3 | 2 | 0 | 8 | 4 |
Phosphorylation sites [27] | 92 | 91 | 129 | 70 | 54 | 102 |
Another prediction of transmembrane regions [28] | 3 | 3 | 3 | 3 | 3 | 3 |
Prediction of arginine and lysine cleavage sites | 1 | 1 | 0 | 0 | 0 | 0 |
Signal peptide prediction [29] | 1 | 1 | 0 | 1 | 1 | 0 |
Sulfinated tyrosines [30] | 2 | 2 | 1 | 1 | 0 | 1 |
TMHMM transmembrane regions [31] | 3 | 3 | 3 | 3 | 3 | 3 |
O-beta glcnac attachment sites | multiple | multiple | multiple | multiple | multiple | multiple |
TMEM151A protein is predicted to localize in the endoplasmic reticulum for Homo sapiens, mammals, lancelets, and some invertebrates. It is predicted to localize in the nucleus for birds and other invertebrates (Table 1). [32]
Humans | Ursus arctos horribilis | Gallus gallus | Branchiostoma floridae | Capitella teleta | Ramazzottius varieornatus | |
Endoplasmic reticulum | 55.6 | 39.1 | 4.3 | 55.6 | 66.7 | 13 |
Mitochondria | 22.2 | 30.4 | 17.4 | 11.1 | 0 | 8.7 |
Secretory system vesicles | 11.1 | 0 | 4.3 | 0 | 4.3 | |
Plasma membrane | 11.1 | 13 | 8.7 | 22.2 | 0 | 34.8 |
Nuclear | 0 | 8.7 | 52.2 | 0 | 0 | 17.4 |
Cytoplasmic | 0 | 8.7 | 8.7 | 0 | 0 | 4.3 |
Extracellular | 0 | 0 | 4.3 | 0 | 11.1 | 8.7 |
Vacuolar | 0 | 0 | 0 | 11.1 | 11.1 | 0 |
Golgi | 0 | 0 | 0 | 0 | 11.1 | 0 |
Table 1: Predicted localization of TMEM151 in various species
The TMEM151A promoter region consists of 1101 base pairs and it is directly adjacent to the base pairs which code for the first amino acid of the TMEM151A protein. [33] Thousands of transcription factors were predicted to bind on this promoter region. Of those, 20 transcription factors are listed in the table below. Many transcription factors predicted to bind to the promoter region were related to the following categories:
# | Transcription factor description | Matrix similarity |
1 | cAMP-responsive binding element | 1 |
2 | KRAB containing zinc finger protein | 1 |
3 | Huntington's disease gene regulatory region binding proteins (more downstream | 0.859 |
4 | Huntington's disease gene regulatory region binding proteins (more upstream) | 0.888 |
5 | EGR1, early growth response 1 | 0.906 |
6 | GA binding protein transcription factor, alpha (likely involved in nuclear control of mitochondrial function and cytochrome oxidase expression) | 0.935 |
7 | ZF5 POZ domain zinc finger protein | 0.836 |
8 | ZF1-myeloid zinc finger 1 protein | 0.992 |
9 | Nerve growth factor induced protein C | 0.803 |
10 | Estrogen Receptor 2 | 0.914 |
11 | GATA-GATA binding factor 1 | 0.989 |
12 | Pleomorphic adenoma gene 1 (salivary gland tumor) | 1 |
13 | Glial cells (nerve support) missing homolog 1 | 0.953 |
14 | Androgene receptor binding site | 0.935 |
15 | EGR/nerve growth factor induced protein C & related factors | 0.902 |
16 | KRAB domain zinc finger protein 57 | 0.942 |
17 | Estrogen related receptors | 0.924 |
18 | Huntington's disease gene regulatory region binding proteins (most upstream) | 0.850 |
19 | Leucine rich repeat (in FLII) interacting protein 1 | 0.865 |
20 | Krueppel-associated box-containing zinc-finger protein 57 (KRAB-ZFP 57) | 0.960 |
Five different promoters influence the expression of this gene. [2]
Immunohistochemistry demonstrates that TMEM151A RNA is primarily expressed in the brain (specifically the hippocampus, caudate, cerebellum, and pituitary gland), and has low levels of expression in the stomach, adipose tissue, retina, gallbladder, testes, colon, heart muscle, pancreas, salivary gland; a polyclonal rabbit TMEM151A antibody from Sigma Aldrich was used to get these results. These results were listed as “uncertain.” [34]
Unigene microarray analysis shows that Homo sapiens TMEM151a DNA is found at relatively higher levels in the heart and the brain (specifically the frontal and occipital cortexes), and has lower levels of expression in multiple other tissues. [35]
Microarray analysis demonstrates that Mus musculus TMEM151a DNA has increased levels of expression in the heart, and has lower levels of expressions in multiple other tissues. [36] Results from the Allen Brain Atlas demonstrate that the mouse isocortex has increased levels of in situ hybridization with TMEM151A. [37] Microarray analysis of Canis lupus familiaris shows that TMEM151a DNA increased in both the cerebrum and the pancreas. [38]
Transmembrane protein 151A has one paralogue: TMEM151B which has a 50.76% identity with TMEM151A. It is hypothesized that TMEM151A first arose as a gene duplicate of TMEM151B approximately 320 Million Years Ago in reptiles. TMEM151A is evolving at approximately the same rate as Hemoglobin B. [39] [40]
It appears as though TMEM151 is conserved in most if not all vertebrates, and is conserved in many invertebrates (except for Porifera and Cnidaria). The details of this conclusion are listed below:
TMEM151A and TMEM151B are conserved in all of the major mammal orders: Primates, Rodentia, Lagomorpha, Chiropetra, Artiodactyla, Carnivora, Soricomorpha, and Diprotodontia. TMEM151A and TMEM151B were present in the following reptile orders: Squamata and Testudinata. TMEM151B (only) was found in many extant bird orders including: Columbiformes, Caprimulgiformes, Apodiformes, and Cuculiformes. TMEM151A was not present in any bird species. This suggests that TMEM151A evolved via gene duplication approximately 320 million years ago, as reptiles do have both TMEM151A and TMEM151B, but birds only have TMEM151A.
TMEM151B is present in the amphibian order Anura. TMEM151B could not be found in the amphibian order Caudata (newts and salamanders) or Apoda (caecilians) in either BLAST or BLAT (bioinformatics). This absence in amphibians may be due to (1) a lack of records or (2) genetic divergence. TMEM151B is present in several bony fish orders including: perciformes, tetraodontiformes, and siluriformes. It could not be found in the bony fish order lophiiformes. TMEM151B is present in cartilaginous fishes, specifically sharks. TMEM151B was not found for lampreys or rays in BLAST; however, TMEM151B was found in BLAT for lampreys. This suggests that TMEM151B is likely found in many cartilaginous fishes; it is simply not recorded. TMEM151B could not be found in jawless fish in either BLAST or BLAT.
TMEM151B could not be found in tunicates, but it was found in one lancelet. As TMEM151B was found in lancelets, TMEM151B is found in Echinodermata, but not found in Porifera, nor Cnidaria. TMEM151B is also found in Annelida, Mollusca, Nematoda, Tardigrada, and Arthropoda . [41]
According to STRING, TMEM151A is predicted to interact with the following proteins: [42] [2]
Illumina analysis has demonstrated that TMEM151A is one of 336 genes that may be used, in combination with the other genes, to diagnose colorectal cancer and/or the stage of that cancer. Specifically, TMEM151A is significantly upregulated in human monocytes circulating in the blood during colorectal cancer. [43] TMEM151A is in linkage disequilibrium with gene CACNA1C; CACNA1C mutation is significantly associated with bipolar disorder p < 0.05. [44] An unspecified variant of TMEM151A is one of 27 genes (out of 47,296 rare exonic variants that were analyzed) that has been associated with major depression disorder in Mexican Americans. These Mexican Americans experienced hyperactivation of the hypothalamic-pituitary-adrenal axis due to significant stress. [45] Transplanted livers that came from deceased donors had a 4.83-fold upregulated expression of TMEM151A when compared to the expression levels of TMEM151A in livers that came from live donors. [46]
Interferon-inducible GTPase 5 also known as immunity-related GTPase cinema 1 (IRGC1) is an enzyme that in humans is coded by the IRGC gene. It is predicted to behave like other proteins in the p47-GTPase-like and IRG families. It is most expressed in the testis.
Transmembrane protein 241 is a ubiquitous sugar transporter protein which in humans is encoded by the TMEM241 gene.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Chromosome X Open Reading Frame 38 (CXorf38) is a protein which, in humans, is encoded by the CXorf38 gene. CXorf38 appears in multiple studies regarding the escape of X chromosome inactivation.
Tubulin epsilon and delta complex 2 (TEDC2), also known as Chromosome 16 open reading frame 59 (C16orf59), is a protein that in humans is encoded by the TEDC2 gene. Its NCBI accession number is NP_079384.2.
Proline-rich protein 16 (PRR16) is a protein coding gene in Homo sapiens. The protein is known by the alias Largen.
TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
C1orf122 is a gene in the human genome that encodes the cytosolic protein ALAESM.. ALAESM is present in all tissue cells and highly up-regulated in the brain, spinal cord, adrenal gland and kidney. This gene can be expressed up to 2.5 times the average gene in its highly expressed tissues. Although the function of C1orf122 is unknown, it is predicted to be used for mitochondria localization.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
Transmembrane protein 247 is a multi-pass transmembrane protein of unknown function found in Homo sapiens encoded by the TMEM247 gene. Notable in the protein are two transmembrane regions near the c-terminus of the translated polypeptide. Transmembrane protein 247 has been found to be expressed almost entirely in the testes.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
C12orf54 is a protein in humans that is encoded by the C12orf54 gene.