Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. [2] SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. [3] SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention. [4]
The SMIM14 gene is located on the minus strand at cytogenetic band 4p14 and is 92,567 base pairs in length. [5] The gene has five exons, four of which constitute the open-reading frame for SMIM14. [6]
The Kozak sequence, which functions as the protein translation initiation site in most eukaryotic mRNA transcripts, is considered a strong motif. [7] There is no signal peptide in SMIM14, but the encoded transmembrane domain acts as the signal sequence. It is predicted that one disulfide bridge is encoded in SMIM14, which stabilizes the tertiary (and sometimes quaternary) structures of proteins. There are at least ten polyadenylation sequences in the 3’ UTR of the SMIM14 gene, indicating transcription termination.
SMIM14 is expressed at four-times the level of an average gene. [8]
SMIM14 has seven predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags is approximately 1,420 base pairs in length. It is found on the minus strand and has a start position at residue 39,638,806 and ends at residue 39,640,225. The identified promoter has five coding transcripts and a maximum of 105,458 CAGE tags from one of the transcripts. [9]
Promoter ID | Start Position | End Position | Length (bp) | Coding Transcripts |
---|---|---|---|---|
GXP_150112 | 39,549,547 | 39,550,812 | 1,266 | 0 |
GXP_3198013 | 39,583,919 | 39,584,958 | 1,040 | 0 |
GXP_9520406 | 39,605,105 | 39,606,144 | 1,040 | N/A |
GXP_9520407 | 39,626,490 | 39,627,529 | 1,040 | N/A |
GXP_6750876 | 39,627,082 | 39,628,121 | 1,040 | 1 |
GXP_3198015 | 39,638,191 | 39,639,230 | 1,040 | 0 |
GXP_6750877 | 39,638,806 | 39,640,225 | 1,420 | 5 |
For the SMIM14 gene, the associated CpG sites are found in CpG island 76; additional transcription factors can bind to this promoter to drive SMIM14 gene expression. [10]
Literature-curated Transcription Factors (via ORegAnno) |
---|
SMARCA4 |
STAT1 |
RBL2 |
TRIM28 |
EGR1 |
TFAP2C |
SMIM14 has three mRNA transcript variants. Transcript variant 1 is the longest variant, with 6,397 base pairs. [2]
Transcript | Length (bp) | Accession Number |
---|---|---|
Transcript variant 1 | 6,397 | NM_001317896.2 |
Transcript variant 2 | 6,252 | NM_174921 |
Transcript variant 3 | 6,263 | NM_001317897 |
SMIM14 has high expression in the liver, adrenal gland, colon, and prostate. It is under-expressed in peripheral blood lymphocytes, skeletal muscles, and the heart. [11]
From SMIM14, transcript variant 1, a protein of 99 amino acids is synthesized. [13]
The predicted molecular weight (Mw) of the SMIM14 protein is 10710.34 Da. The SMIM14 protein carries no electrical charge at a pH value of 5.10 (i.e. isoelectric point, pI). [14] The abundance of every amino acid is within the normal range for humans. [14]
The Kozak sequence is considered a strong motif. [7]
SMIM14 has one transmembrane domain, so it is classified as a single-pass membrane protein. [15] The transmembrane domain extends from residues 51–70. [16] It is predicted that within the domain, there is a dileucine motif, which plays a role in the sorting of transmembrane proteins to endosomes and lysosomes. [17] The N-terminus is positioned in the extracellular space, while the C-terminus is located inside the cell, further classifying SMIM14 as a type I transmembrane protein.
It is predicted that there is an ɑ-helix within the transmembrane domain. [18] It is also predicted that SMIM14 is randomly coiled near the C-terminus. [18] [19] A random coil is regarded as the protein's lack of a secondary structure, so it assumes a relaxed, non-interacting nor stabilizing conformation. It is also predicted that extended strands (E-strands) are throughout the protein. [18] [19] E-strands are a common secondary structure, as well, and are often characterized by their involvement in hydrogen bonding with polar side chains.
Within the N-terminus, SMIM14 is predicted to have three palmitoylation sites, [20] which facilitates the clustering of proteins, and one disulfide bridge, stabilizing the structure of the protein. There is also a predicted glycosaminoglycan site spanning residues 45–48, proximal to the transmembrane domain. [21] The C-terminus is predicted to have two unidentified phosphorylation sites and one PKA-phosphorylation site. [22]
SMIM14, a transmembrane protein, is usually expressed in the ER membrane. [4] While there is no conventional ER retention signal within SMIM14 coding sequences, it has been suggested that the transmembrane domain mediates ER retention.
SMIM14 has no known paralogs and at least 298 orthologs.
Through BLAST, it has been established that there are no paralogs of the SMIM14 gene in Homo sapiens. [23]
SMIM14 is conserved in most vertebrates, excluding hagfish, lampreys, lobe-finned fish, and lungfish. [23] For invertebrates, they are conserved in flatworms, roundworms, mollusks, and arthropods. It is also relatively conserved in distant relatives, such as sea anemones and corals.
Species | Common Name | Taxons | DoD (mya) | % Identity | % Similarity | Corrected % Divergence (m) | Accession Number |
---|---|---|---|---|---|---|---|
Mastomys coucha | Southern multimammate mouse | rodentia | 90 | 87.9 | 98.0 | 12.9 | XP_031198284.1 |
Phyllostomus discolor | pale spear-nosed bat | mammalia | 96 | 93.4 | 99.0 | 6.70 | XP_028361411.1 |
Manacus vitellinus | golden-collared manakin | aves | 312 | 85.1 | 91.1 | 16.1 | XP_017923893.1 |
Python bivittatus | Burmese python | reptilia | 312 | 80.2 | 89.1 | 22.1 | XP_007426519 |
Nanorana parkeri | high Himalaya frog | amphibia | 352 | 69.2 | 79.8 | 36.8 | XP_018420132.1 |
Danio rerio | zebrafish | actinopterygii | 435 | 68.0 | 82.5 | 38.6 | NP_991165.1 |
Rhincodon typus | whale shark | chondrichthyes | 473 | 71.8 | 84.5 | 33.1 | XP_020383770.1 |
Ciona intestinalis | sea vase | ascidiacea | 676 | 42.7 | 55.3 | 85.1 | XP_026690156.1 |
Strongylocentrotus purpuratus | Pacific purple sea urchin | echinodermata | 684 | 50.5 | 68.0 | 68.3 | XP_787363.2 |
Lingula anatina | lamp shell | brachiopoda | 797 | 59.0 | 74.3 | 52.8 | XP_013382479.1 |
Limulus polyphemus | Atlantic horseshoe crab | arthropoda | 797 | 49.5 | 65.0 | 70.3 | XP_013782563.1 |
Agrilus planipennis | emerald ash borer | insecta | 797 | 39.8 | 57.3 | 92.1 | XP_018319678.1 |
Octopus vulgaris | octopus | mollusca | 797 | 51.0 | 64.4 | 67.3 | XP_029637526.1 |
Strongyloides ratti | threadworm | nematoda | 797 | 33.3 | 48.1 | 110 | XP_024504825.1 |
Exaiptasia pallida | sea anemone | anthozoa | 824 | 58.2 | 65.5 | 54.1 | XP_020902189.1 |
Schistosoma haematobium | urinary blood fluke | platyhelminthes | 824 | 37.4 | 53.3 | 98.3 | XP_012793134.1 |
The sequence of the SMIM14 gene is highly conserved in orthologs proximal to the N-terminus. In stark contrast, the C-terminus is more varied across orthologs. Sequence analysis of the SMIM14 gene in humans suggests that the C-terminus encodes a disproportionate amount of proline residues (9 out of 29; 31%) with several proline-rich sequences (PXXP). [4] Proline-rich domains are usually associated with protein-protein interactions; thus, the C-terminus has a high probability of interacting with proteins.
SMIM14 has been predicted to interact with the FATE1 protein, which is involved in the Ca2+ transfer from the ER to mitochondria, a regulatory mechanism for apoptosis. [24] [25] It has also been predicted that SMIM14 interacts with LSM4, a glycine-rich protein that plays a role in pre-mRNA splicing. [26] [27]
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
Transmembrane protein 98 is a single-pass membrane protein that in humans is encoded by the TMEM98 gene. The function of this protein is currently unknown. TMEM98 is also known as UNQ536/PRO1079.
Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.
Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.
Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
Transmembrane and coiled-coil domain 6, TMCO6, is a protein that in humans is encoded by the TMCO6 gene with aliases of PRO1580, HQ1580 or FLJ39769.1.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
Transmembrane protein 179 is a protein that in humans is encoded by the TMEM179 gene. The function of transmembrane protein 179 is not yet well understood, but it is believed to have a function in the nervous system.
TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.
Transmembrane Protein 81 or TMEM81 is a protein that in humans is encoded by the TMEM81 gene. TMEM81 is a poorly-characterized transmembrane protein which contains an extracellular immunoglobulin domain.
Transmembrane protein 247 is a multi-pass transmembrane protein of unknown function found in Homo sapiens encoded by the TMEM247 gene. Notable in the protein are two transmembrane regions near the c-terminus of the translated polypeptide. Transmembrane protein 247 has been found to be expressed almost entirely in the testes.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Transmembrane protein 271, or TMEM271 is a protein in Homo sapiens encoded by the TMEM271 gene, located at 4p16.3 on the minus strand. The protein is located on the plasma membrane of cells and highly expressed in several regions of the brain.
{{cite journal}}
: Cite journal requires |journal=
(help)