SMIM23

Last updated

SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein. [1] In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans. [2]

Contents

Gene

A table with accession number, chromosome location, strand location, size, and known aliases. Summary Information of the SMIM23 Gene.png
A table with accession number, chromosome location, strand location, size, and known aliases.

SMIM23 is a protein-encoding gene. Basic information about its aliases and chromosome location are given in the table. The schematic of the chromosome helps to visualize the location of the gene.

Human chromosome 5.png

mRNA

While the gene has two splice isoforms (isoforms X1 and X2), it has three exon/exon boundaries indicating four exons (nucleotide 1-105, 106-157, 158-225, and 226-519). [3]

Conceptual Translation of SMIM23. Labeled are the start and stop codon, exon splice sites, polyadenylation signal as well as singly conserved based highlighted in yellow, alpha helices with arrows, transmembrane domain in purple, domains of unknown function in blue, and a repeat domain is underlined. Conceptual Translation of SMIM23.png
Conceptual Translation of SMIM23. Labeled are the start and stop codon, exon splice sites, polyadenylation signal as well as singly conserved based highlighted in yellow, alpha helices with arrows, transmembrane domain in purple, domains of unknown function in blue, and a repeat domain is underlined.

Protein

Physical features

SMIM23 notably has a transmembrane domain.

The predicted isoelectric point for the unmodified/unprocessed protein in mice is 5.779 while only the transmembrane region in humans has an isoelectric point of 5.928 [4]

The gene appears to be Leucine and Glutamic Acid rich though not at any usually high number. It is also weak in all other amino acids besides Alanine, Serine, and Glutamine. [5]

The region underlined in the conceptual translation was predicted to be an Involucrin repeat. [6]

Post-Translational modifications

The transmembrane region is 1674.2 daltons while the whole protein is 200008.51 Da. This is very similar to what was found with UniProt where predicted molecular weight was 20.025 kDa. [7] Antibody kits were investigated to see banding pattern and weight changes that may have occurred post translation. C5orf50 Polyclonal Antibody from ThermoFisher Scientific has a Western Blot banding pattern at 40 kDa. [8] This predicts that there is a significant amount of post-translational modification by addition of large components.

There are many phosphorylation sites along its sequence including two protein kinase C phosphorylation sites, cAMP- and cGMP-dependent protein kinase phosphorylation site, and a tyrosine kinase phosphorylation site. [9] There is also a confident potential C-terminal GPI-Modification Site. [10]

Schematic of protein. Marking locations of notable features that were confirmed with some level of confidence. Here red stands for phosphorylation site and grey stands for C-terminal GPI-modification site. The transmembrane domain in relation to the rest of the protein is shown. Protein Schematic.png
Schematic of protein. Marking locations of notable features that were confirmed with some level of confidence. Here red stands for phosphorylation site and grey stands for C-terminal GPI-modification site. The transmembrane domain in relation to the rest of the protein is shown.

Secondary structure

There are two stretches of alpha helices from amino acid 33 to 49 and 89 to 136 based on evidence from various programs that predict secondary structure. The most informative of all the programs from the ones investigated is PELE on Biology Workbench. [5]

Predicted structure of SMIM23 by I-Tasser program. I-Tasser 3D Prediction of SMIM23.jpg
Predicted structure of SMIM23 by I-Tasser program.

A 3D protein structure was predicted to look like a series of helices, [11] similar to what was predicted by other programs.

Subcellular localization

This human integral membrane protein is predicted to be found in the endoplasmic reticulum. [12] The same kind of investigation of protein localization in other types of species returned conflicting results. Many programs predicted the protein to be present in the cytosol. [12] This suggests the possibility of incorrect naming, i.e. the protein may not be integral membrane due to other predicted locations. This type of conclusion will require further information.

Expression

Not enough consensus exists as to where in the body SMIM23 is expressed. Databases indicate mainly in the testes, [13] but this may be due to the lack of data.

Regulation of Expression

The promoter region of SMIM23 is approximately 1192 nucleotides long with various predicted transcription factors. [14]

Regulation in the secondary structure is a predicted stem-loop in the 5' UTR region with a few areas of conservation across species. [15]

Function and clinical significance

Novel research has suggested that how face shape arises in individuals may be influenced by a set of genes. This set includes SMIM23. [2] Though in the paper the gene is referred to by an alias (C5orf50), it is clear that the scientists have gathered a list of five genes that likely determine facial shape. This is specifically people of European descent. These findings are supported by replicating phenotypes of each specific gene and statistical analysis. Just like findings elsewhere, the article mentions SMIM23 that likely codes for an unknown transmembrane protein. There have also been studies where a set of genes including SMIM23 may influence human height. [16] Furthermore, a great deal of research is being done on chromosome 5 in general to understand roles of certain genes on it including SMIM23. [17] This could one day provide insight into this gene’s specific roles on the chromosome itself.

Interacting proteins

The following proteins are predicted to interact with SMIM23.

Cilia And Flagella Associated Protein 43 also known as CFAP43 or WDR96 is the most confident of the predicted functional partners and is a tryptophan-aspartic acid repeat domain.

SFR1 is SWI5-dependent recombination repair 1 which is a component of the SWI5-SFR1 complex, a complex required for double-strand break repair via homologous recombination.

COL17A1 is collagen. Specifically type XVII, alpha 1. This may play a role in overall protein structure.

PRDM16 binds to DNA and acts as a transcriptional regulator. It functions in the differentiation between white and brown adipose tissue. It can also be a repressor of transforming growth factor-beta signaling. [18]

Homology and evolution

There are no known paralogs.

There are around 100+ known orthologs which range from primates to small ground animals. From these investigations and that of sequence similarity, [19] an ortholog space can be discussed. The closest relatives to humans with the SMIM23 gene were in primates so two types of monkeys were picked which diverged around 29.4 million years ago and had sequence similarities in the high 70s. Slightly more distant relatives with the gene come from a wide variety of animals from horses, to sea mammals, to bats, and more which all have similarities between 62-69%. Lastly, some distantly related orthologs were included like the Tasmanian devil and various scavenger animals which have similarities between 40-61%.

It is interesting to see how some portions are still highly conserved (see conceptual translation above). The most interesting motif is tryptophan 124, leucine 125, and aspartic acid 126. Lastly, in BLAST a protein family of unknown function was returned. There are two small conserved sequences part of the DUF4635 motif (LEQ and DLE). So though not completely conserved in the alignments done with SMIM23, these were labeled in the conceptual translation. [20]

Orthologs

A phylogenetic tree of the SMIM23 gene in various animals as seen in the table included. Abbreviations refer to the different common names i.e. Hu SMIM23 refers to the human gene. Phylogenetic Tree of SMIM23 Orthologs.jpg
A phylogenetic tree of the SMIM23 gene in various animals as seen in the table included. Abbreviations refer to the different common names i.e. Hu SMIM23 refers to the human gene.

The protein was not found in bacteria, archaea, protists, plants, fungi, invertebrate, reptiles, and birds. All the found orthologs were under mammals. [3] An unrooted phylogenetic tree [5] of SMIM23 was created with a few close, moderately related, and distant orthologs (listed in table). Here, larger the distance (length of line), longer the time to last common ancestor. Sequence identity refers to similar amino acids while similarity refers to amino acid match.

Genus and Species [3] Common Name [3] Date of Divergence (MYA) [21] Sequence Identity (%) [5] Sequence Similarity (%) [3]
Cercocebus atys Sooty mangabey 29.4473.877.8
Macaca mulatta Rhesus monkey 29.4473.378.3
Galeopterus variegatus Sunda flying lemur 7656.567
Tupaia chinensisChinese tree shrew8254.766
Castor canadensis American beaver 9054.165
Microtus ochrogaster Prairie vole 9054.764.2
Mustela putorius furo Ferret 9659.962
Equus caballus Horse 965768.2
Odobenus rosmarus Walrus 9659.366.4
Acinonyx jubatus Cheetah 9658.763
Ursus maritimus Polar bear 9658.169.3
Camelus ferus Wild bactrian camel 9655.262.2
Dasypus novemcinctus Nine-banded armadillo 10531.240.2
Echinops telfairi Lesser hedgehog tenrec 1055061
Sarcophilus harrisii Tasmanian devil 15934.747.7
Monodelphis domestica Gray short-tailed opossum 15928.544.6

Related Research Articles

<span class="mw-page-title-main">ITFG3</span> Protein-coding gene in the species Homo sapiens

Protein ITFG3 also known as family with sequence similarity 234 member A (FAM234A) is a protein that in humans is encoded by the ITFG3 gene. Here, the gene is explored as encoded by mRNA found in Homo sapiens. The FAM234A gene is conserved in mice, rats, chickens, zebrafish, dogs, cows, frogs, chimpanzees, and rhesus monkeys. Orthologs of the gene can be found in at least 220 organisms including the tropical clawed frog, pandas, and Chinese hamsters. The gene is located at 16p13.3 and has a total of 19 exons. The mRNA has a total of 3224 bp and the protein has 552 aa. The molecular mass of the protein produced by this gene is 59660 Da. It is expressed in at least 27 tissue types in humans, with the greatest presence in the duodenum, fat, small intestine, and heart.

<span class="mw-page-title-main">TMEM242</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.

<span class="mw-page-title-main">TMEM260</span> Protein-coding gene in the species Homo sapiens

TMEM260 is a protein that in humans is encoded by the TMEM260 gene. The function of TMEM260 is not yet clearly understood. TMEM260 is also known as UPF0679, c14orf101, and FLJ0392.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">Transmembrane protein 268</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">C18orf63</span> Protein-coding gene in the species Homo sapiens

Chromosome 18 open reading frame 63 is a protein which in humans is encoded by the C18orf63 gene. This protein is not yet well understood by the scientific community. Research has been conducted suggesting that C18orf63 could be a potential biomarker for early stage pancreatic cancer and breast cancer.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">LSMEM2</span> Protein-coding gene in the species Homo sapiens

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">TMEM169</span> Gene

Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.

<span class="mw-page-title-main">SMIM19</span> Protein-coding gene in the species Homo sapiens

SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association

<span class="mw-page-title-main">FAM110A</span> Protein-coding gene in the species Homo sapiens

Protein FAM110A, also known as protein family with sequence similarity 110, A, C20orf55 or BA371L19.3 is encoded by the FAM110A gene. FAM110A is located on chromosome 20 and is a part of the greater FAM110 gene family, consisting of FAM110A, FAM110B, and FAM110C.

<span class="mw-page-title-main">TMEM212</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">TMEM19</span> Protein encoded by the TMEM19 gene

Transmembrane protein 19 is a protein that in humans is encoded by the TMEM19 gene.

References

  1. Database, GeneCards Human Gene. "SMIM23 Gene - GeneCards | SIM23 Protein | SIM23 Antibody". www.genecards.org. Retrieved 2017-02-18.
  2. 1 2 Liu, Fan; Lijn, Fedde van der; Schurmann, Claudia; Zhu, Gu; Chakravarty, M. Mallar; Hysi, Pirro G.; Wollstein, Andreas; Lao, Oscar; Bruijne, Marleen de (2012-09-13). "A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans". PLOS Genetics. 8 (9): e1002932. doi: 10.1371/journal.pgen.1002932 . ISSN   1553-7404. PMC   3441666 . PMID   23028347.
  3. 1 2 3 4 5 "SMIM23 small integral membrane protein 23 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-02-26.
  4. Program by Dr. Luca Toldo, developed at http://www.embl-heidelberg.de. Changed by Bjoern Kindler to print also the lowest found net charge. Available at EMBL WWW Gateway to Isoelectric Point Service "EMBL WWW Gateway to Isoelectric Point Service". Archived from the original on 2008-10-26. Retrieved 2014-05-10.
  5. 1 2 3 4 Workbench, NCSA Biology. "SDSC Biology Workbench". workbench.sdsc.edu. Retrieved 2017-04-24.
  6. EMBL-EBI. "RADAR - Rapid Automatic Detection and Alignment of Repeats in protein sequences < EMBL-EBI". www.ebi.ac.uk. Retrieved 2017-05-06.
  7. "SMIM23 - Small integral membrane protein 23 - Homo sapiens (Human) - SMIM23 gene & protein". www.uniprot.org. Retrieved 2017-04-24.
  8. "C5orf50 Antibody". www.thermofisher.com. Retrieved 2017-04-24.
  9. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010; 38(Database issue):D161-6.
  10. Eisenhaber B., Bork P., Eisenhaber F. "Prediction of potential GPI-modification sites in proprotein sequences" JMB (1999) 292 (3), 741-758
  11. "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2017-05-06.
  12. 1 2 "PSORT II server - GenScript". www.genscript.com. Retrieved 2017-04-25.
  13. github.com/gxa/atlas/graphs/contributors, EMBL-EBI Expression Atlas development team. "Expression summary for SMIM23 - homo sapiens < Expression Atlas < EMBL-EBI". www.ebi.ac.uk. Retrieved 2017-04-24.{{cite web}}: |last= has generic name (help)
  14. "Genomatix - NGS Data Analysis & Personalized Medicine". www.genomatix.de. Retrieved 2017-04-29.
  15. "The Mfold Web Server | mfold.rit.albany.edu". unafold.rna.albany.edu. Retrieved 2017-05-06.
  16. Lango Allen, Hana; Estrada, Karol; Lettre, Guillaume; Berndt, Sonja I.; Weedon, Michael N.; Rivadeneira, Fernando; Willer, Cristen J.; Jackson, Anne U.; Vedantam, Sailaja (2010-10-14). "Hundreds of variants clustered in genomic loci and biological pathways affect human height". Nature. 467 (7317): 832–838. Bibcode:2010Natur.467..832L. doi:10.1038/nature09410. ISSN   1476-4687. PMC   2955183 . PMID   20881960.
  17. Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, Steve; Gordon, Laurie A.; Scott, Duncan; Xie, Gary (2004-09-16). "The DNA sequence and comparative analysis of human chromosome 5". Nature. 431 (7006): 268–274. Bibcode:2004Natur.431..268S. doi: 10.1038/nature02919 . ISSN   0028-0836. PMID   15372022.
  18. "STRING: functional protein association networks". string-db.org. Retrieved 2017-04-24.
  19. "The European Bioinformatics Institute - EMBOSS Needle - Pairwise Sequence Alignment". Archived from the original on 2011-04-19.
  20. EMBL-EBI, InterPro. "Protein of unknown function DUF4635 (IPR027880) < InterPro < EMBL-EBI". www.ebi.ac.uk. Retrieved 2017-02-26.
  21. "TimeTree :: The Timescale of Life". www.timetree.org. Retrieved 2017-04-29.

Suggested Reading