SPMIP10 is a protein that in Homo sapiens is encoded by the SPMIP10 gene.
SPMIP10 (or Sperm Microtubule Inner Protein 10) is also known as Testis Expressed 43, C5orf48, Tseg7, Sperm Associated Microtubule Inner Protein 10, and Testis Specific Expressed Gene 73. [1]
SPMIP10 is located on the plus strand of the long arm of chromosome 5, band 23, sub-band 2 (5q23.2, see the ideogram of the SPMIP10 gene location on chromosome 5). [1]
SPMIP10 is a 478 bp long protein-coding gene. [2] SPMIP10 contains three exons. Exon 1 spans from position 1–116, exon 2 spans from positions 117–225, and exon 3 spans from positions 226–478 in the SPMIP10 DNA sequence. [2]
SPMIP10 has a predicted molecular weight (Mw) of 15.5 kda and a theoretical isoelectric point (pI) of 9.3. Similar predicted molecular weights and theoretical isoelectric points are seen for various close orthologs (mammals, sequence identity >79%). Varying predicted molecular weights and theoretical isoelectric points are seen in distant orthologs (non-mammal vertebrates, sequence identity <79%). [4] [5] [6]
SPMIP10 protein in humans, as well as various closely related organism, has higher levels than normal of histidine and lower than normal levels of alanine. [5]
SPMIP10 contains a domain of unknown function, DUF4513, from positions 33-452. [7]
SPMIP10 has a tertiary structure that includes both beta sheets and alpha helices. [8] [9] These structures, predicted by AlphaFold and iTasser, are shown in the below images.
SPMIP10 mRNA expression data, obtained from NCBI Gene, shows that SPMIP10 is expressed in varying amounts in both fetal (highest between the 10th and 15th week of development) and adult human tissues. [10] There is SPMIP10 expression seen in heart tissues (approximately 0.049 RPKM) and kidney tissues (approximately 0.064 RPKM) at week 10 and in intestine tissues at 15 weeks (approximately 0.016 RPKM) in fetal tissues. [10] RNA sequencing (RNA-seq) of total SPMIP10 RNA from 20 human tissues showed expression levels at approximately 0.064 reads per kilobase, per million mapped reads (RPKM) in cerebellum tissue. Transcription profiling by high throughput sequencing of 16 human tissues indicated high tests expression (approximately 6.5 RPKM) and low expression levels in lymph node and thyroid tissues. [10] RNA-seq of 95 human individuals showed the highest expression levels of SPMIP10 mRNA expression in the testis at approximately 4.6 RPKM with minute amounts seen in colon and small intestine tissue samples. [10]
An experiment, from the Allen Brain Atlas site, indicated low amounts of SPMIP10 expression throughout various structures in the human brain (see SPMIP10 Microarray Expression Schematic in the Human Brain). [11] Higher amounts of expression for SPMIP10 in the human brain were found in the posterior lobe, parietal lobe, and the amygdala. Higher amounts were primarily seen concentrated in the posterior lobe. [11] Table 1 summarizes these findings.
Structure | Location | Function | z-score |
Lobule VIIIA | Posterior Lobe | Vasopressin and Oxytocin production | 4.3958 |
Basomedial nucleus | Amygdala | Decision-making and adaptation of instinctive behaviors inn response to environmental stimuli | 3.7596 |
Superior parietal lobule | Parietal lobe | Sensory perception and integration | 3.114 |
Lobule VIIB | Posterior lobe | Vasopressin and Oxytocin production | 3.0986 |
Lobule VIIIA | Posterior lobe | Vasopressin and Oxytocin production | 3.0757 |
Lobule IX | Posterior lobe | Vasopressin and Oxytocin production | 3.0531 |
Lobule VIIIA | Posterior lobe | Vasopressin and Oxytocin production | 3.0018 |
There is no 5’ UTR for SPMIP10 because its first exon begins at the start of translation. [7]
3’ UTR
The 3' UTR sequence of SPMIP10 in humans is highly conserved in various mammals. It is predicted to contain 3 stem loops. [12] [13] [14]
Utilizing UCSC Genome Browser, a transcription initiation site (Tex43_1) for SPMIP10 was located at positions 126,631,722 - 126,631,782 on chromosome 5 along with two enhancers (E2405703 and E2405704). [15] These findings are depicted in the SPMIP10 Transcription Regulation Diagram.
SPMIP10 protein is predicted to be localized in the nucleus and cytoplasm, primarily. DEEPLOC-2.0 indicates that SPMIP10 is located in the cytoplasm and contains a nuclear export signal at positions 130-134 of the protein. [16] [17]
SPMIP10 has predicted SUMOylation sites (positions 107, 13, 65, 25, 54, 29, and 41), O-glycosylation sites (positions 10 and 122), and phosphoprotein-binding domains (SH2/LCK at position 30, SH2/CISH at position 30, and PBD at position 24). The locations of these modifications are labeled in the Annotated Conserved Post-translational Modifications for SPMIP10 Diagram. [18] [19] [20] [21]
The SPMIP10 protein is only found in vertebrates. [6] Species containing the SPMIP10 protein include mammals (26.5-100% identity), reptiles (40.9-48.1% identity), birds (23.2-41.8% identity), amphibians (27.7-37.1% identity), and fish (27.9-35.5% identity). Table 2 contains twenty orthologs and their respective sequence identity in relation to SPMIP10 in humans. [3] [6] [22]
SPMIP10 | Genus/Species | Common Name | Taxonomic Group | Est. Date of Divergence (MYA) | Accession Number | Sequence Length (aa) | Sequence Identity (%) | Sequence Similarity (%) |
Mammals | Homo sapiens | Humans | Hominidae | 0 MYA | NP_997291.1 | 134 | 100 | 100 |
Lemur catta | Ring-tailed lemur | Primates | 74 MYA | XP_045421967.1 | 134 | 91.8 | 94.8 | |
Callorhinus ursinus | Northern fur seal | Carnivora | 94 MYA | XP_025716752.1 | 134 | 86.6 | 91.8 | |
Pteropus vampyrus | Large flying fox | Chiroptera | 94 MYA | XP_011356632.1 | 134 | 79.1 | 87.3 | |
Phascolarctos cinereus | Koala | Diprotodontia | 160 MYA | XP_020863037.1 | 133 | 58.2 | 70.2 | |
Tachyglossus aculeatus | Australian echidna | Monotremata | 180 MYA | XP_038625455.1 | 236 | 26.5 | 34.6 | |
Reptilia | Crocodylus porosus | Australian saltwater crocodile | Crocodylia | 319 MYA | XP_019410982.1 | 116 | 48.1 | 64.4 |
Gopherus evgoodei | Goodes thornscrub tortoise | Testudines | 319 MYA | XP_030422994.1 | 116 | 46.3 | 59.0 | |
Lacerta agilis | Sand lizard | Squamata | 319 MYA | XP_033019986.1 | 131 | 44.6 | 61.2 | |
Alligator mississippiensis | American alligator | Crocodylia | 319 MYA | XP_059580882.1 | 132 | 40.9 | 52.6 | |
Aves | Dromaius novaehollandiae | Emu | Casuariiformes | 319 MYA | XP_025971178.1 | 108 | 41.8 | 56.7 |
Antrostomus carolinensis | Chuck-wills-widow | Caprimulgiformes | 319 MYA | XP_028940340.1 | 159 | 35.7 | 48.0 | |
Gavia stellata | Red-throated loon | Gaviiformes | 319 MYA | XP_059690006.1 | 145 | 30.1 | 42.8 | |
Nipponia nippon | Crested ibis | Pelecaniformes | 319 MYA | XP_009470769.1 | 90 | 25.0 | 35.1 | |
Buceros rhinoceros silvestris | Rhinoceros hornbill | Bucerotiformes | 319 MYA | XP_010133851.1 | 138 | 23.2 | 39.3 | |
Amphibian | Rana temporaria | Common frog | Anura | 352 MYA | XP_040200566.1 | 155 | 37.1 | 54.7 |
Bufo bufo | Common toad | Anura | 352 MYA | XP_040276142.1 | 169 | 33.7 | 50.3 | |
Geotrypetes seraphini | Gaboon caecilian | Gymnophiona | 352 MYA | XP_033815079.1 | 119 | 32.4 | 50.0 | |
Xenopus tropicalis | Tropical clawed frog | Anura | 352 MYA | XP_002931758.1 | 174 | 27.7 | 42.4 | |
Fish | Protopterus annectens | West African lungfish | Lepidosireniformes | 408 MYA | XP_043916719.1 | 116 | 35.5 | 48.6 |
Labrus bergylta | Labrus bergylta | Labriformes | 429 MYA | XP_020509209.2 | 152 | 32.2 | 42.8 | |
Petromyzon marinus | Sea lamprey | Petromyzontiformes | 563 MYA | XP_032809373.1 | 115 | 30.3 | 47.9 | |
Anabas testudineus | Climbing perch | Perciformes | 429 MYA | XP_026199556.1 | 147 | 27.9 | 44.2 |
Graph 1 shows the corrected sequence divergence vs estimated date of divergence for SPMIP10 compared to Cytochrome C and Fibrinogen Alpha. SPMIP10 evolves at a pace similar to that of Fibrinogen Alpha than.
On the B-tubule of the flagellum microtubule doublets, ENKUR protein interacts with the loop region of the SPMIP10 protein providing flagellum reinforcement in mammalian sperm. [23] SPMIP10 binds closely to ENKUR and envelops itself around the inter-promoter interface of CCDC105, in this regard, SPMIP10 functions as a “staple” while interacting with protofilaments A12 and A11. [23] SPMIP10 enveloping of CCDC105 provides the promoter with stabilization. [24]
A 4bp deletion, resulting in a frameshift mutation (introducing a premature stop condone 33 aa further), of SPMIP10 in mice has been shown to slightly decrease sperm velocity and motility, however not lower rates of fertilization. [25] Wild-type mouse sperm maintained flexibility at both the mid and end pieces of the flagellum, while the SPMIP10 knock-out mouse sperm showed reduced flexibility at the endpiece of the flagellum. [25]
The duplication of SPMIP10 correlates with karyotypically balanced chromosomal rearrangements associates with decreased cognitive abilities as well as craniofacial and hand dysmorphisms. [26]
The depletion of p63 in ME180 cells (human cervical adenocarcinoma epithelial cells) correlates with a decrease of SPMIP10 expression. Wild-type ME180 cells have slightly higher amounts of SMPIP10 expression on average than those that experienced a depletion of p63. [27]
Diseased cells expressing low levels of EVI1 have higher mean expression of SPMIP10 than diseased cells expressing elevated levels. [28]
Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
Chromosome 9 open reading frame 43 is a protein that in humans is encoded by the C9orf43 gene. The gene is also known as MGC17358 and LOC257169. C9orf43 contains DUF 4647 and a polyglutamine repeat region although protein function is not well understood.
Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.
Glutamate-rich protein 4 is encoded by the gene ERICH4 and can be otherwise known as chromosome 19 open reading frame 69 (C19orf69). ERICH4 is highly conserved in mammals and exhibits overexpression in tissues of the kidneys, terminal ileum, and duodenum. The function of ERICH4 has yet to be well understood by the scientific community but is suggested to contribute to immune inflammatory responses.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.
RING Finger Protein 227, also known as RNF227 and LINC02581, is a protein which in humans is encoded by the RNF227 gene. According to DNA microarray data, it is found in at least 15 tissues.
Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Human uncharacterized protein CXorf65 is encoded by the gene CXorf65, which is located on the minus strand of chromosome X. Its transcript is 834 nucleotides long and consists of 6 exons. The translated protein is 183 amino acids in length. with a molecular weight of 21.3 kDa
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Transmembrane protein 248, also known as C7orf42, is a gene that in humans encodes the TMEM248 protein. This gene contains multiple transmembrane domains and is composed of seven exons.TMEM248 is predicted to be a component of the plasma membrane and be involved in vesicular trafficking. It has low tissue specificity, meaning it is ubiquitously expressed in tissues throughout the human body. Orthology analyses determined that TMEM248 is highly conserved, having homology with vertebrates and invertebrates. TMEM248 may play a role in cancer development. It was shown to be more highly expressed in cases of colon, breast, lung, ovarian, brain, and renal cancers.
Maestro heat-like repeat-containing protein family member 9 (MROH9) is a protein which in humans is encoded by the MROH9 gene. The word ‘maestro’ itself is an acronym, standing for male-specific transcription in the developing reproductive organs (MRO). MRO genes belong to the MROH family, which includes MROH9.
UBALD1 is a protein encoded by the UBALD1 gene, located on chromosome 16 in humans. UBALD1 has high ubiquitous tissue expression and localizes in the nucleus and cytoplasm. UBALD1 is conserved in animals, including invertebrates. An alias for UBALD1 is FAM100A.
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)