SPMAP1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | SPMAP1 , chromosome 17 open reading frame 98, sperm microtubule associated protein 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1919465; HomoloGene: 19140; GeneCards: SPMAP1; OMA:SPMAP1 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sperm microtubule associated protein 1 is a protein which in humans is encoded by the SPMAP1 gene. The protein is derived from Homo sapiens chromosome 17. [5] The SPMAP1 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. [6] SPMAP1 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. [7] [8] SPMAP1 does not belong to any other families nor does it have any isoforms. [9] The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family. [10]
Like most proteins, SPMAP1 is known to be highly expressed in the testes. [11] The protein has also been known to have elevated levels in cancer. [11] The protein has been shown to be expressed in proximity to or within intermediate filaments and the nucleolus. [11] Additionally, SPMAP1 has transcription factors which are also active in hematopoietic stem cells, the immune system, and the cardiovascular system, among others. [12] The gene is over-expressed in many cancer types, including kidney renal clear cell carcinoma and lung squamous cell carcinoma. [13] Motif and transcription factor analysis points towards SPMAP1 playing a role in proliferation, specially in immune cell proliferation.
The SPMAP1 gene consists of 6,303 bases. It has three exons and two large introns. The gene has no alternative splice sites. [14] The 5' UTR sequence of SPMAP1 is highly conserved in primates. No non-mammalian 5' UTR matches were able to be determined. [15] [16] SPMAP1 has 11 Alu repeats. [17]
GeneCards determined that SPMAP1 has five enhancer sequences. The role of the sequences may provide insight into the function of SPMAP1. Four of the five enhancers are active in the thymus. All five enhancers are active in the H1 hESC. Additionally, all five enhancers are active in iPS DF 19.11 derived from foreskin fibroblasts. [18]
The SPMAP1 promoter has many transcription factors binding sites. [19] SPMAP1's transcription factors are commonly found in hematopoietic cells, connective tissue, cardiovascular tissue, and the immune system. The presence of Krueppel Like Transcription Factors suggests a role for SPMAP1 in proliferation or apoptosis. The presence of SMAD indicates an involvement in the TGF-β pathway, while the presence of Myc related transcription factors indicates a potential proliferation function of the protein. Additionally, other SPMAP1 transcription factors, like RBPJ-Kappa are involved in proliferation and signalling.
Numerous SNPs were found in the 5' UTR, 3' UTR, and coding region of SPMAP1. [20] Few SNPs were found in highly conserved regions. In all, four SNPs were found in the highly conserved amino acids. One SNP was found in the start codon sequence. Of these five, three had a SNP on the third position of the codon. Due to the wobble hypothesis, three of the five SNPs would have no effect on the overall protein structure.
SPMAP1 does not have any miRNA binding sites. [21] Its mRNA has low abundance (0.44%). [22] The mRNA sequence has three hexaloops, none of which are significant. [23]
SPMAP1 is a 17.6kDa protein. [8] Distant orthologs are 5 to 6 kDa larger, but some of the discrepancies come from an added NLS sequence, which Homo sapiens does not have There are no positive or negative charge clusters. There are no transmembrane components. The isoelectric point is 9.80 / 17564.67 pI/Mw. [24] SPMAP1 is hydrophobic and soluble.
Secondary structure of SPMAP1 consists of both beta sheets and alpha helices (see diagram on right). Results are confirmed in the tertiary structure, however, alpha helix and beta sheet numbers differ slightly (see diagram on right).
There are no N-terminal signal peptides. Cleavage motifs were not found. There are no ER membrane retention signals, nor peroxisomal targeting signal. SKL2 is not present, thus a secondary peroxisome signal is not present. There are no vacuolar targeting signals. There are no RNA binding motifs or actinin type actin binding motifs. There are no N-myristoylation pattern or prenylation patterns. [25]
Kinase finder at Cuckoo determined kinase binding sites for SPMAP1. There are many Serine/Threonine, and Tyrosine kinase phosphorylation sites. [26] Serine and Threonine kinase binding sites are the most prevalent above the statistically significant threshold. There are no SUMOylation sites. [27] SPMAP1 gene has six sites on the sequence of possible O-GlcNAc sites. [28] Highly conserved O-GlcNAc amino acid sites are 24, 32, 117, and 142. O-GlcNAc post-translational modification occurs on Ser/Thr residues, specifically on oncogenes, tumor suppressors, and proteins involved in growth factor signaling. [29]
SPMAP1 has a Caspase3/7 motif, where either Caspase 3 or 7 would cleave. [30] This supports the idea that SPMAP1 is involved in proliferation, as a proapoptotic caspase would want to destroy any protein driving proliferation. The protein also has a motif where peptidyl-prolyl cis-trans isomerase NIMA interacting 1 (Pin1) binds. [30] Pin1 upregulation is involved in cancer and immune disorders. [31] This supports the claim that SPMAP1 is involved in cancer, immune cells, and perhaps cancers of the immune system. Additionally, SPMAP1 protein has an IBM site, where inhibitors of apoptosis (IAPs) bind. [30] This again supports the idea of SPMAP1 being involved in inhibiting apoptosis, and logically, driving cancer. Furthermore, SPMAP1 has motifs where GRB2's SH2 domain binds. GRB2 is an adapter protein involved in the RAS signaling pathway, a pathway that when deregulated drives uncontrolled proliferation.
A duplication may have occurred at positions 59–71.
Homo sapiens
MAYLSECRLRLEKGFILDGVAVSTAARAYGRSRPKLWSAIPPYNAQQDYHARSYFQ SHVVPPLLRVVPPLLRKTDQDHGGTGRDGWIVDYIHIFGQGQRYLNRRNWAGTGHS LQQVTGHDHYNADLKPIDGFNGRFGYRRNTPALRQSTSVFGEVTHFPLF
Protein abundance in Homo sapiens whole organism is quite low. No data is available for other species. [36] Allen Brain Atlas yields no brain atlas for SPMAP1. [37]
SPMAP1 protein has been found to be expressed in the intermediate filaments and the nucleoli. [38] A SPMAP1 antibody is available from Sigma-Aldrich. [39] Additionally, SPMAP1 localizes in the cytoplasm. Distantly related SPMAP1 orthologs in organisms such as Macrostomum lignano and Amphimedon queenslandica exhibit nuclear expression. [40] Nuclear localization signals are present in distantly related organisms in non-conserved sites. The results of the k-NN prediction is cytoplasmic localization. [41] SPMAP1 is not a signal peptide. [42] The protein is a soluble. [43]
Like most proteins, SPMAP1 protein is highly expressed in the testes. [44] The protein is expressed on adult tissues as well as fetal tissue. The protein has been found to be mildly expressed in connective tissue. [45] Additionally, expression has been seen in the sperm, breast epithelial cells, and various cells of the immune system. [46]
Protein expression is elevated in many cancer patients. Specifically, protein expression has been shown to be high on colorectal, breast, prostate, and lung. [47] SPMAP1 is expressed in papillary thyroid cancer as well. [48] Additionally, mutations were found in SPMAP1 in endometrial, stomach, coloratura, and kidney cancer. [49] SPMAP1 expression is elevated in cancer patients with BRCA. In kidney renal clear cell carcinoma patients, SPMAP1 expression dramatically decreased compared to the non cancerous state. [13] In 80% of chromophobe renal cell carcinoma patients, at least one gene duplication SPMAP1 was present. [13]
Protein expression is lower in males with teratozoospermia as compared to those without. [50] Many Geo Profile experiments have been conducted with SPMAP1, however, none yield data showing significant change in expression. [51]
SPMAP1 is a slow mutating protein. It resembles cytochrome c in its rate of divergence, as determined by the molecular clock equations. [52]
There are no known Homo sapiens paralogs for SPMAP1. [53]
SPMAP1 protein has additional distantly related orthologs across the metazoan kingdom. Its most distant relative is in the sponge family. There is no known ortholog in ctenophores, nematodes, bacteria, fungus, plants, or zebrafish. [10] There are only two fish with the SPMAP1 gene. Model organisms such as Caenorhabditis elegans , and Drosophila melanogaster , do not have the gene.
SPMAP1 Orthologs [10]
Sequence # | Genus and species | Common name | Accession # | Protein length | MYA Div | Seq Id | Confidence |
---|---|---|---|---|---|---|---|
1 | Homo sapiens | Human | NP_001073934 | 154 | 0 | 100% | na |
2 | Camelus ferus | Wild Bactrian camel | XP_006176436 | 154 | 96 | 83% | 2.00E-94 |
3 | Pteropus alecto | Black flying fox | XP_006924784 | 154 | 96 | 81% | 1.00E-92 |
4 | Lipotes vexilifer | Yangtze river dolphin | XP_007465208 | 154 | 96 | 81% | 6.00E-89 |
5 | Condylura cristat | Star-nosed mole | XP_004684322 | 154 | 96 | 75% | 5.00E-78 |
6 | Myotis brandtii | Brandt's bat | EPQ05064 | 171 | 96 | 78% | 6.00E-78 |
7 | Marmata marmata marmata | Alpine marmot | XP_015362150.1 | 154 | 90 | 81% | 3.00E-94 |
8 | Octodon degus | Chilean rodent | XP_004633931 | 153 | 90 | 73% | 1.00E-76 |
9 | Alligator sinensis | Chinese alligator | XP_006022630 | 154 | 312 | 63% | 8.00E-68 |
10 | Anolis carolinensis | Lizard | XP_003222553 | 154 | 312 | 62% | 6.00E-67 |
11 | Xenopus laevis | African clawed frog | XP_018090228 | 244 | 352 | 51% | 4.00E-38 |
12 | Rhincodon typus | Whale shark | XP_020388051.1 | 164 | 476 | 53% | 5.00E-52 |
13 | Acanthaster planci | Starfish | XP_022086463 | 209 | 684 | 48% | 1.00E-37 |
14 | Mizuhopecten yessoensis | Scallop | XP_021340301 | 275 | 797 | 45% | 5.00E-06 |
15 | Lottia gigantea | Sea snail | XP_009063876 | 173 | 797 | 45% | 2.00E-37 |
16 | Lingula anatine | Lamp shell | XP_013388744.1 | 211 | 797 | 43% | 2.00E-35 |
17 | Biomphalaria glabrata | Freshwater snail | XP_013088317 | 198 | 797 | 41% | 6.00E-15 |
18 | Nematostella vectensis | Sea anemone | XP_001629616 | 173 | 824 | 48% | 2.00E-35 |
19 | Stylophora pistillata | Coral | XP_022795125 | 226 | 824 | 46% | 3.00E-38 |
20 | Macrostonum lignano | Flatworm | PAA73615 | 235 | 824 | 36% | 4.00E-25 |
21 | Amphimedon queenslandica | Sponge | XP_003389909 | 275 | 951.8 | 32% | 2.00E-12 |
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
FAM227A is a protein that in humans is encoded by FAM227A gene. Current studies have determined the location of this gene to be in the nuclear region of the cell. FAM227A is most highly expressed in the tissues of the fallopian tube, testis, and pituitary gland. FAM227A is present in species of mammals, birds and reptiles, and gene alignment sequences have shown that FAM227A is a rapidly evolving gene.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Transmembrane protein 125 is a protein that, in humans, is encoded by the TMEM125 gene. It has 4 transmembrane domains and is expressed in the lungs, thyroid, pancreas, intestines, spinal cord, and brain. Though its function is currently poorly understood by the scientific community, research indicates it may be involved in colorectal and lung cancer networks. Additionally, it was identified as a cell adhesion molecule in oligodendrocytes, suggesting it may play a role in neuron myelination.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
RING Finger Protein 227, also known as RNF227 and LINC02581, is a protein which in humans is encoded by the RNF227 gene. According to DNA microarray data, it is found in at least 15 tissues.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
PANO1 is a protein which in humans is encoded by the PANO1 gene. PANO1 is an apoptosis inducing protein that is able to regulate the function of tumor suppressor. More specifically, P14ARF is a protein in which in humans is modulated by the PANO1 gene. P14ARF is known to function as a tumor suppressor. When PANO1 is highly expressed in the cells, it is able to modulate p14ARF by stabilizing it and protecting it from degradation. With a confidence level of 5 out of 5, PANO1 has been theorized to be expressed in the nucleolus of the cell. PANO1 is an intron-less gene. Intron-less genes only make up about 3% of the human genome. A functional analysis of these types of genes revealed that they often have tissue-specific expression in tissues such as the nervous system and testis. This kind of expression is commonly associated with neuropathies, disease, and cancer. The tissue types that PANO1 has the highest expression in, are the cerebellum regions of the brain as well as pituitary and testis tissues.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.