MAP11 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | |||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | GeneCards: ; OMA:- orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
MAP11 (Microtubule-associated protein 11) is a protein that in human is encoded by the gene MAP11. It was previously referred to by the generic name C7orf43. [1] C7orf43 has no other human alias, but in mice can be found as BC037034. [2]
In humans, MAP11 is located in the long arm of human chromosome 7 (7q22.1), and is on the negative (antisense) strand. [1] Genes located around C7orf43 include GAL3ST4, LAMTOR4, GPC2. [1] In humans, C7orf43 has 9 detected common single-nucleotide polymorphisms (SNPs), all of which are located in non-coding regions and thus do not affect amino acid sequence. [3]
MAP11 encodes 2 isoforms, the longest being C7orf43 isoform 1, which is 2585 base pairs long and has with 11 exons and 10 introns. [1] C7orf43 isoform 1 encodes a protein that is 580 amino acids long and only has one polyadenylation site. [1] C7orf43 isoform 2 is 2085 base pairs long and encodes a protein of 311 amino acids. Two additional isoforms has been reported on several occasions, encoding for proteins with 199 and 206 amino acids. [4]
MAP11 has a widespread moderate expression with tissue to tissue variability in humans and across mammalian species. [5] [6] The mouse C7orf43 ortholog has been shown to be ubiquitously expressed in the brain, [7] as well as in the mouse embryonic central nervous system. [8]
MAP11 has one promoter region upstream of its transcription site, as predicted by Genomatix. This promoter is 657 base pairs long and is located at position 99756182 to 99756838 in the negative strand of chromosome 7. [9] There are several transcription factor binding sites located in this promoter, including binding sites for zinc fingers and Kruppel-like transcription factors. [10] The top 20 transcription binding sites as predicted by the ElDorado from Genomatix is listed in the following table.
Detailed Family Information | Detailed Matrix Information | Start Position | End Position | Anchor Position | Strand | Matrix Similarity Score | Sequence |
---|---|---|---|---|---|---|---|
Brachyury gene, mesoderm developmental factor | T-box transcription factor TBX20 | 617 | 645 | 631 | + | 1 | agcagccggAGGTgtcgggaccctctgga |
C2H2 zinc finger transcription factors 2 | KRAB-containing zinc finger protein 300 | 596 | 618 | 607 | + | 1 | ccggccgCCCCagccgggcgcag |
Fork head domain factors | Alternative splicing variant of FOXP1, activated in ESCs | 37 | 53 | 45 | - | 1 | aaaaaaaAACAaccctt |
Pleomorphic adenoma gene | Pleomorphic adenoma gene 1 | 411 | 433 | 422 | - | 1 | gaGGGGgcggggtcccgctgctc |
Pleomorphic adenoma gene | Pleomorphic adenoma gene 1 | 464 | 486 | 475 | - | 1 | gaGGGGgcgtggccgccgaggcc |
RNA polymerase II transcription factor II B | Transcription factor II B (TFIIB) recognition element | 197 | 203 | 200 | + | 1 | ccgCGCC |
TGF-beta induced apoptosis proteins | Cysteine-serine-rich nuclear protein 1 (AXUD1, AXIN1 up-regulated 1) | 73 | 79 | 76 | - | 1 | AGAGtga |
GC-Box factors SP1/GC | Stimulating protein 1, ubiquitous zinc finger transcription factor | 418 | 434 | 426 | - | 0.998 | ggaggGGGCggggtccc |
Human and murine ETS1 factors | Ets variant 3 | 486 | 506 | 496 | - | 0.996 | gagaaacaGGAAgcggaaggg |
Krueppel like transcription factors | Gut-enriched Krueppel-like factor / KLF4 | 469 | 485 | 477 | - | 0.994 | agggggcGTGGccgccg |
Two-handed zinc finger homeodomain transcription factors | AREB6 (Atp1a1 regulatory element binding factor 6) | 495 | 507 | 501 | + | 0.994 | ttcctGTTTctct |
Zinc finger transcription factor RU49, zinc finger proliferation 1 - Zipro1 | Zinc finger transcription factor RU49 (zinc finger proliferation 1 - Zipro 1). RU49 exhibits a strong preference for binding to tandem repeats of the minimal RU49 consensus binding site. | 522 | 528 | 525 | + | 0.994 | cAGTAcc |
Krueppel like transcription factors | Core promoter-binding protein (CPBP) with 3 Krueppel-type zinc fingers (KLF6, ZF9) | 418 | 434 | 426 | - | 0.992 | ggagGGGGcggggtccc |
C2H2 zinc finger transcription factors 7 | Zinc finger protein 263, ZKSCAN12 (zinc finger protein with KRAB and SCAN domains 12) | 425 | 439 | 432 | + | 0.99 | cgccccCTCCtccac |
C2H2 zinc finger transcription factors 6 | Zinc finger and BTB domain containing 7, Proto-oncogene FBI-1, Pokémon (secondary DNA binding preference) | 252 | 264 | 258 | - | 0.989 | caaGACCaccctg |
Krueppel like transcription factors | Kruppel-like factor 7 (ubiquitous, UKLF) | 416 | 432 | 424 | - | 0.989 | agggGGCGgggtcccgc |
GC-Box factors SP1/GC | Sp4 transcription factor | 471 | 487 | 479 | - | 0.986 | ggagggGGCGtggccgc |
Krueppel like transcription factors | Gut-enriched Krueppel-like factor | 137 | 153 | 145 | + | 0.986 | gggctcAAAGgatcctc |
Krueppel like transcription factors | Krueppel-like factor 2 (lung) (LKLF) | 641 | 657 | 649 | - | 0.986 | cgctaGGGTgggtccag |
Human and murine ETS1 factors | Ets variant 1 | 6 | 26 | 16 | - | 0.984 | ttctcccaGGAAgattctcca |
The human protein MAP11 has an isoelectric point of 8.94. MAP11 also has a glycine-rich region spanning amino acids 54 through 134. [11] Analysis using the SAPS tool from the SDSC Biology Workbench showed this glycine-rich region to not be conserved in terms of specific glycine residue positions, but is well conserved in overall glycine content in mammals and reptiles, although not in bony fishes. [12] [13] C7orf43 is mostly uncharged, and this neutral charge distribution is conserved in mammals and reptiles, but bony fishes have at least one negative charge cluster [12] [13] C7orf43 is predicted to have no signal peptide in its first 70 amino acid residues. However, it is predicted to have a vacuolar targeting motif starting at residue 258 in the human protein. [14] This vacuolar targeting motif is shown to be conserved throughout mammals, reptiles, birds, amphibians, and bony fishes.
The MAP11 protein has no paralogs in humans. However, C7orf43 orthologs can be found to be highly conserved in mammals, reptiles, and several species of bony fishes. C7orf43 is also conserved in birds, although several bird species lack parts of the N-terminus. [15] No C7orf43 orthologs can be found outside the animal kingdom. [15] The following table lists representative C7orf43 orthologs across multiple animal classes.
No. | Species | Common Name | Date of Divergence (MYA) | Accession No. | E-value | Length (aa) | Identity (%) | Similarity (%) |
---|---|---|---|---|---|---|---|---|
1 | Homo sapiens | Human | - | NP_060745.3 | 0.0 | 580 | 100 | 100 |
2 | Pan troglodytes | Common Chimpanzee | 6.3 | XP_009452032 | 0.0 | 580 | 99 | 100 |
3 | Macaca mulatta | Macaque | 29.0 | XP_001102238 | 0.0 | 580 | 99 | 99 |
4 | Cavia porcellus | Guinea pig | 92.3 | XP_003470051 | 0.0 | 580 | 98 | 98 |
5 | Sus scrofa | Wild boar | 94.2 | XP_003124386 | 0.0 | 580 | 98 | 99 |
6 | Odobenus rosmarus divergens | Walrus | 94.2 | XP_004399075 | 0.0 | 580 | 98 | 98 |
7 | Tursiops truncates | Common bottlenose dolphin | 94.2 | XP_004315199 | 0.0 | 582 | 92 | 93 |
8 | Echinops telfairi | Lesser hedgehog tenrec | 98.7 | XP_004705644 | 0.0 | 581 | 95 | 97 |
9 | Dasypus novemcinctus | Nine-banded armadillo | 104.2 | XP_004457234 | 0.0 | 580 | 97 | 98 |
10 | Monodelphis domestica | Gray short-tailed opossum | 162.6 | XP_001367097 | 0.0 | 568 | 89 | 92 |
11 | Chrysemys picta bellii | Painted turtle | 296.0 | XP_008175974 | 0.0 | 572 | 76 | 83 |
12 | Alligator mississippiensis | American alligator | 296.0 | XP_006266384 | 0.0 | 582 | 75 | 82 |
13 | Pelodiscus sinensis | Chinese softshell turtle | 296.0 | XP_006127325 | 0.0 | 569 | 73 | 81 |
14 | Xenopus tropicalis | Western clawed frog | 371.2 | NP_001121523 | 0.0 | 580 | 64 | 74 |
15 | Oncorhynchus mykiss | Rainbow trout | 400.1 | CDQ84878 | 0.0 | 581 | 64 | 75 |
16 | Danio rerio | Zebrafish | 400.1 | XP_001339329 | 0.0 | 595 | 63 | 74 |
17 | Oryzias latipes | Japanese rice fish | 400.1 | XP_004076807 | 0.0 | 609 | 62 | 70 |
18 | Takifugu rubripes | Pufferfish | 400.1 | XP_003970822 | 0.0 | 618 | 61 | 71 |
No. | Species | Common Name | Date of Divergence (MYA) | Accession No. | E-value | Length (aa) | Identity (%) | Similarity (%) |
---|---|---|---|---|---|---|---|---|
1 | Nipponia Nippon | Crested ibis | 296.0 | XP_009472339 | 0.0 | 503 | 80 | 88 |
2 | Charadrius vociferous | Killdeer | 296.0 | XP_009892747 | 0.0 | 456 | 82 | 90 |
3 | Pseudopodoces humilis | Ground tit | 296.0 | XP_005533426 | 0.0 | 600 | 66 | 76 |
4 | Latimeria chalumnae | West Indian Ocean coelacanth | 414.9 | XP_006011612 | 3E-177 | 429 | 65 | 75 |
5 | Branchiostoma floridae | Florida lancelet | 713.2 | XP_002592972 | 9E-67 | 557 | 32 | 46 |
6 | Strongylocentrotus purpuratus | Purple sea urchin | 742.9 | XP_003727419 | 3E-46 | 725 | 35 | 51 |
7 | Aplysia californica | California sea slug | 782.7 | XP_005113015 | 4E-21 | 692 | 25 | 39 |
8 | Nematostella vectensis | Starlet sea anemone | 855.3 | XP_001632706 | 4E-19 | 494 | 24 | 39 |
9 | Trichoplax adhaerens | - | - | XP_002108809 | 5E-15 | 645 | 24 | 41 |
C7orf43 has three phosphorylated sites, Ser 517, Thr 541 and, Ser 546. [11] All three sites are relatively well-conserved throughout mammals, reptiles, birds, amphibians, and bony fishes. The protein has no predicted N-myristoylation, as it has no N-terminal glycine. [16] However, C7orf43 is predicted to have one N-acetylation on a serine residue at the N-terminus. [17]
The secondary structure of C7orf43 is yet to be determined. However, C7orf43 is predicted to have no transmembrane domain and to eventually be secreted from the cell. [18] [19] An analysis using the PELE tool from SDSC Biology Workbench predicted mostly beta sheets and random coils that are conserved throughout the strict orthologs. [13] Similarly conserved alpha helix motifs have been predicted, one near the N-terminus and one near the C-terminus.
While no studies have focused on the characterization of C7orf43, several large-scale screenings have revealed information related to C7orf43 function. A study using FLAG affinity purification mass spectrometry (AP-MS) to profile protein interactions in the Hippo signaling pathway identified C7orf43 as one of the interacting proteins. [20] C7orf43 was found to interact with angiomotin-like protein 2 (AMOTL2), also known as Leman Coiled-Coil Protein (LCCP), a regulator of Hippo signaling. [20] [21] AMOTL2 is also known to be an inhibitor of Wnt signaling, a pathway with known associations to cancer development, and to be a factor for angiogenesis, a process essential to tumour maintenance and metastasis. [21]
Several studies have linked C7orf43 to carcinomic events. Other studies have also linked C7orf43 to carcinomic events. A large-scale yeast two-hybrid experiment identified C7orf43 to be interacting with transmembrane protein 50A (TMEM50A), also known as cervical cancer gene 9 or small membrane protein 1 (SMP1). [22] [23] [24] While the exact function of TMEM50A is unknown, it has been associated with cervical cancer.
C7orf43 has also been identified as a target gene of the transcription factor AP-2 gamma (TFAP2C). [25] TFAP2C has been shown to be involved in the development, differentiation, and oncogenesis of mammary tissues. Specifically, TFAP2C has a role in breast carcinoma through its regulatory effect to ESR1 and ERBB2, both of which are receptors whose aberrations have been associated with breast carcinomas. [25] [26] TFAP2C has also been shown to have an oncogenic role by promotion of cell proliferation and tumour growth in neuroblastoma. [27] [28]
Through its location in the q arm of chromosome 7, C7orf43 has been linked to various diseases. Several diseases have been described as having deletions in the q arm of chromosome 7, among them are myeloid disorders, including acute myelogenous leukemia and myelodysplasia. [29]
E3 ubiquitin-protein ligase RNF128 is an enzyme that in humans is encoded by the RNF128 gene.
Uncharacterized protein C1orf21, also known as Proliferation-Inducing Protein 13, is a protein that in humans is encoded by the C1orf21 gene. C1orf21 is an intracellular protein that flows between the nucleus and the cytoplasm in the cell. It has been linked with cell growth and reproduction and there has been strong links with various types of cancers. There are no paralogs for this gene, however, many conserved orthologs have been found in all invertebrates. C1orf21 has low to moderate level of expression in most tissues in humans, however, it has the most expression in the skin, lung and prostate.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.
Chromosome 20 open reading frame 111, or C20orf111, is the hypothetical protein that in humans is encoded by the C20orf111 gene. C20orf111 is also known as Perit1, HSPC207, and dJ1183I21.1. It was originally located using genomic sequencing of chromosome 20. The National Center for Biotechnology Information, or NCBI, shows that it is located at q13.11 on chromosome 20, however the genome browser at the University of California-Santa Cruz (UCSC) website shows that it is at location q13.12, and within a million base pairs of the adenosine deaminase locus. It was also found to have an increase in expression in cells undergoing hydrogen peroxide(H
2O
2)-induced apoptosis. After analyzing the amino acid content of C20orf111, it was found to be rich in serine residues.
Tetratricopeptide repeat 39A is a human protein encoded by the TTC39A gene. TTC39A is also known as DEME-6, KIAA0452, and c1orf34. The function of TTC39A is currently not well understood. The main feature within tetratricopeptide repeat 39A is the domain of unknown function 3808 (DUF3808), spanning almost the entire protein. KIAA0452 can also be seen as an isoform of TTC39A because of differences in genome sequence, but overlap in DUF domain.
Coiled-Coil Domain Containing protein 82 (CCDC82) is a protein that in humans, is encoded for by the gene of the same name, CCDC82. The CCDC82 gene is expressed in nearly all of human tissues at somewhat low rates. As of today, there are no patents involving CCDC82 and the function remains unknown.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
Chromosome 9 open reading frame 152 is a protein that in humans is encoded by the C9orf152 gene. The exact function of the protein is not completely understood.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Chromosome 18 open reading frame 63 is a protein which in humans is encoded by the C18orf63 gene. This protein is not yet well understood by the scientific community. Research has been conducted suggesting that C18orf63 could be a potential biomarker for early stage pancreatic cancer and breast cancer.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. It is a precursor protein that becomes active after cleavage. The function is not yet well understood, but it is suggested to be active during development
C22orf31 is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. The protein has orthologs with high percent similarity in mammals. The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.
Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.