GPATCH11 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | GPATCH11 , CCDC75, CENP-Y, CENPY, G-patch domain containing 11 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1858435 HomoloGene: 44687 GeneCards: GPATCH11 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
GPATCH11 is a protein that in humans is encoded by the G-patch domain containing protein 11 gene. The gene has four transcript variants encoding two functional protein isoforms and is expressed in most human tissues. The protein has been found to interact with several other proteins, including two from a splicing pathway. In addition, GPATCH11 has orthologs in all taxa of the eukarya domain.
G-patch domain containing protein 11 is a protein that in humans is encoded by the gene GPATCH11 and located on chromosome 2, location 2p22.2. [5] It also contains several aliases including CCDC75, and CENPY. [6] The gene is 14,484 bp long and contains 9 exons. Though the function of the protein is not yet known, it is predicted to serve in nucleic acid binding and protein binding. [6] [7]
G-patch containing protein 11 | |||||||
---|---|---|---|---|---|---|---|
Identifiers | |||||||
Symbol | GPATCH11 | ||||||
Alt. symbols | CCDC75, CENPY | ||||||
NCBI gene | 253635 | ||||||
HGNC | 26768 | ||||||
RefSeq | NP_777591.3 | ||||||
UniProt | Q8N954 | ||||||
Other data | |||||||
Locus | Chr. 2 p22.2 | ||||||
|
GPATCH11 has four predicted transcript variants, though only two are known to code for functional protein. Its longest form is unspliced and contains 9 exons whereas the second functional variant has 7 exons with exons 3 and 4 cut out.
GPATCH11 has a molecular weight of about 33.3 kdal and is 285 amino acids long. [6] [9] It also comes in a second isoform that is 156 amino acids long. The gene contains a G-patch domain and the DUF 4138 domain. The G-patch domain itself is a novel domain found only in eukarya. BLAST searches of the human gene against bacteria, archaea, and viruses, support this finding. [6]
The following is the primary sequence of the long form of GPATCH11:
The protein is rich in glutamic acid and is very highly charged. In addition, it is low in amino acids such as valine, threonine, phenylalanine, and proline. It is a soluble protein and has a nuclear export signal and bipartite nuclear import signal implying that it is localized in the nucleus.
The conserved areas of the protein have a secondary structure composed only of alpha-helices and coiled-coil regions.
The image to the right is the predicted tertiary structure of GPATCH11 based on results obtained from I-tasser. The confidence score was very low though, so reliability is uncertain. However, it does match up with the secondary structure prediction of the protein being composed primarily of alpha-helices and coiled coils.
Protein expression has been found in the endocrine and nervous system, along with the eye, breast, colon, liver, ovary, and 55 other tissues. Gene expression is found to be about 1.1 times the average. The highest expression is found in the brain and spinal cord, followed by the spleen. There are six areas in the brain where GPATCH11 is expressed above average including the olfactory areas, hippocampus, midbrain, pons, medulla, and cerebellum. [10] In addition, expression levels increase in cancerous tissue compared to normal tissue.
Using various tools at ExPASy [11] the following are possible post-translational modifications for GPATCH11.
Protein | Abbreviation | Location | Function |
Brain-specific angiogenesis inhibitor 3 | BAI3 | x | Plays a role in the regulation of synaptogenesis and dendritic spine formation |
Jun proto-oncogene | JUN | Nucleus [12] | Highly similar to the avian viral sarcoma protein, and which interacts directly with specific target DNA sequences to regulate gene expression |
Zinc finger (CCCH type) RNA-binding motif and serine/arginine rich 2 | ZRSR2 | Nucleus [12] | Encodes an essential splicing factor, and may play a role in network interactions during spliceosome assembly. |
U2 small nuclear RNA auxiliary factor 1 | U2AF1 | Nucleus [12] | Plays a critical role in both constitutive and enhancer-dependent splicing |
The interaction between GPATCH11 and BAI3 was found via PSICQUIC, [13] mentha, [13] and STRING. [12] The confidence score given by mentha is only .454, however, according to STRING the interaction between the two proteins has been experimentally determined by a validated two-hybrid approach. The two proteins are thought to have a direct physical interaction. BAI3 is a transmembrane protein and a p53 target gene. BAI3 may regulate the number of excitatory synapses that are formed on the hippocampus neurons, and may be involved in angiogenesis inhibition and suppression of glioblastoma. As GPATCH11does have higher expression than the average gene in the hippocampus and the spinal cord, this could be a real interaction.
The interaction between GPATCH11 and JUN could be real as JUN is both localized in the nucleus and associated with cancers. GPATCH11 tends to have higher expression in cancerous tissue compared to normal tissue, so interaction with other proteins highly expressed in cancers seems plausible.
Finally, the interactions between GPATCH11 and ZRSR2 and GPATCH11 and U2AF1 appear to be real due to the fact that ZRSR2 and U2AF1 are known to interact with each other, and all three proteins are localized in the nucleus.
The protein is found in all taxa of the domain eukarya, including unicellular organisms. Aligning the human gene with the various taxids revealed high conservation in the G-patch domain area and the DUF 4187 area. [6] Alignments with closely related taxids such as birds and reptiles revealed conservation over the majority of the sequence. However, alignments with more distantly related taxids such as fungi and plants had less conservation with identities of less than 40%, though the G-patch domain and the DUF domain still had high conservation. [14] Overall, the protein is composed mainly of charged amino acids, both acidic and basic. There were no regions of sustained non-polarity. This implies that this is not a transmembrane protein as that requires a long region of non-polarity.
When comparing the rate of evolution of GPATCH11 to known proteins such as fibrinogen and cytochrome c, GPATCH11 is evolving quite rapidly, similar to the rate of the fibrinogen protein. An unrooted evolutionary tree [14] can be seen to the right including representatives of species ranging from invertebrates to mammals. This shows the hypothetical relationship of the GPATCH11 sequence among the different taxa, and is supported by divergence time of the taxa from humans as well as sequence identity/similarity.
The protein is highly conserved among the domain eukarya. The table below lists a number of species from all different taxids whose GPATCH11 sequence was compared to the human GPATCH11 sequence. Protein sequence lengths, similarities, and identities are represented, including divergence in millions of years.
Genus and Species | Common Name | Divergence (MYA) [15] | Accession number | Sequence length (amino acids) | Sequence identity (%) | Sequence similarity (%) |
Homo sapiens | Human | 0 | NP_777591.3 | 285 | 100 | 100 |
Equus asinus | African ass | 97.5 | XP_014688350.1 | 285 | 94 | 97 |
Picoides pubescens | Downy woodpecker | 320.5 | XP_009910012.1 | 256 | 73 | 86 |
Merops nubicus | Northern carmine bee-eater | 320.5 | XP_008934567.1 | 258 | 73 | 87 |
Chrysemys picta bellii | Western painted turtle | 320.5 | XP_005296317.1 | 257 | 76 | 89 |
Alligator mississippiensis | American Alligator | 320.5 | XP_006272937.1 | 260 | 71 | 85 |
Xenopus tropicalis | Western clawed frog | 355.7 | NP_001005035.1 | 261 | 63 | 80 |
Neolamprologus brichardi | Fairy (lyretail) cichlid | 429.6 | XP_006807714.1 | 260 | 60 | 78 |
Stegastes partitus | Bicolor damselfish | 429.6 | XP_008301855.1 | 265 | 58 | 78 |
Branchiostoma floridae | Florida lancelet | 743 | XP_002610131.1 | 264 | 45 | 65 |
Saccoglossus kowalevskii | Acorn worm | 747.8 | XP_002731571.2 | 311 | 48 | 67 |
Crassostrea gigas | Pacific oyster | 847 | XP_011417222.1 | 262 | 43 | 61 |
Bombus terrestris | Buff-tailed bumblee | 847 | XP_012173875.1 | 246 | 40 | 63 |
Monomorium pharaonis | Pharaoh ant | 847 | XP_012521549.1 | 248 | 38 | 61 |
Halyomorpha halys | Brown marmorated stink bug | 847 | XP_014272647.1 | 258 | 41 | 61 |
Trichoplax adhaerens | Placozoan | 936 | XP_002108305.1 | 256 | 42 | 60 |
Batrachochytrium dendrobatidis | Chytrid fungus | 1302.5 | XP_006681792.1 | 277 | 31 | 55 |
Saccharomyces cerevisiae | Baker's Yeast | 1302.5 | NP_013373.1 | 274 | 42 | 62 |
Musa acuminata malaccensis | Wild banana | 1513.9 | XP_009405687.1 | 248 | 33 | 51 |
Capsella rubella | Pink Shepherd's-Purse | 1513.9 | XP_006290276.1 | 269 | 33 | 54 |
Elaeis guineensis | African oil palm | 1513.9 | XP_010928444.1 | 253 | 34 | 52 |
Clinical significance is not yet known, however, GPATCH11 is present in much higher amounts in cancerous tissue than normal tissue, and has shown possible protein interaction with oncogenes, so might somehow be involved in cancer.
SOGA2, also known as Suppressor of glucose autophagy associated 2 or CCDC165, is a protein that in humans is encoded by the SOGA2 gene. SOGA2 has two human paralogs, SOGA1 and SOGA3. In humans, the gene coding sequence is 151,349 base pairs long, with an mRNA of 6092 base pairs, and a protein sequence of 1586 amino acids. The SOGA2 gene is conserved in gorilla, baboon, galago, rat, mouse, cat, and more. There is distant conservation seen in organisms such as zebra finches and anoles. SOGA2 is ubiquitously expressed in humans, with especially high expression in brain, colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.
Coiled-coil domain-containing protein 113 also known as HSPC065, GC16Pof6842 and GC16P044152, is a protein that in humans is encoded by the CCDC113 gene. The human CCDC113 gene is located on chromosome 16q21 and encodes 5,304 base pairs of mRNA and 377 amino acids.
Transmembrane protein 53, or TMEM53, is a protein that is encoded on chromosome 1 in humans. It has no paralogs but is predicted to have many orthologs across eukaryotes.
Uncharacterized LOC644249 gene., also known as RP11-195B21.3, is about 1058 base pairs long and is found in Homo sapiens on chromosome 9q12. More specifically, the sequence is located on Chromosome: 9; NC_000009.11(67977457..67987991 bp). This gene’s protein product is the “coiled-coil domain-containing protein 29” which is 291 amino acids long and may contain a conserved domain in the superfamily, pfam 12001. In particular, this conserved domain contains the domain of unknown function DUF3496 which is about 110 amino acids long, functionally uncharacterized, and found in eukaryotes. Other possible motifs for the protein product exist but the DUF3496 remains the most likely. This protein may play a role as a transmembrane protein.
Coiled-coil domain containing 109B (CCDC109B) is a potential calcium uniporter protein found in the membrane of human cells and is encoded by the CCDC109B gene. While CCDC109B is a transmembrane protein it is unclear if it is located within the cell membrane or mitochondrial membrane.
Coiled-coil domain containing 94 (CCDC94), is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
Family with sequence similarity 149, member A is a protein that in humans is encoded by the FAM149A gene. It is well conserved in primates, dog, cow, mouse, rat, and chicken. It has one paralog, FAM149B.
Cilia And Flagella Associated Protein 206 (CFAP206) is a gene that in humans encodes a protein “DUF3508”. This protein has a function that is not currently very well understood. Other known aliases are “dJ382I10.1, UPF0704 Protein C6orf165.” In humans, the gene coding sequence is 56,501 base pairs long, with an mRNA of 2,215 base pairs, and a protein sequence of 622 amino acids. The C6orf165 gene is conserved in chimpanzee, rhesus monkey, dog, cow, mouse, rat, chicken, zebrafish, mosquito, frog, and more C6orf165 is rarely expressed in humans, with relatively high expression in brain, lungs (trachea) and testis. The molecular weight of UPF0704 is 71,193 Da and the PI is 6.38
Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.
Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.
Neuroblastoma breakpoint family, member 1, or NBPF1, is a protein that is encoded by the gene NBPF1 in humans. This protein is member of the neuroblastoma breakpoint family of proteins, a group of proteins that are thought to be involved in the development of the nervous system.
Basal body-orientation factor 1 (BBOF1) is a protein that in humans is encoded by the gene CCDC176, which is located on the plus strand of chromosome 14 at 14q24.3. CCDC176 is neighbored by ALDH6A1 and ENTPD5 at the same locus. The mRNA is 3123 base pairs long and has 12 exons, the protein is 529 amino acids long and has a molecular weight of 61987 Da and a predicted isoelectric point of 9.07 in humans.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
PROSER1 is a protein that in humans is encoded by the PROSER1 gene.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
bMERB domain containing 1 is a gene expressed in humans which has broad expression across the brain. This gene codes for bMERB1 domain-containing protein 1 isoform 1. It is predicted that this gene is involved in actin cytoskeleton regulation, microtubule regulation and glial cell migration.
Coiled-coil domain-containing 184 (CCDC184) is a protein which, in humans, is encoded by the CCDC184 gene