GPATCH11

Last updated
GPATCH11
Identifiers
Aliases GPATCH11 , CCDC75, CENP-Y, CENPY, G-patch domain containing 11
External IDs MGI: 1858435; HomoloGene: 44687; GeneCards: GPATCH11; OMA:GPATCH11 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_174931
NM_001278505
NM_001322249

NM_181649

RefSeq (protein)

NP_857632
NP_001390142

Location (UCSC) Chr 2: 37.08 – 37.1 Mb Chr 17: 79.14 – 79.16 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

GPATCH11 is a protein that in humans is encoded by the G-patch domain containing protein 11 gene. The gene has four transcript variants encoding two functional protein isoforms and is expressed in most human tissues. The protein has been found to interact with several other proteins, including two from a splicing pathway. In addition, GPATCH11 has orthologs in all taxa of the eukarya domain.

Gene

G-patch domain containing protein 11 is a protein that in humans is encoded by the gene GPATCH11 and located on chromosome 2, location 2p22.2. [5] It also contains several aliases including CCDC75, and CENPY. [6] The gene is 14,484 bp long and contains 9 exons. Though the function of the protein is not yet known, it is predicted to serve in nucleic acid binding and protein binding. [6] [7]

G-patch containing protein 11
Identifiers
SymbolGPATCH11
Alt. symbolsCCDC75, CENPY
NCBI gene 253635
HGNC 26768
RefSeq NP_777591.3
UniProt Q8N954
Other data
Locus Chr. 2 p22.2
Search for
Structures Swiss-model
Domains InterPro

mRNA

GPATCH11 has four predicted transcript variants, though only two are known to code for functional protein. Its longest form is unspliced and contains 9 exons whereas the second functional variant has 7 exons with exons 3 and 4 cut out.

This figure depicts the transcript variants of GPATCH11 and was obtained from NCBI Aceview. Transcripts a and b have known functioning proteins. GPATCH11 Transcripts.png
This figure depicts the transcript variants of GPATCH11 and was obtained from NCBI Aceview. Transcripts a and b have known functioning proteins.

Protein

GPATCH11 has a molecular weight of about 33.3 kdal and is 285 amino acids long. [6] [9] It also comes in a second isoform that is 156 amino acids long. The gene contains a G-patch domain and the DUF 4138 domain. The G-patch domain itself is a novel domain found only in eukarya. BLAST searches of the human gene against bacteria, archaea, and viruses, support this finding. [6]

Primary structure

The following is the primary sequence of the long form of GPATCH11:

Human GPATCH11 protein sequence: The yellow region depicts the G-patch domain, while the blue region depicts the DUF domain. Primary Sequence of GPATCH11.png
Human GPATCH11 protein sequence: The yellow region depicts the G-patch domain, while the blue region depicts the DUF domain.

The protein is rich in glutamic acid and is very highly charged. In addition, it is low in amino acids such as valine, threonine, phenylalanine, and proline. It is a soluble protein and has a nuclear export signal and bipartite nuclear import signal implying that it is localized in the nucleus.

This is the predicted tertiary structure of GPATCH11 obtained from I-tasser. GPATCH11 Tertiary Structure.png
This is the predicted tertiary structure of GPATCH11 obtained from I-tasser.

Secondary structure

The conserved areas of the protein have a secondary structure composed only of alpha-helices and coiled-coil regions.

Tertiary structure

The image to the right is the predicted tertiary structure of GPATCH11 based on results obtained from I-tasser. The confidence score was very low though, so reliability is uncertain. However, it does match up with the secondary structure prediction of the protein being composed primarily of alpha-helices and coiled coils.

Protein expression

Protein expression has been found in the endocrine and nervous system, along with the eye, breast, colon, liver, ovary, and 55 other tissues. Gene expression is found to be about 1.1 times the average. The highest expression is found in the brain and spinal cord, followed by the spleen. There are six areas in the brain where GPATCH11 is expressed above average including the olfactory areas, hippocampus, midbrain, pons, medulla, and cerebellum. [10] In addition, expression levels increase in cancerous tissue compared to normal tissue.

Predicted Post-Translational Modification

Using various tools at ExPASy [11] the following are possible post-translational modifications for GPATCH11.

Protein Interaction

ProteinAbbreviationLocationFunction
Brain-specific angiogenesis inhibitor 3 BAI3xPlays a role in the regulation of synaptogenesis and dendritic spine formation
Jun proto-oncogene JUNNucleus [12] Highly similar to the avian viral sarcoma protein, and which interacts directly with specific target DNA sequences to regulate gene expression
Zinc finger (CCCH type) RNA-binding motif and serine/arginine rich 2 ZRSR2Nucleus [12] Encodes an essential splicing factor, and may play a role in network interactions during spliceosome assembly.
U2 small nuclear RNA auxiliary factor 1 U2AF1Nucleus [12] Plays a critical role in both constitutive and enhancer-dependent splicing

The interaction between GPATCH11 and BAI3 was found via PSICQUIC, [13] mentha, [13] and STRING. [12] The confidence score given by mentha is only .454, however, according to STRING the interaction between the two proteins has been experimentally determined by a validated two-hybrid approach. The two proteins are thought to have a direct physical interaction. BAI3 is a transmembrane protein and a p53 target gene. BAI3 may regulate the number of excitatory synapses that are formed on the hippocampus neurons, and may be involved in angiogenesis inhibition and suppression of glioblastoma. As GPATCH11does have higher expression than the average gene in the hippocampus and the spinal cord, this could be a real interaction.

The interaction between GPATCH11 and JUN could be real as JUN is both localized in the nucleus and associated with cancers. GPATCH11 tends to have higher expression in cancerous tissue compared to normal tissue, so interaction with other proteins highly expressed in cancers seems plausible.

Finally, the interactions between GPATCH11 and ZRSR2 and GPATCH11 and U2AF1 appear to be real due to the fact that ZRSR2 and U2AF1 are known to interact with each other, and all three proteins are localized in the nucleus.

Evolutionary History

The protein is found in all taxa of the domain eukarya, including unicellular organisms. Aligning the human gene with the various taxids revealed high conservation in the G-patch domain area and the DUF 4187 area. [6] Alignments with closely related taxids such as birds and reptiles revealed conservation over the majority of the sequence. However, alignments with more distantly related taxids such as fungi and plants had less conservation with identities of less than 40%, though the G-patch domain and the DUF domain still had high conservation. [14] Overall, the protein is composed mainly of charged amino acids, both acidic and basic. There were no regions of sustained non-polarity. This implies that this is not a transmembrane protein as that requires a long region of non-polarity.

Obtained via SDSC Biology Workbench. The tree encompasses a representative orthologs within the eukarya domain. Unrooted phylogenetic tree.png
Obtained via SDSC Biology Workbench. The tree encompasses a representative orthologs within the eukarya domain.

When comparing the rate of evolution of GPATCH11 to known proteins such as fibrinogen and cytochrome c, GPATCH11 is evolving quite rapidly, similar to the rate of the fibrinogen protein. An unrooted evolutionary tree [14] can be seen to the right including representatives of species ranging from invertebrates to mammals. This shows the hypothetical relationship of the GPATCH11 sequence among the different taxa, and is supported by divergence time of the taxa from humans as well as sequence identity/similarity.

Homology

The protein is highly conserved among the domain eukarya. The table below lists a number of species from all different taxids whose GPATCH11 sequence was compared to the human GPATCH11 sequence. Protein sequence lengths, similarities, and identities are represented, including divergence in millions of years.

Genus and SpeciesCommon NameDivergence (MYA) [15] Accession numberSequence length (amino acids)Sequence identity (%)Sequence similarity (%)
Homo sapiens Human0NP_777591.3285100100
Equus asinus African ass97.5XP_014688350.12859497
Picoides pubescens Downy woodpecker320.5XP_009910012.12567386
Merops nubicus Northern carmine bee-eater320.5XP_008934567.12587387
Chrysemys picta bellii Western painted turtle320.5XP_005296317.12577689
Alligator mississippiensis American Alligator320.5XP_006272937.12607185
Xenopus tropicalis Western clawed frog355.7NP_001005035.12616380
Neolamprologus brichardi Fairy (lyretail) cichlid429.6XP_006807714.12606078
Stegastes partitus Bicolor damselfish429.6XP_008301855.12655878
Branchiostoma floridae Florida lancelet743XP_002610131.12644565
Saccoglossus kowalevskii Acorn worm747.8XP_002731571.23114867
Crassostrea gigas Pacific oyster847XP_011417222.12624361
Bombus terrestris Buff-tailed bumblee847XP_012173875.12464063
Monomorium pharaonis Pharaoh ant847XP_012521549.12483861
Halyomorpha halys Brown marmorated stink bug847XP_014272647.12584161
Trichoplax adhaerens Placozoan936XP_002108305.12564260
Batrachochytrium dendrobatidis Chytrid fungus1302.5XP_006681792.12773155
Saccharomyces cerevisiae Baker's Yeast1302.5NP_013373.12744262
Musa acuminata malaccensis Wild banana1513.9XP_009405687.12483351
Capsella rubella Pink Shepherd's-Purse1513.9XP_006290276.12693354
Elaeis guineensis African oil palm1513.9XP_010928444.12533452

Clinical significance

Clinical significance is not yet known, however, GPATCH11 is present in much higher amounts in cancerous tissue than normal tissue, and has shown possible protein interaction with oncogenes, so might somehow be involved in cancer.

Related Research Articles

<span class="mw-page-title-main">LRRN3</span> Protein-coding gene in the species Homo sapiens

Leucine-rich repeat neuronal protein 3, also known as neuronal leucine-rich repeat protein 3 (NLRR-3), is a protein that in humans is encoded by the LRRN3 gene.

<span class="mw-page-title-main">C14orf102</span> Protein-coding gene in the species Homo sapiens

Chromosome 14 open reading frame 102 is a 3810bp protein-encoding gene that is highly conserved among its non-distant orthologs. It contains 20 introns and 8 different RNAs - 7 splice variants and 1 unspliced form - and is located on the reverse strand of chromosome 14 (14q32.11). The protein encoded by this gene belongs to the UPF0614 family of Up-frameshift proteins and has a molecular weight of 132.417 kDa and isoelectric point of 7.88. It is expected to have a protein binding function and localization in the cytoplasm.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

<span class="mw-page-title-main">FAM149A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 149, member A is a protein that in humans is encoded by the FAM149A gene. It is well conserved in primates, dog, cow, mouse, rat, and chicken. It has one paralog, FAM149B.

<span class="mw-page-title-main">CFAP206</span> Protein-coding gene in the species Homo sapiens

Cilia And Flagella Associated Protein 206 (CFAP206) is a gene that in humans encodes a protein “DUF3508”. This protein has a function that is not currently very well understood. Other known aliases are “dJ382I10.1, UPF0704 Protein C6orf165.” In humans, the gene coding sequence is 56,501 base pairs long, with an mRNA of 2,215 base pairs, and a protein sequence of 622 amino acids. The C6orf165 gene is conserved in chimpanzee, rhesus monkey, dog, cow, mouse, rat, chicken, zebrafish, mosquito, frog, and more C6orf165 is rarely expressed in humans, with relatively high expression in brain, lungs (trachea) and testis. The molecular weight of UPF0704 is 71,193 Da and the PI is 6.38

<span class="mw-page-title-main">CCDC47</span> Protein-coding gene in humans

Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.

<span class="mw-page-title-main">FAM167A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.

<span class="mw-page-title-main">FAM63A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">Proser1</span>

PROSER1 is a protein that in humans is encoded by the PROSER1 gene.

<span class="mw-page-title-main">C12orf60</span> Protein-coding gene in humans

Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.

The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">TEX9</span> Protein-coding gene in the species Homo sapiens

Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">Ccdc60</span> Protein-coding gene in humans

Coiled-coil domain containing 60 is a protein that in humans is encoded by the CCDC60 gene that is most highly expressed in the trachea, salivary glands, bladder, cervix, and epididymis.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">ZNF821</span> Zinc Finger 821

Zinc Finger Protein 821, also known as ZNF821, is a protein encoded by the ZNF821 gene. This gene is located on the 16th chromosome and is expressed highly in the testes, moderately expressed in the brain and low expression in 23 other tissues. The protein encoded is 412 amino acids long with 2 Zinc Finger motifs and a 23 amino acid long STPR domain.

<span class="mw-page-title-main">CCDC184</span> Protein found in humans

Coiled-coil domain-containing 184 (CCDC184) is a protein which, in humans, is encoded by the CCDC184 gene

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000152133 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000050668 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "GeneCards - Human Genes | Gene Database | Gene Search". genecards.org. Archived from the original on 2016-02-29. Retrieved 2016-02-29.
  6. 1 2 3 4 5 "National Center for Biotechnology Information". ncbi.nlm.nih.gov. Retrieved 2016-02-29.
  7. "UniProt". uniprot.org. Retrieved 2016-02-29.
  8. "AceView a comprehensive annotation of human and worm genes with mRNAs or ESTsAceView". ncbi.nlm.nih.gov. Retrieved 2016-05-09.
  9. "Ensembl genome browser 83". ensembl.org. Retrieved 2016-02-29.
  10. "ISH Data :: Allen Brain Atlas: Mouse Brain". mouse.brain-map.org. Retrieved 2016-05-09.
  11. ExPASy Proteomics Server
  12. 1 2 3 4 "STRING: functional protein association networks". string-db.org. Retrieved 2016-05-09.
  13. 1 2 PSICQUIC. "PSICQUIC View". ebi.ac.uk. Retrieved 2016-05-09.
  14. 1 2 "SDSC Biology Workbench". workbench.sdsc.edu. Retrieved 2016-02-29.
  15. "TimeTree :: The Timescale of Life". timetree.org. Retrieved 2016-02-29.