CCDC130

Last updated
YJU2B
Identifiers
Aliases YJU2B , coiled-coil domain containing 130, CCDC130, YJU2 splicing factor homolog B
External IDs MGI: 1914986 HomoloGene: 12183 GeneCards: YJU2B
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001294281
NM_026350

RefSeq (protein)

NP_001281210
NP_080626

Location (UCSC) Chr 19: 13.73 – 13.76 Mb Chr 8: 84.98 – 85 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. [5] CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. [6] GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes. [6]

Contents

Function

While the specific function of CCDC130 is still unknown, there have been several studies and research papers identifying it as a component of the U5 portion of the U4/U5/U6 tri-snRNP that helps form Complex B of the human spliceosome after coming together with Complex A. Complex B then undergoes more modifications and conformational changes before becoming a mature spliceosome. In one study, the conservation of spliceosomal components is discussed by comparing the human spliceosome with that of yeast. In this study, CCDC130 is categorized as a known splicing factor and its homolog in yeast is Yju2. [7] This yeast protein is a splicing factor that helps form the complete, active spliceosome and promotes the first step of splicing, which involves cleavage at the 5' splice site of the first exon. [7] Based on this information, it is likely that CCDC130 plays a similar role in the human spliceosome, but due to the higher complexity of the human spliceosome, this protein may perform other functions or a completely different function. Due to its high number of phosphorylation sites, it is likely that this protein is activated and recruited to the spliceosomal complex through phosphorylation or dephosphorylation (see Post-translational modifications). Since this gene is ubiquitously expressed and expressed 2.9 times higher than the average gene, it is clear that this protein plays an integral part in the proper function of the spliceosome. [6]

Gene

Aliases

Coiled-coil domain containing 130 has several aliases, including CCDC130, SB115, LOC81576, and MGC10471.

CCDC130 locus&neighbors CCDC130 locus&neighbors.png
CCDC130 locus&neighbors

Locus

CCDC130 is located on the short arm of chromosome 19 in humans. The exact locus is 19p13.2. The entire gene spans from 13858753-13874106 on the + strand of chromosome 19. [6] CCDC130 is bordered upstream by CACNA1A on the - strand, glatobu, smagly, and socho on the + strand, and downstream by MGC3207, C19orf53, ZSWIM4 on the + strand and joypaw, smeygly, floytobu, smawgly, and wycho on the - strand. [6] Glatobu, smagly, socho, joypaw, smeygly, floytobu, smawgly, and wycho have only been verified by cDNA sequences in GenBank and have no information available about their function. There are also several small genes found within the CCDC130 sequence, with snugly, glytobu, stygly, and glartobu occurring on the + strand and chacho, zoycho, spogly, glotobu, glutobu, and sneygly occurring on the - strand. [6] All of these small genes have extremely low levels of expression (under 3% of the expression of the average gene), with stygly having the highest expression at 2.8% of the average. [6]

Promoter

There were several predicted promoters found for CCDC130 using ElDorado from Genomatix, but the promoter that corresponds the closest to the protein sequence is 760 bases and spans from 13858094-13858853 on chromosome 19. [8]

Homology and evolution

Paralogs

There is only one paralog identified for CCDC130, which is CCDC94, the only other known human member in the CWC16 family of proteins. The two have about 27% identity, most of which is located in the COG5134 domain and at the C-terminus. CCDC94 has three predicted serine phosphorylation sites at positions 213, 220, and 306 that line up with serines in CCDC130 in the multiple sequence alignment and a threonine phosphorylation site that lines up with a phosphorylated serine in CCDC130. [5] [9]

An unrooted phylogenetic tree of the human CCDC130, close orthologs, and several distant homologs. Phylogeny for CCDC130.PNG
An unrooted phylogenetic tree of the human CCDC130, close orthologs, and several distant homologs.

Orthologs and homologs

CCDC130 is a highly conserved protein, with true orthologs present in primates, other mammals, amphibians, reptiles, fish, and even invertebrates, such as insects and marine invertebrates. Bird orthologs have not been found in nucleotide or protein BLASTs [6] There have been homologous genes documented in yeasts and other fungi, as well as plants. It is unclear when the most distant homolog of CCDC130 arose, but it was well before the divergence of autotrophs and heterotrophs

SequenceGenusSpeciesCommon NameDate of Divergence (mya)Accession #Sequence Length (aa) % Identity
1HomosapiensHumanN/ANP_110445396100
2SaimiriboliviensisSquirrel monkey42.6XP_00394175939694
3AiluropodamelanoleucaGiant panda94.2XP_00292106239288
4Canislupus familiarisDog94.2XP_54203139787
5BostaurusCow94.2NP_00106981240086
6SusscrofaWild boar94.2XP_00312339339886
7CricetulusgriseusChinese hamster92.3XP_00350197538379
8MusmusculusMouse92.3NP_08062638578
9SarcophilusharrisiiTasmanian devil162.6XP_00376071136777
10AnoliscarolinensisAnole lizard296XP_00321644337362
11XenopuslaevisAfrican clawed frog371.2NP_00108636538461
12DaniorerioZebrafish400.1NP_99115839056
13TakifugurubripesPufferfish400.1XP_00397231937965
14AmphimedonqueenslandicaSponge716.5XP_00338867129946
15CulexquinquefasciatusMosquito782.7XP_00184611832953
16BombusimpatiensBumblebee782.7XP_00348520231455
17CaenorhabditisremaneiNematode937.5XP_00309440236544
18SchizosaccharomycespombeYeast1215.8NP_595734.229427
19CucumissativusCucumber1369XP_00413511731347

Conserved regions

CCDC130 has two conserved domains and a coiled-coil region. The first is the COG5134 domain which is found to be conserved in cucumbers and likely plays a role in the function of the protein because it is always the most highly conserved region in any multiple sequence alignment. [5] It spans approximately the first 170 amino acids of the protein. The other domain is the DUF572 domain, which is a eukaryotic domain of unknown function that is shared by all of the orthologs and a majority of the more distant homologs. This domain doesn't have a defined range, as different sources have reported different lengths, some saying that it is the entire protein. The coiled-coil region is from 182-214 in the human protein and is rich in charged amino acids. The modified residues are also very well conserved.

Protein

The most abundant variant of CCDC130 is encoded by the second longest open reading frame (ORF), corresponding to a 396 amino acid protein with a molecular weight of 44.8 kDa and an isoelectric point of 8.252. [5] The CCDC130 protein is rich in charged amino acids and deficient in uncharged, non-polar amino acids. [5] Mobyle @ Pasteur predicted CCDC130 to be extremely hydrophilic due to the large numbers of charged and polar amino acids, with no site scoring above zero on the hydrophobicity graph and some sites reaching as low as -6 (F180). There is a region in the coiled coil domain (182-214) in which 14 of 18 amino acids are charged. SAPS analysis predicted that this protein would be unstable. [5] Due to its high hydrophilicity, this protein definitely does not contain transmembrane segments.

Protein sequence for the major form of CCDC130. This peptide is 396 amino acids long, contains a coiled-coil region from 182-214, a domain of unknown function (DUF572), and the COG5134 domain in the N-terminal half CCDC130 proteinsequence.PNG
Protein sequence for the major form of CCDC130. This peptide is 396 amino acids long, contains a coiled-coil region from 182-214, a domain of unknown function (DUF572), and the COG5134 domain in the N-terminal half
Alternative splicing variants of the CCDC130 protein as seen on AceView from NCBI. CCDC130 alternateforms.PNG
Alternative splicing variants of the CCDC130 protein as seen on AceView from NCBI.

Variation

There are 17 different mRNAS produced from the CCDC130 gene. 13 of these mRNAs come from alternative splicing, and the other four are unspliced. [6] There have been four alternative promoters, five alternative polyadenylation sites, and four alternative last exons described. [6] Two instances of intron retention have been described. 14 different proteins have been identified from the CCDC130 gene, all of which contain the DUF572 domain but only five seem to show the coiled-coil stretch. The other three mRNAs were very low quality and were not translated. It was also noted that this gene has the potential to encode several non-overlapping proteins. 45 SNPs have been documented for CCDC130 on NCBI: 29 missense mutations and 16 synonymous mutations that don't change the amino acid. [6]

Conceptual translation of amino acids 269-396 and 3' UTR of CCDC130. Blue markings indicate phosphorylation sites and pink letters indicate charged amino acids. CCDC130 C-terminus.PNG
Conceptual translation of amino acids 269-396 and 3' UTR of CCDC130. Blue markings indicate phosphorylation sites and pink letters indicate charged amino acids.

Post-translational modifications

CCDC130 is a heavily phosphorylated protein, with 31 different phosphorylation sites predicted by NetPhos and 26 of those 31 being located in the C-terminal half of the protein. [9] 17 of 22 serines, 4 of 6 threonines, and 2 of 3 tyrosines predicted had probability scores over .800, indicating a high likelihood that they are true phosphorylation sites. [9] There were six sumoylation sites predicted, but only one of these sites (K177) had a probability score of higher than .500, at .640. [10] The physiological function of sumoylation is still relatively mysterious, but this modification can add a substantial amount of molecular weight onto a protein (11 kDa). 13 glycation sites with probability scores over .500 were predicted, and 10 of the 13 glycated lysines occur in the N-terminal half of the protein. [11] NetOGlyc predicted 11 possible O-glycosylationsites with probability scores over .500, with all 11 occurring in a 64 amino acid span running from T313 to T376. [12] Several of these sites were predicted as both phosphorylation sites and O-glycosylation sites. CCDC130 was not predicted to be sulfated, [13] acetylated, [14] myristoylated, [15] N-glycosylated, [16] C-mannosylated, [17] or undergo any GPI modification. [18]

Secondary structure

There is a long alpha helix sequence predicted in CCDC130 that spans from R121-A211 that was predicted by YASPIN. Other programs for secondary structure analysis, such as PELE, CHOFAS, and SABLE, also predicted alpha helices of varying lengths in this region. [5] [19] There were no consistent predictions for beta sheets in CCDC130.

Interaction information

There are several proteins listed that interact with CCDC130, including EEF1A1, NINL, TRAF2, ZBTB16, ZNF165, and ZNF24. EEF1A1 is a eukaryotic elongation factor that is involved in the binding of aminoacyl-tRNA to the A-site of ribosomes during translation. [20] NINL is a ninein-like protein that is involved in microtubule organization and has calcium ion binding activity. [20] TRAF2, tumor necrosis factor (TNF) receptor associated factor 2, is part of some E3 ubiquitin ligase complexes and is involved in ubiquitinating proteins so they can get degraded by the proteasome. [20] ZBTB16, zinc finger and BTB domain-containing protein 16, is also part of the E3 ubiquitin ligase complex and is most likely involved in substrate recognition. There is also an alternate form of CCDC130 where only 803 bases are transcribed instead of 1433 bases, but there is no additional information provided. [21] ZNF165 and ZNF24 are both zinc finger proteins, which bind DNA and other proteins to regulate transcription. Below is a table of the interacting proteins for CCDC130 assembled by GeneCards. [21] The interactions of CCDC130 with NINL, ZNF24, TRAF2, JUP, GATA5 have been verified by a two-hybrid screen according to STRING, so these interactions do occur. JUP is a plaque protein. GATA5 is a transcription factor that helps activate the promoter for lactase-phlorizin hydrolase. [21] Interactions with CDA, DERA, CDC40, NAA25, DGCR14, NAA20, and PRPF19 have not been verified experimentally, but interactions between gene homologs have been documented in other species according to STRING so these interactions could potentially occur. ZBTB16, EEF1A1, and ZNF165 all have been verified by at least one two-hybrid screen according to MINT. NAT9 was described as a known interactant on I2D. In a study done at the University of the District of Columbia to characterize CCDC130, they have found that it is induced through insulin signaling, is targeted by three different kinases (GSK3, CK1, and CK2), and is a mitochondrial protein.5 The study also shows that CCDC130 can potentially be used as a biomarker for certain types of cancer due to its differential expression in cancer cells. The study specifically mentions that CCDC130 is downregulated in some types of colon cancer, which allowed more cancer cells to be untargeted by the apoptosis pathway.

CCDC130 GEOprofile expressiondata1.png
CCDC130 expression summary2.png

Expression

CCDC130 is a ubiquitously expressed protein, showing some expression level in all tissue and cell samples analyzed. The AceView profile for CCDC130 shows expression levels 2.9 times higher than the average protein. [6] The level of expression varies greatly between tissues, but there is at least some level of expression in every sample. According to NCBI GEO profiles and BioGPS data, the fetal thyroid, adrenal cortex, uterus, prostate, testes, seminiferous tubule, heart, PB-CD4+ T cells, PB-CD8+ T cells, lymph node, lung, thymus, thyroid, leukemia chronic myelogenous K562, and leukemia lymphoblastic molt4 samples all had at expression levels above the 75th percentile for gene expression in at least one of two samples. Gene expression was lower than the 25th percentile in at least one of two samples for cerebellum peduncles, occipital lobe, pons, trigeminal ganglion, subthalamic nucleus, superior cervical ganglion (drastically different expression levels), dorsal root ganglion, fetal liver, uterus corpus, atrioventricular node, appendix, skeletal muscle, cardiac myocytes, tongue, and salivary gland. PB-CD8+ T cells had the highest relative CCDC130 expression and the tongue had the lowest relative expression. For more information about CCDC130 expression, see mouse brain expression data or human brain microarray data from Allen Brain Atlas or differential expression in GEO profiles from NCBI. [6]

Medical information

CCDC130 has shown to be differentially expressed in several cancers, including breast, colon, and pancreatic through microarray studies of cancer cells. [22] It was shown to be down-regulated in colon cancers, suggesting that it could be a biomarker for cancers. There is still research being done on this topic to confirm its function as a cancer identifier. Many websites also say that it is involved in the cell's response to viral infection, but there is no specific information on this nor any elaboration.

Related Research Articles

<span class="mw-page-title-main">SOGA2</span> Protein-coding gene in the species Homo sapiens

SOGA2, also known as Suppressor of glucose autophagy associated 2 or CCDC165, is a protein that in humans is encoded by the SOGA2 gene. SOGA2 has two human paralogs, SOGA1 and SOGA3. In humans, the gene coding sequence is 151,349 base pairs long, with an mRNA of 6092 base pairs, and a protein sequence of 1586 amino acids. The SOGA2 gene is conserved in gorilla, baboon, galago, rat, mouse, cat, and more. There is distant conservation seen in organisms such as zebra finches and anoles. SOGA2 is ubiquitously expressed in humans, with especially high expression in brain, colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.

<span class="mw-page-title-main">TMEM106A</span> Protein-coding gene in the species Homo sapiens

TMEM106A is a gene that encodes the transmembrane protein 106A (TMEM106A) in Homo sapiens. It is located at 17q21.31 on the plus strand next to cancer-related genes NBR1 and BRCA1. The TMEM106A gene contains a domain of unknown function, DUF1356.

<span class="mw-page-title-main">KIAA0922</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 131-like, alternatively named uncharacterized protein KIAA0922, is an integral transmembrane protein encoded by the human gene KIAA0922 that is significantly conserved in eukaryotes, at least through protists. Although the function of this gene is not yet fully elucidated, initial microarray evidence suggests that it may be involved in immune responses. Furthermore, its paralog, prolyl endopeptidase (PREP) whose function is known, provides clues as to the function of TMEM131L.

<span class="mw-page-title-main">Protein FAM46B</span> Protein-coding gene in the species Homo sapiens

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

Coiled-coil domain-containing 37, also known as FLJ40083, is a protein that in humans is encoded by the CCDC37 gene (3q21.3). There is no confirmed function of CCDC37.

<span class="mw-page-title-main">DEPDC1B</span> Protein-coding gene in the species Homo sapiens

DEP Domain Containing Protein 1B also known as XTP1, XTP8, HBV XAg-Transactivated Protein 8, [formerly referred to as BRCC3] is a human protein encoded by a gene of similar name located on chromosome 5.

TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.

<span class="mw-page-title-main">IFFO1</span> Protein-coding gene in the species Homo sapiens

Intermediate filament family orphan 1 is a protein that in humans is encoded by the IFFO1 gene. IFFO1 has uncharacterized function and a weight of 61.98 kDa. IFFO1 proteins play an important role in the cytoskeleton and the nuclear envelope of most eukaryotic cell types.

PRP36 is an extracellular protein in Homo sapiens that is encoded by the PRR36 gene that contains a domain of unknown function, DUF4596, towards the C terminus of the protein. The function of PRP36 is unknown, but high gene expression has been observed in various regions of the brain such as the prefrontal cortex, cerebellum, and the amygdala. PRP36 has one alias: Putative Uncharacterized Protein FLJ22184.

CCDC92, or Limkain beta-2, is a protein which in humans is encoded by the CCDC92 gene. It is likely involved in DNA repair or reduction/oxidation reactions. The gene ubiquitously found in humans and is highly conserved across animals.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

<span class="mw-page-title-main">SHLD1</span> Protein-coding gene in the species Homo sapiens

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.

<span class="mw-page-title-main">Coiled-coil domain containing 74a</span> Protein found in humans

Coiled-coil domain containing 74A is a protein that in humans is encoded by the CCDC74A gene. The protein is most highly expressed in the testis and may play a role in developmental pathways. The gene has undergone duplication in the primate lineage within the last 9 million years, and its only true ortholog is found in Pan troglodytes.

<span class="mw-page-title-main">FAM208b</span> Protein-coding gene in the species Homo sapiens

Protein FAM208B is a protein that in humans is encoded by the FAM208B gene. The gene is also known as "chromosome 10 open reading frame 18" (c10orf18). FAM208B is expressed throughout the body however its function has not been established. FAM208b has been observed to be differentially regulated in various cancers and throughout development. While the exact role of the protein is yet to be established, the significant presence of the protein within humans and throughout the phylogenetic tree depicts a central importance of the gene in normal function.

<span class="mw-page-title-main">Coiled-coil domain containing 166</span> Protein found in humans

Coiled-coil domain containing 166 is a protein that in humans is encoded by the CCDC166 gene. Its function is currently unknown. It contains a coiled-coil domain, hence the current origin of its name. It is primarily expressed in the testes.

<span class="mw-page-title-main">C2orf16</span> Protein-coding gene in the species Homo sapiens

C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.

<span class="mw-page-title-main">Small integral membrane protein 14</span>

Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000104957 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000004994 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 6 7 "CCDC130 Analysis". Biology Workbench. San Diego Supercomputing Center- University of California San Diego. Retrieved 7 May 2013.[ permanent dead link ]
  6. 1 2 3 4 5 6 7 8 9 10 11 12 13 "NCBI". National Library of Medicine. Retrieved 2 April 2013.
  7. 1 2 Fabrizio P, Dannenberg J, Dube P, Kastner B, Stark H, Urlaub H, Lührmann R (November 2009). "The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome". Molecular Cell. 36 (4): 593–608. doi: 10.1016/j.molcel.2009.09.040 . hdl: 11858/00-001M-0000-0010-9378-C . PMID   19941820.
  8. "El Dorado". Genomatix Software GmbH. Archived from the original on 2 December 2021. Retrieved 11 April 2013.
  9. 1 2 3 Blom N, Gammeltoft S, Brunak S (1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID   10600390.
  10. "SUMOplot Analysis Program". Abgent- a WuXi AppTec Company. Retrieved 14 May 2013.
  11. Johansen MB, Kiemer L, Brunak S (2006). "Analysis and prediction of mammalian protein glycation". Glycobiology. 16 (9): 844–53. CiteSeerX   10.1.1.128.831 . doi:10.1093/glycob/cwl009. PMID   16762979.
  12. Julenius K, Mølgaard A, Gupta R, Brunak S (2005). "Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites". Glycobiology. 15 (2): 153–64. doi:10.1093/glycob/cwh151. PMID   15385431.
  13. Monigatti F, Gasteiger E, Bairoch A, Jung E (2002). "The Sulfinator: predicting tyrosine sulfation sites in protein sequences". Bioinformatics. 18 (5): 769–70. doi: 10.1093/bioinformatics/18.5.769 . PMID   12050077.
  14. Kiemer L, Bendtsen JD, Blom N (2005). "NetAcet: prediction of N-terminal acetylation sites". Bioinformatics. 21 (7): 1269–70. doi: 10.1093/bioinformatics/bti130 . PMID   15539450.
  15. Bologna G, Yvon C, Duvaud S, Veuthey AL (2004). "N-Terminal myristoylation predictions by ensembles of neural networks". Proteomics. 4 (6): 1626–32. doi:10.1002/pmic.200300783. PMID   15174132. S2CID   20289352.
  16. R. Gupta, E. Jung, S. Brunak (2004). "Prediction of N-glycosylation sites in human proteins". NetNGlyc1.0. Center for Biological Sequence Analysis- University of Denmark. Retrieved 13 May 2013.
  17. Julenius K (2007). "NetCGlyc 1.0: prediction of mammalian C-mannosylation sites". Glycobiology. 17 (8): 868–76. doi: 10.1093/glycob/cwm050 . PMID   17494086.
  18. "big-PI Predictor". GPI Lipid Anchor Project- I.M.P. Bioinformatics. Archived from the original on 21 July 2020. Retrieved 14 May 2013.
  19. "SABLE Secondary Structure Prediction". Cincinnati Children's Hospital Medical Center. Retrieved 14 May 2013.
  20. 1 2 3 "NextProt". CCDC130 interacting proteins. Swiss Institute of Bioinformatics. Retrieved 14 May 2013.
  21. 1 2 3 "GeneCards". Weizmann Institute of Science. Retrieved 14 May 2013.
  22. Wang Y, Sun G, Ji Z, Xing C, Liang Y (20 January 2012). "Weighted change-point method for detecting differential gene expression in breast cancer microarray data". PLOS ONE. 7 (1): e29860. Bibcode:2012PLoSO...729860W. doi: 10.1371/journal.pone.0029860 . PMC   3262809 . PMID   22276133.

Further reading