PomBase

Last updated

PomBase
PomBase Logo.png
Content
DescriptionThe scientific resource for Schizosaccharomyces pombe
Data types
captured
Molecular Function, Biological Process, Cellular Component, Phenotype, Genotype, Allele, Protein Modification, Gene Expression, Protein expression, Nucleotide Sequence, RNA sequence, Protein sequence, Genomics, Human Orthologs, Saccharomyces cerevisiae Orthlogs, Complementation, Disease Associations, Protein features, Physical Interactions, Genetic Interactions
Organisms Schizosaccharomyces pombe
Contact
Research center University of Cambridge and University College London
AuthorsAntonia Lock, Midori A Harris, Manuel Lera-Ramírez, Pascal Carme, Kim Rutherford, Juan Mata, Jürg Bähler, Steve Oliver, Valerie Wood
Primary citationRutherford, et al (2024) [1]
Release date2011
Access
Website pombase.org
Download URL Downloads
Miscellaneous
License Creative Commons Attribution 4.0 International license, GNU General Public License, MIT License
Curation policyProfessionally and community curated
Bookmarkable
entities
Yes

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service (described in [2] ).

Contents

Data Curation and Quality Control

An overview of data provided by PomBase and ways to access it. PomBase infographic.jpg
An overview of data provided by PomBase and ways to access it.

PomBase staff manually curate a wide variety of data types using both primary literature and bioinformatics sources, and numerous mechanisms are employed to ensure both syntactical and biological content validity. [3]

Types of data curated include:

Data Organization

Gene annotation can be viewed either at a gene-specific level (on the gene pages) or at a term-specific level (on the ontology term pages). This makes it possible to either:

Genome-wide datasets (including protein datasets, all annotations, manually curated ortholog lists etc) can be accessed from the datasets page. Datasets suitable for display in a genome browser and that have been loaded can be accessed via the PomBase JBrowse instance.

PomBase uses several biological ontologies to capture gene-specific information, including:

Gene Characterization Status

The GO slim page provides an overview of the "biological role" of all "known" fission yeast genes - these are proteins that have either been experimentally characterized in fission yeast, or in another species and transferred by orthology.

Remarkably, nearly 20% of eukaryotic proteomes, from yeast to human, are uncharacterized in terms of the pathways and processes that these proteins participate in, [6] making it one of the great unsolved problems in biology. The role that these proteins play in biology, have not yet been discovered in any species. To aid research into these unknown proteins, PomBase maintains an inventory of uncharacterized fission yeast proteins. The priority unstudied genes list represents the subset of uncharacterized fission yeast genes that are conserved to man, making it an especially high priority research target.

Community co-Curation

To supplement the work of the small team of professional PomBase curators, fission yeast researchers contribute annotations directly to PomBase via an innovative community curation scheme, for which an online curation tool, Canto, [7] has been developed. Community curation is reviewed by PomBase staff, and this results in highly accurate, effectively co-curated, annotations. [8]

PomBase maintains an annotation stats page.

Knowledgebase Updates

Documentation

Pombase provides both documentation and an FAQ.

Usage of PomBase as a research tool is explored in the "Eukaryotic Genomic Databases" (Methods and Protocols) book chapter. [9] Developments and updates are described in the NAR Database Issue papers. [10] [11] [2] For a detailed overview of using S. pombe as a model organism see the genetics primer [12]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<i>Schizosaccharomyces pombe</i> Species of yeast

Schizosaccharomyces pombe, also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measure 3 to 4 micrometres in diameter and 7 to 14 micrometres in length. Its genome, which is approximately 14.1 million base pairs, is estimated to contain 4,970 protein-coding genes and at least 450 non-coding RNAs.

<span class="mw-page-title-main">Biological database</span>

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">PHI-base</span>

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

dcGO is a comprehensive ontology database for protein domains. As an ontology resource, dcGO integrates Open Biomedical Ontologies from a variety of contexts, ranging from functional information like Gene Ontology to others on enzymes and pathways, from phenotype information across major model organisms to information about human diseases and drugs. As a protein domain resource, dcGO includes annotations to both the individual domains and supra-domains.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

esyN is a bioinformatics web-tool for visualizing, building and analysing molecular interaction networks. esyN is based on cytoscape.js and its aim is to make it easy for everybody to perform network analysis. esyN is connected with a number of databases - specifically: pombase, flybase, and most InterMine data warehouses, DrugBank, and BioGRID from which its possible to download the protein protein or genetic interactions for any protein or gene in a number of different organisms.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

<span class="mw-page-title-main">Canto (gene curation tool)</span>

Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Rutherford, Kim M.; Lera-Ramírez, Manuel; Wood, Valerie (2024). "PomBase: A Global Core Biodata Resource—growth, collaboration and sustainability". Genetics. doi: 10.1093/genetics/iyae007 . PMID   38376816.
  2. 1 2 Lock, A; Rutherford, K; Harris, MA; Hayles, J; Oliver, SG; Bähler, J; Wood, V (13 October 2018). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC   6324063 . PMID   30321395.
  3. Wood, V; Carbon, S; Harris, MA; Lock, A; Engel, SR; Hill, DP; Van Auken, K; Attrill, H; Feuermann, M; Gaudet, P; Lovering, RC; Poux, S; Rutherford, KM; Mungall, CJ (September 2020). "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns". Open Biology. 10 (9): 200149. doi: 10.1098/rsob.200149 . PMC   7536087 . PMID   32875947.
  4. Harris MA, Lock A, Bähler J, Oliver SG, Wood V (July 2013). "FYPO: The Fission Yeast Phenotype Ontology". Bioinformatics . 29 (13): 1671–8. doi:10.1093/bioinformatics/btt266. PMC   3694669 . PMID   23658422.
  5. Montecchi-Palazzi, L; Beavis, R; Binz, PA; Chalkley, RJ; Cottrell, J; Creasy, D; Shofstahl, J; Seymour, SL; Garavelli, JS (August 2008). "The PSI-MOD community standard for representation of protein modification data". Nature Biotechnology. 26 (8): 864–6. doi:10.1038/nbt0808-864. PMID   18688235. S2CID   205270043.
  6. Wood, V; Lock, A; Harris, MA; Rutherford, K; Bähler, J; Oliver, SG (28 February 2019). "Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?". Open Biology. 9 (2): 180241. doi:10.1098/rsob.180241. PMC   6395881 . PMID   30938578.
  7. Rutherford KM, Harris MA, Lock A, Oliver SG, Wood V (June 2014). "Canto: an online tool for community literature curation". Bioinformatics . 30 (12): 1791–2. doi:10.1093/bioinformatics/btu103. PMC   4058955 . PMID   24574118.
  8. Lock, A; Harris, MA; Rutherford, K; Hayles, J; Wood, V (1 January 2020). "Community curation in PomBase: enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications". Database: The Journal of Biological Databases and Curation. 2020. doi:10.1093/database/baaa028. PMC   7192550 . PMID   32353878.
  9. Lock, A; Rutherford, K; Harris, MA; Wood, V (2018). "PomBase: The Scientific Resource for Fission Yeast". Eukaryotic Genomic Databases. Methods in Molecular Biology. Vol. 1757. pp. 49–68. doi:10.1007/978-1-4939-7737-6_4. ISBN   978-1-4939-7736-9. PMC   6440643 . PMID   29761456.
  10. Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, Staines DM, Aslett M, Lock A, Bähler J, Kersey PJ, Oliver SG (January 2012). "PomBase: a comprehensive online resource for fission yeast". Nucleic Acids Res. 40 (Database issue): D695–9. doi:10.1093/nar/gkr853. PMC   3245111 . PMID   22039153.
  11. McDowall MD, Harris MA, Lock A, Rutherford K, Staines DM, Bähler J, Kersey PJ, Oliver SG, Wood V (January 2015). "PomBase 2015: updates to the fission yeast database". Nucleic Acids Res. 43 (Database issue): D656–61. doi:10.1093/nar/gku1040. PMC   4383888 . PMID   25361970.
  12. Hoffman CS, Wood V, Fantes PA (October 2015). "An Ancient Yeast for Young Geneticists: A Primer on the Schizosaccharomyces pombe Model System". Genetics . 201 (2): 403–23. doi:10.1534/genetics.115.181503. PMC   4596657 . PMID   26447128.