PomBase

Last updated March 21, 2024

PomBase
Content

Description	The scientific resource for Schizosaccharomyces pombe
Data types captured	Molecular Function, Biological Process, Cellular Component, Phenotype, Genotype, Allele, Protein Modification, Gene Expression, Protein expression, Nucleotide Sequence, RNA sequence, Protein sequence, Genomics, Human Orthologs, Saccharomyces cerevisiae Orthlogs, Complementation, Disease Associations, Protein features, Physical Interactions, Genetic Interactions
Organisms	Schizosaccharomyces pombe
Contact
Research center	University of Cambridge and University College London
Authors	Antonia Lock, Midori A Harris, Manuel Lera-Ramírez, Pascal Carme, Kim Rutherford, Juan Mata, Jürg Bähler, Steve Oliver, Valerie Wood
Primary citation	Rutherford, et al (2024)^[1]
Release date	2011
Access
Website	pombase.org
Download URL	Downloads
Miscellaneous
License	Creative Commons Attribution 4.0 International license, GNU General Public License, MIT License
Curation policy	Professionally and community curated
Bookmarkable entities	Yes

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service (described in ^[2]).

Data Curation and Quality Control
Data Organization
Gene Characterization Status
Community co-Curation
Knowledgebase Updates
Documentation
References
External links

Data Curation and Quality Control

PomBase staff manually curate a wide variety of data types using both primary literature and bioinformatics sources, and numerous mechanisms are employed to ensure both syntactical and biological content validity.^[3]

Types of data curated include:

Genome sequence and features (e.g. physical location of genes in the genome)
Protein and ncRNA functions, the cellular processes they participate in and where they localize
Phenotypes associated with different alleles and genotypes
Specific protein modification sites and when they occur
Human and budding yeast orthologs of S. pombe genes (manually curated dataset)
Metadata of datasets loaded into the genome browser
Disease associations for when the human ortholog is known to cause disease
Data regarding when specific genes are expressed
Complementation data for where there is functional complementation between a fission yeast gene and a gene from another organism
Subunit composition of complexes

Data Organization

Gene annotation can be viewed either at a gene-specific level (on the gene pages) or at a term-specific level (on the ontology term pages). This makes it possible to either:

View all annotations created for a gene, for example pat1
View all genes annotated a term, for example cytokinesis
View all annotations created from a specific reference, for example 26776736 Chica et al. 2016

Genome-wide datasets (including protein datasets, all annotations, manually curated ortholog lists etc) can be accessed from the datasets page. Datasets suitable for display in a genome browser and that have been loaded can be accessed via the PomBase JBrowse instance.

PomBase uses several biological ontologies to capture gene-specific information, including:

Gene Ontology (GO) - used to describe the enzymatic functions, biological roles and cellular locations of gene products
Fission Yeast Phenotype Ontology (FYPO),^[4] Used to associate phenotypes with alleles of genes, in comparison to the phenotype of the reference strain
Sequence Ontology - used to describe DNA or protein features
Protein modifications - using PSI-MOD^[5]

Gene Characterization Status

The GO slim page provides an overview of the "biological role" of all "known" fission yeast genes - these are proteins that have either been experimentally characterized in fission yeast, or in another species and transferred by orthology.

Remarkably, nearly 20% of eukaryotic proteomes, from yeast to human, are uncharacterized in terms of the pathways and processes that these proteins participate in,^[6] making it one of the great unsolved problems in biology. The role that these proteins play in biology, have not yet been discovered in any species. To aid research into these unknown proteins, PomBase maintains an inventory of uncharacterized fission yeast proteins. The priority unstudied genes list represents the subset of uncharacterized fission yeast genes that are conserved to man, making it an especially high priority research target.

Community co-Curation

To supplement the work of the small team of professional PomBase curators, fission yeast researchers contribute annotations directly to PomBase via an innovative community curation scheme, for which an online curation tool, Canto,^[7] has been developed. Community curation is reviewed by PomBase staff, and this results in highly accurate, effectively co-curated, annotations.^[8]

PomBase maintains an annotation stats page.

Knowledgebase Updates

News updates on the PomBase homepage
Posts to the research community mailing list
NAR ([Nucleic Acids Research]) database updates
Tweets (@PomBase)
Facebook group
Linkedin group

Documentation

Pombase provides both documentation and an FAQ.

Usage of PomBase as a research tool is explored in the "Eukaryotic Genomic Databases" (Methods and Protocols) book chapter.^[9] Developments and updates are described in the NAR Database Issue papers.^[10]^[11]^[2] For a detailed overview of using S. pombe as a model organism see the genetics primer ^[12]

Related Research Articles

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

Schizosaccharomyces pombe, also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measure 3 to 4 micrometres in diameter and 7 to 14 micrometres in length. Its genome, which is approximately 14.1 million base pairs, is estimated to contain 4,970 protein-coding genes and at least 450 non-coding RNAs.

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">PHI-base</span>

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

dcGO is a comprehensive ontology database for protein domains. As an ontology resource, dcGO integrates Open Biomedical Ontologies from a variety of contexts, ranging from functional information like Gene Ontology to others on enzymes and pathways, from phenotype information across major model organisms to information about human diseases and drugs. As a protein domain resource, dcGO includes annotations to both the individual domains and supra-domains.

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

esyN is a bioinformatics web-tool for visualizing, building and analysing molecular interaction networks. esyN is based on cytoscape.js and its aim is to make it easy for everybody to perform network analysis. esyN is connected with a number of databases - specifically: pombase, flybase, and most InterMine data warehouses, DrugBank, and BioGRID from which its possible to download the protein protein or genetic interactions for any protein or gene in a number of different organisms.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

↑ Rutherford, Kim M.; Lera-Ramírez, Manuel; Wood, Valerie (2024). "PomBase: A Global Core Biodata Resource—growth, collaboration and sustainability". Genetics. doi: 10.1093/genetics/iyae007 . PMID 38376816.
1 2 Lock, A; Rutherford, K; Harris, MA; Hayles, J; Oliver, SG; Bähler, J; Wood, V (13 October 2018). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC 6324063 . PMID 30321395.
↑ Wood, V; Carbon, S; Harris, MA; Lock, A; Engel, SR; Hill, DP; Van Auken, K; Attrill, H; Feuermann, M; Gaudet, P; Lovering, RC; Poux, S; Rutherford, KM; Mungall, CJ (September 2020). "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns". Open Biology. 10 (9): 200149. doi: 10.1098/rsob.200149 . PMC 7536087 . PMID 32875947.
↑ Harris MA, Lock A, Bähler J, Oliver SG, Wood V (July 2013). "FYPO: The Fission Yeast Phenotype Ontology". Bioinformatics . 29 (13): 1671–8. doi:10.1093/bioinformatics/btt266. PMC 3694669 . PMID 23658422.
↑ Montecchi-Palazzi, L; Beavis, R; Binz, PA; Chalkley, RJ; Cottrell, J; Creasy, D; Shofstahl, J; Seymour, SL; Garavelli, JS (August 2008). "The PSI-MOD community standard for representation of protein modification data". Nature Biotechnology. 26 (8): 864–6. doi:10.1038/nbt0808-864. PMID 18688235. S2CID 205270043.
↑ Wood, V; Lock, A; Harris, MA; Rutherford, K; Bähler, J; Oliver, SG (28 February 2019). "Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?". Open Biology. 9 (2): 180241. doi:10.1098/rsob.180241. PMC 6395881 . PMID 30938578.
↑ Rutherford KM, Harris MA, Lock A, Oliver SG, Wood V (June 2014). "Canto: an online tool for community literature curation". Bioinformatics . 30 (12): 1791–2. doi:10.1093/bioinformatics/btu103. PMC 4058955 . PMID 24574118.
↑ Lock, A; Harris, MA; Rutherford, K; Hayles, J; Wood, V (1 January 2020). "Community curation in PomBase: enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications". Database: The Journal of Biological Databases and Curation. 2020. doi:10.1093/database/baaa028. PMC 7192550 . PMID 32353878.
↑ Lock, A; Rutherford, K; Harris, MA; Wood, V (2018). "PomBase: The Scientific Resource for Fission Yeast". Eukaryotic Genomic Databases. Methods in Molecular Biology. Vol. 1757. pp. 49–68. doi:10.1007/978-1-4939-7737-6_4. ISBN 978-1-4939-7736-9. PMC 6440643 . PMID 29761456.
↑ Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, Staines DM, Aslett M, Lock A, Bähler J, Kersey PJ, Oliver SG (January 2012). "PomBase: a comprehensive online resource for fission yeast". Nucleic Acids Res. 40 (Database issue): D695–9. doi:10.1093/nar/gkr853. PMC 3245111 . PMID 22039153.
↑ McDowall MD, Harris MA, Lock A, Rutherford K, Staines DM, Bähler J, Kersey PJ, Oliver SG, Wood V (January 2015). "PomBase 2015: updates to the fission yeast database". Nucleic Acids Res. 43 (Database issue): D656–61. doi:10.1093/nar/gku1040. PMC 4383888 . PMID 25361970.
↑ Hoffman CS, Wood V, Fantes PA (October 2015). "An Ancient Yeast for Young Geneticists: A Primer on the Schizosaccharomyces pombe Model System". Genetics . 201 (2): 403–23. doi:10.1534/genetics.115.181503. PMC 4596657 . PMID 26447128.

External links

PomBase

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.