Content | |
---|---|
Description | The scientific resource for Schizosaccharomyces pombe |
Data types captured | Molecular Function, Biological Process, Cellular Component, Phenotype, Genotype, Allele, Protein Modification, Gene Expression, Protein expression, Nucleotide Sequence, RNA sequence, Protein sequence, Genomics, Human Orthologs, Saccharomyces cerevisiae Orthlogs, Complementation, Disease Associations, Protein features, Physical Interactions, Genetic Interactions |
Organisms | Schizosaccharomyces pombe |
Contact | |
Research center | University of Cambridge and University College London |
Authors | Antonia Lock, Midori A Harris, Manuel Lera-Ramírez, Pascal Carme, Kim Rutherford, Juan Mata, Jürg Bähler, Steve Oliver, Valerie Wood |
Primary citation | Rutherford, et al (2024) [1] |
Release date | 2011 |
Access | |
Website | pombase.org |
Download URL | Downloads |
Miscellaneous | |
License | Creative Commons Attribution 4.0 International license, GNU General Public License, MIT License |
Curation policy | Professionally and community curated |
Bookmarkable entities | Yes |
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service (described in [2] ).
PomBase staff manually curate a wide variety of data types using both primary literature and bioinformatics sources, and numerous mechanisms are employed to ensure both syntactical and biological content validity. [3]
Types of data curated include:
Gene annotation can be viewed either at a gene-specific level (on the gene pages) or at a term-specific level (on the ontology term pages). This makes it possible to either:
Genome-wide datasets (including protein datasets, all annotations, manually curated ortholog lists etc) can be accessed from the datasets page. Datasets suitable for display in a genome browser and that have been loaded can be accessed via the PomBase JBrowse instance.
PomBase uses several biological ontologies to capture gene-specific information, including:
The GO slim page provides an overview of the "biological role" of all "known" fission yeast genes - these are proteins that have either been experimentally characterized in fission yeast, or in another species and transferred by orthology.
Remarkably, nearly 20% of eukaryotic proteomes, from yeast to human, are uncharacterized in terms of the pathways and processes that these proteins participate in, [6] making it one of the great unsolved problems in biology. The role that these proteins play in biology, have not yet been discovered in any species. To aid research into these unknown proteins, PomBase maintains an inventory of uncharacterized fission yeast proteins. The priority unstudied genes list represents the subset of uncharacterized fission yeast genes that are conserved to man, making it an especially high priority research target.
To supplement the work of the small team of professional PomBase curators, fission yeast researchers contribute annotations directly to PomBase via an innovative community curation scheme, for which an online curation tool, Canto, [7] has been developed. Community curation is reviewed by PomBase staff, and this results in highly accurate, effectively co-curated, annotations. [8]
PomBase maintains an annotation stats page.
Pombase provides both documentation and an FAQ.
Usage of PomBase as a research tool is explored in the "Eukaryotic Genomic Databases" (Methods and Protocols) book chapter. [9] Developments and updates are described in the NAR Database Issue papers. [10] [11] [2] For a detailed overview of using S. pombe as a model organism see the genetics primer [12]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Schizosaccharomyces pombe, also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measure 3 to 4 micrometres in diameter and 7 to 14 micrometres in length. Its genome, which is approximately 14.1 million base pairs, is estimated to contain 4,970 protein-coding genes and at least 450 non-coding RNAs.
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.
The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.
The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.
The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
dcGO is a comprehensive ontology database for protein domains. As an ontology resource, dcGO integrates Open Biomedical Ontologies from a variety of contexts, ranging from functional information like Gene Ontology to others on enzymes and pathways, from phenotype information across major model organisms to information about human diseases and drugs. As a protein domain resource, dcGO includes annotations to both the individual domains and supra-domains.
Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
esyN is a bioinformatics web-tool for visualizing, building and analysing molecular interaction networks. esyN is based on cytoscape.js and its aim is to make it easy for everybody to perform network analysis. esyN is connected with a number of databases - specifically: pombase, flybase, and most InterMine data warehouses, DrugBank, and BioGRID from which its possible to download the protein protein or genetic interactions for any protein or gene in a number of different organisms.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.