Canto (gene curation tool)

Last updated
Canto
Canto logo.png
Content
DescriptionCanto is a web-based community curation tool for gene-specific data
Data types
captured
Various
Organisms Various
Contact
Research center University of Cambridge University College London
Primary citation PMID   24574118
Access
Website curation.pombase.org
Miscellaneous
Bookmarkable
entities
yes

Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. [1] Canto was developed as part of the PomBase project, [2] and is funded by the Wellcome Trust.

Contents

Canto enables experts (biocurators and publication authors) to provide detailed, standardized, sharable annotation from research publications and was originally created for the fission yeast community. Canto is a generic tool that can be readily configured for use with other organisms and other databases and now supports pathogen-host interactions for PHI-base (Rothamsted research) and the curation of phenotypes and genetic interactions at FlyBase (University of Cambridge), and all gene-specific datatypes for the emerging model species Schizosaccharomyces japonicus in JaponicusDB.

Curation using ontology terms

Canto supports the use of bio-ontologies (including the Gene Ontology, Protein Ontology, The Fission Yeast Phenotype Ontology FYPO, and the Sequence Ontology to describe attributes of gene products. Complex ontology structures are hidden by an intuitive search, browse, and drill-down workflow. Canto workflow guides the user through the curation process with prompts for required qualifiers and metadata (for example evidence (provenance), annotation extensions, and experimental conditions). Prompts are tailored to different data types, and their individual specific domains and ranges.

Community Curation

Canto has been successful in supporting community curation, and most of the new curation in PomBase is provided by the community of researchers who use the fission yeast Schizosaccharomyces pombe as a model organism. The PomBase team demonstrate that co-curation by publication authors and professional curators provides higher quality curation to maximise the value and impact of scientific research. [3]

Related Research Articles

<i>Schizosaccharomyces pombe</i> Species of yeast

Schizosaccharomyces pombe, also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measure 3 to 4 micrometres in diameter and 7 to 14 micrometres in length. Its genome, which is approximately 14.1 million base pairs, is estimated to contain 4,970 protein-coding genes and at least 450 non-coding RNAs.

<span class="mw-page-title-main">Biological database</span>

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">Generic Model Organism Database</span>

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

<span class="mw-page-title-main">Robert Stevens (scientist)</span>

Robert David Stevens is a professor of bio-health informatics. and former Head of Department of Computer Science at The University of Manchester

The Open Regulatory Annotation Database is designed to promote community-based curation of regulatory information. Specifically, the database contains information about regulatory regions, transcription factor binding sites, regulatory variants, and haplotypes.

<span class="mw-page-title-main">PHI-base</span>

https://canto.phi-base.org/PHI-baseFile:PHI-base+01.jpgContentDescriptionPathogen-Host+Interactions+databaseData+typescapturedphenotypes+of+microbial+mutantsOrganisms~280+fungal,+bacterial+and+protist+pathogens+of+agronomic+and+medical+importance+tested+on+~230+hostsContactResearch+centerRothamsted+ResearchPrimary+citationPMID 31733065Release+dateMay+2005AccessData+formatXML,+FASTAWebsitePHI-baseToolsWebPHI-base+SearchPHIB-BLASTPHI-Canto+(Author+curation)MiscellaneousLicenseCreative+Commons+Attribution-NoDerivatives+4.0+International+LicenseVersioningYesData+releasefrequency6+monthlyVersion4.15+(May+2023)Curation+policyManual+Curation

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

The Comparative Toxicogenomics Database (CTD) is a public website and research tool launched in November 2004 that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations, pathways, and interaction modules. The database is maintained by the Department of Biological Sciences at North Carolina State University.

The Arabidopsis Information Resource (TAIR) is a community resource and online model organism database of genetic and molecular biology data for the model plant Arabidopsis thaliana, commonly known as mouse-ear cress.

DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Rutherford, Kim M.; Harris, Midori A.; Lock, Antonia; Oliver, Stephen G.; Wood, Valerie (2014). "Canto: An online tool for community literature curation". Bioinformatics. 30 (12): 1791–1792. doi:10.1093/bioinformatics/btu103. PMC   4058955 . PMID   24574118.
  2. Lock, Antonia; Rutherford, Kim; Harris, Midori A.; Wood, Valerie (2018). "PomBase: The Scientific Resource for Fission Yeast". Eukaryotic Genomic Databases. Methods in Molecular Biology. Vol. 1757. pp. 49–68. doi:10.1007/978-1-4939-7737-6_4. ISBN   978-1-4939-7736-9. PMC   6440643 . PMID   29761456.
  3. Wood, Valerie; Hayles, Jacqueline; Rutherford, Kim; Harris, Midori A.; Lock, Antonia (2020). "Community curation in PomBase: Enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications". Database. 2020. doi:10.1093/database/baaa028. PMC   7192550 . PMID   32353878.