Saccharomyces Genome Database

Last updated
Saccharomyces Genome Database (SGD)
Developer(s) J Michael Cherry, Gail Binkley, Stacia Engel, Rob Nash, Stuart Miyasato, Edith Wong, Shuai Weng
Operating system Unix, Mac, MS-Windows
Type Bioinformatics tool, Model Organism Database
Licence Free
Website http://www.yeastgenome.org

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae , which is commonly known as baker's or budding yeast. [1] Further information is located at the Yeastract curated repository. [2]

Contents

Saccharomyces Genome Database

The SGD provides Internet access to the complete Saccharomyces cerevisiae genomic DNA sequence, its genes and their products, the phenotypes of its mutants, and the literature supporting these data. In the peer-reviewed literature report, experiment result on function and interaction of yeast genes are extracted by high-quality manual curation and integrated within a well-developed database. The data are combined with quality high-throughput results and post on Locus Summary pages which is a powerful query engine and rich genome browser. Based on the complexity of information collection, multiple bioinformatic tools are used to integrate information and allow productive discovery of new biological details. [3] The gold standard for functional description of budding yeast is provided by SGD resource. The SGD resource also provide a platform from which to investigate related genes and pathways in higher organisms. The amount of information and the number of features provided by SGD have increased greatly following the release of the S. cerevisiae genomic sequence. SGD aids researchers by providing not only basic information, but also tools such as sequence similarity searching that lead to detailed information about features of the genome and relationships between genes. SGD presents information using a variety of user-friendly, dynamically created graphical displays illustrating physical, genetic and sequence feature maps. All of the data in SGD are freely accessible to researchers and educators worldwide via web pages designed for optimal ease of use. [3]

Information collection

Biocurator includes review of the published literature or sets of data, leading to the identification and abstraction of key results. The result then incorporated into database and use controlled vocabularies to associated with appropriate genes or chromosomal regions. As more data being recorded, biocuration is becoming more important for biomedical research.

SGD keep reference genome sequence for the budding yeast S.cerevisiae. SGD are the source of the genome sequence for S. cerevisiae S288C strain background, includes catalog of genes and chromosomal feature of genome.

One of important function of SGD is biocuration of the yeast literature. SGD biocurators search all the scientific literature that relevant to S. cerevisiae, read the papers and capture their major finding in various defined fields of the database. [3]

The biocurators at SGD aim to annotate each gene by identifying function(s) from primary literature and linking to terms using the structured knowledge representation in the gene ontology. [4] Additionally, functions identified from high throughput experiments as well as computationally predicted function annotations are included from GO Annotation project. [5]

Biochemical pathways are manually curated by SGD and provided using the Pathway Tools browser version 15.0 (13). The SGD biochemical pathways data set for S. cerevisiae, one of the most highly curated data sets among all Pathway Tools data sets available, is the gold standard for budding yeast; SGD supports an ongoing effort to update and enhance these data. The Pathway Tools interface provides a complete description of each pathway, with molecular structures, E.C. numbers and full reference listing. The updated pathways browser provides several enhanced features, including download of a list of genes found in a pathway for further analysis with other tools available at SGD. The pathway browser is hyperlinked via the ‘Pathways’ section of the Locus Summary page. The Pathway display is available from http://pathway.yeastgenome.org. [3]

Nomenclature

SGD continues to maintain the S. cerevisiae genomic nomenclature. The job is to promote the community-defined nomenclature standards and to ensure that the agreed-upon guidelines are followed in naming new genes or assigning new names to previously identified genes. Community guidelines state that the first published name for a gene becomes the standard name. However, prior to publication, a gene name may be registered and displayed in SGD in order to notify the community of its intended use. If there are disagreements or naming conflicts, we communicate with the relevant researchers within the community and negotiate an agreement whenever possible. The majority of those working on the gene in question must agree to any nomenclature change before it is implemented in SGD. In addition to maintaining genetic names, SGD ensures that the names of ORFs, ARS elements, tRNAs and other chromosomal features also conform to agreed-upon formats. Over the past two years 154 new gene names have been assigned and 21 community-initiated name changes have been processed. [3]

Analysis methods

There are several different analysis tools provided by SGD.

SGD analysis methods Database analysis.png
SGD analysis methods

BLAST, Basic Local Alignment Search Tool, the program is designed to find similar regions between biological sequences. SGD allows users to run BLAST searches of S. cerevisiae sequence datasets.

Fungal BLAST allows searches between multiple fungal sequences

Gene Ontology (GO) Term Finder searches for significant shared GO terms or their parents, and is used to describe the genes queried to help users discover what the gene have in common.

GO Slim Mapper maps annotations of a group of genes to more general terms and/or bins them into broad categories.

Pattern Matching is a resource that allows users to search for short nucleotide or peptide sequences of less than 20 residues, or ambiguous/degenerate patterns.

Restriction Analysis allows users to perform a restriction analysis by entering a sequence name or arbitrary DNA sequence [6]

Related Research Articles

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

<span class="mw-page-title-main">Generic Model Organism Database</span>

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

The BioCyc database collection is an assortment of organism specific Pathway/Genome Databases (PGDBs) that provide reference to genome and metabolic pathway information for thousands of organisms. As of July 2023, there were over 20,040 databases within BioCyc. SRI International, based in Menlo Park, California, maintains the BioCyc database family.

<span class="mw-page-title-main">Integrated Microbial Genomes System</span> Genome browsing and annotation platform

The Integrated Microbial Genomes system is a genome browsing and annotation platform developed by the U.S. Department of Energy (DOE)-Joint Genome Institute. IMG contains all the draft and complete microbial genomes sequenced by the DOE-JGI integrated with other publicly available genomes. IMG provides users a set of tools for comparative analysis of microbial genomes along three dimensions: genes, genomes and functions. Users can select and transfer them in the comparative analysis carts based upon a variety of criteria. IMG also includes a genome annotation pipeline that integrates information from several tools, including KEGG, Pfam, InterPro, and the Gene Ontology, among others. Users can also type or upload their own gene annotations and the IMG system will allow them to generate Genbank or EMBL format files containing these annotations.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

YEASTRACT is a curated repository of more than 48000 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae, based on more than 1200 bibliographic references. It also includes the description of about 300 specific DNA binding sites for more than a hundred characterized TFs. Further information about each Yeast gene has been extracted from the Saccharomyces Genome Database (SGD). For each gene the associated Gene Ontology (GO) terms and their hierarchy in GO was obtained from the GO consortium. Currently, YEASTRACT maintains more than 7100 terms from GO. The nucleotide sequences of the promoter and coding regions for Yeast genes were obtained from Regulatory Sequence Analysis Tools (RSAT). All the information in YEASTRACT is updated regularly to match the latest data from SGD, GO consortium, RSA Tools and recent literature on yeast regulatory networks.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

<span class="mw-page-title-main">Comprehensive Antibiotic Resistance Database</span> Biological database

The Comprehensive Antibiotic Resistance Database (CARD) is a biological database that collects and organizes reference information on antimicrobial resistance genes, proteins and phenotypes. The database covers all types of drug classes and resistance mechanisms and structures its data based on an ontology. The CARD database was one of the first resources that covered antimicrobial resistance genes. The resource is updated monthly and provides tools to allow users to find potential antibiotic resistance genes in newly-sequenced genomes.

<span class="mw-page-title-main">Canto (gene curation tool)</span>

Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Cherry JM; Ball C; Weng S; Juvik G; Schmidt R; Adler C; Dunn B; Dwight S; Riles L; Mortimer RK; Botstein D (May 1997). "Genetic and physical maps of Saccharomyces cerevisiae". Nature . 387 (6632 Suppl): 67–73. doi:10.1038/387s067. PMC   3057085 . PMID   9169866.
  2. Teixeira, M. C.; Monteiro, P; Jain, P; Tenreiro, S; Fernandes, AR; Mira, NP; Alenquer, M; Freitas, AT; Oliveira, AL; Sá-Correia, I (Jan 2006). "The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae". Nucleic Acids Res. England. 34 (Database issue): D446–51. doi:10.1093/nar/gkj013. PMC   1347376 . PMID   16381908.
  3. 1 2 3 4 5 Cherry, Michael; Hong, Eurie; amundsen, Craig; balakrishnan, rama; binkley, gail; chan, esther; christie, karen; costanzo, maria; dwight, selina; engel, stacia; fisk, dianna; hirschman, jodi; hitz, benjamin; karra, kalpana; krieger, cynthia; miyasato, stuart; nash, rob; park, julie; skrzypek, marek; simison, matt; weng, shuai; wong, edith (2011). "Saccharomyces Genome Database: the genomics resource of budding yeast". Nucleic Acids Research. 40 (2012): D700–D705. doi:10.1093/nar/gkr1029. PMC   3245034 . PMID   22110037.
  4. Dwight SS, Harris MA, Dolinski K, et al. (January 2002). "Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)". Nucleic Acids Res. 30 (1): 69–72. doi:10.1093/nar/30.1.69. PMC   99086 . PMID   11752257.
  5. Hong EL, Balakrishnan R, Dong Q, et al. (January 2008). "Gene Ontology annotations at SGD: new data sources and annotation methods". Nucleic Acids Res. 36 (Database issue): D577–81. doi:10.1093/nar/gkm909. PMC   2238894 . PMID   17982175.
  6. "Saccharomyces Genome Database". Saccharomyces Genome Database. Stanford University. Retrieved 26 April 2018.