MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. [1] [2] MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
The main aim of MicrobesOnline is to provide an easy-to-use resource that integrates a wealth of data from multiple sources. This integrated platform facilitates studies in comparative genomics, metabolic pathway analysis, genome composition, functional genomics as well as in protein domain and family data. It also provides tools to search or browse the database with genes, species, sequences, orthologous groups, gene ontology (GO) terms or pathway keywords, etc. Another one of its main features is the Gene Cart, which allows users to keep a record of their genes of interest. One of the highlights of the database is the overall navigation accessibility and interconnection between the tools.
The development of high-throughput methods for genome sequencing has brought about a wealth of data that requires sophisticated bioinformatics tools for their analysis and interpretation. [3] Nowadays, numerous tools exist to study genomics sequence data and extract information from different perspectives. However, the lack of unification of nomenclature and standardised protocols between tools, makes direct comparison between their results very difficult. [4] Additionally, the user is forced to constantly switch from various websites or software, adjusting the format of their data to fit with individual requirements. MicrobesOnline was developed with the aim to integrate the capacities of different tools into a unified platform for easy comparison between analysis results, with a focus on prokaryote species and basal eukaryotes.
MicrobesOnline hosts genomic, gene expression and fitness data for a wide range of microbial species. Genomic data is available for 1752 bacteria, 94 archaea and 119 eukaryotes, for a total of 3707 genomes, 2842 of which are marked as being complete. Gene expression data is available for 113 species, and fitness data is available for 4 organisms. [5]
MicrobesOnline provides diverse tools for searching, analysing and integrating information related to bacteria genomes for applications in four major areas: genetic information, functional genomics, comparative genomics and metabolic pathway studies. [6] The homepage of MicrobesOnline is the portal for accessing its functions, which includes six main sections: the top navigation elements, a genome selector, examples of the tutorial based on E.coli K-12, a link to the Genome-Linked Application for Metabolic Maps (GLAMM), website highlights and the “about MicrobesOnline” list. As an ongoing project, the authors of MicrobesOnline claim that the tools for data analysis and the support of more data types will be expanded. [7]
Information of microbial genes stored in MicrobesOnline includes sequences (genes, transcripts and proteins), genomic loci, gene annotations and some statistics of sequences. This information can be accessed through three features displayed on the homepage of MicrobesOnline: sequence search and advanced search in the top navigation section, and the genome selector. For the sequence search tool, MicrobesOnline integrates BLAT, FastHMM and FastBLAST [8] to search protein sequences, and uses MEGABLAST to search nucleotide sequences. [9] It also provides a link to BLAST as an alternative way for searching sequences. On the other hand, the advanced search tool enables a user to search genetic information by categories, custom query, wild-card search and field-specific search, which uses the gene name, the description, the cluster of orthologous groups (COGs) id, the GO term, the KEGG enzyme commission (EC) number, etc. as key words.
The “genomes selected” box of the genome selector lists genomes added from the favourite genome list on the left or the ones searched by keywords. On the right side of the genome selector, four actions can be applied after selecting genomes: the “find genes” interface searches the gene name in the selected genomes and displays results in the gene list view; the “info” button lists a brief summary of selected genomes in the Summary View; the “GO” button opens a GO Browser called VertiGo which tabulates the number of genes under different GO items; finally, the “pathway” button initiates a pathway browser that illustrates the complete pathways of all organisms in the MicrobesOnline database.
In addition, the genome names shown in the summary view leads to a single-genome data view that presents a wealth of information about the selected genome. In the gene list view, the links “G O D H S T B...” lead the user to a locus information tool, where detailed information such as operon & regulon, domains & families, sequences, annotations, etc. are shown.
An important feature to store a user's work is the Gene Cart. Many web pages of MicrobesOnline displaying genetic information contain a link to add genes of interest to the session gene cart, which is available for all users. This is a temporary gene cart, and as such it loses information as a user closes the web browser. Genes in the session gene cart can be saved to the permanent gene cart which is only available to registered users after logging in.
One goal of setting up MicrobesOnline is to store functional information of microbial genomes. Such information includes gene ontology and microarray-based gene expression profiles, which can be accessed through two interfaces called GO browser and Expression Data Viewer respectively. The GO browser provides links to genes organised by gene ontology terms and the Expression Data Viewer provides both the access to expression profiles and information of experimental conditions.
The GO Browser, also known as VertiGo, is used by MicrobesOnline to search and visualise the GO hierarchy, which is a unified verbal system that describes properties of gene products, including cellular components, molecular function and biological process. The Genome Selector of the MicrobesOnline homepage provides a direct way to browse the GO hierarchy of the selected genomes, as well as provide a list of genes under a selected GO term, which can then be added to the session gene cart for further analysis.
The Expression Data Viewer is an interface for searching and inspecting microarray-base gene expression experiments and expression profiles. It consists of several components: an experiment browser for searching specific experiments in selected genomes under selected experimental conditions, an expression experiment viewer providing details of each microarray experiment, a gene expression viewer showing a heat map of the expression levels of the selected gene and genes in the same operon, and finally, a profile search tool for searching gene expression profiles. The Expression Data Viewer can be accessed through three ways: the “Browse Functional Data” in the navigator bar, the “Gene Expression Data” in the homepage and the “Gene expression” list in the single-genome data view, where the expression data are available. The single-genome data view can also show a protein-protein interaction browser that allows the inspection of interaction complexes and the download of expression data (e.g. Escherichia coli str. K-12 substr. MG1655). Furthermore, the user can launch a MultiExperiment Viewer (MeV) in the single-genome data view for analysing and visualising expression data.
MicrobesOnline stores information of gene homology and phylogeny for comparative genomic studies, which can be accessed through two interfaces. The first one is the Tree Browser, which draws a species tree or a gene tree for the selected gene and its gene neighbourhood. The second one is the Orthology Browser, which is an extension of the Genome Browser and demonstrates the selected gene within the context of its gene neighbourhood aligned with orthologs in other selected genomes. [10] Both browsers provide options to save a gene in the session gene cart for further analysis.
The tree browser can be accessed by searching a gene by the Find Genes tool on the homepage with its VIMSS id (e.g. VIMSS15779). Once the gene context view has been accessed through the “Browse genomes by trees” option, a gene tree and a gene context diagram are displayed. In addition, the “View species tree” option opens a species tree view, which shows a species tree alongside the gene tree. Additionally, the tree browser enables users to choose both genes and genomes according to their similarity. Furthermore, it also demonstrates horizontal gene transfers among genomes.
The Orthology Browser displays orthologs of genomes compared to the query genome by choosing multiple genomes from the “Select Organism(s) to Display” box.
The locus information can be viewed through the “view genes” option, and this gene can be added to the session gene cart, or its gene expression data (including the heatmap) can be downloaded. Alternatively, a gene context view appears when browsing genomes by trees.
The Pathway Browser lets users to navigate the Kyoto Encyclopedia of Genes and Genomes (KEGG) [11] pathway maps displaying predicted presence or absence of enzymes for up to two selected genomes. The map of a particular pathway and a comparison between two kinds of microbes can be shown in the pathway browser. The enzyme commission number (e.g. 3.1.3.25) provides a link to the gene list view that shows information of the selected enzyme and allows the user to add genes to the session gene cart.
The GLAMM is another tool for searching and visualising metabolic pathways in a unified web interface. It helps users to identify or construct novel, transgenic pathways. [12]
MicrobesOnline has integrated numerous tools for analysing sequences, gene expression profiles and protein-protein interactions into an interface called Bioinformatics Workbench, which is accessed via gene carts. Analyses currently supported include multiple sequence alignments, construction of phylogenetic trees, motif searches and scans, summaries of gene expression profiles and protein-protein interactions. In order to save computational resources, a user is allowed to run two concurrent jobs for at most four hours and all results are saved temporarily until the session is terminated. [13] Results can be shared with other users or groups via the resource access control tool.
MicrobesOnline is built on the integration of the data of an array of databases that manage different aspects of its capabilities. A comprehensive list is as follows: [14]
MicrobesOnline was updated every 3 to 9 months from 2007 to 2011, where new features as well as new species data were added. However, there have been no new release notes since March 2011. [39]
MicrobesOnline is compatible with other similar platforms of integrated microbe data, such as IMG and RegTransBase, given that standard identifiers of genes are maintained throughout the database. [40]
There have been other efforts to create a unified platform for prokaryote analysis tools, however, most of them focus on one set of analysis types. A few examples of these focused databases include those with an emphasis on metabolic data analysis (Microme [41] ), comparative genomics (MBGD [42] and the OMA Browser [43] ), regulons and transcription factors (RegPrecise [44] ), comparative functional genomics (Pathline [45] ), among many others. However, notable efforts have been made by other teams to create comprehensive platforms that largely overlap with the capabilities of MicrobesOnline. MicroScope [46] and the Integrated Microbial Genomes System [47] [48] (IMG) are examples of popular and recently updated databases (As of September 2014 [update] ).
metaMicrobesOnline [49] was compiled by the same developers as MicrobesOnline, and constitutes an extension of MicrobesOnline capacities, by focusing on the phylogenetic analysis of metagenomes. With a similar web interface to MicrobesOnline, the user is capable of toggling between sites via the “switch to” link on the homepage.
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.
The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.
The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.
The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.
The Integrated Microbial Genomes system is a genome browsing and annotation platform developed by the U.S. Department of Energy (DOE)-Joint Genome Institute. IMG contains all the draft and complete microbial genomes sequenced by the DOE-JGI integrated with other publicly available genomes. IMG provides users a set of tools for comparative analysis of microbial genomes along three dimensions: genes, genomes and functions. Users can select and transfer them in the comparative analysis carts based upon a variety of criteria. IMG also includes a genome annotation pipeline that integrates information from several tools, including KEGG, Pfam, InterPro, and the Gene Ontology, among others. Users can also type or upload their own gene annotations and the IMG system will allow them to generate Genbank or EMBL format files containing these annotations.
In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.
Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
The Multi-Omics Profiling Expression Database (MOPED) was an expanding multi-omics resource that supports rapid browsing of transcriptomics and proteomics information from publicly available studies on model organisms and humans. As to date (2021) it has ceased activities and is unaccessible online.
PhytoPath was a joint scientific project between the European Bioinformatics Institute and Rothamsted Research, running from January 2012 to May 30, 2017. The project aimed to enable the exploitation of the growing body of “-omics” data being generated for phytopathogens, their plant hosts and related model species. Gene mutant phenotypic information is directly displayed in genome browsers.
In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.
BASys is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes. With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.
BacMap is a freely available web-accessible database containing fully annotated, fully zoomable and fully searchable chromosome maps from more than 2500 prokaryotic species. BacMap was originally developed in 2005 to address the challenges of viewing and navigating through the growing numbers of bacterial genomes that were being generated through large-scale sequencing efforts. Since it was first introduced, the number of bacterial genomes in BacMap has grown by more than 15X. Essentially BacMap functions as an on-line visual atlas of microbial genomes. All of the genome annotations in BacMap were generated through the BASys genome annotation system. BASys is a widely used microbial annotation infrastructure that performs comprehensive bioniformatic analyses on raw bacterial genome sequence data. All of the genome (chromosome) maps in BacMap were constructed using the program known as CGView. CGView is a popular visualization program for generating interactive, web-compatible circular chromosome maps. Each chromosome map in BacMap is extensively hyperlinked and each chromosome image can be interactively navigated, expanded and rotated using navigation buttons or hyperlinks. All identified genes in a BacMap chromosome map are colored according to coding directions and when sufficiently zoomed-in, gene labels are visible. Each gene label on a BacMap genome map is also hyperlinked to a 'gene card'. The gene cards provide detailed information about the corresponding DNA and protein sequences. Each genome map in BacMap is searchable via BLAST and a gene name/synonym search.
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
In molecular biology, MvirDB is a publicly available database that stores information on toxins, virulence factors and antibiotic resistance genes. Sources that this database uses for DNA and protein information include: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provides a BLAST tool that allows the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors can be obtained from the usage of the provided browser tool. Once the browser tool is used, the results are returned as a readable table that is organized by ascending E-Values, each of which are hyperlinked to their related page. MvirDB is implemented in an Oracle 10g relational database.
Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.
{{cite encyclopedia}}
: CS1 maint: location (link) CS1 maint: location missing publisher (link){{cite encyclopedia}}
: CS1 maint: location (link) CS1 maint: location missing publisher (link){{cite encyclopedia}}
: CS1 maint: location (link) CS1 maint: location missing publisher (link)