Last updated
DescriptionFor comparative metagenomic studies
Research center University of Alberta
Laboratory David S. Wishart
Primary citation [1]
Release date2012
Data format Data input: Taxonomic profile and sample-specific metadata information. Data output: Statistical analysis results and plots/graphs.
Data release
Last updated on 2014
Curation policyManually curated

METAGENassist is a freely available web server for comparative metagenomic analysis. [1] Comparative metagenomic studies involve the large-scale comparison of genomic or taxonomic census data from bacterial samples across different environments. Historically this has required a sound knowledge of statistics, computer programming, genetics and microbiology. As a result, only a small number of researchers are routinely able to perform comparative metagenomic studies. To circumvent these limitations, METAGENassist was developed to allow metagenomic analyses to be performed by non-specialists, easily and intuitively over the web. METAGENassist is particularly notable for its rich graphical output and its extensive database of bacterial phenotypic information.



METAGENassist is designed to support a wide range of statistical comparisons across metagenomic samples. METAGENassist accepts a wide range of bacterial census data or taxonomic profile data derived from 16S rRNA data, classical DNA sequencing, NextGen shotgun sequencing or even classical microbial culturing techniques. These taxonomic profile data can be in different formats including standard comma-separated value (CSV) formats or in program-specific formats generated by tools such as mothur [2] and QIIME. [3] Once the data are uploaded to the website, METAGENassist offers users a large selection of data pre-processing and data quality checking tools such as: 1) taxonomic name normalization; 2) taxonomic-to-phenotypic mapping; 3) data integrity/quality checks and 4) data normalization. METAGENassist also supports an extensive collection of classical univariate and multivariate analyses, such as fold-change analysis, t-tests, one-way ANOVA, partial least-squares discriminant analysis (PLS-DA) and principal component analysis (PCA). Each of these analyses generates colorful, informative graphs and tables in PNG or PDF formats. All of the processed data and images are also available for download. These data analysis and visualization tools can be used to visualize key features that distinguish or characterize microbial populations in different environments or in different conditions. METAGENassist distinguishes itself from most other metagenomics data analysis tools through its extensive use of automated taxonomic-to-phenotypic mapping and its ability to support sophisticated data analyses with the resulting phenotypic data. METAGENassist’s phenotype database covers more than 11,000 microbial species annotated with 20 different phenotypic categories, including oxygen requirements, energy source(s), metabolism, and GC content. This gives users substantially more features with which to compare and analyze different samples. The phenotype database is regularly updated with information retrieved from several resources including BacMap, [4] GOLD, [5] and other NCBI taxonomy resources. [6]

See also

Related Research Articles

<span class="mw-page-title-main">Metabolome</span> Complete set of small molecules in a biological sample

The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The small molecule chemicals found in a given metabolome may include both endogenous metabolites that are naturally produced by an organism as well as exogenous chemicals that are not naturally produced by an organism.

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

<span class="mw-page-title-main">PHI-base</span> Biological database

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

<span class="mw-page-title-main">Integrated Microbial Genomes System</span> Genome browsing and annotation platform

The Integrated Microbial Genomes system is a genome browsing and annotation platform developed by the U.S. Department of Energy (DOE)-Joint Genome Institute. IMG contains all the draft and complete microbial genomes sequenced by the DOE-JGI integrated with other publicly available genomes. IMG provides users a set of tools for comparative analysis of microbial genomes along three dimensions: genes, genomes and functions. Users can select and transfer them in the comparative analysis carts based upon a variety of criteria. IMG also includes a genome annotation pipeline that integrates information from several tools, including KEGG, Pfam, InterPro, and the Gene Ontology, among others. Users can also type or upload their own gene annotations and the IMG system will allow them to generate Genbank or EMBL format files containing these annotations.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

MEGAN is a computer program that allows optimized analysis of large metagenomic datasets.

<span class="mw-page-title-main">Microbiota</span> Community of microorganisms

Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.

MetaboAnalyst is a set of online tools for metabolomic data analysis and interpretation, created by members of the Wishart Research Group at the University of Alberta. It was first released in May 2009 and version 2.0 was released in January 2012. MetaboAnalyst provides a variety of analysis methods that have been tailored for metabolomic data. These methods include metabolomic data processing, normalization, multivariate statistical analysis, and data annotation. The current version is focused on biomarker discovery and classification.

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

<span class="mw-page-title-main">Viral metagenomics</span>

Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.

Metabolite Set Enrichment Analysis (MSEA) is a method designed to help metabolomics researchers identify and interpret patterns of metabolite concentration changes in a biologically meaningful way. It is conceptually similar to another widely used tool developed for transcriptomics called Gene Set Enrichment Analysis or GSEA. GSEA uses a collection of predefined gene sets to rank the lists of genes obtained from gene chip studies. By using this “prior knowledge” about gene sets researchers are able to readily identify significant and coordinated changes in gene expression data while at the same time gaining some biological context. MSEA does the same thing by using a collection of predefined metabolite pathways and disease states obtained from the Human Metabolome Database. MSEA is offered as a service both through a stand-alone web server and as part of a larger metabolomics analysis suite called MetaboAnalyst.

BASys is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes. With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.

BacMap is a freely available web-accessible database containing fully annotated, fully zoomable and fully searchable chromosome maps from more than 2500 prokaryotic species. BacMap was originally developed in 2005 to address the challenges of viewing and navigating through the growing numbers of bacterial genomes that were being generated through large-scale sequencing efforts. Since it was first introduced, the number of bacterial genomes in BacMap has grown by more than 15X. Essentially BacMap functions as an on-line visual atlas of microbial genomes. All of the genome annotations in BacMap were generated through the BASys genome annotation system. BASys is a widely used microbial annotation infrastructure that performs comprehensive bioniformatic analyses on raw bacterial genome sequence data. All of the genome (chromosome) maps in BacMap were constructed using the program known as CGView. CGView is a popular visualization program for generating interactive, web-compatible circular chromosome maps. Each chromosome map in BacMap is extensively hyperlinked and each chromosome image can be interactively navigated, expanded and rotated using navigation buttons or hyperlinks. All identified genes in a BacMap chromosome map are colored according to coding directions and when sufficiently zoomed-in, gene labels are visible. Each gene label on a BacMap genome map is also hyperlinked to a 'gene card'. The gene cards provide detailed information about the corresponding DNA and protein sequences. Each genome map in BacMap is searchable via BLAST and a gene name/synonym search.

<span class="mw-page-title-main">BacDive</span> Online database for bacteria

BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity.

<span class="mw-page-title-main">Virome</span>

Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or holobiont. The word is frequently used to describe environmental viral shotgun metagenomes. Viruses, including bacteriophages, are found in all environments, and studies of the virome have provided insights into nutrient cycling, development of immunity, and a major source of genes through lysogenic conversion. Also, the human virome has been characterized in nine organs of 31 Finnish individuals using qPCR and NGS methodologies.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.


  1. 1 2 Arndt D; Xia J; Liu Y; Zhou Y; Guo AC; Cruz JA; Sinelnikov I; Budwill K; Nesbø CL; Wishart DS (July 2012). "METAGENassist: a comprehensive web server for comparative metagenomics". Nucleic Acids Res. 40 (Web Server issue): W88-95. doi:10.1093/nar/gks497. PMC   3394294 . PMID   22645318.
  2. Schloss, P.D.; Westcott, S.L.; Ryabin, T.; Hall, J.R.; Hartmann, M.; Hollister, E.B.; Lesniewski, R.A.; Oakley, B.B.; Parks, D.H.; Robinson, C.J.; et al. (2009). "Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities". Appl. Environ. Microbiol. 75 (23): 7537–7541. Bibcode:2009ApEnM..75.7537S. doi:10.1128/AEM.01541-09. PMC   2786419 . PMID   19801464.
  3. Caporaso, J.G.; Kuczynski, J.; Stombaugh, J.; Bittinger, K.; Bushman, F.D.; Costello, E.K.; Fierer, N.; Pena, A.G.; Goodrich, J.K.; Gordon, J.I.; et al. (2010). "QIIME allows analysis of high-throughput community sequencing data". Nat. Methods. 7 (5): 335–336. doi:10.1038/nmeth.f.303. PMC   3156573 . PMID   20383131.
  4. Cruz, J.; Liu, Y.; Liang, Y.; Zhou, Y.; Wilson, M.; Dennis, J.J.; Stothard, P.; Van Domselaar, G.; Wishart, D.S. (2012). "BacMap: an up-to-date electronic atlas of annotated bacterial genomes". Nucleic Acids Res. 40 (Database issue): D599–D604. doi:10.1093/nar/gkr1105. PMC   3245156 . PMID   22135301.
  5. Pagani, I.; Liolios, K.; Jansson, J.; Chen, I.M.; Smirnova, T.; Nosrat, B.; Markowitz, V.M.; Kyrpides, N.C. (2012). "The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata". Nucleic Acids Res. 40 (Database issue): D571–D579. doi:10.1093/nar/gkr1100. PMC   3245063 . PMID   22135293.
  6. Sayers, E.W.; Barrett, T.; Benson, D.A.; Bolton, E.; Bryant, S.H.; Canese, K.; Chetvernin, V.; Church, D.M.; Dicuccio, M.; Federhen, S.; et al. (2012). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 40 (Database issue): D13–D25. doi:10.1093/nar/gkr1184. PMC   3245031 . PMID   22140104.