MEGAN

Last updated

MEGAN
Developer(s) Daniel Huson et al.
Stable release
6.13.1 / 2017
Repository github.com/husonlab/megan-ce
Written in Java
Operating system Windows, Unix, Linux, macOS
Platform Java
Type Bioinformatics
License Free open source "community edition", commercial "Ultimate edition" licensed by Computomics
Website uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/megan6/

MEGAN ("MEtaGenome ANalyzer") is a computer program that allows optimized analysis of large metagenomic datasets. [1] [2]

Contents

Metagenomics is the analysis of the genomic sequences from a usually uncultured environmental sample. A large term goal of most metagenomics is to inventory and measure the extent and the role of microbial biodiversity in the ecosystem due to discoveries that the diversity of microbial organisms and viral agents in the environment is far greater than previously estimated. [3] Tools that allow the investigation of very large data sets from environmental samples using shotgun sequencing techniques in particular, such as MEGAN, are designed to sample and investigate the unknown biodiversity of environmental samples where more precise techniques with smaller, better known samples, cannot be used.

Fragments of DNA from an metagenomics sample, such as ocean waters or soil, are compared against databases of known DNA sequences using BLAST or another sequence comparison tool to assemble the segments into discrete comparable sequences. MEGAN is then used to compare the resulting sequences with gene sequences from GenBank in NCBI. [4] The program was used to investigate the DNA of a mammoth recovered from the Siberian permafrost [5] and Sargasso Sea data set. [6]

Introduction

Metagenomics is the study of genomic content of samples from same habitat, which is designed to determine the role and the extent of species diversity. Targeted or random sequencing are widely used with comparisons against sequence databases. [1] Recent developments in sequencing technology increased the number of metagenomics samples. MEGAN is an easy to use tool for analysing such metagenomics data. First version of MEGAN was released in 2007 [1] and the most recent version is MEGAN6. [7] First version is capable of analysing taxonomic content of a single dataset while the latest version can analyse multiple datasets including new features (query different databases, new algorithm etc.).

MEGAN Pipeline

MEGAN analysis starts with collecting reads from any shotgun platform. Then, the reads are compared with sequence databases using BLAST or similar. Third, MEGAN assigns a taxon ID to processed read results based on NCBI taxonomy which creates a MEGAN file that contains required information for statistical and graphical analysis. Lastly, lowest common ancestor (LCA) algorithm can be run to inspect assignments, to analyze data and to create summaries of data based on different NCBI taxonomy levels. LCA algorithm simply finds the lowest common ancestor of different species. [1] [2]

Related Research Articles

In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

Global Ocean Sampling Expedition Ocean exploration genome project to assess genetic diversity in marine microbial communities

The Global Ocean Sampling Expedition (GOS) is an ocean exploration genome project whose goal is to assess genetic diversity in marine microbial communities and to understand their role in nature's fundamental processes. It was begun as a Sargasso Sea pilot sampling project in August 2003; Craig Venter announced the full expedition on 4 March 2004. The two-year journey, which used Craig Venter's personal yacht, originated in Halifax, Canada, circumnavigated the globe and terminated in the U.S. in January 2006. The expedition sampled water from Halifax, Nova Scotia to the Eastern Tropical Pacific Ocean. During 2007, sampling continued along the west coast of North America.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

<span class="mw-page-title-main">Earth Microbiome Project</span>

The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.

Biological dark matter is an informal term for unclassified or poorly understood genetic material. This genetic material may refer to genetic material produced by unclassified microorganisms. By extension, biological dark matter may also refer to the un-isolated microorganism whose existence can only be inferred from the genetic material that they produce. Some of the genetic material may not fall under the three existing domains of life: Bacteria, Archaea and Eukaryota; thus, it has been suggested that a possible fourth domain of life may yet be discovered, although other explanations are also probable. Alternatively, the genetic material may refer to non-coding DNA and non-coding RNA produced by known organisms.

In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

AMPHORA is an open-source bioinformatics workflow. AMPHORA2 uses 31 bacterial and 104 archaeal phylogenetic marker genes for inferring phylogenetic information from metagenomic datasets. Most of the marker genes are single copy genes, therefore AMPHORA2 is suitable for inferring the accurate taxonomic composition of bacterial and archaeal communities from metagenomic shotgun sequencing data.

MG-RAST is an open-source web application server that suggests automatic phylogenetic and functional analysis of metagenomes. It is also one of the biggest repositories for metagenomic data. The name is an abbreviation of Metagenomic Rapid Annotations using Subsystems Technology. The pipeline automatically produces functional assignments to the sequences that belong to the metagenome by performing sequence comparisons to databases in both nucleotide and amino-acid levels. The applications supply phylogenetic and functional assignments of the metagenome being analysed, as well as tools for comparing different metagenomes. It also provides a RESTful API for programmatic access.

<span class="mw-page-title-main">Viral metagenomics</span>

Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.

METAGENassist is a freely available web server for comparative metagenomic analysis. Comparative metagenomic studies involve the large-scale comparison of genomic or taxonomic census data from bacterial samples across different environments. Historically this has required a sound knowledge of statistics, computer programming, genetics and microbiology. As a result, only a small number of researchers are routinely able to perform comparative metagenomic studies. To circumvent these limitations, METAGENassist was developed to allow metagenomic analyses to be performed by non-specialists, easily and intuitively over the web. METAGENassist is particularly notable for its rich graphical output and its extensive database of bacterial phenotypic information.

Mark J. Pallen is a research leader at the Quadram Institute and Professor of Microbial Genomics at the University of East Anglia. In recent years, he has been at the forefront of efforts to apply next-generation sequencing to problems in microbiology and ancient DNA research.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

<span class="mw-page-title-main">Virome</span>

Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or holobiont. The word is frequently used to describe environmental viral shotgun metagenomes. Viruses, including bacteriophages, are found in all environments, and studies of the virome have provided insights into nutrient cycling, development of immunity, and a major source of genes through lysogenic conversion. Also, the human virome has been characterized in nine organs of 31 Finnish individuals using qPCR and NGS methodologies.

<span class="mw-page-title-main">Pharmacomicrobiomics</span>

Pharmacomicrobiomics, proposed by Prof. Marco Candela for the ERC-2009-StG project call, and publicly coined for the first time in 2010 by Rizkallah et al., is defined as the effect of microbiome variations on drug disposition, action, and toxicity. Pharmacomicrobiomics is concerned with the interaction between xenobiotics, or foreign compounds, and the gut microbiome. It is estimated that over 100 trillion prokaryotes representing more than 1000 species reside in the gut. Within the gut, microbes help modulate developmental, immunological and nutrition host functions. The aggregate genome of microbes extends the metabolic capabilities of humans, allowing them to capture nutrients from diverse sources. Namely, through the secretion of enzymes that assist in the metabolism of chemicals foreign to the body, modification of liver and intestinal enzymes, and modulation of the expression of human metabolic genes, microbes can significantly impact the ingestion of xenobiotics.

<span class="mw-page-title-main">Machine learning in bioinformatics</span>

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

References

  1. 1 2 3 4 Huson, H.; A. Auch; Ji Qi; S. C. Schuster (2007). "MEGAN Analysis of Metagenomic Data". Genome Research. 17 (3): 377–386. doi:10.1101/gr.5969107. PMC   1800929 . PMID   17255551 . Retrieved April 3, 2008.
  2. 1 2 Huson, Daniel H; S. Mitra; N. Weber; H. Ruscheweyh; Stephan C. Schuster (2011). "Integrative analysis of environmental sequences using MEGAN4". Genome Research. 21 (9): 1552–1560. doi:10.1101/gr.120618.111. PMC   3166839 . PMID   21690186.
  3. Nee, S. (2004). "More than meets the eye". Nature. 429 (6994): 804–805. Bibcode:2004Natur.429..804N. doi:10.1038/429804a. PMID   15215837. S2CID   1699973.
  4. Frias-Lopez, Jorge; Yanmei Shi; Gene W. Tyson; Maureen L. Coleman; Stephan C. Schuster; Sallie W. Chisholm; band Edward F. DeLong (March 11, 2008). "Microbial community gene expression in ocean surface waters" (PDF). PNAS. 105 (10): 3805–3810. doi: 10.1073/pnas.0708897105 . PMC   2268829 . PMID   18316740 . Retrieved April 3, 2008.
  5. Poinar, Hendrik N.; Carsten Schwarz; Ji Qi; Beth Shapiro; Ross D. E. MacPhee; Bernard Buigues; Alexei Tikhonov; Daniel Huson; Lynn P. Tomsho; Alexander Auch; Markus Rampp; Webb Miller; Stephan C. Schuster (2007). "Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA". Science. 331 (6016): 392–394. doi:10.1126/science.331.6016.392. PMID   21273464 . Retrieved April 3, 2008.
  6. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO (April 2004). "Environmental Genome Shotgun Sequencing of the Sargasso Sea". Science. 304 (5667): 66–74. Bibcode:2004Sci...304...66V. CiteSeerX   10.1.1.124.1840 . doi:10.1126/science.1093857. PMID   15001713. S2CID   1454587.
  7. "MEGAN6 — Algorithms in Bioinformatics". uni-tuebingen.de. Retrieved December 21, 2020.Huson, Daniel H; S. Beier; I. Flade; A. Gorska; M. El-Hadidi; H. Ruscheweyh; R. Tappu (2016). "MEGAN Community Edition - Interactive exploration and analysis of large-scale microbiome sequencing data". PLOS Computational Biology. 12 (6): e1004957. Bibcode:2016PLSCB..12E4957H. doi: 10.1371/journal.pcbi.1004957 . PMC   4915700 . PMID   27327495.