Earth Microbiome Project

Last updated
Earth Microbiome Project
Formation2010
Website https://earthmicrobiome.org/

The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe. [1]

Contents

Microbes are highly abundant, diverse, and important in ecology. Yet as of 2010, it was estimated that the total global environmental DNA sequencing effort had produced less than 1 percent of the total DNA found in a liter of seawater or a gram of soil, [2] and the specific interactions between microbes are largely unknown.

The EMP is aiming to process at most 200,000 samples in different biomes, creating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction. [3]

Actors

The non-governmental international project was launched in 2010. As of January 2018, it listed 161 institutions, all of them are universities and university-affiliated institutions, except for IBM Research and the Atlanta Zoo. Crowdsourcing has come from the John Templeton Foundation, the W. M. Keck Foundation, the Argonne National Laboratory by the U.S. Dept. of Energy, the Australian Research Council, the Tula Foundation, and the Samuel Lawrence Foundation. Companies have provided in-kind support, including MO BIO Laboratories, Luca Technologies, Eppendorf, Boreal Genomics, Illumina, Roche and Integrated DNA Technologies. [4]

Goals

The primary goal[ by whom? ] of the Earth Microbiome Project (EMP) has been[ when? ] to survey microbial composition in many environments across the planet, across time as well as space, using a standard set of protocols. [1] The development of standardized protocols reduces variation and bias in analytical pipelines that complicates comparison of microbial community structures. [5]

Another important goal is to determine how the reconstruction of microbial communities is affected by analytic biases. The rate of technological advancement is rapid, and it is necessary to understand how data using updated protocols will compare with data collected using earlier techniques. Information from this project will be archived in a database to facilitate analysis. Other outputs will include a global atlas of protein function and a catalog of reassembled genomes classified by their taxonomic distributions. [5]

Methods

Standard protocols for sampling, DNA extraction, 16S rRNA amplification, 18S rRNA amplification, and "shotgun" metagenomics have been developed or are under development. [6]

Sample collection

Samples will be collected using appropriate methods from various environments including the deep ocean, fresh water lakes, desert sand, and soil. Standardized collection protocols will be used when possible, so that the results are comparable. Microbes from natural samples cannot always be cultured. Because of this, metagenomic methods will be employed to sequence all the DNA or RNA in a sample in a culture-independent fashion.

Wet lab

The wet lab usually needs to perform a series of procedures to select and purify the microbial portion of the samples. The purification process may be very different according to the type of sample. DNA will be extracted from soil particles, or microbes will be concentrated using filtration techniques. In addition, various amplification techniques may be used to increase DNA yield. For example, non-PCR based Multiple displacement amplification is preferred by some researchers. DNA extraction, the use of primers, and PCR protocols are all areas that, to avoid bias, need to be performed following carefully standardized protocols. [5]

Sequencing

Researchers can sequence a metagenomic sample using two main approaches depending on the biological question. To identify the types and abundances of organisms present, the preferred approach is to target and amplify a specific gene, often that is highly conserved among the species of interest, often the 16S ribosomal RNA gene for bacteria and the 18S ribosomal RNA gene for protists. This approach is called "deep sequencing", which allows rare species to be identified in a sample. However, this approach will not enable assembly of any whole genomes, nor will it provide information on how organisms may interact with each other. The second approach is shotgun metagenomics, in which all the DNA in the sample is sheared and the fragments sequenced. In principle, this approach allows for the assembly of whole microbial genomes and inference of metabolic relationships. However, if most microbes are uncharacterized in a given environment, de novo assembly will be computationally expensive. [7]

Data analysis

EMP proposes to standardize the bioinformatics aspects of sample processing. [5]

Data analysis usually includes the following steps: 1) Data clean up. A pre-procedure to clean up any reads with low quality scores removing any sequences containing "N" or ambiguous nucleotides and 2) Assigning taxonomy to the sequences which is usually done using tools such as BLAST [8] or RDP. [9] Very often, novel sequences are discovered which cannot be mapped to existing taxonomy. In this case, taxonomy is derived from a phylogenetic tree which is created with the novel sequences and a pool of closely related known sequences. [10]

Additional methods may be employed depending on the sequencing technology and the underlying biological question. For example, an assembly will be required if the sequenced reads are too short to infer any useful information. An assembly can also be used to construct whole genomes, providing useful information on the species. Furthermore, if the metabolic relationships within a microbial metagenome are to be understood, DNA sequences would need to be translated into amino acid sequences, for example with using gene prediction tools such as GeneMark [11] or FragGeneScan. [12]

Project output

The four key outputs from the EMP have been: [13]

Challenges

Large amounts of sequence data generated from analyzing diverse microbial communities are a challenge to store, organize and analyse. The problem is exacerbated by the short reads provided by the high-throughput sequencing platform that will be the standard instrument used in the EMP project. Improved algorithms, analysis tools, huge amounts of computer storage, and access to thousands of hours of supercomputer time will be necessary. [7]

Another challenge is the large number of sequencing errors expected, and distinguishing them from actual diversity in the collected microbial samples. [7] Next-generation sequencing technologies provide enormous throughput but lower accuracies than older sequencing methods. When sequencing a single genome, the intrinsic lower accuracy of these methods is more than compensated for by the ability to cover the entire genome multiple times in opposite directions from multiple start points, but this capability provides no improvement in accuracy when sequencing a diverse mixture of genomes.

Despite the issuance of standard protocols, systematic biases from lab to lab are expected. The need to amplify DNA from samples with low biomass will introduce additional distortions of the data. Assembly of genomes of even the dominant organisms in a diverse sample of organisms requires gigabytes of sequence data. [7]

With the advancement in high-throughput sequencing technologies, many sequences are entering public databases with no experimentally determined function, but which have been annotated on the basis of observed homologies with a known sequence. The first known sequence is used to annotate the first unknown sequence, but a problem that has become prevalent in the public sequence databases, which the EMP must avoid, is that the first unknown sequence is being used to annotate the second unknown sequence and so on. Sequence homology is only a modestly reliable predictor of function. [14]

See also

Notes

  1. 1 2 Gilbert, J.A.; Jansson, J. K.; Knight, R. (2014). "The Earth Microbiome project: successes and aspirations". BMC Biology. 12 (1): 69. doi: 10.1186/s12915-014-0069-1 . PMC   4141107 . PMID   25184604.
  2. Gilbert, J. A.; Meyer, F.; Antonopoulos, D.; Balaji, P.; Brown, C. T.; Brown, C. T.; Desai, N.; Eisen, J. A.; Evers, D.; Field, D.; Feng, W.; Huson, D.; Jansson, J.; Knight, R.; Knight, J.; Kolker, E.; Konstantindis, K.; Kostka, J.; Kyrpides, N.; MacKelprang, R.; McHardy, A.; Quince, C.; Raes, J.; Sczyrba, A.; Shade, A.; Stevens, R. (2010). "Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project". Standards in Genomic Sciences. 3 (3): 243–248. doi:10.4056/sigs.1433550. PMC   3035311 . PMID   21304727.
  3. Gilbert, J. A.; O'Dor, R.; King, N.; Vogel, T. M. (2011). "The importance of metagenomic surveys to microbial ecology: Or why Darwin would have been a metagenomic scientist". Microbial Informatics and Experimentation. 1 (1): 5. doi: 10.1186/2042-5783-1-5 . PMC   3348666 . PMID   22587826.
  4. The Earth Microbiome Project is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. Archived 2020-05-11 at the Wayback Machine Earth Microbiome Project 2018, retrieved 3 January 2018
  5. 1 2 3 4 Gilbert, J.A.; Meyer, F. (2012). "Modeling the Earth Microbiome". Microbe Magazine. 7 (2): 64–69. doi:10.1128/microbe.7.64.1.
  6. "Earth Microbiome Project / Standard Protocols". Archived from the original on 2012-03-16. Retrieved 2012-03-07.
  7. 1 2 3 4 Jansson, Janet (2011). "Towards "Tera-Terra": Terabase Sequencing of Terrestrial Metagenomes". Microbe Magazine. 6 (7): 309–15. doi:10.1128/microbe.6.309.1.
  8. "BLAST: Basic Local Alignment Search Tool". Archived from the original on 2011-08-09. Retrieved 2012-03-05.
  9. "Ribosomal Database Project". Archived from the original on 2020-08-19. Retrieved 2012-03-06.
  10. Meyer, F.; Paarmann, D.; d'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Rodriguez, A.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A. (2008). "The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes". BMC Bioinformatics. 9: 386. doi: 10.1186/1471-2105-9-386 . PMC   2563014 . PMID   18803844.
  11. "GeneMark - Free gene prediction software". Archived from the original on 2013-10-30. Retrieved 2012-03-06.
  12. "FragGeneScan". Archived from the original on 2019-09-17. Retrieved 2012-03-06.
  13. "Earth Microbiome Project / Defining the Tasks". Archived from the original on 2012-03-16. Retrieved 2012-03-07.
  14. Gilbert, J. A.; Dupont, C. L. (2011). "Microbial Metagenomics: Beyond the Genome". Annual Review of Marine Science. 3: 347–371. Bibcode:2011ARMS....3..347G. doi:10.1146/annurev-marine-120709-142811. PMID   21329209.

Related Research Articles

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

<span class="mw-page-title-main">Joint Genome Institute</span> Research facility in California, US

The Joint Genome Institute (JGI) is a scientific user facility for integrative genomic science at Lawrence Berkeley National Laboratory. The mission of the JGI is to advance genomics research in support of the United States Department of Energy's (DOE) missions of energy and the environment. It is one of three national scientific user facilities supported by the Office of Biological and Environmental Research (BER) within the Department of Energy's Office of Research. These BER facilities are part of a more extensive network of 28 national scientific user facilities that operate at the DOE national laboratories.

<span class="mw-page-title-main">Functional cloning</span>

Functional cloning is a molecular cloning technique that relies on prior knowledge of the encoded protein’s sequence or function for gene identification. In this assay, a genomic or cDNA library is screened to identify the genetic sequence of a protein of interest. Expression cDNA libraries may be screened with antibodies specific for the protein of interest or may rely on selection via the protein function. Historically, the amino acid sequence of a protein was used to prepare degenerate oligonucleotides which were then probed against the library to identify the gene encoding the protein of interest. Once candidate clones carrying the gene of interest are identified, they are sequenced and their identity is confirmed. This method of cloning allows researchers to screen entire genomes without prior knowledge of the location of the gene or the genetic sequence.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

<span class="mw-page-title-main">Microbiota</span> Community of microorganisms

Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

Biological dark matter is an informal term for unclassified or poorly understood genetic material. This genetic material may refer to genetic material produced by unclassified microorganisms. By extension, biological dark matter may also refer to the un-isolated microorganism whose existence can only be inferred from the genetic material that they produce. Some of the genetic material may not fall under the three existing domains of life: Bacteria, Archaea and Eukaryota; thus, it has been suggested that a possible fourth domain of life may yet be discovered, although other explanations are also probable. Alternatively, the genetic material may refer to non-coding DNA and non-coding RNA produced by known organisms.

In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

<span class="mw-page-title-main">Viral metagenomics</span>

Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.

<span class="mw-page-title-main">Microbiome</span> Microbial community assemblage and activity

A microbiome is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps et al. as "a characteristic microbial community occupying a reasonably well-defined habitat which has distinct physio-chemical properties. The term thus not only refers to the microorganisms involved but also encompasses their theatre of activity". In 2020, an international panel of experts published the outcome of their discussions on the definition of the microbiome. They proposed a definition of the microbiome based on a revival of the "compact, clear, and comprehensive description of the term" as originally provided by Whipps et al., but supplemented with two explanatory paragraphs. The first explanatory paragraph pronounces the dynamic character of the microbiome, and the second explanatory paragraph clearly separates the term microbiota from the term microbiome.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

<span class="mw-page-title-main">Virome</span>

Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or holobiont. The word is frequently used to describe environmental viral shotgun metagenomes. Viruses, including bacteriophages, are found in all environments, and studies of the virome have provided insights into nutrient cycling, development of immunity, and a major source of genes through lysogenic conversion. Also, the human virome has been characterized in nine organs of 31 Finnish individuals using qPCR and NGS methodologies.

<span class="mw-page-title-main">Pharmacomicrobiomics</span>

Pharmacomicrobiomics, proposed by Prof. Marco Candela for the ERC-2009-StG project call, and publicly coined for the first time in 2010 by Rizkallah et al., is defined as the effect of microbiome variations on drug disposition, action, and toxicity. Pharmacomicrobiomics is concerned with the interaction between xenobiotics, or foreign compounds, and the gut microbiome. It is estimated that over 100 trillion prokaryotes representing more than 1000 species reside in the gut. Within the gut, microbes help modulate developmental, immunological and nutrition host functions. The aggregate genome of microbes extends the metabolic capabilities of humans, allowing them to capture nutrients from diverse sources. Namely, through the secretion of enzymes that assist in the metabolism of chemicals foreign to the body, modification of liver and intestinal enzymes, and modulation of the expression of human metabolic genes, microbes can significantly impact the ingestion of xenobiotics.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Nikos Kyrpides is a Greek-American bioscientist who has worked on the origins of life, information processing, bioinformatics, microbiology, metagenomics and microbiome data science. He is a senior staff scientist at the Berkeley National Laboratory, head of the Prokaryote Super Program and leads the Microbiome Data Science program at the US Department of Energy Joint Genome Institute.

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

<span class="mw-page-title-main">Christopher E. Mason</span> American geneticist

Christopher E. Mason is a professor of Genomics, Physiology, and Biophysics at Weill Cornell Medicine. He is also one of the founding Directors of the WorldQuant Initiative for Quantitative Prediction together with Olivier Elemento. Mason has co-founded four biotechnology startup companies including Onegevity Health, Biotia, BridgeOmics, and Genome Liberty.