Earth Microbiome Project

Last updated

Earth Microbiome Project
Formation2010
Website earthmicrobiome.org

The Earth Microbiome Project was an initiative founded by Janet Jansson, Jack Gilbert, and Rob Knight in 2010 to collect natural samples and analyze microbial life from around the world. [1]

The EMP set out to process up to 200,000 samples in different biomes, creating a database of microbes on Earth to characterize environments and ecosystems by microbial composition and interaction. [2]

The EMP website has not been updated in years, and the project is believed to be closed. [3]

Actors

The Earth Microbiome Project (EMP) was launched in 2010, and as of January 2018, it listed 161 institutions, all of which wre universities and university-affiliated institutions except for IBM Research and the Atlanta Zoo. Crowdsourcing has come from the John Templeton Foundation, the W. M. Keck Foundation, the Argonne National Laboratory by the U.S. Dept. of Energy, the Australian Research Council, the Tula Foundation, and the Samuel Lawrence Foundation. Companies have provided in-kind support, including MO BIO Laboratories, Luca Technologies, Eppendorf, Boreal Genomics, Illumina, Roche, and Integrated DNA Technologies. [4]

Goals

The primary goal[ by whom? ] of the Earth Microbiome Project (EMP) has been[ when? ] to survey microbial composition in many environments across the planet, across time as well as space, using a standard set of protocols. [1] The development of standardized protocols reduces variation and bias in analytical pipelines that complicates comparison of microbial community structures. [5] [6]

Another important goal is to determine how the reconstruction of microbial communities is affected by analytic biases. The rate of technological advancement is rapid, and it is necessary to understand how data using updated protocols will compare with data collected using earlier techniques. Information from this project will be archived in a database to facilitate analysis. Other outputs will include a global atlas of protein function and a catalog of reassembled genomes classified by their taxonomic distributions. [5]

Methods

Standard protocols for sampling, DNA extraction, 16S rRNA amplification, 18S rRNA amplification, and "shotgun" metagenomics have been developed. [7]

Sample collection

Samples will be collected using appropriate methods from various environments including the deep ocean, fresh water lakes, desert sand, and soil. Standardized collection protocols will be used when possible, so that the results are comparable. Microbes from natural samples cannot always be cultured. Because of this, metagenomic methods will be employed to sequence all the DNA or RNA in a sample in a culture-independent fashion.

Wet lab

The wet lab was used to perform a series of procedures to select and purify the microbial portion of the samples. The purification process varies according to the type of sample. DNA will be extracted from soil particles, or microbes will be concentrated using filtration techniques. In addition, various amplification techniques may be used to increase DNA yield. For example, non-PCR based Multiple displacement amplification is preferred by some researchers. DNA extraction, the use of primers, and PCR protocols are all areas that, to avoid bias, need to be performed following carefully standardized protocols. [5]

Sequencing

Researchers can sequence a metagenomic sample using two main approaches, depending on the biological question. To identify the types and abundances of organisms present, the preferred approach is to target and amplify a specific gene, often that is highly conserved among the species of interest, often the 16S ribosomal RNA gene for bacteria and the 18S ribosomal RNA gene for protists. This approach is called "deep sequencing", which allows rare species to be identified in a sample. However, this approach will not enable assembly of any whole genomes, nor will it provide information on how organisms may interact with each other. The second approach is shotgun metagenomics, in which all the DNA in the sample is sheared and the fragments sequenced. In principle, this approach allows for the assembly of whole microbial genomes and inference of metabolic relationships. However, if most microbes are uncharacterized in a given environment, de novo assembly will be computationally expensive. [8]

Data analysis

EMP proposes to standardize the bioinformatics aspects of sample processing. [5]

Data analysis usually includes the following steps: 1) Data clean up. A pre-procedure to clean up any reads with low quality scores removing any sequences containing "N" or ambiguous nucleotides and 2) Assigning taxonomy to the sequences which is usually done using tools such as BLAST [9] or RDP. [10] Very often, novel sequences are discovered which cannot be mapped to existing taxonomy. In this case, taxonomy is derived from a phylogenetic tree which is created with the novel sequences and a pool of closely related known sequences. [11]

Additional methods may be employed depending on the sequencing technology and the underlying biological question. For example, an assembly will be required if the sequenced reads are too short to infer any useful information. An assembly can also be used to construct whole genomes, providing useful information on the species. Furthermore, if the metabolic relationships within a microbial metagenome are to be understood, DNA sequences would need to be translated into amino acid sequences, for example with using gene prediction tools such as GeneMark [12] or FragGeneScan. [13]

Project output

The four key outputs from the EMP have been: [14]

Challenges

Large amounts of sequence data generated from analyzing diverse microbial communities are a challenge to store, organize and analyse. The problem is exacerbated by the short reads provided by the high-throughput sequencing platform that will be the standard instrument used in the EMP project. Improved algorithms, analysis tools, huge amounts of computer storage, and access to thousands of hours of supercomputer time will be necessary. [8]

Another challenge is the large number of sequencing errors expected, and distinguishing them from actual diversity in the collected microbial samples. [8] Next-generation sequencing technologies provide enormous throughput but lower accuracies than older sequencing methods. When sequencing a single genome, the intrinsic lower accuracy of these methods is more than compensated for by the ability to cover the entire genome multiple times in opposite directions from multiple start points, but this capability provides no improvement in accuracy when sequencing a diverse mixture of genomes.

Despite the issuance of standard protocols, systematic biases from lab to lab are expected. The need to amplify DNA from samples with low biomass will introduce additional distortions of the data. Assembly of genomes of even the dominant organisms in a diverse sample of organisms requires gigabytes of sequence data. [8]

With the advancement in high-throughput sequencing technologies, many sequences are entering public databases with no experimentally determined function, but which have been annotated on the basis of observed homologies with a known sequence. The first known sequence is used to annotate the first unknown sequence, but a problem that has become prevalent in the public sequence databases, which the EMP must avoid, is that the first unknown sequence is being used to annotate the second unknown sequence and so on. Sequence homology is only a modestly reliable predictor of function. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

<span class="mw-page-title-main">Microbiota</span> Community of microorganisms

Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

Biological dark matter is an informal term for unclassified or poorly understood genetic material. This genetic material may refer to genetic material produced by unclassified microorganisms. By extension, biological dark matter may also refer to the un-isolated microorganisms whose existence can only be inferred from the genetic material that they produce. Some of the genetic material may not fall under the three existing domains of life: Bacteria, Archaea and Eukaryota; thus, it has been suggested that a possible fourth domain of life may yet be discovered, although other explanations are also probable. Alternatively, the genetic material may refer to non-coding DNA and non-coding RNA produced by known organisms.

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

<span class="mw-page-title-main">Viral metagenomics</span>

Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

Mark J. Pallen is a research leader at the Quadram Institute and Professor of Microbial Genomics at the University of East Anglia. In recent years, he has been at the forefront of efforts to apply next-generation sequencing to problems in microbiology and ancient DNA research.

<span class="mw-page-title-main">Microbiome</span> Microbial community assemblage and activity

A microbiome is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps et al. as "a characteristic microbial community occupying a reasonably well-defined habitat which has distinct physio-chemical properties. The term thus not only refers to the microorganisms involved but also encompasses their theatre of activity". In 2020, an international panel of experts published the outcome of their discussions on the definition of the microbiome. They proposed a definition of the microbiome based on a revival of the "compact, clear, and comprehensive description of the term" as originally provided by Whipps et al., but supplemented with two explanatory paragraphs, the first pronouncing the dynamic character of the microbiome, and the second clearly separating the term microbiota from the term microbiome.

Microbial dark matter (MDM) comprises the vast majority of microbial organisms that microbiologists are unable to culture in the laboratory, due to lack of knowledge or ability to supply the required growth conditions. Microbial dark matter is analogous to the dark matter of physics and cosmology due to its elusiveness in research and importance to our understanding of biological diversity. Microbial dark matter can be found ubiquitously and abundantly across multiple ecosystems, but remains difficult to study due to difficulties in detecting and culturing these species, posing challenges to research efforts. It is difficult to estimate its relative magnitude, but the accepted gross estimate is that as little as one percent of microbial species in a given ecological niche are culturable. In recent years, more effort has been directed towards deciphering microbial dark matter by means of recovering genome DNA sequences from environmental samples via culture independent methods such as single cell genomics and metagenomics. These studies have enabled insights into the evolutionary history and the metabolism of the sequenced genomes, providing valuable knowledge required for the cultivation of microbial dark matter lineages. However, microbial dark matter research remains comparatively undeveloped and is hypothesized to provide insight into processes radically different from known biology, new understandings of microbial communities, and increasing understanding of how life survives in extreme environments.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

<span class="mw-page-title-main">Virome</span>

Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or holobiont. The word is frequently used to describe environmental viral shotgun metagenomes. Viruses, including bacteriophages, are found in all environments, and studies of the virome have provided insights into nutrient cycling, development of immunity, and a major source of genes through lysogenic conversion. Also, the human virome has been characterized in nine organs of 31 Finnish individuals using qPCR and NGS methodologies.

<span class="mw-page-title-main">Pharmacomicrobiomics</span>

Pharmacomicrobiomics, proposed by Prof. Marco Candela for the ERC-2009-StG project call, and publicly coined for the first time in 2010 by Rizkallah et al., is defined as the effect of microbiome variations on drug disposition, action, and toxicity. Pharmacomicrobiomics is concerned with the interaction between xenobiotics, or foreign compounds, and the gut microbiome. It is estimated that over 100 trillion prokaryotes representing more than 1000 species reside in the gut. Within the gut, microbes help modulate developmental, immunological and nutrition host functions. The aggregate genome of microbes extends the metabolic capabilities of humans, allowing them to capture nutrients from diverse sources. Namely, through the secretion of enzymes that assist in the metabolism of chemicals foreign to the body, modification of liver and intestinal enzymes, and modulation of the expression of human metabolic genes, microbes can significantly impact the ingestion of xenobiotics.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Nikos Kyrpides is a Greek-American bioscientist who has worked on the origins of life, information processing, bioinformatics, microbiology, metagenomics and microbiome data science. He is a senior staff scientist at the Berkeley National Laboratory, head of the Prokaryote Super Program and leads the Microbiome Data Science program at the US Department of Energy Joint Genome Institute.

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

<span class="mw-page-title-main">Microbiome-wide association study</span>

A microbiome-wide association study (MWAS), otherwise known as a metagenome-wide association study (MGWAS), is a statistical methodology used to examine the full metagenome of a defined microbiome in various organisms to determine if some feature of the microbiome is associated with a host trait. MWAS has been adopted by the field of metagenomics from the widely used genome-wide association study (GWAS).

References

  1. 1 2 Gilbert, J.A.; Jansson, J. K.; Knight, R. (2014). "The Earth Microbiome project: successes and aspirations". BMC Biology. 12 (1): 69. doi: 10.1186/s12915-014-0069-1 . PMC   4141107 . PMID   25184604.
  2. Gilbert, J. A.; O'Dor, R.; King, N.; Vogel, T. M. (2011). "The importance of metagenomic surveys to microbial ecology: Or why Darwin would have been a metagenomic scientist". Microbial Informatics and Experimentation. 1 (1): 5. doi: 10.1186/2042-5783-1-5 . PMC   3348666 . PMID   22587826.
  3. "Home". Earth Microbiome Project. 11 July 2024. Retrieved 11 July 2024.
  4. The Earth Microbiome Project is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. Archived 11 May 2020 at the Wayback Machine Earth Microbiome Project 2018, retrieved 3 January 2018
  5. 1 2 3 4 Gilbert, J.A.; Meyer, F. (2012). "Modeling the Earth Microbiome". Microbe Magazine. 7 (2): 64–69. doi:10.1128/microbe.7.64.1.
  6. Thompson, Luke R.; et al. (2017). "A communal catalogue reveals Earth's multiscale microbial diversity". Nature. 551 (7681): 457–463. Bibcode:2017Natur.551..457T. doi:10.1038/nature24621. PMC   6192678 . PMID   29088705.
  7. "Earth Microbiome Project / Standard Protocols". Archived from the original on 16 March 2012. Retrieved 7 March 2012.
  8. 1 2 3 4 Jansson, Janet (2011). "Towards "Tera-Terra": Terabase Sequencing of Terrestrial Metagenomes". Microbe Magazine. 6 (7): 309–15. doi:10.1128/microbe.6.309.1. OSTI   1051845.
  9. "BLAST: Basic Local Alignment Search Tool". Archived from the original on 9 August 2011. Retrieved 5 March 2012.
  10. "Ribosomal Database Project". Archived from the original on 19 August 2020. Retrieved 6 March 2012.
  11. Meyer, F.; Paarmann, D.; d'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Rodriguez, A.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A. (2008). "The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes". BMC Bioinformatics. 9: 386. doi: 10.1186/1471-2105-9-386 . PMC   2563014 . PMID   18803844.
  12. "GeneMark – Free gene prediction software". Archived from the original on 30 October 2013. Retrieved 6 March 2012.
  13. "FragGeneScan". Archived from the original on 17 September 2019. Retrieved 6 March 2012.
  14. "Earth Microbiome Project / Defining the Tasks". Archived from the original on 16 March 2012. Retrieved 7 March 2012.
  15. Gilbert, J. A.; Dupont, C. L. (2011). "Microbial Metagenomics: Beyond the Genome". Annual Review of Marine Science. 3: 347–371. Bibcode:2011ARMS....3..347G. doi:10.1146/annurev-marine-120709-142811. PMID   21329209.