Microbial dark matter [1] [2] (MDM) comprises the vast majority of microbial organisms (usually bacteria and archaea) that microbiologists are unable to culture in the laboratory, due to lack of knowledge or ability to supply the required growth conditions. Microbial dark matter is analogous to the dark matter of physics and cosmology due to its elusiveness in research and importance to our understanding of biological diversity. Microbial dark matter can be found ubiquitously and abundantly across multiple ecosystems, but remains difficult to study due to difficulties in detecting and culturing these species, posing challenges to research efforts. [3] It is difficult to estimate its relative magnitude, but the accepted gross estimate is that as little as one percent of microbial species in a given ecological niche are culturable. In recent years, more effort has been directed towards deciphering microbial dark matter by means of recovering genome DNA sequences from environmental samples via culture independent methods such as single cell genomics [4] and metagenomics. [5] These studies have enabled insights into the evolutionary history and the metabolism of the sequenced genomes, [6] [7] providing valuable knowledge required for the cultivation of microbial dark matter lineages. However, microbial dark matter research remains comparatively undeveloped and is hypothesized to provide insight into processes radically different from known biology, new understandings of microbial communities, and increasing understanding of how life survives in extreme environments. [8]
Our contemporary understanding of microbial dark matter was born from a field that still faced constraints with the cultivation of traditional microbes. One of the main constraints of this time was an over dependence on the use of culturing methods. This over reliance meant that a large amount of microbial diversity remained yet to be discovered. However in the late 20th century new developments in molecular techniques led to a surge in discovery of uncultured microbes. Despite this newfound diversity, a large majority of microbial species remain uncharacterized. [9] This fact was further proven by the development of advanced genomic sequencing techniques in the early 21st century which uncovered a larger amount of microbial diversity than previously thought. [8]
Metagenomics is a technique in the field of microbial studies that enables us to sequence DNA directly from samples of microbial environments. This innovative technique allows us to identify the genetic material of unknown microbes and avoid overreliance on the use of culturing. The use of metagenomics differs from other microbial methods in that it uses a broad description through its use of bulk samples. This technique has expanded our understanding of microbial functions in ecosystems through the discovery of new genes and metabolic pathways. [10]
Methods of single-cell genomics have shown promise in supporting metagenomics approaches by allowing the study of individual microbial cells isolated from their natural environments, a method which has been employed to uncover the genomic and functional diversity within microbial communities, particularly those that cannot be cultured. Single-cell techniques have also successfully identified numerous new branches on the tree of life, providing insight into the gaps of current phylogenetic understanding and metabolic potential of these organisms. [11]
Despite the rise of culture-independent methods as successful methods for dark matter research, improvements in culturing techniques remain both relevant and necessary to further current understanding of MRM microbes. To this point, developments in methods such as highly specific growth media to mimic natural microbial environments and co-culturing of synergistic microbial species have shown success in studying previously unculturable microbes. These advancements also serve to facilitate the application of MRM research into biotechnological and physiological uses. [12]
Genomic studies produce vast amounts of data to be analyzed. This analysis requires the use of advanced computational components. The scientific subdiscipline of bioinformatics used computational technology to collect genomes and conduct analysis on metabolic pathways. In recent years, research on artificial intelligence and machine learning has produced new ways to increase our ability to predict the behavior of microbial species using their genetic data. [13] These new developments in the world of computational tools have allowed us to further understand the structure and dynamics present in microbial communities.
It has been suggested certain microbial dark matter genetic material could belong to a new (i.e., fourth) domain of life, [14] [15] although other explanations (e.g., viral origin) are also possible, which has ties with the issue of a hypothetical shadow biosphere. [16]
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.
The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.
Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.
Nanohaloarchaea is a clade of diminutive archaea with small genomes and limited metabolic capabilities, belonging to the DPANN archaea. They are ubiquitous in hypersaline habitats, which they share with the extremely halophilic haloarchaea.
The Earth Microbiome Project (EMP) was an initiative founded by Janet Jansson, Jack Gilbert, and Rob Knight in 2010 to collect natural samples and analyze microbial life around the globe.
Biological dark matter is an informal term for unclassified or poorly understood genetic material. This genetic material may refer to genetic material produced by unclassified microorganisms. By extension, biological dark matter may also refer to the un-isolated microorganisms whose existence can only be inferred from the genetic material that they produce. Some of the genetic material may not fall under the three existing domains of life: Bacteria, Archaea and Eukaryota; thus, it has been suggested that a possible fourth domain of life may yet be discovered, although other explanations are also probable. Alternatively, the genetic material may refer to non-coding DNA and non-coding RNA produced by known organisms.
In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.
Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.
Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.
Mark J. Pallen is a research leader at the Quadram Institute and Professor of Microbial Genomics at the University of East Anglia. In recent years, he has been at the forefront of efforts to apply next-generation sequencing to problems in microbiology and ancient DNA research.
A microbiome is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps et al. as "a characteristic microbial community occupying a reasonably well-defined habitat which has distinct physio-chemical properties. The term thus not only refers to the microorganisms involved but also encompasses their theatre of activity". In 2020, an international panel of experts published the outcome of their discussions on the definition of the microbiome. They proposed a definition of the microbiome based on a revival of the "compact, clear, and comprehensive description of the term" as originally provided by Whipps et al., but supplemented with two explanatory paragraphs, the first pronouncing the dynamic character of the microbiome, and the second clearly separating the term microbiota from the term microbiome.
Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.
Parvarchaeota is a phylum of archaea belonging to the DPANN archaea. They have been discovered in acid mine drainage waters and later in marine sediments. The cells of these organisms are extremely small consistent with small genomes. Metagenomic techniques allow obtaining genomic sequences from non-cultured organisms, which were applied to determine this phylum.
Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or holobiont. The word is frequently used to describe environmental viral shotgun metagenomes. Viruses, including bacteriophages, are found in all environments, and studies of the virome have provided insights into nutrient cycling, development of immunity, and a major source of genes through lysogenic conversion. Also, the human virome has been characterized in nine organs of 31 Finnish individuals using qPCR and NGS methodologies.
DPANN is a superphylum of Archaea first proposed in 2013. Many members show novel signs of horizontal gene transfer from other domains of life. They are known as nanoarchaea or ultra-small archaea due to their smaller size (nanometric) compared to other archaea.
Nikos Kyrpides is a Greek-American bioscientist who has worked on the origins of life, information processing, bioinformatics, microbiology, metagenomics and microbiome data science. He is a senior staff scientist at the Berkeley National Laboratory, head of the Prokaryote Super Program and leads the Microbiome Data Science program at the US Department of Energy Joint Genome Institute.
The candidate phyla radiation is a large evolutionary radiation of bacterial lineages whose members are mostly uncultivated and only known from metagenomics and single cell sequencing. They have been described as nanobacteria or ultra-small bacteria due to their reduced size (nanometric) compared to other bacteria.
Gracilibacteria is a bacterial candidate phylum formerly known as GN02, BD1-5, or SN-2. It is part of the Candidate Phyla Radiation and the Patescibacteria group.
Culturomics is the high-throughput cell culture of bacteria that aims to comprehensively identify strains or species in samples obtained from tissues such as the human gut or from the environment. This approach was conceived as an alternative, complementary method to metagenomics, which relies on the presence of homologous sequences to identify new bacteria. Due to the limited phylogenetic information available on bacteria, metagenomic data generally contains large amounts of "microbial dark matter", sequences of unknown origin. Culturomics provides some of the missing gaps with the added advantage of enabling the functional study of the generated cultures. Its main drawback is that many bacterial species remain effectively uncultivable until their growth conditions are better understood. Therefore, optimization of the culturomics approach has been done by improving culture conditions.