An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. [1] Numerical taxonomy is a method in biological systematics that involves using numerical techniques to classify taxonomic units based on the states of their characteristics. [2] In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy.
OTUs are employed in microbial community DNA sequencing research to delineate species-level distinctions among organisms and represent the most frequently utilized unit for measuring microbial diversity. [3] Nowadays, however, the term "OTU" is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU). [4] In other words, OTUs are pragmatic proxies for "species" (microbial or metazoan) at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes [5] ) marker gene sequence datasets.
Sequences can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold (usually 97% similarity; however also 100% similarity is common, also known as single variants [6] ) set by the researcher. It remains debatable how well this commonly-used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that microbial OTUs were generally ecologically consistent across habitats and several OTU clustering approaches. [7] The number of OTUs defined may be inflated due to errors in DNA sequencing. [8]
There are three main approaches to clustering OTUs: [9]
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.
In molecular biology, a hybridization probe (HP) is a fragment of DNA or RNA, usually 15–10000 nucleotides long, which can be radioactively or fluorescently labeled. HPs can be used to detect the presence of nucleotide sequences in analyzed RNA or DNA that are complementary to the sequence in the probe. The labeled probe is first denatured into single stranded DNA (ssDNA) and then hybridized to the target ssDNA or RNA immobilized on a membrane or in situ.
Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.
16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.
The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.
Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.
In molecular biology, and more importantly high-throughput DNA sequencing, a chimera is a single DNA sequence originating when multiple transcripts or DNA sequences get joined. Chimeras can be considered artifacts and be filtered out from the data during processing to prevent spurious inferences of biological variation. However, chimeras should not be confused with chimeric reads, who are generally used by structural variant callers to detect structural variation events and are not always an indication of the presence of a chimeric transcript or gene.
Rare biosphere refers to a large number of rare species of microbial life, i.e. bacteria, archaea and fungi, that can be found in very low concentrations in an environment.
DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections, an individual sequence can be used to uniquely identify an organism to species, just as a supermarket scanner uses the familiar black stripes of the UPC barcode to identify an item in its stock against its reference database. These "barcodes" are sometimes used in an effort to identify unknown species or parts of an organism, simply to catalog as many taxa as possible, or to compare with traditional taxonomy in an effort to determine species boundaries.
The Earth Microbiome Project (EMP) was an initiative founded by Janet Jansson, Jack Gilbert, and Rob Knight in 2010 to collect natural samples and analyze microbial life around the globe.
Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.
In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.
Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.
Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.
PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.
Oligotyping is the process of correcting DNA sequence measured during the process of DNA sequencing based on frequency data of related sequences across related samples.
Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.
Microbial DNA barcoding is the use of DNA metabarcoding to characterize a mixture of microorganisms. DNA metabarcoding is a method of DNA barcoding that uses universal genetic markers to identify DNA of a mixture of organisms.
An amplicon sequence variant (ASV) is any one of the inferred single DNA sequences recovered from a high-throughput analysis of marker genes. Because these analyses, also called "amplicon reads," are created following the removal of erroneous sequences generated during PCR and sequencing, using ASVs makes it possible to distinguish sequence variation by a single nucleotide change. The uses of ASVs include classifying groups of species based on DNA sequences, finding biological and environmental variation, and determining ecological patterns.
Metabarcoding is the barcoding of DNA/RNA in a manner that allows for the simultaneous identification of many taxa within the same sample. The main difference between barcoding and metabarcoding is that metabarcoding does not focus on one specific organism, but instead aims to determine species composition within a sample.