Operational taxonomic unit

Last updated

An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. [1] Numerical taxonomy is a method in biological systematics that involves using numerical techniques to classify taxonomic units based on the states of their characteristics. [2] In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy.

Contents

OTUs are employed in microbial community DNA sequencing research to delineate species-level distinctions among organisms and represent the most frequently utilized unit for measuring microbial diversity. [3] Nowadays, however, the term "OTU" is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU). [4] In other words, OTUs are pragmatic proxies for "species" (microbial or metazoan) at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes [5] ) marker gene sequence datasets.

Sequences can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold (usually 97% similarity; however also 100% similarity is common, also known as single variants [6] ) set by the researcher. It remains debatable how well this commonly-used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that microbial OTUs were generally ecologically consistent across habitats and several OTU clustering approaches. [7] The number of OTUs defined may be inflated due to errors in DNA sequencing. [8]

OTU clustering approaches

There are three main approaches to clustering OTUs: [9]

OTU clustering algorithms

See also

Related Research Articles

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

<span class="mw-page-title-main">Microbiota</span> Community of microorganisms

Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

<span class="mw-page-title-main">DNA barcoding</span> Method of species identification using a short section of DNA

DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections, an individual sequence can be used to uniquely identify an organism to species, just as a supermarket scanner uses the familiar black stripes of the UPC barcode to identify an item in its stock against its reference database. These "barcodes" are sometimes used in an effort to identify unknown species or parts of an organism, simply to catalog as many taxa as possible, or to compare with traditional taxonomy in an effort to determine species boundaries.

<span class="mw-page-title-main">Earth Microbiome Project</span>

The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

Parasutterella is a genus of Gram-negative, circular/rod-shaped, obligate anaerobic, non-spore forming bacteria from the Pseudomonadota phylum, Betaproteobacteria class and the family Sutterellaceae. Previously, this genus was considered "unculturable," meaning that it could not be characterized through conventional laboratory techniques, such as grow in culture due its unique requirements of anaerobic environment. The genus was initially discovered through 16S rRNA sequencing and bioinformatics analysis. By analyzing the sequence similarity, Parasutterella was determined to be related most closely to the genus Sutterella and previously classified in the family Alcaligenaceae.

UCLUST is an algorithm designed to cluster nucleotide or amino-acid sequences into clusters based on sequence similarity. The algorithm was published in 2010 and implemented in a program also named UCLUST. The algorithm is described by the author as following two simple clustering criteria, in regard to the requested similarity threshold T. The first criterion states that any given cluster's centroid sequence will have a similarity smaller than T to any other clusters' centroid sequence. The second criterion states that each member sequence in a given cluster will have similarity to the cluster's centroid sequence that is equal or greater than T.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

<span class="mw-page-title-main">Oligotyping (sequencing)</span>

Oligotyping is the process of correcting DNA sequence measured during the process of DNA sequencing based on frequency data of related sequences across related samples.

<span class="mw-page-title-main">Rhea (pipeline)</span> Rhea

Rhea is a bioinformatic pipeline written in R language for the analysis of microbial profiles. It was released during the end of 2016 and it is publicly available through a GitHub repository.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Bloom filters are space-efficient probabilistic data structures used to test whether an element is a part of a set. Bloom filters require much less space than other data structures for representing sets, however the downside of Bloom filters is that there is a false positive rate when querying the data structure. Since multiple elements may have the same hash values for a number of hash functions, then there is a probability that querying for a non-existent element may return a positive if another element with the same hash values has been added to the Bloom filter. Assuming that the hash function has equal probability of selecting any index of the Bloom filter, the false positive rate of querying a Bloom filter is a function of the number of bits, number of hash functions and number of elements of the Bloom filter. This allows the user to manage the risk of a getting a false positive by compromising on the space benefits of the Bloom filter.

Microbial DNA barcoding is the use of DNA metabarcoding to characterize a mixture of microorganisms. DNA metabarcoding is a method of DNA barcoding that uses universal genetic markers to identify DNA of a mixture of organisms.

<span class="mw-page-title-main">Amplicon sequence variant</span>

An amplicon sequence variant (ASV) is any one of the inferred single DNA sequences recovered from a high-throughput analysis of marker genes. Because these analyses, also called "amplicon reads," are created following the removal of erroneous sequences generated during PCR and sequencing, using ASVs makes it possible to distinguish sequence variation by a single nucleotide change. The uses of ASVs include classifying groups of species based on DNA sequences, finding biological and environmental variation, and determining ecological patterns.

References

  1. Sokal & Sneath: Principles of Numerical Taxonomy, San Francisco: W.H. Freeman, 1957
  2. "Contributors", Wikipedia and Academic Libraries, Michigan Publishing, 15 September 2021, retrieved 17 January 2024
  3. Escalas, Arthur; Hale, Lauren; Voordeckers, James W.; Yang, Yunfeng; Firestone, Mary K.; Alvarez‐Cohen, Lisa; Zhou, Jizhong (October 2019). "Microbial functional diversity: From concepts to applications". Ecology and Evolution. 9 (20): 12000–12016. doi:10.1002/ece3.5670. ISSN   2045-7758. PMC   6822047 . PMID   31695904.
  4. Blaxter, M.; Mann, J.; Chapman, T.; Thomas, F.; Whitton, C.; Floyd, R.; Abebe, E. (October 2005). "Defining operational taxonomic units using DNA barcode data". Philos Trans R Soc Lond B Biol Sci. 360 (1462): 1935–43. doi:10.1098/rstb.2005.1725. PMC   1609233 . PMID   16214751.
  5. Sommer, Stephanie A.; Woudenberg, Lauren Van; Lenz, Petra H.; Cepeda, Georgina; Goetze, Erica (2017). "Vertical gradients in species richness and community composition across the twilight zone in the North Pacific Subtropical Gyre". Molecular Ecology. 26 (21): 6136–6156. doi: 10.1111/mec.14286 . hdl: 11336/53966 . ISSN   1365-294X. PMID   28792641.
  6. Porter, Teresita M.; Hajibabaei, Mehrdad (2018). "Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis". Molecular Ecology. 27 (2): 313–338. doi: 10.1111/mec.14478 . ISSN   1365-294X. PMID   29292539.
  7. Schmidt, Thomas S. B.; Rodrigues, João F. Matias; von Mering, Christian (24 April 2014). "Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale". PLOS Comput Biol. 10 (4): e1003594. Bibcode:2014PLSCB..10E3594S. doi: 10.1371/journal.pcbi.1003594 . ISSN   1553-7358. PMC   3998914 . PMID   24763141.
  8. Kunin, V.; Engelbrektson, A.; Ochman, H.; Hugenholtz, P. (January 2010). "Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates". Environ Microbiol. 12 (1): 118–23. doi:10.1111/j.1462-2920.2009.02051.x. PMID   19725865.
  9. Kopylova E, Navas-Molina JA, Mercier C, Xu ZZ, Mahé F, He Y, et al. (23 February 2016). Segata N (ed.). "Open-Source Sequence Clustering Methods Improve the State Of the Art". mSystems. 1 (1): e00003–15. doi:10.1128/mSystems.00003-15. PMC   5069751 . PMID   27822515.
  10. Edgar, Robert C. (1 October 2010). "Search and clustering orders of magnitude faster than BLAST". Bioinformatics. 26 (19): 2460–2461. doi: 10.1093/bioinformatics/btq461 . ISSN   1367-4803. PMID   20709691.
  11. Fu, Limin; Niu, Beifang; Zhu, Zhengwei; Wu, Sitao; Li, Weizhong (1 December 2012). "CD-HIT: accelerated for clustering the next-generation sequencing data". Bioinformatics. 28 (23): 3150–3152. doi:10.1093/bioinformatics/bts565. ISSN   1367-4803. PMC   3516142 . PMID   23060610.
  12. Fu, Limin; Niu, Beifang; Zhu, Zhengwei; Wu, Sitao; Li, Weizhong (1 December 2012). "CD-HIT: accelerated for clustering the next-generation sequencing data". Bioinformatics. 28 (23): 3150–3152. doi:10.1093/bioinformatics/bts565. ISSN   1367-4803. PMC   3516142 . PMID   23060610.
  13. Hao, X.; Jiang, R.; Chen, T. (2011). "Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering". Bioinformatics. 27 (5): 611–618. doi:10.1093/bioinformatics/btq725. PMC   3042185 . PMID   21233169.

Further reading