Part of a series on |
DNA barcoding |
---|
By taxa |
Other |
DNA barcoding of algae is commonly used for species identification and phylogenetic studies. Algae form a phylogenetically heterogeneous group, meaning that the application of a single universal barcode/marker for species delimitation is unfeasible, thus different markers/barcodes are applied for this aim in different algal groups.
Diatom DNA barcoding is a method for taxonomical identification of diatoms even to species level. It is conducted using DNA or RNA followed by amplification and sequencing of specific, conserved regions in the diatom genome followed by taxonomic assignment.
One of the main challenges of identifying diatoms is that it is often collected as a mixture of diatoms from several species. DNA metabarcoding is the process of identifying the individual species from a mixed sample of environmental DNA (also called eDNA) which is DNA extracted straight from the environment such as in soil or water samples.
A newly applied method is diatom DNA metabarcoding which is used for ecological quality assessment of rivers and streams because of the specific response of diatoms to particular ecologic conditions. As species identification via morphology is relatively difficult and requires a lot of time and expertise, [1] [2] high-throughput sequencing (HTS) DNA metabarcoding enables taxonomic assignment and therefore identification for the complete sample regarding the group specific primers chosen for the previous DNA amplification.
Until now, several DNA markers have already been developed, mainly targeting the 18S rRNA. [3] Using the V4 hypervariable region of the ribosomal small subunit DNA (SSU rDNA), DNA-based identification was found to be more efficient than the classical morphology-based approach. [4] Other conserved regions in the genomes which are frequently used as marker genes are ribulose-1-5-bisphosphate carboxylase (rbcL), cytochrome oxidase I (cox1, COI), [5] ITS [6] and 28S. [7] It has been shown repeatedly that the molecular data gained by diatom eDNA metabarcoding quite faithfully reflect the morphology-based biotic diatom indices and therefore provide a similar assessment of ecosystem status. [8] [9] In the meantime, diatoms are routinely used for the assessment of ecological quality in other freshwater ecosystems. [7] Together with aquatic invertebrates they are considered as the best indicators of disturbance related to physical, chemical or biological conditions of watercourses. Numerous studies are using benthic diatoms for biomonitoring. [10] [11] [12] [13] Because no ideal diatom DNA barcode was found, it has been proposed that different markers are used for different purposes. Indeed, the highly variable cox1, ITS and 28S genes were considered more suitable for taxonomic studies, while more conserved 18S and rbcL genes seem more appropriate for biomonitoring.
Applying the DNA barcoding concept to diatoms promises great potential to resolve the problem of inaccurate species identification and thus facilitate analyses of the biodiversity of environmental samples. [14]
Molecular methods based on the NGS technology almost always leads to a higher number of identified taxa whose presence could subsequently be verified by light microscopy. [4] Results of this study provides evidence that eDNA barcoding of diatoms is suitable for water quality assessment and could complement or improve traditional methods. Stoeck et al. [15] also showed that eDNA barcoding provides a more insight into diatom diversity or other protist communities and therefore could be used for ecological projection of global diversity. Other studies showed different results. For example, inventories obtained from the molecular-based method were closer to those obtained by the morphology-based method when abundant species are in focus. [5]
DNA metabarcoding can also increase the taxonomic resolution and comparability across geographic regions, which is often difficult using morphological characters only. Moreover, DNA-based identification allows extending the range of potential bioindicators, including the inconspicuous taxonomic groups that could be highly sensitive or tolerant to particular stressors. Indirectly, the molecular methods can also help filling the gaps in knowledge of species ecology, by increasing the number of samples processed coupled with a decrease in processing time (cost-effectiveness), as well as by increasing the accuracy and precision of correlation between species/MOTUs occurrence and environmental factors. [16]
Currently there is no consensus concerning methods for DNA preservation and isolation, the choice of DNA barcodes and PCR primers, nor agreement concerning the parameters of MOTU clustering and their taxonomic assignment. [16] Sampling and molecular steps need to be standardize through development studies. [5] One of the major limitation is the availability of reference barcodes for diatoms species. The reference database of bioindicator taxa is far from complete despite the constant efforts of numerous national barcoding initiatives a lot of species are still lacking barcode information. Furthermore, most existing metabarcoding data are only locally available and geographically scattered, which is hindering the development of globally useful tools. [16] Visco et al. [17] estimated that no more than 30% of European diatoms species are currently represented in reference databases. For example, there is an important lack for a number of species from the Fennoscandian communities (especially acidophilic diatoms, such as Eunotia incisa). It has also been shown that taxonomic identification with DNA barcoding is not accurate above species level, to discriminate varieties for example (reference missing).
Another well-known limitation of barcoding for taxonomic identification is the clustering method used before the taxonomic assignation: It often leads to massive loss of genetic information and the only reliable way to assess the effects of different clustering and different taxonomic assignation processes would be to compare the species list generated by different pipelines when using the same reference database. This has yet to be done for the variety of pipelines used in molecular assessment of diatom communities in Europe. [16] Taxonomically validated databases, which includes accessible vouchers are also crucial for reliable taxa identification via NGS. [18]
Additionally, primer bias is often found to be a major source of variation in barcoding and PCR primers efficiency can differ between diatoms species, i.e. some primers lead to a preferential amplification of one taxon over another. [16]
The inference of abundance from metabarcoding data is considered as one of the most difficult issues in environmental use. [19] [20] The number of generated sequences by HTS does not directly correspond to the number of specimen or biomass and that different species can produce different amount of reads, (for example, due to differences in the chloroplast size with the rbcL marker). Vasselon et al. [21] recently created a biovolume correction factor when using the rbcL marker. For example, Achnanthidium minutissimum has a small biovolume, and thus will generate less copies of the rbcL fragment (located in the chloroplast) than larger species. This correction factor, however, requires extensive calibration with each species own biovolume and has been tested only on a few species that far. Fluctuations of gene copy number for other markers, such as the 18S marker, does not seem to be species specific, but have not been tested yet.
Barcoding marker usually combine hypervariable regions of the genome (to allow the distinction between species) with very conserved region (to insure a specificity to the target organism). Several DNA markers, belonging to the nuclear, mitochondrial, and chloroplast genomes (rbcL, COI, ITS+5.8S, SSU, 18S...), have been designed and successfully used for diatoms identification with NGS. [22] [23] [6]
The 18S gene region has been widely used as a marker in other protist groups [24] [25] and Jahn et al. [26] were the first to test the 18S gene region for diatoms barcoding. Zimmerman et al. [7] proposed a 390–410 bp long fragment of the 1800 bp long 18S rRNA gene locus as a barcode marker for the analysis of environmental samples with HTS. and discusses its use and limitations for diatom identification. This fragment includes the V4 subunit which is the largest and most complex of the highly variable regions within the 18S locus. [27] They highlighted that this hypervariable region of the 18S gene have great potential for studying protist diversity at large scale but has limited efficiency to identification below species level or cryptic species.
The rbcl gene is used for taxonomy studies (Trobajo et al. 2009) which benefits include that rarely any intragenomic variation and they are very easily aligned and compared. An open-access reference library, called R-Syst::diatom includes data for two barcodes (18S and rbcL). It is freely accessible through a website. [28] Kermmarec et al. [5] also successfully used the rbcL gene for ecological assessment of diatoms. The rbcL marker is also easily aligned and compared.
Moniz and Kaczmarska [23] investigated the amplification success of the SSU, COI, and ITS2 markers and found that the 300 – 400 bp ITS-2 + 5.8S fragment provided the highest success rate of amplification and good species resolution. This marker was subsequently used to separate morphologically defined species with a success rate of 99.5%. Despite this amplification success, Zimmerman et al. [7] criticised the use of ITS-2 due to intra-individual heterogeneity. It has been suggested that SSU [7] or the rbcL (Mann et al., 2010) markers less heterogenous between individuals and therefore more beneficial when distinguishing between species.
Diatoms are routinely used as part of a suite of biomonitoring tools which must be monitored as part of the European Water Framework Directive. [29] Diatoms are used as an indicator of ecosystem health in freshwaters because they are ubiquitous, directly affected by the changes in physico-chemical parameters and show a better relationship with environmental variables than other taxa e.g. invertebrates, giving a better overall picture of water quality. [30]
Over the recent years, researchers have developed and standardised the tools for the metabarcoding and sequencing of diatoms, to complement the traditional assessment using microscopy, opening up a new avenue of biomonitoring for aquatic systems. [31] Using benthic diatoms through a method of next-generation sequencing approach to river biomonitoring revealed a good potential in it. [5] Many studies have shown that metabarcoding and HTS (high-throughput sequencing) can be utilized to estimate the quality status and diversity in freshwaters. As part of the Environment Agency, Kelly et al. [32] has developed a DNA-based metabarcoding approach to assess diatom communities in rivers for the UK. Vasselon et al. [33] compared morphological and HTS approaches for diatoms and found that HTS gave a reliable indication of quality status for most rivers in terms of Specific Polluosensitivity Index (SPI). Vasselon et al. [34] also applied DNA metabarcoding of diatoms communities to the monitoring network of rivers on the tropical Island Mayotte (French DOM-TOM).
Rimet et al. [35] also explored the possibility of using HTS for assessing diatom diversity and showed that diversity indices from both HTS and microscopic analysis were well correlated although not perfect.
DNA barcoding and metabarcoding can be used to establish molecular metrics and indices, which potentially provide conclusions broadly similar to those of the traditional approaches about the ecological and environmental status of aquatic ecosystems. [36]
Diatoms are used to as a diagnosis tool for drowning in forensic practices. The diatom test is based on the principle of diatom inhalation from water into the lungs and distribution and deposition around the body. DNA methods can be used to confirm if the cause of death was indeed drowning and locate the origin of drowning. [37] Diatom DNA metabarcoding, provides the opportunity to quickly analyse the diatom community present within a body and locate the origin of drowning and investigate if a body may have been moved from one place to another.
Diatom metabarcoding may help delimit cryptic species that are difficult to identify using microscopy and help complete reference databases by comparing morphological assemblages to metabarcoding data. [35]
Chlorophytes possess an ancients and taxonomically very diverse lineage (Fang et al. 2014), including terrestrial plants too. Even though more than 14 000 species have been described based on structural and ultrastructural criteria (Hall et al. 2010) their morphological identification is often limited.
Several barcodes for chlorophytes have been proposed for DNA-based identification in order to bypass the problematics of the morphological one. Although the cytochrome oxidase I (COI, COX) coding gene (link) is a standard barcode for animals it proved to be unsatisfactory for chlorophytes because the gene contains several introns in this algae group (Turmel et al. 2002). Nuclear marker genes have been used for chlorophytes are SSU rDNA, LSU rDNA, rDNA ITS (Leliaert et al. 2014). [38]
Macroalgae—a morphological rather than taxonomic grouping—can be very challenging to identify because of their simple morphology, phenotypic plasticity and alternate lifecycle stages. Thus, algal systematics and identification have come to rely heavily on genetic/molecular tools such as DNA barcoding. [39] [40] The SSU rDNA gene is a common used barcode for phylogenetic studies on macroalgae. [41] However, the SSU rDNA is a highly conserved region and typically lack resolution for species identification.
Over the past 2 decades certain standards for DNA barcoding with the aim of species identification have been developed for each of the main groups of macroalgae. [42] [39] [43] [44] [45] The cytochrome c oxidase subunit I (COI) gene is commonly used as a barcode for red and brown algae, while tufA (plastid elongation factor), rbcL (rubisco large subunit) and ITS (internal transcribe spacer) are commonly used for green algae. [41] [45] These barcodes are typically 600-700 bp long.
The barcodes typically differ between the 3 main groups of macroalgae (red, green and brown) because their evolutionary heritage is very diverse. [46] Macroalgae is a polyphyletic group, meaning that within the group they do not all share a recent common ancestor, making it challenging to find a gene that is conserved among all but variable enough for species identification.
Taxonomic group | Marker gene | ||
nuclear | mitochondrial | chloroplastid | |
Chlorophytes | SSU rDNA, LSU rDNA, rDNA ITS | tufA, rbcL | |
Rhodophytes | Phycoerythrin, elongation factor, LSU rDNA | cox1, cox2-3 spacer | rbcL, Rubisco spacer |
Phaeophytes | RDNA ITS | cox1, cox3 | psbA, rbcL, Rubisco spacer |
Chrysophytes and Synurophytes | SSU rDNA, rDNA ITS | cox1 | psaA, rbcL |
Cryptophytes | SSU rDNA, LSU rDNA, rDNA ITS | cox1 | Rubisco spacer |
Bacillariophytes | SSU rDNA, LSU rDNA, rDNA ITS | cox1 | rbcL |
Dinophytes | LSU rDNA, rDNA ITS | cox1, cob | PsbAncr, 23S rDNA |
Haptophytes | SSU rDNA, LSU rDNA, rDNA, rDNA ITS | cox1b-atp4 | tufA |
Raphidophytes | SSU rDNA, LSU rDNA, rDNA, rDNA ITS | cox1 | psaA, rbcL |
Xanthophytes | RDNA ITS | RbcL, psbA-rbcL spacer | |
Chlorarachniophytes | Nuclear rDNA ITS, nucleomorph rDNA ITS | ||
Euglenophytes | SSU rDNA, LSU rDNA | SSU rDNA, LSU rDNA |
Adapted from [40]
Detailed information on DNA barcoding of different organisms can be found here:
Internal transcribed spacer (ITS) is the spacer DNA situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome or the corresponding transcribed region in the polycistronic rRNA precursor transcript.
The Consortium for the Barcode of Life (CBOL) was an international initiative dedicated to supporting the development of DNA barcoding as a global standard for species identification. CBOL's Secretariat Office is hosted by the National Museum of Natural History, Smithsonian Institution, in Washington, DC. Barcoding was proposed in 2003 by Prof. Paul Hebert of the University of Guelph in Ontario as a way of distinguishing and identifying species with a short standardized gene sequence. Hebert proposed the 658 bases of the Folmer region of the mitochondrial gene cytochrome-C oxidase-1 as the standard barcode region. Hebert is the Director of the Biodiversity Institute of Ontario, the Canadian Centre for DNA Barcoding, and the International Barcode of Life Project (iBOL), all headquartered at the University of Guelph. The Barcode of Life Data Systems (BOLD) is also located at the University of Guelph.
Molecular ecology is a subdiscipline of ecology that is concerned with applying molecular genetic techniques to ecological questions. It is virtually synonymous with the field of "Ecological Genetics" as pioneered by Theodosius Dobzhansky, E. B. Ford, Godfrey M. Hewitt, and others. Molecular ecology is related to the fields of population genetics and conservation genetics.
16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.
18S ribosomal RNA is a part of the ribosomal RNA in eukaryotes. It is a component of the Eukaryotic small ribosomal subunit (40S) and the cytosolic homologue of both the 12S rRNA in mitochondria and the 16S rRNA in plastids and prokaryotes. Similar to the prokaryotic 16S rRNA, the genes of the 18S ribosomal RNA have been widely used for phylogenetic studies and biodiversity screening of eukaryotes.
Identification in biology is the process of assigning a pre-existing taxon name to an individual organism. Identification of organisms to individual scientific names may be based on individualistic natural body features, experimentally created individual markers, or natural individualistic molecular markers. Individual identification is used in ecology, wildlife management and conservation biology. The more common form of identification is the identification of organisms to common names or scientific name. By necessity this is based on inherited features ("characters") of the sexual organisms, the inheritance forming the basis of defining a class. The features may, e. g., be morphological, anatomical, physiological, behavioral, or molecular.
An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. Numerical taxonomy is a method in biological systematics that involves using numerical techniques to classify taxonomic units based on the states of their characteristics. In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy.
DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections, an individual sequence can be used to uniquely identify an organism to species, just as a supermarket scanner uses the familiar black stripes of the UPC barcode to identify an item in its stock against its reference database. These "barcodes" are sometimes used in an effort to identify unknown species or parts of an organism, simply to catalog as many taxa as possible, or to compare with traditional taxonomy in an effort to determine species boundaries.
Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.
Diplonemidae is a family of biflagellated unicellular protists that may be among the more diverse and common groups of planktonic organisms in the ocean. Although this family is currently made up of three named genera; Diplonema, Rhynchopus, and Hemistasia, there likely exist thousands of still unnamed genera. Organisms are generally colourless and oblong in shape, with two flagella emerging from a subapical pocket. They possess a large mitochondrial genome composed of fragmented linear DNA. These non-coding sequences must be massively trans-spliced, making it one of the most complicated post-transcriptional editing process known to eukaryotes.
Environmental DNA or eDNA is DNA that is collected from a variety of environmental samples such as soil, seawater, snow or air, rather than directly sampled from an individual organism. As various organisms interact with the environment, DNA is expelled and accumulates in their surroundings from various sources. Such eDNA can be sequenced by environmental omics to reveal facts about the species that are present in an ecosystem — even microscopic ones not otherwise apparent or detectable.
Pollen DNA barcoding is the process of identifying pollen donor plant species through the amplification and sequencing of specific, conserved regions of plant DNA. Being able to accurately identify pollen has a wide range of applications though it has been difficult in the past due to the limitations of microscopic identification of pollen.
DNA barcoding is an alternative method to the traditional morphological taxonomic classification, and has frequently been used to identify species of aquatic macroinvertebrates. Many are crucial indicator organisms in the bioassessment of freshwater and marine ecosystems.
Microbial DNA barcoding is the use of DNA metabarcoding to characterize a mixture of microorganisms. DNA metabarcoding is a method of DNA barcoding that uses universal genetic markers to identify DNA of a mixture of organisms.
DNA barcoding methods for fish are used to identify groups of fish based on DNA sequences within selected regions of a genome. These methods can be used to study fish, as genetic material, in the form of environmental DNA (eDNA) or cells, is freely diffused in the water. This allows researchers to identify which species are present in a body of water by collecting a water sample, extracting DNA from the sample and isolating DNA sequences that are specific for the species of interest. Barcoding methods can also be used for biomonitoring and food safety validation, animal diet assessment, assessment of food webs and species distribution, and for detection of invasive species.
DNA barcoding in diet assessment is the use of DNA barcoding to analyse the diet of organisms. and further detect and describe their trophic interactions. This approach is based on the identification of consumed species by characterization of DNA present in dietary samples, e.g. individual food remains, regurgitates, gut and fecal samples, homogenized body of the host organism, target of the diet study.
Fungal DNA barcoding is the process of identifying species of the biological kingdom Fungi through the amplification and sequencing of specific DNA sequences and their comparison with sequences deposited in a DNA barcode database such as the ISHAM reference database, or the Barcode of Life Data System (BOLD). In this attempt, DNA barcoding relies on universal genes that are ideally present in all fungi with the same degree of sequence variation. The interspecific variation, i.e., the variation between species, in the chosen DNA barcode gene should exceed the intraspecific (within-species) variation.
Genome skimming is a sequencing approach that uses low-pass, shallow sequencing of a genome, to generate fragments of DNA, known as genome skims. These genome skims contain information about the high-copy fraction of the genome. The high-copy fraction of the genome consists of the ribosomal DNA, plastid genome (plastome), mitochondrial genome (mitogenome), and nuclear repeats such as microsatellites and transposable elements. It employs high-throughput, next generation sequencing technology to generate these skims. Although these skims are merely 'the tip of the genomic iceberg', phylogenomic analysis of them can still provide insights on evolutionary history and biodiversity at a lower cost and larger scale than traditional methods. Due to the small amount of DNA required for genome skimming, its methodology can be applied in other fields other than genomics. Tasks like this include determining the traceability of products in the food industry, enforcing international regulations regarding biodiversity and biological resources, and forensics.
Marine protists are defined by their habitat as protists that live in marine environments, that is, in the saltwater of seas or oceans or the brackish water of coastal estuaries. Life originated as marine single-celled prokaryotes and later evolved into more complex eukaryotes. Eukaryotes are the more developed life forms known as plants, animals, fungi and protists. Protists are the eukaryotes that cannot be classified as plants, fungi or animals. They are mostly single-celled and microscopic. The term protist came into use historically as a term of convenience for eukaryotes that cannot be strictly classified as plants, animals or fungi. They are not a part of modern cladistics because they are paraphyletic.
Metabarcoding is the barcoding of DNA/RNA in a manner that allows for the simultaneous identification of many taxa within the same sample. The main difference between barcoding and metabarcoding is that metabarcoding does not focus on one specific organism, but instead aims to determine species composition within a sample.