Terminal restriction fragment length polymorphism

Last updated

Terminal restriction fragment length polymorphism (TRFLP or sometimes T-RFLP) is a molecular biology technique for profiling of microbial communities based on the position of a restriction site closest to a labelled end of an amplified gene. The method is based on digesting a mixture of PCR amplified variants of a single gene using one or more restriction enzymes and detecting the size of each of the individual resulting terminal fragments using a DNA sequencer. The result is a graph image where the x-axis represents the sizes of the fragment and the y-axis represents their fluorescence intensity.

Contents

Background

TRFLP is one of several molecular methods aimed to generate a fingerprint of an unknown microbial community. Other similar methods include DGGE, TGGE, ARISA, ARDRA, PLFA, etc.
These relatively high throughput methods were developed in order to reduce the cost and effort in analyzing microbial communities using a clone library. The method was first described by Avaniss-Aghajani et al in 1994 [1] and later by Liu in 1997 [2] which employed the amplification of the 16S rDNA target gene from the DNA of several isolated bacteria as well as environmental samples.
Since then the method has been applied for the use of other marker genes such as the functional marker gene pmoA to analyze methanotrophic communities.

Method

Like most other community analysis methods, TRFLP is also based on PCR amplification of a target gene. In the case of TRFLP, the amplification is performed with one or both the primers having their 5’ end labeled with a fluorescent molecule. In case both primers are labeled, different fluorescent dyes are required. While several common fluorescent dyes can be used for the purpose of tagging such as 6-carboxyfluorescein (6-FAM), ROX, carboxytetramethylrhodamine (TAMRA, a rhodamine-based dye), and hexachlorofluorescein (HEX), the most widely used dye is 6-FAM. The mixture of amplicons is then subjected to a restriction reaction, normally using a four-cutter restriction enzyme. Following the restriction reaction, the mixture of fragments is separated using either capillary or polyacrylamide electrophoresis in a DNA sequencer and the sizes of the different terminal fragments are determined by the fluorescence detector. Because the excised mixture of amplicons is analyzed in a sequencer, only the terminal fragments (i.e. the labeled end or ends of the amplicon) are read while all other fragments are ignored. Thus, T-RFLP is different from ARDRA and RFLP in which all restriction fragments are visualized. In addition to these steps the TRFLP protocol often includes a cleanup of the PCR products prior to the restriction and in case a capillary electrophoresis is used a desalting stage is also performed prior to running the sample.

Data format and artifacts

The result of a T-RFLP profiling is a graph called electropherogram which is an intensity plot representation of an electrophoresis experiment (gel or capillary). In an electropherogram the X-axis marks the sizes of the fragments while the Y-axis marks the fluorescence intensity of each fragment. Thus, what appears on an electrophoresis gel as a band appears as a peak on the electropherogram whose integral is its total fluorescence. In a T–RFLP profile each peak assumingly corresponds to one genetic variant in the original sample while its height or area corresponds to its relative abundance in the specific community. Both assumptions listed above, however, are not always met. Often, several different bacteria in a population might give a single peak on the electropherogram due to the presence of a restriction site for the particular restriction enzyme used in the experiment at the same position. To overcome this problem and to increase the resolving power of this technique a single sample can be digested in parallel by several enzymes (often three) resulting in three T-RFLP profiles per sample each resolving some variants while missing others. Another modification which is sometimes used is to fluorescently label the reverse primer as well using a different dye, again resulting in two parallel profiles per sample each resolving a different number of variants.

In addition to convergence of two distinct genetic variants into a single peak artifacts might also appear, mainly in the form of false peaks. False peaks are generally of two types: background “noises” and “pseudo” TRFs. [3] Background (noise) peaks are peaks resulting from the sensitivity of the detector in use. These peaks are often small in their intensity and usually form a problem in case the total intensity of the profile is low (i.e. low concentration of DNA). Because these peaks result from background noise they are normally irreproducible in replicate profiles, thus the problem can be tackled by producing a consensus profile from several replicates or by eliminating peaks below a certain threshold. Several other computational techniques were also introduced in order to deal with this problem. [4] Pseudo TRFs, on the other hand, are reproducible peaks and are linear to the amount of DNA loaded. These peaks are thought to be the result of ssDNA annealing on to itself and creating double stranded random restriction sites which are later recognized by the restriction enzyme resulting in a terminal fragment which does not represent any genuine genetic variant. It has been suggested that applying a DNA exonuclease such as the Mung bean exonuclease prior to the digestion stage might eliminate such artifact.

Interpretation of data

The data resulting from the electropherogram is normally interpreted in one of the following ways.

Pattern comparison

In pattern comparison the general shapes of electropherograms of different samples are compared for changes such as presence-absence of peaks between treatments, their relative size, etc.

Complementing with a clone library

If a clone library is constructed in parallel to the T-RFLP analysis then the clones can be used to assess and interpret the T-RFLP profile. In this method the TRF of each clone is determined either directly (i.e. performing T-RFLP analysis on each single clone) or by in silico analysis of that clone’s sequence. By comparing the T-RFLP profile to a clone library it is possible to validate each of the peaks as genuine as well as to assess the relative abundance of each variant in the library.

Peak resolving using a database

Several computer applications attempt to relate the peaks in an electropherogram to specific bacteria in a database. Normally this type of analysis is done by simultaneously resolving several profiles of a single sample obtained with different restriction enzymes. The software then resolves the profile by attempting to maximize the matches between the peaks in the profiles and the entries in the database so that the number of peaks left without a matching sequence is minimal. The software withdraws from the database only those sequences which have their TRFs in all analyzed profiles.

Multivariate analysis

A recently growing way to analyze T-RFLP profiles is use multivariate statistical methods to interpret the T-RFLP data. [5] Usually the methods applied are those commonly used in ecology and especially in the study of biodiversity. Among them ordinations and cluster analysis are the most widely used. In order to perform multivariate statistical analysis on T-RFLP data, the data must first be converted to table known as a “sample by species table“ which depicts the different samples (T-RFLP profiles) versus the species (T-RFS) with the height or area of the peaks as values.

Advantages and disadvantages

As T-RFLP is a fingerprinting technique its advantages and drawbacks are often discussed in comparison with other similar techniques, mostly DGGE.

Advantages

The major advantage of T-RFLP is the use of an automated sequencer which gives highly reproducible results for repeated samples. Although the genetic profiles are not completely reproducible and several minor peaks which appear are irreproducible the overall shape of the electropherogram and the ratios of the major peaks are considered reproducible. The use of an automated sequencer which outputs the results in a digital numerical format also enables an easy way to store the data and compare different samples and experiments. The numerical format of the data can and has been used for relative (though not absolute) quantification and statistical analysis. Although sequence data cannot be definitively inferred directly from the T-RFLP profile, ‘’in-silico’’ assignment of the peaks to existing sequences is possible to a certain extent.

Drawbacks

Because T-RFLP relies on DNA extraction methods and PCR, the biases inherent to both will affect the results of the analysis. [6] [7] Also, the fact that only the terminal fragments are being read means that any two distinct sequences which share a terminal restriction site will result in one peak only on the electropherogram and will be indistinguishable. Indeed, when T-RFLP is applied on a complex microbial community the result is often a compression of the total diversity to normally 20-50 distinct peaks only representing each an unknown number of distinct sequences. Although this phenomenon makes the T-RFLP results easier to handle, it naturally introduces biases and oversimplification of the real diversity. Attempts to minimize (but not overcome) this problem are often done by applying several restriction enzymes and/ or labeling both primers with a different fluorescent dye. The inability to retrieve sequences from T-RFLP often leads to the need to construct and analyze one or more clone libraries in parallel to the T-RFLP analysis which adds to the effort and complicates analysis. The possible appearance of false (pseudo) T-RFs, as discussed above, is yet another drawback. To handle this researchers often only consider peaks which can be affiliated to sequences in a clone library.

Related Research Articles

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. PCR was invented in 1983 by the American biochemist Kary Mullis at Cetus Corporation; Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, in order to distinguish individuals, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

This is a list of topics in molecular biology. See also index of biochemistry articles.

<span class="mw-page-title-main">Temperature gradient gel electrophoresis</span>

Temperature gradient gel electrophoresis (TGGE) and denaturing gradient gel electrophoresis (DGGE) are forms of electrophoresis which use either a temperature or chemical gradient to denature the sample as it moves across an acrylamide gel. TGGE and DGGE can be applied to nucleic acids such as DNA and RNA, and proteins. TGGE relies on temperature dependent changes in structure to separate nucleic acids. DGGE separates genes of the same size based on their different denaturing ability which is determined by their base pair sequence. DGGE was the original technique, and TGGE a refinement of it.

A restriction digest is a procedure used in molecular biology to prepare DNA for analysis or other processing. It is sometimes termed DNA fragmentation. Hartl and Jones describe it this way:

This enzymatic technique can be used for cleaving DNA molecules at specific sites, ensuring that all DNA fragments that contain a particular sequence at a particular location have the same size; furthermore, each fragment that contains the desired sequence has the sequence located at exactly the same position within the fragment. The cleavage method makes use of an important class of DNA-cleaving enzymes isolated primarily from bacteria. These enzymes are called restriction endonucleases or restriction enzymes, and they are able to cleave DNA molecules at the positions at which particular short sequences of bases are present.

DNA fragmentation is the separation or breaking of DNA strands into pieces. It can be done intentionally by laboratory personnel or by cells, or can occur spontaneously. Spontaneous or accidental DNA fragmentation is fragmentation that gradually accumulates in a cell. It can be measured by e.g. the Comet assay or by the TUNEL assay.

<span class="mw-page-title-main">Amplified fragment length polymorphism</span>

AFLP-PCR or just AFLP is a PCR-based tool used in genetics research, DNA fingerprinting, and in the practice of genetic engineering. Developed in the early 1990s by KeyGene, AFLP uses restriction enzymes to digest genomic DNA, followed by ligation of adaptors to the sticky ends of the restriction fragments. A subset of the restriction fragments is then selected to be amplified. This selection is achieved by using primers complementary to the adaptor sequence, the restriction site sequence and a few nucleotides inside the restriction site fragments. The amplified fragments are separated and visualized on denaturing on agarose gel electrophoresis, either through autoradiography or fluorescence methodologies, or via automated capillary sequencing instruments.

<span class="mw-page-title-main">STR analysis</span> Biological DNA analysis for allele repeats

Short Tandem Repeat (STR) analysis is a common molecular biology method used to compare allele repeats at specific loci in DNA between two or more samples. A short tandem repeat is a microsatellite with repeat units that are 2 to 7 base pairs in length, with the number of repeats varying among individuals, making STRs effective for human identification purposes. This method differs from restriction fragment length polymorphism analysis (RFLP) since STR analysis does not cut the DNA with restriction enzymes. Instead, polymerase chain reaction (PCR) is employed to discover the lengths of the short tandem repeats based on the length of the PCR product.

<span class="mw-page-title-main">Molecular-weight size marker</span> Set of standards

A molecular-weight size marker, also referred to as a protein ladder, DNA ladder, or RNA ladder, is a set of standards that are used to identify the approximate size of a molecule run on a gel during electrophoresis, using the principle that molecular weight is inversely proportional to migration rate through a gel matrix. Therefore, when used in gel electrophoresis, markers effectively provide a logarithmic scale by which to estimate the size of the other fragments.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

In the field of genomics, GeneCalling is an open-platform mRNA transcriptional profiling technique. The GeneCalling protocol measures levels of cDNA, which are correlated with gene expression levels of specific transcripts. Differences between gene expression in healthy tissues and disease or drug responsive tissues are examined and compared in this technology. The technique has been applied to the study of human tissues and plant tissues.

An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler Dot blot assay. It is a common tool used in genetic testing, forensics, and Molecular Biology research.

<span class="mw-page-title-main">Oligomer restriction</span>

Oligomer Restriction is a procedure to detect an altered DNA sequence in a genome. A labeled oligonucleotide probe is hybridized to a target DNA, and then treated with a restriction enzyme. If the probe exactly matches the target, the restriction enzyme will cleave the probe, changing its size. If, however, the target DNA does not exactly match the probe, the restriction enzyme will have no effect on the length of the probe. The OR technique, now rarely performed, was closely associated with the development of the popular polymerase chain reaction (PCR) method.

The versatility of polymerase chain reaction (PCR) has led to a large number of variants of PCR.

The cleaved amplified polymorphic sequence (CAPS) method is a technique in molecular biology for the analysis of genetic markers. It is an extension to the restriction fragment length polymorphism (RFLP) method, using polymerase chain reaction (PCR) to more quickly analyse the results.

Diversity Arrays Technology (DArT) is a high-throughput genetic marker technique that can detect allelic variations to provides comprehensive genome coverage without any DNA sequence information for genotyping and other genetic analysis. The general steps involve reducing the complexity of the genomic DNA with specific restriction enzymes, choosing diverse fragments to serve as representations for the parent genomes, amplify via polymerase chain reaction (PCR), insert fragments into a vector to be placed as probes within a microarray, then fluorescent targets from a reference sequence will be allowed to hybridize with probes and put through an imaging system. The objective is to identify and quantify various forms of DNA polymorphism within genomic DNA of sampled species.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

<span class="mw-page-title-main">Surveyor nuclease assay</span>

Surveyor nuclease assay is an enzyme mismatch cleavage assay used to detect single base mismatches or small insertions or deletions (indels).

References

  1. Avaniss-Aghajani, E; Jones, K; Chapman, D; Brunk, C (1994). "A molecular technique for identification of bacteria using small subunit ribosomal RNA sequences". BioTechniques. 17 (1): 144–149. PMID   7946297.
  2. Liu, W; Marsh, T; Cheng, H; Forney, L (1997). "Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA". Appl. Environ. Microbiol. 63 (11): 4516–4522. Bibcode:1997ApEnM..63.4516L. doi:10.1128/AEM.63.11.4516-4522.1997. PMC   168770 . PMID   9361437.
  3. Egert, M; Friedrich, MW (2003). "Formation of Pseudo-Terminal Restriction Fragments, a PCR-Related Bias Affecting Terminal Restriction Fragment Length Polymorphism Analysis of Microbial Community Structure". Appl. Environ. Microbiol. 69 (5): 2555–2562. Bibcode:2003ApEnM..69.2555E. doi:10.1128/aem.69.5.2555-2562.2003. PMC   154551 . PMID   12732521.
  4. Dunbar, J; Ticknor, LO; Kuske, CR (2001). "Phylogenetic Specificity and Reproducibility and New Method for Analysis of Terminal Restriction Fragment Profiles of 16S rRNA Genes from Bacterial Communities". Appl. Environ. Microbiol. 67 (1): 190–197. Bibcode:2001ApEnM..67..190D. doi:10.1128/aem.67.1.190-197.2001. PMC   92545 . PMID   11133445.
  5. Abdo, Zaid; et al. (2006). "Statistical Methods for Characterizing Diversity of Microbial Communities by Analysis of Terminal Restriction Fragment Length Polymorphisms of 16S rRNA Genes". Environmental Microbiology. 8 (5): 929–938. doi:10.1111/j.1462-2920.2005.00959.x. PMID   16623749.
  6. Brooks, J. P.; Edwards, David J.; Harwich, Michael D.; Rivera, Maria C.; Fettweis, Jennifer M.; Serrano, Myrna G.; Reris, Robert A.; Sheth, Nihar U.; Huang, Bernice (2015-03-21). "The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies". BMC Microbiology. 15 (1): 66. doi:10.1186/s12866-015-0351-6. ISSN   1471-2180. PMC   4433096 . PMID   25880246.
  7. Sharifian, Hoda (May 2010). "Errors induced during PCR amplification" (PDF).