DNA footprinting is a method of in vitro DNA analysis that assists researchers in determining transcription factor (TF) associated binding proteins. [1] This technique can be used to study protein-DNA interactions both outside and within cells.
Transcription factors are regulatory proteins that assist with various levels of DNA regulation. [2] These regulatory molecules and associated proteins bind promoters, enhancers, or silencers to drive or repress transcription and are fundamental to understanding the unique regulation of individual genes within the genome. [3]
First developed in 1978, primary investigators David J. Galas, Ph.D. and Albert Schmitz, Ph.D. modified the pre-existing Maxam-Gilbert chemical sequencing technique to bind specifically to the lac repressor protein. [4] Since the technique's discovery, scientific researchers have developed this technique to map chromatin and have greatly reduced technical requirements to perform the footprinting method. [5] [6]
The most common method of DNA footprinting is DNase-sequencing. [7] DNase-sequencing uses DNase I endonuclease to cleave DNA for analysis. The process of DNA footprinting begins with polymerase chain reaction (PCR) to increase the amount of DNA present. This is to ensure the sample contains sufficient amount of DNA for analysis. Once added, proteins of interest will bind to DNA at their respective binding sites. This is then followed by cleavage with an enzyme like DNase I that will cleave unbound regions of DNA and keep protein-bound DNA intact. The resulting DNA fragments will be separated using Polyacrylamide gel electrophoresis. Polyacrylamide gel electrophoresis allows researchers to determine fragment sizes of the protein-bound DNA fragments that have since been cleaved. This is indicated by the gap regions on the gel, areas where there are no bands, representing specific DNA-protein interactions. [8]
In January 1978, David J. Galas, Ph.D. and Albert Schmitz, Ph.D. developed the DNA footprinting technique to study the binding specificity of the lac repressor protein. [4] Galas, the primary investigator of the DNA footprinting project, earned his Ph.D. in physics from University of California, Davis. [9] He later went on to lead the Human Genome Project from 1990 to 1993 while he held a position as Director for Health and Environmental Research at the U.S. Department of Energy Office of Science. [9]
DNA footprinting was originally a modification of the Maxam-Gilbert chemical sequencing technique, now allowing for binding of the lac repressor protein. [4] The method was submitted and published without revision in Nucleic Acids Research . After the submission of their work, Galas and Schmitz’s method was cited in a 1980 article by David R. Engelke, Ph.D. and colleagues describing eukaryotic proteins and their binding sites. [10] The DNA footprinting technique was further refined by Thomas D. Tullius, Ph.D. and colleagues in August 1986, publishing a paper that used more accurate DNA cleavage mechanisms to boost the scientific rigour of their research and future research. [5]
In January 2008, Alan P. Boyle, Ph.D. and colleagues developed a genome-wide DNA footprinting method, which involved running pre-digested nuclei through multiple rounds of digestion and repair to produce DNase-seq, an enzyme analogous to the DNase I used by Galas and Schmitz, to map genomic open chromatin. [6] In recent years, many laboratories and researchers have developed computational methods to statistically analyze deeply sequenced DNase-seq information that originally required an extensive background in bioinformatics to understand and sequence. [11] [12] [13] [14] [15] [16] [17] [18]
The most common method of DNA footprinting is DNase I-sequencing. [7] This technique uses a DNase I endonuclease enzyme to cleave the DNA and assess whether a specific protein binds to a target region within the DNA. [19] DNase I preferentially cuts at accessible sites not bound by proteins. DNA footprinting systematically identifies transcription factors (TFs) in DNA by analyzing the location of DNase cleavage sites. [7] The DNase-seq method of footprinting involves 4 steps: polymerase chain reaction (PCR) of the DNA, incubation of DNA with a protein, DNA cleavage, and DNA analysis through polyacrylamide gel electrophoresis (PAGE).
Polymerase chain reaction (PCR) is the first step in DNase-seq DNA footprinting. The purpose of PCR is to amplify DNA fragments to ensure there is sufficient material before downstream analysis. [20] The ideal amplification length is between 200-400 base pairs. [21] The labelled template DNA is then divided into two samples. One sample will be incubated with the protein of interest and the other sample will remain as a control, where DNA is incubated in solitary. Dividing the DNA sample into two separate conditions allows researchers to assess whether a DNA-protein interaction when undergoing polyacrylamide gel electrophoresis. If the protein binds to DNA, the DNA sample incubated with the protein of interest may display regions protected from DNase I cleavage due to protein binding. The control sample, which lacks protein binding, will undergo random cleavage that creates a distinct fragment pattern observed in PAGE. [22]
The PCR process consists of three steps: denaturing of DNA, annealing of DNA and elongation of DNA. [23] The first stage requires high temperatures between 94-98 °C to break apart double-stranded DNA into single-stranded DNA.[ citation needed ] The DNA mixture is then cooled to roughly 45 °C, to allow for primers to bind to the two DNA single strands. Finally, the DNA is left to elongate at 76 °C. DNA Polymerase will be largely responsible for DNA elongation. DNA Polymerase is an enzyme that will build a DNA strand complementary to the template strand.To allow for the maximum amount of DNA, amplification will continue for 15-18 rounds. This will increase the amount of DNA by approximately 10,000 times.[ citation needed ] Once the DNA is amplified, it can be labelled with either a fluorescent tag protein or a radioactive phosphorus.
The DNA template is labelled at the 3' or 5' end, depending on the location of the binding site(s). Two labels can be used for footprinting: radioactivity and fluorescence. [19]
Radioactivity has been traditionally used to label DNA fragments for footprinting analysis. This process was originally developed in 1977 by Maxam and Gilbert when proposing their chemical sequencing technique. Radioactive labelling is very sensitive and is optimal for visualizing small amounts of DNA. During the radioactive labelling process, DNA is treated with a kinase enzyme that adds a radioactive phosphate group (³²P) to the backbone of the 3' or 5' end of the DNA. Radioactivity is a specific, sensitive and durable treatment that allows for analysis of small DNA targets with high precision. [24]
Fluorescence is a widely used method of DNA labelling. This method is considered to be safer due to the lack of radio-chemicals. DNA fluorescent labelling is specific, versatile, and can be used to label live cells. [25] There are 2 ways to fluorescently label DNA: chemical synthesis and enzymatic synthesis. In chemical synthesis, a fluorescent dye is attached to nucleotides, which are then added directly to the growing DNA strand. Enzymatic synthesis involves the use of fluorescent nucleoside triphosphates, which are used instead of standard nucleotides. [25] [26]
Both fluorescence and radioactivity are beneficial for labelling small or fragmented sections of DNA, allowing for more specific footprinting.
There are a variety of cleavage agents used in genomic footprinting. A desirable cleavage agent is sequence-neutral, easy to use, and easy to control. No current cleavage agent meets all of the criteria for an ideal agent, but many enzymatic and chemical agents have been used successfully. There are three main cleavage agents employed in DNA footprinting: DNase I endonuclease, hydroxyl radicals and ultraviolet irradiation. [19]
DNase I is a large enzyme that functions as a double-strand endonuclease. It binds to the minor groove of DNA and cleaves the phosphodiester backbone. [7] DNase I is considered a good cleavage agent because it is large and more likely to be blocked from cleaving the DNA strand at regions bound by a protein of interest. [27] When DNase I activity is blocked there is a "footprint" or an area with little to no DNA cleavage due to binding proteins. [7] DNase activity is dependent on two conditions: affinity between the ligand and protein, and the equilibrium between DNase and DNA. [20] In addition, the DNase I enzyme is easily controlled by adding Ethylenediaminetetraacetic acid (EDTA) to stop the reaction. DNase I also has a number of limitations. The enzyme does not cut DNA randomly; its activity is affected by DNA structure and sequence which results in an uneven ladder. This can limit the precision of predicting a protein's binding site on the DNA molecule. [19] [28]
The use of hydroxyl radicals as a method of DNA footprinting was first created from the Fenton reaction. [29] In this method of creating radicals, Fe2+ is reduced with hydrogen peroxide (H2O2) to form free hydroxyl molecules. These hydroxyl molecules then react with the DNA backbone, which results in a break in DNA. [30] Similar to DNase I, hydroxyl radicals have the ability to suggest protein-DNA interactions; radicals are inhibited at specific sites contacted by bound proteins. [30] Due to their small size, the resulting DNA footprint has high resolution. Unlike DNase I, hydroxyl radicals have no sequence preference and result in an evenly distributed ladder. However, hydroxyl radicals are also time-consuming due to longer reaction and digestion times. [31]
Ultraviolet (UV) irradiation can induce photoreactions in nucleic acids, leading to DNA damage such as single-strand breaks, crosslinks between DNA strands or with proteins, and interactions with solvents. [32] [33] UV light causes the formation of cyclobutane pyrimidine dimers (CPDs) and covalent links between bases. UV irradiation is also limited by protein interactions with DNA, altering the pattern of damage and informing footprinting analysis. [34] Once both protected and unprotected DNA have been treated, a primer extension of the cleaved products must occur. [35] [36] The primer extension is temporary and will terminate upon reaching a damaged base. During analysis, the protected sample will show an additional band where the DNA was crosslinked with a bound protein. This method reacts quickly and can capture interactions that are only momentary. [34] Additionally, UV light can penetrate live cell membranes and can be applied to in vivo experiments. In this method, the bound protein does not protect the DNA, it alters the photoreactions in the vicinity which leads to difficulties in interpretation. [37]
Gel electrophoresis is a laboratory technique used to separate nucleic acids or proteins based on their size and charge. This method involves applying an electric field to a gel matrix, typically made of agarose or polyacrylamide, through which molecules migrate at varying speeds. Smaller molecules move faster through the gel, while larger molecules migrate more slowly. [8] [38] The resulting separation pattern can be visualized using staining methods or by detecting labelled molecules. Polyacrylamide gels are favoured due to their high resolution in separating small DNA fragments, making them ideal for analyzing complex mixtures and studying DNA-protein interactions. [39]
This method is used in DNA footprinting to identify DNA-protein binding sites by separating DNA fragments that result from nuclease digestion or chemical cleavage. After DNA cleavage by a specific cleavage agent, the mixture of protected and cleaved fragments is then separated by polyacrylamide gel electrophoresis (PAGE). [38] The separation of DNA allows researchers to visualize the "footprint" as a region lacking cleaved fragments. This approach is particularly useful when studying protein-DNA interactions in native conditions, as it preserves the integrity of complex formations. Additionally, combining electrophoretic mobility shift assays (EMSA) with footprinting enhances the specificity and resolution of detecting less stable protein-DNA complexes. [40] [41]
In vivo footprinting is a technique used to analyze the protein-DNA interaction occurring within a cell at a given time point. [42] This method helps identify regions of DNA occupied by proteins, revealing insights into in vivo gene regulation with the cell.
Cleavage agents are used to degrade the unbound DNA while preserving protein-bound DNA. DNase I is commonly used as a cleavage agent when the cellular membrane has been permeabilized, making it more porous to allow better penetration of external substances. [43] However, the most common cleavage agent used is UV-irradiation, because it penetrates the cell membrane without disrupting the cell and can thus capture interactions that are sensitive to cellular changes. However, this comes with the drawback that DNase I provides higher specificity and accuracy. DNase I is capable of cleaving unprotected DNA regions, leaving a footprint where proteins are bound. This allows for precise identification of protein-DNA interactions, while UV radiation induces widespread damage, such that, it can be difficult to find the exact binding sites. [44]
Once the DNA has been cleaved or damaged by UV, the cells can be lysed and DNA purified for analysis of a region of interest. Ligation-mediated PCR is an alternative method to footprint in vivo . Following DNA cleavage and isolation, linker proteins are attached at the breakpoints. A region of interest is amplified between the linker and a gene-specific primer, and when run on a polyacrylamide gel, will have a footprint, a gap, where a protein was bound. [45]
In vivo footprinting can be combined with immunoprecipitation to assess protein specificity at particular locations throughout the genome. [46] This assay involves either using chemical crosslinkers or UV light to cross-link DNA to its associated proteins. After digesting unbound DNA, the DNA-protein complexes will remain. The protein of interest can be then selectively immunoprecipitated when detected by a complimentary antibody. Once detected, the immunoprecipitated DNA can be purified, released from crosslink and analyzed using DNA footprinting techniques like PCR or sequencing the region of interest. [47] The DNA bound to a protein of interest can be immunoprecipitated with an antibody to that protein, and then specific region binding can be assessed using the DNA footprinting technique.
The DNA footprinting technique can be modified to assess the binding strength of a protein to specific regions of DNA. [48] By using a range of protein concentrations in the footprinting experiment, the intensity of banding and the footprint can be tracked. This concentration-dependent technique investigates the affinity between DNA and the protein of interest by testing it under a range of protein concentrations. After DNase I treatment, the resulting fragments will be visualized on a PAGE gel and computationally analyzed. The intensity of banding and presence of the footprint will reflect the binding affinity between the protein of interest and a specific region of DNA. It is expected that with lower protein concentration, the gaps that signify the protein-DNA interactions will disappear, because fewer proteins are bound to the DNA, leading to more random cleavage. [49] [50]
To adapt the footprinting technique to updated detection methods, the labelled DNA fragments are detected by a capillary electrophoresis device instead of being run on a polyacrylamide gel. [51] If the DNA fragment to be analyzed is produced by polymerase chain reaction (PCR), it is straightforward to couple a fluorescent molecule such as carboxyfluorescein (FAM) to the primers. This way, the fragments produced by DNase I digestion will contain FAM, and will be detectable by the capillary electrophoresis machine. Typically, carboxytetramethyl-rhodamine (ROX)-labelled size standards are also added to the mixture of fragments to be analyzed. Binding sites of transcription factors have been successfully identified this way. [52]
Capillary electrophoresis can be used to detect length differences of DNA fragments of interest within a sample. [53] Additionally, this technique offers high resolution, allowing for the detection of even minor variations in fragment length. The automation and high-throughput capabilities of capillary electrophoresis make it a valuable tool for large-scale studies and applications where rapid and accurate results are required. Furthermore, the use of fluorescent labelling enhances sensitivity and allows for multiplexing, enabling the simultaneous analysis of multiple samples or target regions within a single run. [54]
Cell-free (Cf) DNA fragmentomics analyzes the fragmentation patterns of cfDNA. [55] DNA footprinting is applied to cfDNA to study the binding sites of DNA-binding proteins. This allows researchers to identify and analyze protein-DNA interactions non-invasively. This is specifically used for early cancer detection by assessing disease-associated fragmentation patterns. This is done by extracting DNA from a body fluid sample, undergo sequencing and analysis, where specific features like fragment size, end motifs, and fragment distribution across different genomic regions to detect disease in a non-invasive manner. [56] [57]
Next-generation sequencing has enabled a genome-wide approach to identify DNA footprints. Open chromatin assays such as DNase-Seq [58] and FAIRE-Seq [59] have proven to provide a robust regulatory landscape for many cell types. [60] However, these assays require some downstream bioinformatics analyses in order to provide genome-wide DNA footprints. The computational tools proposed can be categorized in two classes: segmentation-based and site-centric approaches.
Segmentation-based methods are based on the application of Hidden Markov models or sliding window methods to segment the genome into open/closed chromatin region. Examples of such methods are: HINT, [61] Boyle method [62] and Neph method. [63] Site-centric methods, on the other hand, find footprints given the open chromatin profile around motif-predicted binding sites, i.e., regulatory regions predicted using DNA-protein sequence information (encoded in structures such as position weight matrix). Examples of these methods are CENTIPEDE [64] and Cuellar-Partida method. [65]