DNA footprinting

Last updated

DNA footprinting is a method of investigating the sequence specificity of DNA-binding proteins in vitro. This technique can be used to study protein-DNA interactions both outside and within cells.

Contents

The regulation of transcription has been studied extensively, and yet there is still much that is unknown. Transcription factors and associated proteins that bind promoters, enhancers, or silencers to drive or repress transcription are fundamental to understanding the unique regulation of individual genes within the genome. Techniques like DNA footprinting help elucidate which proteins bind to these associated regions of DNA and unravel the complexities of transcriptional control.

History

In 1978, David J. Galas and Albert Schmitz developed the DNA footprinting technique to study the binding specificity of the lac repressor protein. It was originally a modification of the Maxam-Gilbert chemical sequencing technique. [1]

Method

The simplest application of this technique is to assess whether a given protein binds to a region of interest within a DNA molecule. [2] Polymerase chain reaction (PCR) amplify and label region of interest that contains a potential protein-binding site, ideally amplicon is between 50 and 200 base pairs in length. Add protein of interest to a portion of the labeled template DNA; a portion should remain separate without protein, for later comparison. Add a cleavage agent to both portions of DNA template. The cleavage agent is a chemical or enzyme that will cut at random locations in a sequence independent manner. The reaction should occur just long enough to cut each DNA molecule in only one location. A protein that specifically binds a region within the DNA template will protect the DNA it is bound to from the cleavage agent. Run both samples side by side on a polyacrylamide gel electrophoresis. The portion of DNA template without protein will be cut at random locations, and thus when it is run on a gel, will produce a ladder-like distribution. The DNA template with the protein will result in ladder distribution with a break in it, the "footprint", where the DNA has been protected from the cleavage agent. Note: Maxam-Gilbert chemical DNA sequencing can be run alongside the samples on the polyacrylamide gel to allow the prediction of the exact location of ligand binding site.

Labeling

The DNA template labeled at the 3' or 5' end, depending on the location of the binding site(s). Labels that can be used are: radioactivity and fluorescence. Radioactivity has been traditionally used to label DNA fragments for footprinting analysis, as the method was originally developed from the Maxam-Gilbert chemical sequencing technique. Radioactive labeling is very sensitive and is optimal for visualizing small amounts of DNA. Fluorescence is a desirable advancement due to the hazards of using radio-chemicals. However, it has been more difficult to optimize because it is not always sensitive enough to detect the low concentrations of the target DNA strands used in DNA footprinting experiments. Electrophoretic sequencing gels or capillary electrophoresis have been successful in analyzing footprinting of fluorescent tagged fragments. [2]

Cleavage agent

A variety of cleavage agents can be chosen. a desirable agent is one that is sequence neutral, easy to use, and is easy to control. Unfortunately no available agents meet all of these standards, so an appropriate agent can be chosen, depending on your DNA sequence and ligand of interest. The following cleavage agents are described in detail: DNase I is a large protein that functions as a double-strand endonuclease. It binds the minor groove of DNA and cleaves the phosphodiester backbone. It is a good cleavage agent for footprinting because its size makes it easily physically hindered. Thus is more likely to have its action blocked by a bound protein on a DNA sequence. In addition, the DNase I enzyme is easily controlled by adding EDTA to stop the reaction. There are however some limitations in using DNase I. The enzyme does not cut DNA randomly; its activity is affected by local DNA structure and sequence and therefore results in an uneven ladder. This can limit the precision of predicting a protein’s binding site on the DNA molecule. [2] [3] Hydroxyl radicals are created from the Fenton reaction, which involves reducing Fe2+ with H2O2 to form free hydroxyl molecules. These hydroxyl molecules react with the DNA backbone, resulting in a break. Due to their small size, the resulting DNA footprint has high resolution. Unlike DNase I they have no sequence dependence and result in a much more evenly distributed ladder. The negative aspect of using hydroxyl radicals is that they are more time consuming to use, due to a slower reaction and digestion time. [4] Ultraviolet irradiation can be used to excite nucleic acids and create photoreactions, which results in damaged bases in the DNA strand. [5] Photoreactions can include: single strand breaks, interactions between or within DNA strands, reactions with solvents, or crosslinks with proteins. The workflow for this method has an additional step, once both your protected and unprotected DNA have been treated, there is subsequent primer extension of the cleaved products. [6] [7] The extension will terminate upon reaching a damaged base, and thus when the PCR products are run side-by-side on a gel; the protected sample will show an additional band where the DNA was crosslinked with a bound protein. Advantages of using UV are that it reacts very quickly and can therefore capture interactions that are only momentary. Additionally it can be applied to in vivo experiments, because UV can penetrate cell membranes. A disadvantage is that the gel can be difficult to interpret, as the bound protein does not protect the DNA, it merely alters the photoreactions in the vicinity. [8]

Advanced applications

In vivo footprinting

In vivo footprinting is a technique used to analyze the protein-DNA interactions that are occurring in a cell at a given time point. [9] [10] DNase I can be used as a cleavage agent if the cellular membrane has been permeabilized. However the most common cleavage agent used is UV irradiation because it penetrates the cell membrane without disrupting cell state and can thus capture interactions that are sensitive to cellular changes. Once the DNA has been cleaved or damaged by UV, the cells can be lysed and DNA purified for analysis of a region of interest. Ligation-mediated PCR is an alternative method to footprint in vivo. Once a cleavage agent has been used on the genomic DNA, resulting in single strand breaks, and the DNA is isolated, a linker is added onto the break points. A region of interest is amplified between the linker and a gene-specific primer, and when run on a polyacrylamide gel, will have a footprint where a protein was bound. [11] In vivo footprinting combined with immunoprecipitation can be used to assess protein specificity at many locations throughout the genome. The DNA bound to a protein of interest can be immunoprecipitated with an antibody to that protein, and then specific region binding can be assessed using the DNA footprinting technique. [12]

Quantitative footprinting

The DNA footprinting technique can be modified to assess the binding strength of a protein to a region of DNA. Using varying concentrations of the protein for the footprinting experiment, the appearance of the footprint can be observed as the concentrations increase and the proteins binding affinity can then be estimated. [2]

Detection by capillary electrophoresis

To adapt the footprinting technique to updated detection methods, the labelled DNA fragments are detected by a capillary electrophoresis device instead of being run on a polyacrylamide gel. If the DNA fragment to be analyzed is produced by polymerase chain reaction (PCR), it is straightforward to couple a fluorescent molecule such as carboxyfluorescein (FAM) to the primers. This way, the fragments produced by DNaseI digestion will contain FAM, and will be detectable by the capillary electrophoresis machine. Typically, carboxytetramethyl-rhodamine (ROX)-labelled size standards are also added to the mixture of fragments to be analyzed. Binding sites of transcription factors have been successfully identified this way. [13]

Genome-wide assays

Next-generation sequencing has enabled a genome-wide approach to identify DNA footprints. Open chromatin assays such as DNase-Seq [14] and FAIRE-Seq [15] have proven to provide a robust regulatory landscape for many cell types. [16] However, these assays require some downstream bioinformatics analyses in order to provide genome-wide DNA footprints. The computational tools proposed can be categorized in two classes: segmentation-based and site-centric approaches.

Segmentation-based methods are based on the application of Hidden Markov models or sliding window methods to segment the genome into open/closed chromatin region. Examples of such methods are: HINT, [17] Boyle method [18] and Neph method. [19] Site-centric methods, on the other hand, find footprints given the open chromatin profile around motif-predicted binding sites, i.e., regulatory regions predicted using DNA-protein sequence information (encoded in structures such as position weight matrix). Examples of these methods are CENTIPEDE [20] and Cuellar-Partida method. [21]

See also

Related Research Articles

A DNase footprinting assay is a DNA footprinting technique from molecular biology/biochemistry that detects DNA-protein interaction using the fact that a protein bound to DNA will often protect that DNA from enzymatic cleavage. This makes it possible to locate a protein binding site on a particular DNA molecule. The method uses an enzyme, deoxyribonuclease, to cut the radioactively end-labeled DNA, followed by gel electrophoresis to detect the resulting cleavage pattern.

<span class="mw-page-title-main">DNA-binding protein</span> Proteins that bind with DNA, such as transcription factors, polymerases, nucleases and histones

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others.

<span class="mw-page-title-main">Electrophoretic mobility shift assay</span>

An electrophoretic mobility shift assay (EMSA) or mobility shift electrophoresis, also referred as a gel shift assay, gel mobility shift assay, band shift assay, or gel retardation assay, is a common affinity electrophoresis technique used to study protein–DNA or protein–RNA interactions. This procedure can determine if a protein or mixture of proteins is capable of binding to a given DNA or RNA sequence, and can sometimes indicate if more than one protein molecule is involved in the binding complex. Gel shift assays are often performed in vitro concurrently with DNase footprinting, primer extension, and promoter-probe experiments when studying transcription initiation, DNA gang replication, DNA repair or RNA processing and maturation, as well as pre-mRNA splicing. Although precursors can be found in earlier literature, most current assays are based on methods described by Garner and Revzin and Fried and Crothers.

<span class="mw-page-title-main">ChIP-on-chip</span> Molecular biology method

ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

DNA adenine methyltransferase identification, often abbreviated DamID, is a molecular biology protocol used to map the binding sites of DNA- and chromatin-binding proteins in eukaryotes. DamID identifies binding sites by expressing the proposed DNA-binding protein as a fusion protein with DNA methyltransferase. Binding of the protein of interest to DNA localizes the methyltransferase in the region of the binding site. Adenine methylation does not occur naturally in eukaryotes and therefore adenine methylation in any region can be concluded to have been caused by the fusion protein, implying the region is located near a binding site. DamID is an alternate method to ChIP-on-chip or ChIP-seq.

<span class="mw-page-title-main">Tiling array</span>

Tiling arrays are a subtype of microarray chips. Like traditional microarrays, they function by hybridizing labeled DNA or RNA target molecules to probes fixed onto a solid surface.

Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.

Experimental approaches of determining the structure of nucleic acids, such as RNA and DNA, can be largely classified into biophysical and biochemical methods. Biophysical methods use the fundamental physical properties of molecules for structure determination, including X-ray crystallography, NMR and cryo-EM. Biochemical methods exploit the chemical properties of nucleic acids using specific reagents and conditions to assay the structure of nucleic acids. Such methods may involve chemical probing with specific reagents, or rely on native or analogue chemistry. Different experimental approaches have unique merits and are suitable for different experimental purposes.

<span class="mw-page-title-main">DNA binding site</span> Regions of DNA capable of binding to biomolecules

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

<span class="mw-page-title-main">Chromatin immunoprecipitation</span> Genomic technique

Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly define cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers. ChIP is crucial for the advancements in the field of epigenomics and learning more about epigenetic phenomena.

FAIRE-Seq is a method in molecular biology used for determining the sequences of DNA regions in the genome associated with regulatory activity. The technique was developed in the laboratory of Jason D. Lieb at the University of North Carolina, Chapel Hill. In contrast to DNase-Seq, the FAIRE-Seq protocol doesn't require the permeabilization of cells or isolation of nuclei, and can analyse any cell type. In a study of seven diverse human cell types, DNase-seq and FAIRE-seq produced strong cross-validation, with each cell type having 1-2% of the human genome as open chromatin.

DNase-seq is a method in molecular biology used to identify the location of regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I. FAIRE-Seq is a successor of DNase-seq for the genome-wide identification of accessible DNA regions in the genome. Both the protocols for identifying open chromatin regions have biases depending on underlying nucleosome structure. For example, FAIRE-seq provides higher tag counts at non-promoter regions. On the other hand, DNase-seq signal is higher at promoter regions, and DNase-seq has been shown to have better sensitivity than FAIRE-seq even at non-promoter regions.

<span class="mw-page-title-main">ChIP-exo</span>

ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites from hundreds of base pairs to almost one base pair. It employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5'-3' direction to within a small number of nucleotides of the protein binding site. The nucleotides of the exonuclease-treated ends are determined using some combination of DNA sequencing, microarrays, and PCR. These sequences are then mapped to the genome to identify the locations on the genome at which the protein binds.

<span class="mw-page-title-main">DNase I hypersensitive site</span>

In genetics, DNase I hypersensitive sites (DHSs) are regions of chromatin that are sensitive to cleavage by the DNase I enzyme. In these specific regions of the genome, chromatin has lost its condensed structure, exposing the DNA and making it accessible. This raises the availability of DNA to degradation by enzymes, such as DNase I. These accessible chromatin zones are functionally related to transcriptional activity, since this remodeled state is necessary for the binding of proteins such as transcription factors.

ATAC-seq is a technique used in molecular biology to assess genome-wide chromatin accessibility. In 2013, the technique was first described as an alternative advanced method for MNase-seq, FAIRE-Seq and DNase-Seq. ATAC-seq is a faster and more sensitive analysis of the epigenome than DNase-seq or MNase-seq.

CUT&RUN sequencing, also known as cleavage under targets and release using nuclease, is a method used to analyze protein interactions with DNA. CUT&RUN sequencing combines antibody-targeted controlled cleavage by micrococcal nuclease with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN sequencing does not.

CUT&Tag-sequencing, also known as cleavage under targets and tagmentation, is a method used to analyze protein interactions with DNA. CUT&Tag-sequencing combines antibody-targeted controlled cleavage by a protein A-Tn5 fusion with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN and CUT&Tag sequencing do not. CUT&Tag sequencing is an improvement over CUT&RUN because it does not require cells to be lysed or chromatin to be fractionated. CUT&RUN is not suitable for single-cell platforms so CUT&Tag is advantageous for these.

<span class="mw-page-title-main">MNase-seq</span> Sk kasid Youtuber

MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.

<span class="mw-page-title-main">NOMe-seq</span> NOMe-seq is a nucleosome occupancy and methylome technique.

Nucleosome Occupancy and Methylome Sequencing (NOMe-seq) is a genomics technique used to simultaneously detect nucleosome positioning and DNA methylation... This method is an extension of bisulfite sequencing, which is the gold standard for determining DNA methylation. NOMe-seq relies on the methyltransferase M.CviPl, which methylates cytosines in GpC dinucleotides unbound by nucleosomes or other proteins, creating a nucleosome footprint. The mammalian genome naturally contains DNA methylation, but only at CpG sites, so GpC methylation can be differentiated from genomic methylation after bisulfite sequencing. This allows simultaneous analysis of the nucleosome footprint and endogenous methylation on the same DNA molecules. In addition to nucleosome foot-printing, NOMe-seq can determine locations bound by transcription factors. Nucleosomes are bound by 147 base pairs of DNA whereas transcription factors or other proteins will only bind a region of approximately 10-80 base pairs. Following treatment with M.CviPl, nucleosome and transcription factor sites can be differentiated based on the size of the unmethylated GpC region.

References

  1. Galas, D; Schmitz, A (1978). "DNAse footprinting: a simple method for the detection of protein-DNA binding specificity". Nucleic Acids Research. 5 (9): 3157–70. doi:10.1093/nar/5.9.3157. PMC   342238 . PMID   212715.
  2. 1 2 3 4 Hampshire, A; Rusling, D; Broughton-Head, V; Fox, K (2007). "Footprinting: A method for determining the sequence selectivity, affinity and kinetics of DNA-binding ligands". Methods. 42 (2): 128–140. doi:10.1016/j.ymeth.2007.01.002. PMID   17472895.
  3. LeBlanc B and Moss T. (2001) DNase I Footprinting. Methods in Molecular Biology. 148: 31–8.
  4. Zaychikov E, Schickor P, Denissova L, and Heumann H. (2001) Hydroxyl radical footprinting. Methods in Molecular Biology. 148: 49–61.
  5. Becker, M.M.; Wang, J.C. (1984). "Use of Light for Footprinting DNA in vivo". Nature. 309 (5970): 682–687. Bibcode:1984Natur.309..682B. doi:10.1038/309682a0. PMID   6728031. S2CID   31638231.
  6. Axelrod, J.D.; Majors, J (1989). "An Improved Method for Photofootprinting Yeast Genes In Vivo Using Taq Polymerase". Nucleic Acids Res. 17 (1): 171–183. doi:10.1093/NAR/17.1.171. PMC   331543 . PMID   2643080.
  7. Becker, M.M.; Wang, Z.; Grossmann, G.; Becherer, K.A. (1989). "Genomic Footprinting in Mammalian Cells With Ultraviolet Light". PNAS. 86 (14): 5315–5319. Bibcode:1989PNAS...86.5315B. doi: 10.1073/PNAS.86.14.5315 . PMC   297612 . PMID   2748587.
  8. Geiselmann J and Boccard F. (2001) Ultraviolet-laser footprinting. Methods in Molecular Biology. 148:161-73.
  9. Becker, M.M.; Wang, J.C. (1984). "Use of Light for Footprinting DNA in vivo". Nature. 309 (5970): 682–687. Bibcode:1984Natur.309..682B. doi:10.1038/309682a0. PMID   6728031. S2CID   31638231.
  10. Ephrussi, A.; Church, G.M.; Tonegawa, S.; Gilbert, W. (1985). "B Lineage-Specific Interactions of an Immunoglobulin Enhancer With Cellular Factors In Vivo". Science. 227 (4683): 134–140. Bibcode:1985Sci...227..134E. doi:10.1126/science.3917574. PMID   3917574.
  11. Dai S, Chen H, Chang C, Riggs A, Flanagan S. (2000) Ligation-mediated PCR for quantitative in vivo footprinting. Nature Biotechnology. 18:1108–1111.
  12. Zaret, K (1997). "Editorial". Methods. 11 (2): 149–150. doi:10.1006/meth.1996.0400. PMID   8993026.
  13. Kovacs, Krisztian A.; Steinmann, Myriam; Magistretti, Pierre J.; Halfon, Olivier; Cardinaux, Jean-Rene (2006). "C/EBPβ couples dopamine signalling to substance P precursor gene expression in striatal neurones". Journal of Neurochemistry. 98 (5): 1390–1399. doi:10.1111/j.1471-4159.2006.03957.x. PMID   16771829. S2CID   36225447.
  14. Song, L; Crawford, GE (Feb 2010). "DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells". Cold Spring Harbor Protocols. 2010 (2): pdb.prot5384+. doi:10.1101/pdb.prot5384. PMC   3627383 . PMID   20150147.
  15. Giresi, PG; Kim, J; McDaniell, RM; Iyer, VR; Lieb, JD (Jun 2007). "FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin". Genome Research. 17 (6): 877–85. doi:10.1101/gr.5533506. PMC   1891346 . PMID   17179217.
  16. Thurman, RE; et al. (Sep 2012). "The accessible chromatin landscape of the human genome". Nature. 489 (7414): 75–82. Bibcode:2012Natur.489...75T. doi:10.1038/nature11232. PMC   3721348 . PMID   22955617.
  17. Gusmao, EG; Dieterich, C; Zenke, M; Costa, IG (Aug 2014). "Detection of Active Transcription Factor Binding Sites with the Combination of DNase Hypersensitivity and Histone Modifications". Bioinformatics. 30 (22): 3143–51. doi: 10.1093/bioinformatics/btu519 . PMID   25086003.
  18. Boyle, AP; et al. (Mar 2011). "High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells". Genome Research. 21 (3): 456–464. doi:10.1101/gr.112656.110. PMC   3044859 . PMID   21106903.
  19. Neph, S; et al. (Sep 2012). "An expansive human regulatory lexicon encoded in transcription factor footprints". Nature. 489 (7414): 83–90. Bibcode:2012Natur.489...83N. doi:10.1038/nature11212. PMC   3736582 . PMID   22955618.
  20. Pique-Regi, R; et al. (Mar 2011). "Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data". Genome Research. 21 (3): 447–455. doi:10.1101/gr.112623.110. PMC   3044858 . PMID   21106904.
  21. Cuellar-Partida, G; et al. (Jan 2012). "Epigenetic priors for identifying active transcription factor binding sites". Bioinformatics. 28 (1): 56–62. doi:10.1093/bioinformatics/btr614. PMC   3244768 . PMID   22072382.