FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements) is a method in molecular biology used for determining the sequences of DNA regions in the genome associated with regulatory activity. [1] The technique was developed in the laboratory of Jason D. Lieb at the University of North Carolina, Chapel Hill. In contrast to DNase-Seq, the FAIRE-Seq protocol doesn't require the permeabilization of cells or isolation of nuclei, and can analyse any cell type. In a study of seven diverse human cell types, DNase-seq and FAIRE-seq produced strong cross-validation, with each cell type having 1-2% of the human genome as open chromatin.
The protocol is based on the fact that the formaldehyde cross-linking is more efficient in nucleosome-bound DNA than it is in nucleosome-depleted regions of the genome. This method then segregates the non cross-linked DNA that is usually found in open chromatin, which is then sequenced. The protocol consists of cross linking, phenol extraction and sequencing the DNA in aqueous phase.
FAIRE uses the biochemical properties of protein-bound DNA to separate nucleosome-depleted regions in the genome. Cells will be subjected to cross-linking, ensuring that the interaction between the nucleosomes and DNA are fixed. After sonication, the fragmented and fixed DNA is separated using a phenol-chloroform extraction. This method creates two phases, an organic and an aqueous phase. Due to their biochemical properties, the DNA fragments cross-linked to nucleosomes will preferentially sit in the organic phase. Nucleosome depleted or ‘open’ regions on the other hand will be found in the aqueous phase. By specifically extracting the aqueous phase, only nucleosome-depleted regions will be purified and enriched. [1]
FAIRE-extracted DNA fragments can be analyzed in a high-throughput way using next-generation sequencing techniques. In general, libraries are made by ligating specific adapters to the DNA fragments that allow them to cluster on a platform and be amplified resulting in the DNA sequences being read/determined, and this in parallel for millions of the DNA fragments.
Depending on the size of the genome FAIRE-seq is performed on, a minimum of reads is required to create an appropriate coverage of the data, ensuring a proper signal can be determined. [2] [3] In addition, a reference or input genome, which has not been cross-linked, is often sequenced alongside to determine the level of background noise.
Note that the extracted FAIRE-fragments can be quantified in an alternative method by using quantitative PCR. However, this method does not allow a genome wide / high-throughput quantification of the extracted fragments.
There are several aspects of FAIRE-seq that require attention when analysing and interpreting the data. For one, it has been stated that FAIRE-seq will have a higher coverage at enhancer regions over promoter regions. [4] This is in contrast to the alternative method of DNase-seq who is known to show a higher sensitivity towards promoter regions. In addition, FAIRE-seq has been stated to show prefers for internal introns and exons. [5] In general it is also believed that FAIRE-seq data displays a higher background level, making it a less sensitive method. [6]
In a first step FAIRE-seq data are mapped to the reference genome of the model organism used.
Next, the identification of genomic regions with open chromatin, is done by using a peak calling algorithm. Different tools offer packages to do this (e.g. ChIPOTle [7] ZINBA [8] and MACS2 [9] ). ChIPOTle uses a sliding window of 300bp to identify statistically significant signals. In contrast, MACS2 identifies the enriched signal by combining the parameter callpeak with other options like 'broad', 'broad cutoff', 'no model' or 'shift'. ZINBA is a generic algorithm for detection of enrichment in short read dataset. [10] It thus helps in the accurate detection of signal in complex datasets having low signal-to noise ratio.
BedTools [11] is used to merge the enriched regions residing close to each other to form COREs (Cluster of open regulatory elements). This helps in the identification of chromatin accessible regions and gene regulation patterns which would have been undetectable otherwise, considering the lower resolution FAIRE-seq often brings with it.
Data is typically visualized as tracks (e.g. bigWig) and can be uploaded to the UCSC genome browser. [12]
The major limitation of this method, i.e. the low signal-to-noise ratio compared to other chromatin accessibility assays, makes the computational interpretation of these data very difficult. [13]
There are several methods that can be used as an alternative to FAIRE-seq. DNase-seq uses the ability of the DNase I enzyme to cleave free/open/accessible DNA to identify and sequence open chromatin. [14] [15] The subsequently developed ATAC-seq employs the Tn5 transposase, which inserts specified fragments or transposons into accessible regions of the genome to identify and sequence open chromatin. [16]
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
DNase-seq is a method in molecular biology used to identify the location of regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I. FAIRE-Seq is a successor of DNase-seq for the genome-wide identification of accessible DNA regions in the genome. Both the protocols for identifying open chromatin regions have biases depending on underlying nucleosome structure. For example, FAIRE-seq provides higher tag counts at non-promoter regions. On the other hand, DNase-seq signal is higher at promoter regions, and DNase-seq has been shown to have better sensitivity than FAIRE-seq even at non-promoter regions.
ATAC-seq is a technique used in molecular biology to assess genome-wide chromatin accessibility. In 2013, the technique was first described as an alternative advanced method for MNase-seq, FAIRE-Seq and DNase-Seq. ATAC-seq is a faster analysis of the epigenome than DNase-seq or MNase-seq.
H3K27ac is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates acetylation of the lysine residue at N-terminal position 27 of the histone H3 protein.
H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin.
H3K79me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 79th lysine residue of the histone H3 protein. H3K79me2 is detected in the transcribed regions of active genes.
H4K91ac is an epigenetic modification to the DNA packaging protein histone H4. It is a mark that indicates the acetylation at the 91st lysine residue of the histone H4 protein. No known diseases are attributed to this mark but it might be implicated in melanoma.
H3K23ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 23rd lysine residue of the histone H3 protein.
H3K14ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 14th lysine residue of the histone H3 protein.
H3K36ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 36th lysine residue of the histone H3 protein.
H3K56ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 56th lysine residue of the histone H3 protein.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.
H3R42me is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the mono-methylation at the 42nd arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R17me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 17th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R26me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 26th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R8me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 8th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R2me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 2nd arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H4R3me2 is an epigenetic modification to the DNA packaging protein histone H4. It is a mark that indicates the di-methylation at the 3rd arginine residue of the histone H4 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
Nucleosome Occupancy and Methylome Sequencing (NOMe-seq) is a genomics technique used to simultaneously detect nucleosome positioning and DNA methylation... This method is an extension of bisulfite sequencing, which is the gold standard for determining DNA methylation. NOMe-seq relies on the methyltransferase M.CviPl, which methylates cytosines in GpC dinucleotides unbound by nucleosomes or other proteins, creating a nucleosome footprint. The mammalian genome naturally contains DNA methylation, but only at CpG sites, so GpC methylation can be differentiated from genomic methylation after bisulfite sequencing. This allows simultaneous analysis of the nucleosome footprint and endogenous methylation on the same DNA molecules. In addition to nucleosome foot-printing, NOMe-seq can determine locations bound by transcription factors. Nucleosomes are bound by 147 base pairs of DNA whereas transcription factors or other proteins will only bind a region of approximately 10-80 base pairs. Following treatment with M.CviPl, nucleosome and transcription factor sites can be differentiated based on the size of the unmethylated GpC region.