Pore-C

Last updated

Pore-C is an emerging genomic technique [1] [2] [3] which utilizes chromatin conformation capture (3C) and Oxford Nanopore Technologies' (ONT) long-read sequencing to characterize three-dimensional (3D) chromatin structure. To characterize concatemers, the originators of Pore-C developed an algorithm to identify alignments that are assigned to a restriction fragment; concatemers with greater than two associated fragments are deemed high order. [2] Pore-C attempts to improve on previous 3C technologies, such as Hi-C and SPRITE, by not requiring DNA amplification prior to sequencing. [2] This technology was developed as a simpler and more easily scalable method of capturing higher-order chromatin structure and mapping regions of chromatin contact. In addition, Pore-C can be used to visualize epigenomic interactions due to the capability of ONT long-read sequencing to detect DNA methylation. Applications of this technology include analysis of combinatorial chromatin interactions, the generation of de novo chromosome scale assemblies, visualization of regions associated with multi-locus histone bodies, and detection and resolution of structural variants. [2]

Contents

Background

Although the DNA within eukaryotic cells is linear, it is also intricately folded and packaged to fit within each cell’s nucleus. [4] [5] Thus, specific parts of the genome may be closer in physical space than would otherwise appear to be based on DNA sequence alone. The 3D genome refers to how DNA is spatially organized within cells. [4] [5] The 3D structures found in the genome include active and inactive chromatin, chromatin loops, and topologically associated domains (TADs). These structures function to regulate gene expression. In genomic and epigenomic research, chromatin structure is most often visualized by 3C techniques, [5] which quantify interactions between loci to construct a 3D map. The fundamental 3C technique is used to quantify interactions between pairs of genomic loci. Methods that are derived from this technique, such as 4C, 5C, and Hi-C assays, allow quantification of pairwise interactions between multiple loci. [6] Other variations, such as ChIP-loop [6] and ChIA-PET, [7] combine 3C with immunoprecipitation assays to detect interactions mediated by a protein of interest. These techniques all involve an amplification step, most often using polymerase chain reaction (PCR). A limitation of most current 3D chromatin assays is that they are less useful to categorize interactions between more than two loci, and Pore-C was developed to fill this gap in technology. [2] Additionally, not requiring PCR amplification simplifies the workflow, therefore Pore-C is intended to be simpler and more easily scalable than previous techniques. Pore-C can also be used in populations of cells to characterize topology polymorphisms at specific loci. [2]

Methodology

Pore-C workflow Pore-C workflow.png
Pore-C workflow

Many methods to characterize the 3D genome are variations on 3C technology. [5] Like other 3C-based technologies, [5] Pore-C seeks to characterize the architecture of the 3D genome by determining which genomic loci are in close spatial proximity (within ~200 nm). [2] Similar to previous 3C-based methods, [5] Pore-C relies on crosslinking, restriction enzyme digestion, proximity ligation, reverse cross-linking, and protein degradation steps. [2] However, Pore-C is distinct from many previous methods in its subsequent utilization of ONT long-read sequencing, which facilitates the resolution of multi-way chromosome contacts and simultaneous detection of DNA methylation [2] [3]

Cross-linking DNA to protein

First, in order to preserve the 3D structure of the genome from degradation in subsequent steps, DNA is cross-linked to DNA-associated proteins, such as histones. [2] Formaldehyde is used for cross-linking, as it joins DNA to proteins with covalent bonds, thus temporarily locking the 3D genome in place. [8] Specifically, after a series of washes with phosphate-buffered saline (PBS), cells are pelletted with centrifugation, and then resuspended in a formaldehyde and PBS solution. Following a short incubation period, glycine is added to stop the cross-linking reaction. [8] [2] By quenching the excess formaldehyde, glycine prevents the reaction from going to completion, thereby maximizing the efficiency of later steps and ensuring the cross-linking reaction is reversible. [8]

Restriction enzyme digestion and proximity ligation

Cross-linking generates loops of DNA, with each loop arising from a separate locus. [5] To capture long-range interactions between distant loci, potentially from different chromosomes, these loops are first cut and then re-joined back together based on proximity. Although fragments deriving from the same loop may reanneal back together, sometimes fragments from separate loops will ligate together, thus creating chimeric sequences. [5] The cutting and rejoining of DNA is achieved by the in situ restriction enzyme digestion and proximity ligation steps respectively. Specifically, a restriction endonuclease cuts the DNA to create free ends, whereas T4 ligase is used to join fragments together. [5] Ultimately, these steps result in genomic loci close together in physical space being linked together on contiguous DNA segments referred to as concatemers. [2]

Cross-linking reversal, protein degradation, and DNA purification

Next, in order to isolate DNA for sequencing, proteins bound to the DNA have to be detached and degraded. [5] First, Proteinase K, sodium dodecyl sulfate (SDS; a detergent), Tween-20, and nuclease-free water are added. [2] Subsequently, the reaction is heated to 56 °C in a thermocycler for optimal reaction kinetics. Proteinase K degrades proteins, and SDS acts a denaturing agent that disrupts protein structure. [9] [10] This reaction results in the breakage of covalent bonds between DNA and protein and removes potential protein contamination. [5] DNA is then isolated and purified, typically using phenol-chloroform extraction followed by ethanol precipitation. [2]

Size selection, library preparation, and long-read sequencing

Pore-C concatemers undergo size selection prior to library preparation and ONT long-read sequencing. [2] Via size selection, Pore-C is able to detect high-order interactions, which are defined as concatemers containing greater than two DNA fragments. Specifically, Pore-C size selection enriches for DNA sequences greater than 1.5 kb, thereby filtering out shorter concatemers unlikely to contain greater than two fragments. [2] Many size selection methods have been developed for ONT long-read sequencing. [11] For example, Solid Phase Reversible Immobilisation (SPRI) size selection has been used in the Pore-C literature. [2] [3] Following size selection, library preparation for ONT long-read sequencing is performed, usually with a ligation sequencing kit provided by ONT. Key steps include DNA repair and adaptor ligation. [2] [3] Subsequently, DNA is loaded onto flow cells for sequencing, where each concatemer is fed through a pore, aided by a motor protein. [2] [11] Nitrogenous DNA bases are read out by their characteristic disruption of an electric current [11]

Bioinformatic analysis

Overall, bioinformatic approaches applied to Pore-C data allow for the inference of pairwise and multi-way contacts between loci. [2] Since concatemers in Pore-C contain DNA sequences that come from different regions of the genome, aligning sequencing reads to a reference genome is challenging. One solution to this problem involves a bioinformatic pipeline using a greedy piece-wise algorithm. [2] Further analysis of Pore-C results depends on the study and what other data types are available. [3]

Applications

Pore-C is a relatively new method, so its applications have not yet been fully appreciated. [2] A strength of Pore-C over previous methods is its ability to detect interactions between more than two genomic loci. Such high-order interactions enable the study of cellular processes, such as gene expression regulation at a more system-level scale. [2] [3] With statistical methods, Pore-C data can be used to identify cooperative interactions, wherein high-order interactions are observed at a frequency greater than the sum of their expected pairwise contacts. [2] In addition, using ONT long reads, Pore-C can detect DNA methylation, thereby providing an additional layer of epigenetic information to analyze. [2] In the future, Pore-C may be applied to study how the 3D genome changes during developmental processes, such as cellular differentiation. [3] Additionally, Pore-C may be applied to the study of cancer, where the 3D genome is often structurally rearranged, which can result in aberrant gene transcription via processes such as enhancer hijacking. [2]

Use

Advantages

Limitations

Related Research Articles

<span class="mw-page-title-main">Nanopore</span>

A nanopore is a pore of nanometer size. It may, for example, be created by a pore-forming protein or as a hole in synthetic materials such as silicon or graphene.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">Chromosome conformation capture</span>

Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

DNA adenine methyltransferase identification, often abbreviated DamID, is a molecular biology protocol used to map the binding sites of DNA- and chromatin-binding proteins in eukaryotes. DamID identifies binding sites by expressing the proposed DNA-binding protein as a fusion protein with DNA methyltransferase. Binding of the protein of interest to DNA localizes the methyltransferase in the region of the binding site. Adenine methylation does not occur naturally in eukaryotes and therefore adenine methylation in any region can be concluded to have been caused by the fusion protein, implying the region is located near a binding site. DamID is an alternate method to ChIP-on-chip or ChIP-seq.

Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

Chromatin Interaction Analysis by Paired-End Tag Sequencing is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags, and High-throughput sequencing to determine de novo long-range chromatin interactions genome-wide.

<span class="mw-page-title-main">Chromatin immunoprecipitation</span> Genomic technique

Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly define cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers. ChIP is crucial for the advancements in the field of epigenomics and learning more about epigenetic phenomena.

<span class="mw-page-title-main">ChIP-exo</span>

ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites from hundreds of base pairs to almost one base pair. It employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5'-3' direction to within a small number of nucleotides of the protein binding site. The nucleotides of the exonuclease-treated ends are determined using some combination of DNA sequencing, microarrays, and PCR. These sequences are then mapped to the genome to identify the locations on the genome at which the protein binds.

Third-generation sequencing is a class of DNA sequencing methods currently under active development.

<span class="mw-page-title-main">Single cell epigenomics</span> Study of epigenomics in individual cells by single cell sequencing

Single cell epigenomics is the study of epigenomics in individual cells by single cell sequencing. Since 2013, methods have been created including whole-genome single-cell bisulfite sequencing to measure DNA methylation, whole-genome ChIP-sequencing to measure histone modifications, whole-genome ATAC-seq to measure chromatin accessibility and chromosome conformation capture.

Human epigenome is the complete set of structural modifications of chromatin and chemical modifications of histones and nucleotides. These modifications affect according to cellular type and development status. Various studies show that epigenome depends on exogenous factors.

H3K79me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 79th lysine residue of the histone H3 protein. H3K79me2 is detected in the transcribed regions of active genes.

H4K20me is an epigenetic modification to the DNA packaging protein Histone H4. It is a mark that indicates the mono-methylation at the 20th lysine residue of the histone H4 protein. This mark can be di- and tri-methylated. It is critical for genome integrity including DNA damage repair, DNA replication and chromatin compaction.

H3K36me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 36th lysine residue of the histone H3 protein.

H3K36me is an epigenetic modification to the DNA packaging protein Histone H3, specifically, the mono-methylation at the 36th lysine residue of the histone H3 protein.

H3R8me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 8th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

<span class="mw-page-title-main">Hi-C (genomic analysis technique)</span> Genomic analysis technique

Hi-C is a high-throughput genomic and epigenomic technique first described in 2009 by Lieberman-Aiden et al. to capture chromatin conformation. In general, Hi-C is considered as a derivative of a series of chromosome conformation capture technologies, including but not limited to 3C, 4C, and 5C. Hi-C comprehensively detects genome-wide chromatin interactions in the cell nucleus by combining 3C and next-generation sequencing (NGS) approaches and has been considered as a qualitative leap in C-technology development and the beginning of 3D genomics.

<span class="mw-page-title-main">NOMe-seq</span> NOMe-seq is a nucleosome occupancy and methylome technique.

Nucleosome Occupancy and Methylome Sequencing (NOMe-seq) is a genomics technique used to simultaneously detect nucleosome positioning and DNA methylation... This method is an extension of bisulfite sequencing, which is the gold standard for determining DNA methylation. NOMe-seq relies on the methyltransferase M.CviPl, which methylates cytosines in GpC dinucleotides unbound by nucleosomes or other proteins, creating a nucleosome footprint. The mammalian genome naturally contains DNA methylation, but only at CpG sites, so GpC methylation can be differentiated from genomic methylation after bisulfite sequencing. This allows simultaneous analysis of the nucleosome footprint and endogenous methylation on the same DNA molecules. In addition to nucleosome foot-printing, NOMe-seq can determine locations bound by transcription factors. Nucleosomes are bound by 147 base pairs of DNA whereas transcription factors or other proteins will only bind a region of approximately 10-80 base pairs. Following treatment with M.CviPl, nucleosome and transcription factor sites can be differentiated based on the size of the unmethylated GpC region.

References

  1. 1 2 3 Ulahannan, Netha; Pendleton, Matthew; Deshpande, Aditya; Schwenk, Stefan; Behr, Julie M. (2019-11-07). "Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure". bioRxiv: 833590. doi:10.1101/833590. S2CID   209606104 . Retrieved 2023-03-09.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Deshpande, Aditya; Ulahannan, Netha; Pendleton, Matthew; Dai, Xiaoguang; Ly, Lynn (2022-05-30). "Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure". Nature Biotechnology. 40 (10): 1488–1499. doi:10.1038/s41587-022-01289-z. PMID   35637420. S2CID   249217259.
  3. 1 2 3 4 5 6 7 Dotson, Gabrielle; Chen, Can; Lindsly, Stephen; Cicalo, Anthony; Dilworth, Sam (2022-09-20). "Deciphering multi-way interactions in the human genome". Nature Communications. 13 (1): 5498. Bibcode:2022NatCo..13.5498D. doi:10.1038/s41467-022-32980-z. PMC   9489732 . PMID   36127324.
  4. 1 2 Hafner, Antonina; Boettiger, Alistair (2022-09-14). "The spatial organization of transcriptional control". Nature Reviews Genetics. 24 (1): 53–68. doi:10.1038/s41576-022-00526-0. PMID   36104547. S2CID   252282267.
  5. 1 2 3 4 5 6 7 8 9 10 11 Patton McCord, Rachel; Kaplan, Noam; Giorgetti, Luca (2020-01-27). "Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and FunctionThe spatial organization of transcriptional control". Molecular Cell. 77 (4): 688–708. doi:10.1016/j.molcel.2019.12.021. PMC   7134573 . PMID   32001106.
  6. 1 2 de Wit, Elzo; de Laat, Wouter (2012-01-01). "A decade of 3C technologies: insights into nuclear organization". Genes Dev. 26 (1): 11–24. doi:10.1186/1471-2164-15-S12-S11. PMC   3258961 . PMID   22215806.
  7. Li, Guoliang; Cai, Liuyang; Chang, Huidang; Hong, Ping; Zhou, Qiangwei (2014-12-19). "Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing technology and application". BMC Genomics. 15 (12): S11. doi:10.1186/1471-2164-15-S12-S11. PMC   4303937 . PMID   25563301.
  8. 1 2 3 Hoffman, Elizabeth A.; Frey, Brian L.; Smith, Lloyd M.; Auble, David T. (2015). "Formaldehyde Crosslinking: A Tool for the Study of Chromatin Complexes". Journal of Biological Chemistry. 290 (44): 26404–26411. doi: 10.1074/jbc.R115.651679 . PMC   4646298 . PMID   26354429.
  9. Weber, Klaus; Kuter, David J. (1971). "Reversible denaturation of enzymes by sodium dodecyl sulfate". Journal of Biological Chemistry. 246 (14): 4504–4509. doi: 10.1016/S0021-9258(18)62040-X . PMID   5106387.
  10. McKinley, M.P.; Bolton, D.C.; Prusiner, S.B. (1983). "A protease-resistant protein is a structural component of the scrapie prion". Cell. 1 (1): 57–62. doi: 10.1016/0092-8674(83)90207-6 . PMID   6414721. S2CID   34383066.
  11. 1 2 3 Wang, Yunhao; Zhao, Yue; Bollas, Audrey; Wang, Yuru; Au, Kin Fai (2021-11-08). "Nanopore sequencing technology, bioinformatics and applications". Nature Biotechnology. 39 (11): 1348–1365. doi:10.1038/s41587-021-01108-x. PMC   8988251 . PMID   34750572.
  12. 1 2 Zhong, Jia-Yong; Niu, Longjian; Lin, Zhuo-Bin; Bai, Xin; Chen, Ying; Luo, Feng; Hou, Chunhui; Xiao, Chuan-Le (2023). "High-throughput Pore-C reveals the single-allele topology and cell type-specificity of 3D genome folding". Nature Communications. 14 (1): 1250. Bibcode:2023NatCo..14.1250Z. doi: 10.1038/s41467-023-36899-x . PMC   9988853 . PMID   36878904.