ScGET-seq

Last updated

Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin, [1] scGET-seq is also capable of probing inactive heterochromatin. [2]

Contents

This is achieved through the use of TnH, which is created by linking the chromodomain (CD) of heterochromatin protein-1-alpha (HP-1) to the Tn5 transposase. TnH is then able to target histone 3 lysine 9 trimethylation (H3K9me3), a marker for heterochromatin. [3]

Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development, [4] the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways. [2]

History

Transcriptional regulation is tightly linked to chromatin states. Chromatin that is open, or permissive to transcription, make up only 2-3% of the genome, but encompass 94.4% of transcription factor binding sites. [5] [6] Conversely, more tightly packed DNA, or heterochromatin, is responsible for genome organization and stability. [7] Chromatin density also changes over the course of cellular differentiation processes, [8] but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin.

Many genomic-related diseases such as cancer are highly linked to changes in their epigenome. Cancers in particular are characterized by single-cell heterogeneity, which can drive metastasis and treatment resistance. [9] [10]   The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation. [11]

In 2015, ATAC-seq, which uses the Tn5 transposase to fragment and tag accessible chromatin, or euchromatin, for sequencing, became feasible at the single-cell resolution. [12] scGET-seq builds upon this technology by also providing information on heterochromatin, providing a more comprehensive look at chromatin structure and dynamics within each cell. [13]

Methods

Broad overview of how scGET-seq is performed ScGET-seq Methods Overview.png
Broad overview of how scGET-seq is performed

Sample preparation

Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material. [14]

The next step is to produce the TnH transposase. Tn5 is a transposase that cuts and ligates adapters to genomic regions unbound by nucleosomes (open chromatin). [15] HP-1a is a member of the HP1 family and is able to recognize and specifically bind to H3K9me3. [16] [17] Its chromodomain uses an induced-fit mechanism for recognizing this chromatin modification. [18] Linking the first 112 amino acids of HP-1a containing the chromodomain to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH transposase, which is capable of targeting heterochromatin marked by H3K9me3. [2]

Library preparation is done using a modified protocol for single-cell ATAC-seq, [19] where the nuclei suspension is sequentially incubated with the Tn5 transposase first, and then TnH. [2]

Data analysis

The goals of the data analysis are: [2]

  1. To identify and characterize distinct cell populations using clustering
  2. To profile chromatin accessibility across the genome
  3. To predict copy-number variants and single-nucleotide variants

Pre-processing

  1. Post-sequencing, reads need to be demultiplexed and mapped to the appropriate reference genome. Duplicated reads are identified and removed.
  2. "Peaks", or regions in the DNA enriched in the number of reads mapped, are identified. [20]
  3. Quality control is performed, and cells with low numbers of reads or few detected features are filtered out.
  4. Four count matrices (matrices where each column is a cell and each row is a feature) are generated: Tn5-dhs, Tn5-complement, TnH-dhs and TnH-complement, representing signal from accessible and compacted chromatin. [2]

Analysis

Dimension reduction, visualization and clustering

Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using principal component analysis (PCA). Groups of cells are identified using a k-NN algorithm [21] and Leiden algorithm. [22] Finally, the four matrices are combined using matrix factorization [23] and UMAP reduction. [24]

Cell identification annotation

There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks, [25] and annotation based on integration with reference scRNA-seq data. [26]

Applications

Differences between scGET-seq and scATAC-seq WikiFigure ATACvsGET.png
Differences between scGET-seq and scATAC-seq

Current

By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity). [2] By isolating regions that are most dynamic and identifying which transcription factors bind there, chromatin velocity can be used to infer the dynamic epigenetic processes happening within a given cell and the contributions of various transcription factors to those processes. [2]

Future

Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes. [27] [28] Thus, platforms and tools for integration of multimodal data are areas of active research [29] [30] [31] Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways. [32] [33]

Limitations

scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability. [13] Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data [34]

Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs. [2]

Related Research Articles

The family of heterochromatin protein 1 (HP1) consists of highly conserved proteins, which have important functions in the cell nucleus. These functions include gene repression by heterochromatin formation, transcriptional activation, regulation of binding of cohesion complexes to centromeres, sequestration of genes to the nuclear periphery, transcriptional arrest, maintenance of heterochromatin integrity, gene repression at the single nucleosome level, gene repression by heterochromatization of euchromatin, and DNA repair. HP1 proteins are fundamental units of heterochromatin packaging that are enriched at the centromeres and telomeres of nearly all eukaryotic chromosomes with the notable exception of budding yeast, in which a yeast-specific silencing complex of SIR proteins serve a similar function. Members of the HP1 family are characterized by an N-terminal chromodomain and a C-terminal chromoshadow domain, separated by a hinge region. HP1 is also found at some euchromatic sites, where its binding can correlate with either gene repression or gene activation. HP1 was originally discovered by Tharappel C James and Sarah Elgin in 1986 as a factor in the phenomenon known as position effect variegation in Drosophila melanogaster.

<span class="mw-page-title-main">Chromodomain</span>

A chromodomain is a protein structural domain of about 40–50 amino acid residues commonly found in proteins associated with the remodeling and manipulation of chromatin. The domain is highly conserved among both plants and animals, and is represented in a large number of different proteins in many genomes, such as that of the mouse. Some chromodomain-containing genes have multiple alternative splicing isoforms that omit the chromodomain entirely. In mammals, chromodomain-containing proteins are responsible for aspects of gene regulation related to chromatin remodeling and formation of heterochromatin regions. Chromodomain-containing proteins also bind methylated histones and appear in the RNA-induced transcriptional silencing complex. In histone modifications, chromodomains are very conserved. They function by identifying and binding to methylated lysine residues that exist on the surface of chromatin proteins and thereby regulate gene transcription.

<span class="mw-page-title-main">CBX5 (gene)</span> Protein-coding gene in humans

Chromobox protein homolog 5 is a protein that in humans is encoded by the CBX5 gene. It is a highly conserved, non-histone protein part of the heterochromatin family. The protein itself is more commonly called HP1α. Heterochromatin protein-1 (HP1) has an N-terminal domain that acts on methylated lysines residues leading to epigenetic repression. The C-terminal of this protein has a chromo shadow-domain (CSD) that is responsible for homodimerizing, as well as interacting with a variety of chromatin-associated, non-histone proteins.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

ATAC-seq is a technique used in molecular biology to assess genome-wide chromatin accessibility. In 2013, the technique was first described as an alternative advanced method for MNase-seq, FAIRE-Seq and DNase-Seq. ATAC-seq is a faster analysis of the epigenome than DNase-seq or MNase-seq.

H3K4me3 is an epigenetic modification to the DNA packaging protein Histone H3 that indicates tri-methylation at the 4th lysine residue of the histone H3 protein and is often involved in the regulation of gene expression. The name denotes the addition of three methyl groups (trimethylation) to the lysine 4 on the histone H3 protein.

<span class="mw-page-title-main">Single cell epigenomics</span> Study of epigenomics in individual cells by single cell sequencing

Single cell epigenomics is the study of epigenomics in individual cells by single cell sequencing. Since 2013, methods have been created including whole-genome single-cell bisulfite sequencing to measure DNA methylation, whole-genome ChIP-sequencing to measure histone modifications, whole-genome ATAC-seq to measure chromatin accessibility and chromosome conformation capture.

H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin.

H3K4me1 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the mono-methylation at the 4th lysine residue of the histone H3 protein and often associated with gene enhancers.

H3K36me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 36th lysine residue of the histone H3 protein and often associated with gene bodies.

H4K20me is an epigenetic modification to the DNA packaging protein Histone H4. It is a mark that indicates the mono-methylation at the 20th lysine residue of the histone H4 protein. This mark can be di- and tri-methylated. It is critical for genome integrity including DNA damage repair, DNA replication and chromatin compaction.

H3K9ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 9th lysine residue of the histone H3 protein.

ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.

H3K36me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 36th lysine residue of the histone H3 protein.

<span class="mw-page-title-main">MNase-seq</span> Sk kasid Youtuber

MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.

H3K36me is an epigenetic modification to the DNA packaging protein Histone H3, specifically, the mono-methylation at the 36th lysine residue of the histone H3 protein.

H3R42me is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the mono-methylation at the 42nd arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

H3R17me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 17th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

H4R3me2 is an epigenetic modification to the DNA packaging protein histone H4. It is a mark that indicates the di-methylation at the 3rd arginine residue of the histone H4 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

H3Y41P is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the phosphorylation the 41st tyrosine residue of the histone H3 protein.

References

  1. Yan F, Powell DR, Curtis DJ, Wong NC (February 2020). "From reads to insight: a hitchhiker's guide to ATAC-seq data analysis". Genome Biology. 21 (1): 22. doi: 10.1186/s13059-020-1929-3 . PMC   6996192 . PMID   32014034.
  2. 1 2 3 4 5 6 7 8 9 Tedesco M, Giannese F, Lazarević D, Giansanti V, Rosano D, Monzani S, et al. (February 2022). "Chromatin Velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin". Nature Biotechnology. 40 (2): 235–244. doi:10.1038/s41587-021-01031-1. hdl: 11368/3007419 . PMID   34635836. S2CID   238637962.
  3. Kouzarides T (February 2007). "Chromatin modifications and their function". Cell. 128 (4): 693–705. doi: 10.1016/j.cell.2007.02.005 . PMID   17320507. S2CID   11691263.
  4. La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. (August 2018). "RNA velocity of single cells". Nature. 560 (7719): 494–498. Bibcode:2018Natur.560..494L. doi:10.1038/s41586-018-0414-6. PMC   6130801 . PMID   30089906.
  5. Klemm SL, Shipony Z, Greenleaf WJ (April 2019). "Chromatin accessibility and the regulatory epigenome". Nature Reviews. Genetics. 20 (4): 207–220. doi:10.1038/s41576-018-0089-8. PMID   30675018. S2CID   59159906.
  6. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. (September 2012). "The accessible chromatin landscape of the human genome". Nature. 489 (7414): 75–82. Bibcode:2012Natur.489...75T. doi:10.1038/nature11232. PMC   3721348 . PMID   22955617. S2CID   4304439.
  7. Penagos-Puig A, Furlan-Magaril M (2020). "Heterochromatin as an Important Driver of Genome Organization". Frontiers in Cell and Developmental Biology. 8: 579137. doi: 10.3389/fcell.2020.579137 . PMC   7530337 . PMID   33072761.
  8. Golkaram M, Jang J, Hellander S, Kosik KS, Petzold LR (October 2017). "The Role of Chromatin Density in Cell Population Heterogeneity during Stem Cell Differentiation". Scientific Reports. 7 (1): 13307. Bibcode:2017NatSR...713307G. doi:10.1038/s41598-017-13731-3. PMC   5645312 . PMID   29042584.
  9. Dagogo-Jack I, Shaw AT (February 2018). "Tumour heterogeneity and resistance to cancer therapies". Nature Reviews. Clinical Oncology. 15 (2): 81–94. doi:10.1038/nrclinonc.2017.166. PMID   29115304. S2CID   2194691.
  10. Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z (December 2018). "Tumour heterogeneity and metastasis at single-cell resolution". Nature Cell Biology. 20 (12): 1349–1360. doi:10.1038/s41556-018-0236-7. PMC   6477686 . PMID   30482943.
  11. Dai Z, Gu XY, Xiang SY, Gong DD, Man CF, Fan Y (November 2020). "Research and application of single-cell sequencing in tumor heterogeneity and drug resistance of circulating tumor cells". Biomarker Research. 8 (1): 60. doi: 10.1186/s40364-020-00240-1 . PMC   7653877 . PMID   33292625.
  12. Pott S, Lieb JD (August 2015). "Single-cell ATAC-seq: strength in numbers". Genome Biology. 16 (1): 172. doi: 10.1186/s13059-015-0737-7 . PMC   4546161 . PMID   26294014.
  13. 1 2 Tang L (December 2021). "Sketching open and closed chromatin". Nature Methods. 18 (12): 1448. doi:10.1038/s41592-021-01351-9. PMID   34862496. S2CID   244871731.
  14. "Isolation of Nuclei for Single Cell RNA Sequencing & Tissues for Single Cell RNA Sequencing -Demonstrated Protocol -Sample Prep -Single Cell Gene Expression -Official 10x Genomics Support". support.10xgenomics.com. Retrieved 2022-03-02.
  15. Hsu FM, Gohain M, Chang P, Lu JH, Chen PY (January 2018). "Chapter 4 - Bioinformatics of Epigenomic Data Generated From Next-Generation Sequencing". In Tollefsbol TO (ed.). Epigenetics in Human Disease. Translational Epigenetics. Vol. 6 (Second ed.). Academic Press. pp. 65–106. doi:10.1016/B978-0-12-812215-0.00004-2. ISBN   978-0-12-812215-0.
  16. Bannister AJ, Zegerman P, Partridge JF, Miska EA, Thomas JO, Allshire RC, Kouzarides T (March 2001). "Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain". Nature. 410 (6824): 120–124. Bibcode:2001Natur.410..120B. doi:10.1038/35065138. PMID   11242054. S2CID   4334447.
  17. Watanabe S, Mishima Y, Shimizu M, Suetake I, Takada S (May 2018). "Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays". Biophysical Journal. 114 (10): 2336–2351. Bibcode:2018BpJ...114.2336W. doi:10.1016/j.bpj.2018.03.025. PMC   6129468 . PMID   29685391.
  18. Nielsen PR, Nietlispach D, Mott HR, Callaghan J, Bannister A, Kouzarides T, et al. (March 2002). "Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9". Nature. 416 (6876): 103–107. Bibcode:2002Natur.416..103N. doi:10.1038/nature722. PMID   11882902. S2CID   4423019.
  19. "Chromium Single Cell ATAC Reagent Kits User Guide (v1.1 Chemistry) -User Guide -Official 10x Genomics Support". support.10xgenomics.com. Retrieved 2022-03-02.
  20. Baek S, Lee I (2020-01-01). "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal. 18: 1429–1439. doi:10.1016/j.csbj.2020.06.012. PMC   7327298 . PMID   32637041.
  21. Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park JE (February 2020). "BBKNN: fast batch alignment of single cell transcriptomes". Bioinformatics. 36 (3): 964–965. doi:10.1093/bioinformatics/btz625. PMC   9883685 . PMID   31400197.
  22. Traag VA, Waltman L, van Eck NJ (March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv: 1810.08473 . Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC   6435756 . PMID   30914743.
  23. Žitnik M, Zupan B (January 2015). "Data Fusion by Matrix Factorization". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (1): 41–53. arXiv: 1307.0803 . doi:10.1109/TPAMI.2014.2343973. PMID   26353207. S2CID   362295.
  24. "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction — umap 0.5 documentation". umap-learn.readthedocs.io. Retrieved 2022-03-04.
  25. Cittaro D (2022-02-21), dawe/scatACC , retrieved 2022-03-04
  26. Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJ, Mahfouz A (September 2019). "A comparison of automatic cell identification methods for single-cell RNA sequencing data". Genome Biology. 20 (1): 194. doi: 10.1186/s13059-019-1795-z . PMC   6734286 . PMID   31500660.
  27. Stadhouders R, Vidal E, Serra F, Di Stefano B, Le Dily F, Quilez J, et al. (February 2018). "Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming". Nature Genetics. 50 (2): 238–249. doi:10.1038/s41588-017-0030-7. PMC   5810905 . PMID   29335546.
  28. Ranzoni AM, Tangherloni A, Berest I, Riva SG, Myers B, Strzelecka PM, et al. (March 2021). "Integrative Single-Cell RNA-Seq and ATAC-Seq Analysis of Human Developmental Hematopoiesis". Cell Stem Cell. 28 (3): 472–487.e7. doi:10.1016/j.stem.2020.11.015. PMC   7939551 . PMID   33352111.
  29. Lin Y, Wu TY, Wan S, Yang JY, Wong WH, Wang YX (January 2022). "scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning". Nature Biotechnology. 40 (5): 703–710. doi:10.1038/s41587-021-01161-6. PMC   9186323 . PMID   35058621. S2CID   246150572.
  30. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. (June 2019). "Comprehensive Integration of Single-Cell Data". Cell. 177 (7): 1888–1902.e21. doi:10.1016/j.cell.2019.05.031. PMC   6687398 . PMID   31178118.
  31. Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, et al. (August 2020). "Integrative analyses of single-cell transcriptome and regulome using MAESTRO". Genome Biology. 21 (1): 198. doi: 10.1186/s13059-020-02116-x . PMC   7412809 . PMID   32767996.
  32. Xu Y, Begoli E, McCord RP (2021-12-01). "sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network". bioRxiv: 2021.11.30.470677. doi:10.1101/2021.11.30.470677. S2CID   244821695.
  33. Chen Z, King WC, Gerstein M, Zhang J (2022-02-23). "scDVF: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations". bioRxiv: 2022.02.15.480564. doi:10.1101/2022.02.15.480564. S2CID   247000437.
  34. Baek S, Lee I (January 2020). "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal. 18: 1429–1439. doi:10.1016/j.csbj.2020.06.012. PMC   7327298 . PMID   32637041.