This article is an orphan, as no other articles link to it . Please introduce links to this page from related articles ; try the Find link tool for suggestions. (January 2024) |
Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin, [1] scGET-seq is also capable of probing inactive heterochromatin. [2]
This is achieved through the use of TnH, which is created by linking the chromodomain (CD) of heterochromatin protein-1-alpha (HP-1) to the Tn5 transposase. TnH is then able to target histone 3 lysine 9 trimethylation (H3K9me3), a marker for heterochromatin. [3]
Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development, [4] the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways. [2]
Transcriptional regulation is tightly linked to chromatin states. Chromatin that is open, or permissive to transcription, make up only 2-3% of the genome, but encompass 94.4% of transcription factor binding sites. [5] [6] Conversely, more tightly packed DNA, or heterochromatin, is responsible for genome organization and stability. [7] Chromatin density also changes over the course of cellular differentiation processes, [8] but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin.
Many genomic-related diseases such as cancer are highly linked to changes in their epigenome. Cancers in particular are characterized by single-cell heterogeneity, which can drive metastasis and treatment resistance. [9] [10] The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation. [11]
In 2015, ATAC-seq, which uses the Tn5 transposase to fragment and tag accessible chromatin, or euchromatin, for sequencing, became feasible at the single-cell resolution. [12] scGET-seq builds upon this technology by also providing information on heterochromatin, providing a more comprehensive look at chromatin structure and dynamics within each cell. [13]
Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material. [14]
The next step is to produce the TnH transposase. Tn5 is a transposase that cuts and ligates adapters to genomic regions unbound by nucleosomes (open chromatin). [15] HP-1a is a member of the HP1 family and is able to recognize and specifically bind to H3K9me3. [16] [17] Its chromodomain uses an induced-fit mechanism for recognizing this chromatin modification. [18] Linking the first 112 amino acids of HP-1a containing the chromodomain to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH transposase, which is capable of targeting heterochromatin marked by H3K9me3. [2]
Library preparation is done using a modified protocol for single-cell ATAC-seq, [19] where the nuclei suspension is sequentially incubated with the Tn5 transposase first, and then TnH. [2]
The goals of the data analysis are: [2]
Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using principal component analysis (PCA). Groups of cells are identified using a k-NN algorithm [21] and Leiden algorithm. [22] Finally, the four matrices are combined using matrix factorization [23] and UMAP reduction. [24]
There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks, [25] and annotation based on integration with reference scRNA-seq data. [26]
By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity). [2] By isolating regions that are most dynamic and identifying which transcription factors bind there, chromatin velocity can be used to infer the dynamic epigenetic processes happening within a given cell and the contributions of various transcription factors to those processes. [2]
Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes. [27] [28] Thus, platforms and tools for integration of multimodal data are areas of active research [29] [30] [31] Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways. [32] [33]
scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability. [13] Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data [34]
Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs. [2]
The family of heterochromatin protein 1 (HP1) consists of highly conserved proteins, which have important functions in the cell nucleus. These functions include gene repression by heterochromatin formation, transcriptional activation, regulation of binding of cohesion complexes to centromeres, sequestration of genes to the nuclear periphery, transcriptional arrest, maintenance of heterochromatin integrity, gene repression at the single nucleosome level, gene repression by heterochromatization of euchromatin, and DNA repair. HP1 proteins are fundamental units of heterochromatin packaging that are enriched at the centromeres and telomeres of nearly all eukaryotic chromosomes with the notable exception of budding yeast, in which a yeast-specific silencing complex of SIR proteins serve a similar function. Members of the HP1 family are characterized by an N-terminal chromodomain and a C-terminal chromoshadow domain, separated by a hinge region. HP1 is also found at some euchromatic sites, where its binding can correlate with either gene repression or gene activation. HP1 was originally discovered by Tharappel C James and Sarah Elgin in 1986 as a factor in the phenomenon known as position effect variegation in Drosophila melanogaster.
A chromodomain is a protein structural domain of about 40–50 amino acid residues commonly found in proteins associated with the remodeling and manipulation of chromatin. The domain is highly conserved among both plants and animals, and is represented in a large number of different proteins in many genomes, such as that of the mouse. Some chromodomain-containing genes have multiple alternative splicing isoforms that omit the chromodomain entirely. In mammals, chromodomain-containing proteins are responsible for aspects of gene regulation related to chromatin remodeling and formation of heterochromatin regions. Chromodomain-containing proteins also bind methylated histones and appear in the RNA-induced transcriptional silencing complex. In histone modifications, chromodomains are very conserved. They function by identifying and binding to methylated lysine residues that exist on the surface of chromatin proteins and thereby regulate gene transcription.
Chromobox protein homolog 5 is a protein that in humans is encoded by the CBX5 gene. It is a highly conserved, non-histone protein part of the heterochromatin family. The protein itself is more commonly called HP1α. Heterochromatin protein-1 (HP1) has an N-terminal domain that acts on methylated lysines residues leading to epigenetic repression. The C-terminal of this protein has a chromo shadow-domain (CSD) that is responsible for homodimerizing, as well as interacting with a variety of chromatin-associated, non-histone proteins.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
ATAC-seq is a technique used in molecular biology to assess genome-wide chromatin accessibility. In 2013, the technique was first described as an alternative advanced method for MNase-seq, FAIRE-Seq and DNase-Seq. ATAC-seq is a faster analysis of the epigenome than DNase-seq or MNase-seq.
H3K4me3 is an epigenetic modification to the DNA packaging protein Histone H3 that indicates tri-methylation at the 4th lysine residue of the histone H3 protein and is often involved in the regulation of gene expression. The name denotes the addition of three methyl groups (trimethylation) to the lysine 4 on the histone H3 protein.
Single cell epigenomics is the study of epigenomics in individual cells by single cell sequencing. Since 2013, methods have been created including whole-genome single-cell bisulfite sequencing to measure DNA methylation, whole-genome ChIP-sequencing to measure histone modifications, whole-genome ATAC-seq to measure chromatin accessibility and chromosome conformation capture.
H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin.
H3K4me1 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the mono-methylation at the 4th lysine residue of the histone H3 protein and often associated with gene enhancers.
H3K36me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 36th lysine residue of the histone H3 protein and often associated with gene bodies.
H4K20me is an epigenetic modification to the DNA packaging protein Histone H4. It is a mark that indicates the mono-methylation at the 20th lysine residue of the histone H4 protein. This mark can be di- and tri-methylated. It is critical for genome integrity including DNA damage repair, DNA replication and chromatin compaction.
H3K9ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 9th lysine residue of the histone H3 protein.
ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.
H3K36me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 36th lysine residue of the histone H3 protein.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.
H3K36me is an epigenetic modification to the DNA packaging protein Histone H3, specifically, the mono-methylation at the 36th lysine residue of the histone H3 protein.
H3R42me is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the mono-methylation at the 42nd arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R17me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 17th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H4R3me2 is an epigenetic modification to the DNA packaging protein histone H4. It is a mark that indicates the di-methylation at the 3rd arginine residue of the histone H4 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3Y41P is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the phosphorylation the 41st tyrosine residue of the histone H3 protein.