A topologically associating domain (TAD) is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD. [1] The average size of a topologically associating domain (TAD) is 1000 kb in humans, 880 kb in mouse cells, and 140 kb in fruit flies. [2] [3] Boundaries at both side of these domains are conserved between different mammalian cell types and even across species [2] and are highly enriched with CCCTC-binding factor (CTCF) and cohesin. [1] In addition, some types of genes (such as transfer RNA genes and housekeeping genes) appear near TAD boundaries more often than would be expected by chance. [4] [5]
The functions of TADs are not fully understood and are still a matter of debate. Most of the studies indicate TADs regulate gene expression by limiting the enhancer-promoter interaction to each TAD; [6] however, a recent study uncouples TAD organization and gene expression. [7] Disruption of TAD boundaries are found to be associated with wide range of diseases such as cancer, [8] [9] [10] variety of limb malformations such as synpolydactyly, Cooks syndrome, and F-syndrome, [11] and number of brain disorders like Hypoplastic corpus callosum and Adult-onset demyelinating leukodystrophy. [11] Furthermore, studies have revealed that interactions between promoters and enhancers spanning single or multiple TADs, are fundamental to the exact dynamics of gene expression. [12] The genomic elements underlying these interactions are named distal tethering elements (DTEs) and it has been shown that these elements are important for precise gene activation of Hox genes in early embryogenesis of D. melanogaster . [12]
The mechanisms underlying TAD formation are also complex and not yet fully elucidated, though a number of protein complexes and DNA elements are associated with TAD boundaries. However, the handcuff model and the loop extrusion model describe the TAD formation by the aid of CTCF and cohesin proteins. [13] Furthermore, it has been proposed that the stiffness of TAD boundaries itself could cause the domain insulation and TAD formation. [13]
TADs are defined as regions whose DNA sequences preferentially contact each other. They were discovered in 2012 using chromosome conformation capture techniques including Hi-C. [4] [14] [5] They have been shown to be present in multiple species, [15] including fruit flies (Drosophila), [16] mouse, [4] plants, fungi and human [5] genomes. In bacteria, they are referred to as Chromosomal Interacting Domains (CIDs). [15]
TAD locations are defined by applying an algorithm to Hi-C data. For example, TADs are often called according to the so-called "directionality index". [5] The directionality index is calculated for individual 40kb bins, by collecting the reads that fall in the bin, and observing whether their paired reads map upstream or downstream of the bin (read pairs are required to span no more than 2Mb). A positive value indicates that more read pairs lie downstream than upstream, and a negative value indicates the reverse. Mathematically, the directionality index is a signed chi-square statistic.
The development of specialized genome browsers and visualization tools [17] such as Juicebox, [18] HiGlass [19] /HiPiler, [20] The 3D Genome Browser, [21] 3DIV, [22] 3D-GNOME, [23] and TADKB [24] have enabled us to visualize the TAD organization of regions of interest in different cell types.
A number of proteins are known to be associated with TAD formation including the protein CTCF and the protein complex cohesin. [1] It is also unknown what components are required at TAD boundaries; however, in mammalian cells, it has been shown that these boundary regions have comparatively high levels of CTCF binding. In addition, some types of genes (such as transfer RNA genes and housekeeping genes) appear near TAD boundaries more often than would be expected by chance. [4] [5]
Computer simulations have shown that chromatin loop extrusion driven by cohesin motors can generate TADs. [25] [26] In the loop extrusion model, cohesin binds chromatin, pulls it in, and extrudes chromatin to progressively grow a loop. Chromatin on both sides of the cohesin complex is extruded until cohesin encounters a chromatin-bound CTCF protein, typically located at the boundary of a TAD. In this way, TAD boundaries can be brought together as the anchors of a chromatin loop. [27] Indeed, in vitro, cohesin has been observed to processively extrude DNA loops in an ATP-dependent manner [28] [29] [30] and stall at CTCF. [31] [32] However, some in vitro data indicates that the observed loops may be artifacts. [33] [34] Importantly, since cohesins can dynamically unbind from chromatin, this model suggests that TADs (and associated chromatin loops) are dynamic, transient structures, [25] in agreement with in vivo observations. [35] [36] [37] [38]
Other mechanisms for TAD formation have been suggested. For example, some simulations suggest that transcription-generated supercoiling can relocalize cohesin to TAD boundaries [39] [40] or that passively diffusing cohesin “slip links” [41] [42] can generate TADs.
TADs have been reported to be relatively constant between different cell types (in stem cells and blood cells, for example), and even between species in specific cases. [5] [43] [44] [45] Comparative TAD analysis between Drosophila melanogaster and Drosophila subobscura , with a divergence time of approximately 49 million years, has revealed a conservation in range of 30-40%. [46]
The majority of observed interactions between promoters and enhancers do not cross TAD boundaries. Removing a TAD boundary (for example, using CRISPR to delete the relevant region of the genome) can allow new promoter-enhancer contacts to form. This can affect gene expression nearby - such misregulation has been shown to cause limb malformations (e.g. polydactyly) in humans and mice. [44]
Computer simulations have shown that transcription-induced supercoiling of chromatin fibres can explain how TADs are formed and how they can assure very efficient interactions between enhancers and their cognate promoters located in the same TAD. [39]
Replication timing domains have been shown to be associated with TADs as their boundary is co localized with the boundaries of TADs that are located at either sides of compartments. [47] Insulated neighborhoods, DNA loops formed by CTCF/cohesin-bound regions, are proposed to functionally underlie TADs. [48]
Genome rearrangement breakpoint have shown to be enriched at the TAD boundaries in D. melanogaster. [49]
Disruption of TAD boundaries can affect the expression of nearby genes, and this can cause disease. [50]
For example, genomic structural variants that disrupt TAD boundaries have been reported to cause developmental disorders such as human limb malformations. [51] [52] [53] Additionally, several studies have provided evidence that the disruption or rearrangement of TAD boundaries can provide growth advantages to certain cancers, such as T-cell acute lymphoblastic leukemia (T-ALL), [54] gliomas, [55] and lung cancer. [56]
Lamina-associated domains (LADs) are parts of the chromatin that heavily interact with the lamina, a network-like structure at the inner membrane of the nucleus. [57] LADs consist mostly of transcriptionally silent chromatin, being enriched with trimethylated Lys27 on histone H3, (i.e. H3K27me3); which is a common posttranslational histone modification of heterochromatin. [58] LADs have CTCF-binding sites at their periphery. [57]
Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in reinforcing the DNA during cell division, preventing DNA damage, and regulating gene expression and DNA replication. During mitosis and meiosis, chromatin facilitates proper segregation of the chromosomes in anaphase; the characteristic shapes of chromosomes visible during this stage are the result of DNA being coiled into highly condensed chromatin.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
The nucleoid is an irregularly shaped region within the prokaryotic cell that contains all or most of the genetic material. The chromosome of a typical prokaryote is circular, and its length is very large compared to the cell dimensions, so it needs to be compacted in order to fit. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. Instead, the nucleoid forms by condensation and functional arrangement with the help of chromosomal architectural proteins and RNA molecules as well as DNA supercoiling. The length of a genome widely varies and a cell may contain multiple copies of it.
In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerational epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome. The human epigenome, including DNA methylation and histone modification, is maintained through cell division. The epigenome is essential for normal development and cellular differentiation, enabling cells with the same genetic code to perform different functions. The human epigenome is dynamic and can be influenced by environmental factors such as diet, stress, and toxins.
An insulator is a type of cis-regulatory element known as a long-range regulatory element. Found in multicellular eukaryotes and working over distances from the promoter element of the target gene, an insulator is typically 300 bp to 2000 bp in length. Insulators contain clustered binding sites for sequence specific DNA-binding proteins and mediate intra- and inter-chromosomal interactions.
Cohesin is a protein complex that mediates sister chromatid cohesion, homologous recombination, and DNA looping. Cohesin is formed of SMC3, SMC1, SCC1 and SCC3. Cohesin holds sister chromatids together after DNA replication until anaphase when removal of cohesin leads to separation of sister chromatids. The complex forms a ring-like structure and it is believed that sister chromatids are held together by entrapment inside the cohesin ring. Cohesin is a member of the SMC family of protein complexes which includes Condensin, MukBEF and SMC-ScpAB.
Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the CTCF gene. CTCF is involved in many cellular processes, including transcriptional regulation, insulator activity, V(D)J recombination and regulation of chromatin architecture.
Nipped-B-like protein (NIPBL), also known as SCC2 or delangin is a protein that in humans is encoded by the NIPBL gene. NIPBL is required for the association of cohesin with DNA and is the major subunit of the cohesin loading complex. Heterozygous mutations in NIPBL account for an estimated 60% of case of Cornelia de Lange Syndrome.
Structural maintenance of chromosomes protein 1A (SMC1A) is a protein that in humans is encoded by the SMC1A gene. SMC1A is a subunit of the cohesin complex which mediates sister chromatid cohesion, homologous recombination and DNA looping. In somatic cells, cohesin is formed of SMC1A, SMC3, RAD21 and either SA1 or SA2 whereas in meiosis, cohesin is formed of SMC3, SMC1B, REC8 and SA3.
Tumor protein p63, typically referred to as p63, also known as transformation-related protein 63 is a protein that in humans is encoded by the TP63 gene.
Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.
SATB1 is a protein which in humans is encoded by the SATB1 gene. It is a dimeric/tetrameric transcription factor with multiple DNA binding domains. SATB1 specifically binds to AT-rich DNA sequences with high unwinding propensity called base unpairing regions (BURs), containing matrix attachment regions (MARs).
Double-strand-break repair protein rad21 homolog is a protein that in humans is encoded by the RAD21 gene. RAD21, an essential gene, encodes a DNA double-strand break (DSB) repair protein that is evolutionarily conserved in all eukaryotes from budding yeast to humans. RAD21 protein is a structural component of the highly conserved cohesin complex consisting of RAD21, SMC1A, SMC3, and SCC3 [ STAG1 (SA1) and STAG2 (SA2) in multicellular organisms] proteins, involved in sister chromatid cohesion.
Sister chromatid cohesion refers to the process by which sister chromatids are paired and held together during certain phases of the cell cycle. Establishment of sister chromatid cohesion is the process by which chromatin-associated cohesin protein becomes competent to physically bind together the sister chromatids. In general, cohesion is established during S phase as DNA is replicated, and is lost when chromosomes segregate during mitosis and meiosis. Some studies have suggested that cohesion aids in aligning the kinetochores during mitosis by forcing the kinetochores to face opposite cell poles.
Structural maintenance of chromosomes protein 1B (SMC-1B) is a protein that in humans is encoded by the SMC1B gene. SMC proteins engage in chromosome organization and can be broken into 3 groups based on function which are cohesins, condensins, and DNA repair. SMC-1B belongs to a family of proteins required for chromatid cohesion and DNA recombination during meiosis and mitosis. SMC1B protein appears to participate with other cohesins REC8, STAG3 and SMC3 in sister-chromatid cohesion throughout the whole meiotic process in human oocytes.
Nuclear organization refers to the spatial distribution of chromatin within a cell nucleus. There are many different levels and scales of nuclear organisation. Chromatin is a higher order structure of DNA.
In mammalian biology, insulated neighborhoods are chromosomal loop structures formed by the physical interaction of two DNA loci bound by the transcription factor CTCF and co-occupied by cohesin. Insulated neighborhoods are thought to be structural and functional units of gene control because their integrity is important for normal gene regulation. Current evidence suggests that these structures form the mechanistic underpinnings of higher-order chromosome structures, including topologically associating domains (TADs). Insulated neighborhoods are functionally important in understanding gene regulation in normal cells and dysregulated gene expression in disease.
SRY-box 17 is a protein that in humans is encoded by the SOX17 gene.
DXZ4 is a variable number tandemly repeated DNA sequence. In humans it is composed of 3kb monomers containing a highly conserved CTCF binding site. CTCF is a transcription factor protein and the main insulator responsible for partitioning of chromatin domains in the vertebrate genome.
Hi-C is a high-throughput genomic and epigenomic technique to capture chromatin conformation (3C). In general, Hi-C is considered as a derivative of a series of chromosome conformation capture technologies, including but not limited to 3C, 4C, and 5C. Hi-C comprehensively detects genome-wide chromatin interactions in the cell nucleus by combining 3C and next-generation sequencing (NGS) approaches and has been considered as a qualitative leap in C-technology development and the beginning of 3D genomics.