In mammalian biology, insulated neighborhoods are chromosomal loop structures formed by the physical interaction of two DNA loci bound by the transcription factor CTCF and co-occupied by cohesin. [1] Insulated neighborhoods are thought to be structural and functional units of gene control because their integrity is important for normal gene regulation. Current evidence suggests that these structures form the mechanistic underpinnings of higher-order chromosome structures, including topologically associating domains (TADs). Insulated neighborhoods are functionally important in understanding gene regulation in normal cells and dysregulated gene expression in disease.
Mammalian gene transcription is generally controlled by enhancers. [2] [3] [4] [5] [6] Enhancers can regulate transcription of genes at large distances by looping to physically contact their target genes. This property of enhancers makes it difficult to identify an enhancer's target gene(s). Insulators, another type of DNA regulatory element, limit an enhancer's ability to target distal genes when the insulator is located between an enhancer and a potential target. [7] [8] [9] [10] In mammals, insulators are bound by CTCF, [11] but only a minority of CTCF-bound sites function as insulators. [12] CTCF molecules can form homodimers on DNA, which can be co-bound by cohesin; this chromatin loop structure helps constrain the ability of enhancers within the loop to target genes outside the loop. Loops with CTCF and cohesin at the start and end of the loop that restrict enhancer-gene targeting are "insulated neighborhoods."
Insulated neighborhoods are defined as chromosome loops that are formed by CTCF homodimers, co-bound with cohesin, and containing at least one gene. [13] [14] The CTCF/cohesin-bound regions delimiting an insulated neighborhood are called "anchors." One study in human Embryonic stem cells identified ~13,000 insulated neighborhoods that, on average, each contained three genes and was about 90kb in size. [15] Two lines of evidence argue that the boundaries of insulated neighborhoods are insulating: 1) the vast majority (~90-97%) of enhancer-gene interactions are contained within insulated neighborhoods and 2) genetic perturbation of CTCF/cohesin-bound insulated neighborhood anchors leads to local gene dysregulation due to novel interactions outside of the neighborhood.
The majority of insulated neighborhoods appear to be maintained during development because CTCF binding and CTCF-CTCF loop structures are very similar across human cell types. [16] [17] While the location of many insulated neighborhood structures are maintained across different cell types, the enhancer-gene interactions occurring within them are cell-type specific, consistent with the cell type-specific activity of enhancers. [18] [19]
Topologically associating domains (TADs) are megabase-size regions of relatively high DNA interaction frequencies. [20] [21] Mechanistic studies indicate TADs are single insulated neighborhoods or collections of insulated neighborhoods. [22]
Genetic and epigenetic variation of insulated neighborhood anchors have been linked to several human diseases. One study of a genetic variant linked to asthma disrupts CTCF binding and insulated neighborhood formation. [23] Studies of imprinted loci showed DNA methylation controls CTCF-anchored loops regulating gene expression. Individuals with methylation aberrations at an imprinted CTCF-binding site near IGF2/H19 form aberrant Insulated Neighborhoods and develop Beckwith-Wiedemann syndrome (when both alleles have the paternal type of insulated neighborhood) or Silver-Russell syndrome (when both alleles have the maternal type of insulated neighborhood). [24]
Insulated neighborhoods aid in identifying the target genes of disease-associated enhancer variants. The majority of disease-linked DNA variants identified from genome-wide association studies occur in enhancers. [25] [26] [27] [28] Identifying target genes of enhancers with disease-linked variants has been difficult because enhancers may act over long distances, but the constraint on enhancer-gene targeting by insulated neighborhoods refines the prediction of target genes. For example, a DNA variant associated with type 2 diabetes occurs within an enhancer located between the CDC123 and CAMK1D genes but only affects CAMK1D because this gene and the enhancer are within the same insulated neighborhood, while CDC123 lies outside the neighborhood. [29] [30]
Somatic mutations that alter insulated neighborhood anchors can contribute to tumorigenesis. Chromosomal alterations such as translocations, deletions and tandem duplications intersecting with insulated neighborhood anchor sites can activate oncogenes. [31] [32] [33] Epigenetic dysregulation can also contribute to tumorigenesis by altering insulated neighborhoods. IDH-mutant gliomas display altered DNA methylation patterns, so CTCF binding, which is DNA methylation-dependent, is also altered. [34] Altered CTCF-binding disrupts insulated neighborhoods and can lead to oncogene misregulation.
Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in reinforcing the DNA during cell division, preventing DNA damage, and regulating gene expression and DNA replication. During mitosis and meiosis, chromatin facilitates proper segregation of the chromosomes in anaphase; the characteristic shapes of chromosomes visible during this stage are the result of DNA being coiled into highly condensed chromatin.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
An insulator is a type of cis-regulatory element known as a long-range regulatory element. Found in multicellular eukaryotes and working over distances from the promoter element of the target gene, an insulator is typically 300 bp to 2000 bp in length. Insulators contain clustered binding sites for sequence specific DNA-binding proteins and mediate intra- and inter-chromosomal interactions.
HHV Latency Associated Transcript is a length of RNA which accumulates in cells hosting long-term, or latent, Human Herpes Virus (HHV) infections. The LAT RNA is produced by genetic transcription from a certain region of the viral DNA. LAT regulates the viral genome and interferes with the normal activities of the infected host cell.
Cohesin is a protein complex that mediates sister chromatid cohesion, homologous recombination, and DNA looping. Cohesin is formed of SMC3, SMC1, SCC1 and SCC3. Cohesin holds sister chromatids together after DNA replication until anaphase when removal of cohesin leads to separation of sister chromatids. The complex forms a ring-like structure and it is believed that sister chromatids are held together by entrapment inside the cohesin ring. Cohesin is a member of the SMC family of protein complexes which includes Condensin, MukBEF and SMC-ScpAB.
Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the CTCF gene. CTCF is involved in many cellular processes, including transcriptional regulation, insulator activity, V(D)J recombination and regulation of chromatin architecture.
Nipped-B-like protein (NIPBL), also known as SCC2 or delangin is a protein that in humans is encoded by the NIPBL gene. NIPBL is required for the association of cohesin with DNA and is the major subunit of the cohesin loading complex. Heterozygous mutations in NIPBL account for an estimated 60% of case of Cornelia de Lange Syndrome.
Structural maintenance of chromosomes protein 1A (SMC1A) is a protein that in humans is encoded by the SMC1A gene. SMC1A is a subunit of the cohesin complex which mediates sister chromatid cohesion, homologous recombination and DNA looping. In somatic cells, cohesin is formed of SMC1A, SMC3, RAD21 and either SA1 or SA2 whereas in meiosis, cohesin is formed of SMC3, SMC1B, REC8 and SA3.
Tumor protein p63, typically referred to as p63, also known as transformation-related protein 63 is a protein that in humans is encoded by the TP63 gene.
Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.
Histone H2A type 2-A is a protein that in humans is encoded by the HIST2H2AA3 gene.
Double-strand-break repair protein rad21 homolog is a protein that in humans is encoded by the RAD21 gene. RAD21, an essential gene, encodes a DNA double-strand break (DSB) repair protein that is evolutionarily conserved in all eukaryotes from budding yeast to humans. RAD21 protein is a structural component of the highly conserved cohesin complex consisting of RAD21, SMC1A, SMC3, and SCC3 [ STAG1 (SA1) and STAG2 (SA2) in multicellular organisms] proteins, involved in sister chromatid cohesion.
Cohesin subunit SA-2 (SA2) is a protein that in humans is encoded by the STAG2 gene. SA2 is a subunit of the Cohesin complex which mediates sister chromatid cohesion, homologous recombination and DNA looping. In somatic cells cohesin is formed of SMC3, SMC1, RAD21 and either SA1 or SA2 whereas in meiosis, cohesin is formed of SMC3, SMC1B, REC8 and SA3.
Structural maintenance of chromosomes protein 1B (SMC-1B) is a protein that in humans is encoded by the SMC1B gene. SMC proteins engage in chromosome organization and can be broken into 3 groups based on function which are cohesins, condensins, and DNA repair. SMC-1B belongs to a family of proteins required for chromatid cohesion and DNA recombination during meiosis and mitosis. SMC1B protein appears to participate with other cohesins REC8, STAG3 and SMC3 in sister-chromatid cohesion throughout the whole meiotic process in human oocytes.
In genetics, a super-enhancer is a region of the mammalian genome comprising multiple enhancers that is collectively bound by an array of transcription factor proteins to drive transcription of genes involved in cell identity. Because super-enhancers are frequently identified near genes important for controlling and defining cell identity, they may thus be used to quickly identify key nodes regulating cell identity.
A topologically associating domain (TAD) is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD. The median size of a TAD in mouse cells is 880 kb, and they have similar sizes in non-mammalian species. Boundaries at both side of these domains are conserved between different mammalian cell types and even across species and are highly enriched with CCCTC-binding factor (CTCF) and cohesin. In addition, some types of genes appear near TAD boundaries more often than would be expected by chance.
Richard Allen Young is an American geneticist, a Member of Whitehead Institute, and a professor of biology at the Massachusetts Institute of Technology. He is a pioneer in the systems biology of gene control who has developed genomics technologies and concepts key to understanding gene control in human health and disease. He has served as an advisor to the World Health Organization and the National Institutes of Health. He is a member of the National Academy of Sciences and the National Academy of Medicine. Scientific American has recognized him as one of the top 50 leaders in science, technology and business. Young is among the most Highly Cited Researchers in his field.
Nuclear organization refers to the spatial distribution of chromatin within a cell nucleus. There are many different levels and scales of nuclear organisation. Chromatin is a higher order structure of DNA.
DXZ4 is a variable number tandemly repeated DNA sequence. In humans it is composed of 3kb monomers containing a highly conserved CTCF binding site. CTCF is a transcription factor protein and the main insulator responsible for partitioning of chromatin domains in the vertebrate genome.
Human epigenome is the complete set of structural modifications of chromatin and chemical modifications of histones and nucleotides. These modifications affect according to cellular type and development status. Various studies show that epigenome depends on exogenous factors.