In genetics, a super-enhancer is a region of the mammalian genome comprising multiple enhancers that is collectively bound by an array of transcription factor proteins to drive transcription of genes involved in cell identity. [1] [2] [3] Because super-enhancers are frequently identified near genes important for controlling and defining cell identity, they may thus be used to quickly identify key nodes regulating cell identity. [3] [4]
Enhancers have several quantifiable traits that have a range of values, and these traits are generally elevated at super-enhancers. Super-enhancers are bound by higher levels of transcription-regulating proteins and are associated with genes that are more highly expressed. [1] [5] [6] [7] Expression of genes associated with super-enhancers is particularly sensitive to perturbations, which may facilitate cell state transitions or explain sensitivity of super-enhancer—associated genes to small molecules that target transcription. [1] [5] [6] [8] [9]
The regulation of transcription by enhancers has been studied since the 1980s. [10] [11] [12] [13] [14] Large or multi-component transcription regulators with a range of mechanistic properties, including locus control regions, clustered open regulatory elements, and transcription initiation platforms, were observed shortly thereafter. [15] [16] [17] [18] More recent research has suggested that these different categories of regulatory elements may represent subtypes of super-enhancer. [3] [19]
In 2013, two labs identified large enhancers near several genes especially important for establishing cell identities. While Richard A. Young and colleagues identified super-enhancers, Francis Collins and colleagues identified stretch enhancers. [1] [2] Both super-enhancers and stretch enhancers are clusters of enhancers that control cell-specific genes and may be largely synonymous. [2] [20]
As currently defined, the term “super-enhancer” was introduced by Young’s lab to describe regions identified in mouse embryonic stem cells (ESCs). [1] These particularly large, potent enhancer regions were found to control the genes that establish the embryonic stem cell identity, including Oct-4, Sox2, Nanog, Klf4, and Esrrb. Perturbation of the super-enhancers associated with these genes showed a range of effects on their target genes’ expression. [20] Super-enhancers have been since identified near cell identity-regulators in a range of mouse and human tissues. [2] [3] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]
The enhancers comprising super-enhancers share the functions of enhancers, including binding transcription factor proteins, looping to target genes, and activating transcription. [1] [3] [19] [20] Three notable traits of enhancers comprising super-enhancers are their clustering in genomic proximity, their exceptional signal of transcription-regulating proteins, and their high frequency of physical interaction with each other. Perturbing the DNA of enhancers comprising super-enhancers showed a range of effects on the expression of cell identity genes, suggesting a complex relationship between the constituent enhancers. [20] Super-enhancers separated by tens of megabases cluster in three-dimensions inside the nucleus of mouse embryonic stem cells. [38] [39]
High levels of many transcription factors and co-factors are seen at super-enhancers (e.g., CDK7, BRD4, and Mediator). [1] [3] [5] [6] [8] [9] [19] This high concentration of transcription-regulating proteins suggests why their target genes tend to be more highly expressed than other classes of genes. However, housekeeping genes tend to be more highly expressed than super-enhancer—associated genes. [1]
Super-enhancers may have evolved at key cell identity genes to render the transcription of these genes responsive to an array of external cues. [20] The enhancers comprising a super-enhancer can each be responsive to different signals, which allows the transcription of a single gene to be regulated by multiple signaling pathways. [20] Pathways seen to regulate their target genes using super-enhancers include Wnt, TGFb, LIF, BDNF, and NOTCH. [20] [40] [41] [42] [43] The constituent enhancers of super-enhancers physically interact with each other and their target genes over a long range sequence-wise. [7] [22] [44] Super-enhancers that control the expression of major cell surface receptors with a crucial role in the function of a given cell lineage have also been defined. This is notably the case for B-lymphocytes, the survival, the activation and the differentiation of which rely on the expression of membrane-form immunoglobulins (Ig). The Ig heavy chain locus super-enhancer is a very large (25kb) cis-regulatory region, including multiple enhancers and controlling several major modifications of the locus (notably somatic hypermutation, class-switch recombination and locus suicide recombination).
Mutations in super-enhancers have been noted in various diseases, including cancers, type 1 diabetes, Alzheimer’s disease, lupus, rheumatoid arthritis, multiple sclerosis, systemic scleroderma, primary biliary cirrhosis, Crohn’s disease, Graves disease, vitiligo, and atrial fibrillation. [2] [3] [6] [25] [32] [35] [45] [46] [47] [48] [49] A similar enrichment in disease-associated sequence variation has also been observed for stretch enhancers. [2]
Super-enhancers may play important roles in the misregulation of gene expression in cancer. During tumor development, tumor cells acquire super-enhancers at key oncogenes, which drive higher levels of transcription of these genes than in healthy cells. [3] [5] [44] [45] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] Altered super-enhancer function is also induced by mutations of chromatin regulators. [60] Acquired super-enhancers may thus be biomarkers that could be useful for diagnosis and therapeutic intervention. [20]
Proteins enriched at super-enhancers include the targets of small molecules that target transcription-regulating proteins and have been deployed against cancers. [5] [6] [25] [61] For instance, super-enhancers rely on exceptional amounts of CDK7, and, in cancer, multiple papers report the loss of expression of their target genes when cells are treated with the CDK7 inhibitor THZ1. [5] [8] [9] [62] Similarly, super-enhancers are enriched in the target of the JQ1 small molecule, BRD4, so treatment with JQ1 causes exceptional losses in expression for super-enhancer—associated genes. [6]
Super-enhancers have been most commonly identified by locating genomic regions that are highly enriched in ChIP-Seq signal. ChIP-Seq experiments targeting master transcription factors and co-factors like Mediator or BRD4 have been used, but the most frequently used is H3K27ac-marked nucleosomes. [1] [3] [6] [63] [64] [65] The program “ROSE” (Rank Ordering of Super-Enhancers) is commonly used to identify super-enhancers from ChIP-Seq data. This program stitches together previously identified enhancer regions and ranks these stitched enhancers by their ChIP-Seq signal. [1] The stitching distance selected to combine multiple individual enhancers into larger domains can vary. Because some markers of enhancer activity also are enriched in promoters, regions within promoters of genes can be disregarded. ROSE separates super-enhancers from typical enhancers by their exceptional enrichment in a mark of enhancer activity. Homer is another tool that can identify super-enhancers. [66]
In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn are wrapped into 30-nanometer fibers that form tightly packed chromatin. Histones prevent DNA from becoming tangled and protect it from DNA damage. In addition, histones play important roles in gene regulation and DNA replication. Without histones, unwound DNA in chromosomes would be very long. For example, each human cell has about 1.8 meters of DNA if completely stretched out; however, when wound about histones, this length is reduced to about 90 micrometers (0.09 mm) of 30 nm diameter chromatin fibers.
In genetics, an enhancer is a short region of DNA that can be bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcription factors. Enhancers are cis-acting. They can be located up to 1 Mbp away from the gene, upstream or downstream from the start site. There are hundreds of thousands of enhancers in the human genome. They are found in both prokaryotes and eukaryotes.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
An insulator is a type of cis-regulatory element known as a long-range regulatory element. Found in multicellular eukaryotes and working over distances from the promoter element of the target gene, an insulator is typically 300 bp to 2000 bp in length. Insulators contain clustered binding sites for sequence specific DNA-binding proteins and mediate intra- and inter-chromosomal interactions.
Smads comprise a family of structurally similar proteins that are the main signal transducers for receptors of the transforming growth factor beta (TGF-B) superfamily, which are critically important for regulating cell development and growth. The abbreviation refers to the homologies to the Caenorhabditis elegans SMA and MAD family of genes in Drosophila.
Myc is a family of regulator genes and proto-oncogenes that code for transcription factors. The Myc family consists of three related human genes: c-myc (MYC), l-myc (MYCL), and n-myc (MYCN). c-myc was the first gene to be discovered in this family, due to homology with the viral gene v-myc.
Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the CTCF gene. CTCF is involved in many cellular processes, including transcriptional regulation, insulator activity, V(D)J recombination and regulation of chromatin architecture.
Cyclin D1 is a protein that in humans is encoded by the CCND1 gene.
G1/S-specific cyclin-D3 is a protein that in humans is encoded by the CCND3 gene.
Cyclin-dependent kinase 7, or cell division protein kinase 7, is an enzyme that in humans is encoded by the CDK7 gene.
Forkhead box O3, also known as FOXO3 or FOXO3a, is a human protein encoded by the FOXO3 gene.
Metastasis-associated protein MTA1 is a protein that in humans is encoded by the MTA1 gene. MTA1 is the founding member of the MTA family of genes. MTA1 is primarily localized in the nucleus but also found to be distributed in the extra-nuclear compartments. MTA1 is a component of several chromatin remodeling complexes including the nucleosome remodeling and deacetylation complex (NuRD). MTA1 regulates gene expression by functioning as a coregulator to integrate DNA-interacting factors to gene activity. MTA1 participates in physiological functions in the normal and cancer cells. MTA1 is one of the most upregulated proteins in human cancer and associates with cancer progression, aggressive phenotypes, and poor prognosis of cancer patients.
Cyclin-H is a protein that in humans is encoded by the CCNH gene.
Cell division protein kinase 8 is an enzyme that in humans is encoded by the CDK8 gene.
Mediator of RNA polymerase II transcription subunit 21 is an enzyme that in humans is encoded by the MED21 gene.
T-box transcription factor TBX3 is a protein that in humans is encoded by the TBX3 gene.
General transcription factor IIF subunit 1 is a protein that in humans is encoded by the GTF2F1 gene.
Richard Allen Young is an American geneticist, a Member of Whitehead Institute, and a professor of biology at the Massachusetts Institute of Technology. He is a pioneer in the systems biology of gene control who has developed genomics technologies and concepts key to understanding gene control in human health and disease. He has served as an advisor to the World Health Organization and the National Institutes of Health. He is a member of the National Academy of Sciences and the National Academy of Medicine. Scientific American has recognized him as one of the top 50 leaders in science, technology and business. Young is among the most Highly Cited Researchers in his field.
In mammalian biology, insulated neighborhoods are chromosomal loop structures formed by the physical interaction of two DNA loci bound by the transcription factor CTCF and co-occupied by cohesin. Insulated neighborhoods are thought to be structural and functional units of gene control because their integrity is important for normal gene regulation. Current evidence suggests that these structures form the mechanistic underpinnings of higher-order chromosome structures, including topologically associating domains (TADs). Insulated neighborhoods are functionally important in understanding gene regulation in normal cells and dysregulated gene expression in disease.
A chromatin variant corresponds to a section of the genome that differs in chromatin states across cell types/states within an individual (intra-individual) or between individuals for a given cell type/state (inter-individual). Chromatin variants distinguish DNA sequences that differ in their function in one cell type/state versus another. Chromatin variants range in sizes. The smallest chromatin variants cover a few hundred DNA base pairs, such as seen at promoters, enhancers or insulators. The largest chromatin variants capture a few thousand DNA base pairs, such as seen at Large Organized Chromatin Lysine domains (LOCKs) and Clusters Of Cis-Regulatory Elements (COREs), such as super-enhancer.