In genetics, a super-enhancer is a region of the mammalian genome comprising multiple enhancers that is collectively bound by an array of transcription factor proteins to drive transcription of genes involved in cell identity. [1] [2] [3] Because super-enhancers are frequently identified near genes important for controlling and defining cell identity, they may thus be used to quickly identify key nodes regulating cell identity. [3] [4]
Enhancers have several quantifiable traits that have a range of values, and these traits are generally elevated at super-enhancers. Super-enhancers are bound by higher levels of transcription-regulating proteins and are associated with genes that are more highly expressed. [1] [5] [6] [7] Expression of genes associated with super-enhancers is particularly sensitive to perturbations, which may facilitate cell state transitions or explain sensitivity of super-enhancer— [1] [5] [6] [8] [9]
The regulation of transcription by enhancers has been studied since the 1980s. [10] [11] [12] [13] [14] Large or multi-component transcription regulators with a range of mechanistic properties, including locus control regions, clustered open regulatory elements, and transcription initiation platforms, were observed shortly thereafter. [15] [16] [17] [18] More recent research has suggested that these different categories of regulatory elements may represent subtypes of super-enhancer. [3] [19]
In 2013, two labs identified large enhancers near several genes especially important for establishing cell identities. While Richard A. Young and colleagues identified super-enhancers, Francis Collins and colleagues identified stretch enhancers. [1] [2] Both super-enhancers and stretch enhancers are clusters of enhancers that control cell-specific genes and may be largely synonymous. [2] [20]
As currently defined, the term “super-enhancer” was introduced by Young’s lab to describe regions identified in mouse embryonic stem cells (ESCs). [1] These particularly large, potent enhancer regions were found to control the genes that establish the embryonic stem cell identity, including Oct-4, Sox2, Nanog, Klf4, and Esrrb. Perturbation of the super-enhancers associated with these genes showed a range of effects on their target genes’ expression. [20] Super-enhancers have been since identified near cell identity-regulators in a range of mouse and human tissues. [2] [3] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]
The enhancers comprising super-enhancers share the functions of enhancers, including binding transcription factor proteins, looping to target genes, and activating transcription. [1] [3] [19] [20] Three notable traits of enhancers comprising super-enhancers are their clustering in genomic proximity, their exceptional signal of transcription-regulating proteins, and their high frequency of physical interaction with each other. Perturbing the DNA of enhancers comprising super-enhancers showed a range of effects on the expression of cell identity genes, suggesting a complex relationship between the constituent enhancers. [20] Super-enhancers separated by tens of megabases cluster in three-dimensions inside the nucleus of mouse embryonic stem cells. [38] [39]
High levels of many transcription factors and co-factors are seen at super-enhancers (e.g., CDK7, BRD4, and Mediator). [1] [3] [5] [6] [8] [9] [19] This high concentration of transcription-regulating proteins suggests why their target genes tend to be more highly expressed than other classes of genes. However, housekeeping genes tend to be more highly expressed than super-enhancer—associated genes. [1]
Super-enhancers may have evolved at key cell identity genes to render the transcription of these genes responsive to an array of external cues. [20] The enhancers comprising a super-enhancer can each be responsive to different signals, which allows the transcription of a single gene to be regulated by multiple signaling pathways. [20] Pathways seen to regulate their target genes using super-enhancers include Wnt, TGFb, LIF, BDNF, and NOTCH. [20] [40] [41] [42] [43] The constituent enhancers of super-enhancers physically interact with each other and their target genes over a long range sequence-wise. [7] [22] [44] Super-enhancers that control the expression of major cell surface receptors with a crucial role in the function of a given cell lineage have also been defined. This is notably the case for B-lymphocytes, the survival, the activation and the differentiation of which rely on the expression of membrane-form immunoglobulins (Ig). The Ig heavy chain locus super-enhancer is a very large (25kb) cis-regulatory region, including multiple enhancers and controlling several major modifications of the locus (notably somatic hypermutation, class-switch recombination and locus suicide recombination).
Mutations in super-enhancers have been noted in various diseases, including cancers, type 1 diabetes, Alzheimer’s disease, lupus, rheumatoid arthritis, multiple sclerosis, systemic scleroderma, primary biliary cirrhosis, Crohn’s disease, Graves disease, vitiligo, and atrial fibrillation. [2] [3] [6] [25] [32] [35] [45] [46] [47] [48] [49] A similar enrichment in disease-associated sequence variation has also been observed for stretch enhancers. [2]
Super-enhancers may play important roles in the misregulation of gene expression in cancer. During tumor development, tumor cells acquire super-enhancers at key oncogenes, which drive higher levels of transcription of these genes than in healthy cells. [3] [5] [44] [45] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] Altered super-enhancer function is also induced by mutations of chromatin regulators. [60] Acquired super-enhancers may thus be biomarkers that could be useful for diagnosis and therapeutic intervention. [20]
Proteins enriched at super-enhancers include the targets of small molecules that target transcription-regulating proteins and have been deployed against cancers. [5] [6] [25] [61] For instance, super-enhancers rely on exceptional amounts of CDK7, and, in cancer, multiple papers report the loss of expression of their target genes when cells are treated with the CDK7 inhibitor THZ1. [5] [8] [9] [62] Similarly, super-enhancers are enriched in the target of the JQ1 small molecule, BRD4, so treatment with JQ1 causes exceptional losses in expression for super-enhancer—associated genes. [6]
Super-enhancers have been most commonly identified by locating genomic regions that are highly enriched in ChIP-Seq signal. ChIP-Seq experiments targeting master transcription factors and co-factors like Mediator or BRD4 have been used, but the most frequently used is H3K27ac-marked nucleosomes. [1] [3] [6] [63] [64] [65] The program “ROSE” (Rank Ordering of Super-Enhancers) is commonly used to identify super-enhancers from ChIP-Seq data. This program stitches together previously identified enhancer regions and ranks these stitched enhancers by their ChIP-Seq signal. [1] The stitching distance selected to combine multiple individual enhancers into larger domains can vary. Because some markers of enhancer activity also are enriched in promoters, regions within promoters of genes can be disregarded. ROSE separates super-enhancers from typical enhancers by their exceptional enrichment in a mark of enhancer activity. Homer is another tool that can identify super-enhancers. [66]
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins produce messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).
In genetics, an enhancer is a short region of DNA that can be bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcription factors. Enhancers are cis-acting. They can be located up to 1 Mbp away from the gene, upstream or downstream from the start site. There are hundreds of thousands of enhancers in the human genome. They are found in both prokaryotes and eukaryotes. Active enhancers typically get transcribed as enhancer or regulatory non-coding RNA, whose expression levels correlate with mRNA levels of target genes.
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.
The epithelial–mesenchymal transition (EMT) is a process by which epithelial cells lose their cell polarity and cell–cell adhesion, and gain migratory and invasive properties to become mesenchymal stem cells; these are multipotent stromal cells that can differentiate into a variety of cell types. EMT is essential for numerous developmental processes including mesoderm formation and neural tube formation. EMT has also been shown to occur in wound healing, in organ fibrosis and in the initiation of metastasis in cancer progression.
An insulator is a type of cis-regulatory element known as a long-range regulatory element. Found in multicellular eukaryotes and working over distances from the promoter element of the target gene, an insulator is typically 300 bp to 2000 bp in length. Insulators contain clustered binding sites for sequence specific DNA-binding proteins and mediate intra- and inter-chromosomal interactions.
The p300-CBP coactivator family in humans is composed of two closely related transcriptional co-activating proteins :
Transcription factor II H (TFIIH) is an important protein complex, having roles in transcription of various protein-coding genes and DNA nucleotide excision repair (NER) pathways. TFIIH first came to light in 1989 when general transcription factor-δ or basic transcription factor 2 was characterized as an indispensable transcription factor in vitro. This factor was also isolated from yeast and finally named TFIIH in 1992.
GATA2 or GATA-binding factor 2 is a transcription factor, i.e. a nuclear protein which regulates the expression of genes. It regulates many genes that are critical for the embryonic development, self-renewal, maintenance, and functionality of blood-forming, lympathic system-forming, and other tissue-forming stem cells. GATA2 is encoded by the GATA2 gene, a gene which often suffers germline and somatic mutations which lead to a wide range of familial and sporadic diseases, respectively. The gene and its product are targets for the treatment of these diseases.
Metastasis-associated protein MTA1 is a protein that in humans is encoded by the MTA1 gene. MTA1 is the founding member of the MTA family of genes. MTA1 is primarily localized in the nucleus but also found to be distributed in the extra-nuclear compartments. MTA1 is a component of several chromatin remodeling complexes including the nucleosome remodeling and deacetylation complex (NuRD). MTA1 regulates gene expression by functioning as a coregulator to integrate DNA-interacting factors to gene activity. MTA1 participates in physiological functions in the normal and cancer cells. MTA1 is one of the most upregulated proteins in human cancer and associates with cancer progression, aggressive phenotypes, and poor prognosis of cancer patients.
SRY -box 2, also known as SOX2, is a transcription factor that is essential for maintaining self-renewal, or pluripotency, of undifferentiated embryonic stem cells. Sox2 has a critical role in maintenance of embryonic and neural stem cells.
DNA damage-inducible transcript 3, also known as C/EBP homologous protein (CHOP), is a pro-apoptotic transcription factor that is encoded by the DDIT3 gene. It is a member of the CCAAT/enhancer-binding protein (C/EBP) family of DNA-binding transcription factors. The protein functions as a dominant-negative inhibitor by forming heterodimers with other C/EBP members, preventing their DNA binding activity. The protein is implicated in adipogenesis and erythropoiesis and has an important role in the cell's stress response.
Cell division protein kinase 8 is an enzyme that in humans is encoded by the CDK8 gene.
Metastasis-associated protein MTA2 is a protein that in humans is encoded by the MTA2 gene.
T-box transcription factor TBX3 is a protein that in humans is encoded by the TBX3 gene.
Neurogenins, often abbreviated as Ngn, are a family of bHLH transcription factors involved in specifying neuronal differentiation. The family consisting of Neurogenin-1, Neurogenin-2, and Neurogenin-3, plays a fundamental role in specifying neural precursor cells and regulating the differentiation of neurons during embryonic development. It is one of many gene families related to the atonal gene in Drosophila. Other positive regulators of neuronal differentiation also expressed during early neural development include NeuroD and ASCL1.
Histone-lysine N-methyltransferase 2D (KMT2D), also known as MLL4 and sometimes MLL2 in humans and Mll4 in mice, is a major mammalian histone H3 lysine 4 (H3K4) mono-methyltransferase. It is part of a family of six Set1-like H3K4 methyltransferases that also contains KMT2A, KMT2B, KMT2C, KMT2F, and KMT2G.
Richard Allen Young is an American geneticist, a Member of Whitehead Institute, and a professor of biology at the Massachusetts Institute of Technology. He is a pioneer in the systems biology of gene control who has developed genomics technologies and concepts key to understanding gene control in human health and disease. He has served as an advisor to the World Health Organization and the National Institutes of Health. He is a member of the National Academy of Sciences and the National Academy of Medicine. Scientific American has recognized him as one of the top 50 leaders in science, technology and business. Young is among the most Highly Cited Researchers in his field.
In mammalian biology, insulated neighborhoods are chromosomal loop structures formed by the physical interaction of two DNA loci bound by the transcription factor CTCF and co-occupied by cohesin. Insulated neighborhoods are thought to be structural and functional units of gene control because their integrity is important for normal gene regulation. Current evidence suggests that these structures form the mechanistic underpinnings of higher-order chromosome structures, including topologically associating domains (TADs). Insulated neighborhoods are functionally important in understanding gene regulation in normal cells and dysregulated gene expression in disease.
In genetics, transcriptional amplification is the process in which the total amount of messenger RNA (mRNA) molecules from expressed genes is increased during disease, development, or in response to stimuli.