Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.
CREs are found in the vicinity of the genes that they regulate. CREs typically regulate gene transcription by binding to transcription factors. A single transcription factor may bind to many CREs, and hence control the expression of many genes (pleiotropy). The Latin prefix cis means "on this side", i.e. on the same molecule of DNA as the gene(s) to be transcribed.
CRMs are stretches of DNA, usually 100–1000 DNA base pairs in length, [1] where a number of transcription factors can bind and regulate expression of nearby genes and regulate their transcription rates. They are labeled as cis because they are typically located on the same DNA strand as the genes they control as opposed to trans, which refers to effects on genes not located on the same strand or farther away, such as transcription factors. [1] One cis-regulatory element can regulate several genes, [2] and conversely, one gene can have several cis-regulatory modules. [3] Cis-regulatory modules carry out their function by integrating the active transcription factors and the associated co-factors at a specific time and place in the cell where this information is read and an output is given. [4]
CREs are often but not always upstream of the transcription site. CREs contrast with trans-regulatory elements (TREs). TREs code for transcription factors.[ citation needed ]
The genome of an organism contains anywhere from a few hundred to thousands of different genes, all encoding a singular product or more. For numerous reasons, including organizational maintenance, energy conservation, and generating phenotypic variance, it is important that genes are only expressed when they are needed. The most efficient way for an organism to regulate gene expression is at the transcriptional level. CREs function to control transcription by acting nearby or within a gene. The most well characterized types of CREs are enhancers and promoters. Both of these sequence elements are structural regions of DNA that serve as transcriptional regulators.[ citation needed ]
Cis-regulatory modules are one of several types of functional regulatory elements. Regulatory elements are binding sites for transcription factors, which are involved in gene regulation. [1] Cis-regulatory modules perform a large amount of developmental information processing. [1] Cis-regulatory modules are non-random clusters at their specified target site that contain transcription factor binding sites. [1]
The original definition presented cis-regulatory modules as enhancers of cis-acting DNA, which increased the rate of transcription from a linked promoter. [4] However, this definition has changed to define cis-regulatory modules as a DNA sequence with transcription factor binding sites which are clustered into modular structures, including -but not limited to- locus control regions, promoters, enhancers, silencers, boundary control elements and other modulators. [4]
Cis-regulatory modules can be divided into three classes; enhancers, which regulate gene expression positively; [1] insulators, which work indirectly by interacting with other nearby cis-regulatory modules; and [1] silencers that turn off expression of genes. [1]
The design of cis-regulatory modules is such that transcription factors and epigenetic modifications serve as inputs, and the output of the module is the command given to the transcription machinery, which in turn determines the rate of gene transcription or whether it is turned on or off. [1] There are two types of transcription factor inputs: those that determine when the target gene is to be expressed and those that serve as functional drivers, which come into play only during specific situations during development. [1] These inputs can come from different time points, can represent different signal ligands, or can come from different domains or lineages of cells. However, a lot still remains unknown.[ citation needed ]
Additionally, the regulation of chromatin structure and nuclear organization also play a role in determining and controlling the function of cis-regulatory modules. [4] Thus gene-regulation functions (GRF) provide a unique characteristic of a cis-regulatory module (CRM), relating the concentrations of transcription factors (input) to the promoter activities (output). The challenge is to predict GRFs. This challenge still remains unsolved. In general, gene-regulation functions do not use Boolean logic, [2] although in some cases the approximation of the Boolean logic is still very useful.[ citation needed ]
Within the assumption of the Boolean logic, principles guiding the operation of these modules includes the design of the module which determines the regulatory function. In relation to development, these modules can generate both positive and negative outputs. The output of each module is a product of the various operations performed on it. Common operations include the OR gate – this design indicates that in an output will be given when either input is given [3], and the AND gate – in this design two different regulatory factors are necessary to make sure that a positive output results. [1] "Toggle Switches" – This design occurs when the signal ligand is absent while the transcription factor is present; this transcription factor ends up acting as a dominant repressor. However, once the signal ligand is present the transcription factor's role as repressor is eliminated and transcription can occur. [1]
Other Boolean logic operations can occur as well, such as sequence specific transcriptional repressors, which when they bind to the cis-regulatory module lead to an output of zero. Additionally, besides influence from the different logic operations, the output of a "cis"-regulatory module will also be influenced by prior events. [1] 4) Cis-regulatory modules must interact with other regulatory elements. For the most part, even with the presence of functional overlap between cis-regulatory modules of a gene, the modules' inputs and outputs tend to not be the same. [1]
While the assumption of Boolean logic is important for systems biology, detailed studies show that in general the logic of gene regulation is not Boolean. [2] This means, for example, that in the case of a cis-regulatory module regulated by two transcription factors, experimentally determined gene-regulation functions can not be described by the 16 possible Boolean functions of two variables. Non-Boolean extensions of the gene-regulatory logic have been proposed to correct for this issue. [2]
Cis-regulatory modules can be characterized by the information processing that they encode and the organization of their transcription factor binding sites. Additionally, cis-regulatory modules are also characterized by the way they affect the probability, proportion, and rate of transcription. [4] Highly cooperative and coordinated cis-regulatory modules are classified as enhanceosomes. [4] The architecture and the arrangement of the transcription factor binding sites are critical because disruption of the arrangement could cancel out the function. [4] Functional flexible cis-regulatory modules are called billboards. Their transcriptional output is the summation effect of the bound transcription factors. [4] Enhancers affect the probability of a gene being activated, but have little or no effect on rate. [4] The Binary response model acts like an on/off switch for transcription. This model will increase or decrease the amount of cells that transcribe a gene, but it does not affect the rate of transcription. [4] Rheostatic response model describes cis-regulatory modules as regulators of the initiation rate of transcription of its associated gene. [4]
Promoters are CREs consisting of relatively short sequences of DNA which include the site where transcription is initiated and the region approximately 35 bp upstream or downstream from the initiation site (bp). [5] In eukaryotes, promoters usually have the following four components: the TATA box, a TFIIB recognition site, an initiator, and the downstream core promoter element. [5] It has been found that a single gene can contain multiple promoter sites. [6] In order to initiate transcription of the downstream gene, a host of DNA-binding proteins called transcription factors (TFs) must bind sequentially to this region. [5] Only once this region has been bound with the appropriate set of TFs, and in the proper order, can RNA polymerase bind and begin transcribing the gene.
Enhancers are CREs that influence (enhance) the transcription of genes on the same molecule of DNA and can be found upstream, downstream, within the introns, or even relatively far away from the gene they regulate. Multiple enhancers can act in a coordinated fashion to regulate transcription of one gene. [7] A number of genome-wide sequencing projects have revealed that enhancers are often transcribed to long non-coding RNA (lncRNA) or enhancer RNA (eRNA), whose changes in levels frequently correlate with those of the target gene mRNA. [8]
Silencers are CREs that can bind transcription regulation factors (proteins) called repressors, thereby preventing transcription of a gene. The term "silencer" can also refer to a region in the 3' untranslated region of messenger RNA, that binds proteins which suppress translation of that mRNA molecule, but this usage is distinct from its use in describing a CRE.[ citation needed ]
Operators are CREs in prokaryotes and some eukaryotes that exist within operons, where they can bind proteins called repressors to affect transcription.[ citation needed ]
CREs have an important evolutionary role. The coding regions of genes are often well conserved among organisms; yet different organisms display marked phenotypic diversity. It has been found that polymorphisms occurring within non-coding sequences have a profound effect on phenotype by altering gene expression. [7] Mutations arising within a CRE can generate expression variance by changing the way TFs bind. Tighter or looser binding of regulatory proteins will lead to up- or down-regulated transcription.
The function of a gene regulatory network depends on the architecture of the nodes, whose function is dependent on the multiple cis-regulatory modules. [1] The layout of cis-regulatory modules can provide enough information to generate spatial and temporal patterns of gene expression. [1] During development each domain, where each domain represents a different spatial regions of the embryo, of gene expression will be under the control of different cis-regulatory modules. [1] The design of regulatory modules help in producing feedback, feed forward, and cross-regulatory loops. [9]
Cis-regulatory modules can regulate their target genes over large distances. Several models have been proposed to describe the way that these modules may communicate with their target gene promoter. [4] These include the DNA scanning model, the DNA sequence looping model and the facilitated tracking model. In the DNA scanning model, the transcription factor and cofactor complex form at the cis-regulatory module and then continues to move along the DNA sequence until it finds the target gene promoter. [4] In the looping model, the transcription factor binds to the cis-regulatory module, which then causes the looping of the DNA sequence and allows for the interaction with the target gene promoter. The transcription factor-cis-regulatory module complex causes the looping of the DNA sequence slowly towards the target promoter and forms a stable looped configuration. [4] The facilitated tracking model combines parts of the two previous models.
Besides experimentally determining CRMs, there are various bioinformatics algorithms for predicting them. Most algorithms try to search for significant combinations of transcription factor binding sites (DNA binding sites) in promoter sequences of co-expressed genes. [10] More advanced methods combine the search for significant motifs with correlation in gene expression datasets between transcription factors and target genes. [11] Both methods have been implemented, for example, in the ModuleMaster. Other programs created for the identification and prediction of cis-regulatory modules include:
INSECT 2.0 [12] is a web server that allows to search Cis-regulatory modules in a genome-wide manner. The program relies on the definition of strict restrictions among the Transcription Factor Binding Sites (TFBSs) that compose the module in order to decrease the false positives rate. INSECT is designed to be user-friendly since it allows automatic retrieval of sequences and several visualizations and links to third-party tools in order to help users to find those instances that are more likely to be true regulatory sites. INSECT 2.0 algorithm was previously published and the algorithm and theory behind it explained in [13]
Stubb uses hidden Markov models to identify statistically significant clusters of transcription factor combinations. It also uses a second related genome to improve the prediction accuracy of the model. [14]
Bayesian Networks use an algorithm that combines site predictions and tissue-specific expression data for transcription factors and target genes of interest. This model also uses regression trees to depict the relationship between the identified cis-regulatory module and the possible binding set of transcription factors. [15]
CRÈME examine clusters of target sites for transcription factors of interest. This program uses a database of confirmed transcription factor binding sites that were annotated across the human genome. A search algorithm is applied to the data set to identify possible combinations of transcription factors, which have binding sites that are close to the promoter of the gene set of interest. The possible cis-regulatory modules are then statistically analyzed and the significant combinations are graphically represented [16]
Active cis-regulatory modules in a genomic sequence have been difficult to identify. Problems in identification arise because often scientists find themselves with a small set of known transcription factors, so it makes it harder to identify statistically significant clusters of transcription factor binding sites. [14] Additionally, high costs limit the use of large whole genome tiling arrays. [15]
An example of a cis-acting regulatory sequence is the operator in the lac operon. This DNA sequence is bound by the lac repressor, which, in turn, prevents transcription of the adjacent genes on the same DNA molecule. The lac operator is, thus, considered to "act in cis" on the regulation of the nearby genes. The operator itself does not code for any protein or RNA.
In contrast, trans-regulatory elements are diffusible factors, usually proteins, that may modify the expression of genes distant from the gene that was originally transcribed to create them. For example, a transcription factor that regulates a gene on chromosome 6 might itself have been transcribed from a gene on chromosome 11. The term trans-regulatory is constructed from the Latin root trans, which means "across from".
There are cis-regulatory and trans-regulatory elements. Cis-regulatory elements are often binding sites for one or more trans-acting factors.
To summarize, cis-regulatory elements are present on the same molecule of DNA as the gene they regulate whereas trans-regulatory elements can regulate genes distant from the gene from which they were transcribed.
Type | Abbr. | Function | Distribution | Ref. |
---|---|---|---|---|
Frameshift element | Regulates alternative frame use with messenger RNAs | Archaea, bacteria, Eukaryota, RNA viruses | [17] [18] [19] | |
Internal ribosome entry site | IRES | Initiates translation in the middle of a messenger RNA | RNA virus, Eukaryota | [20] |
Iron response element | IRE | Regulates the expression of iron associated genes | Eukaryota | [21] |
Leader peptide | Regulates transcription of associated genes and/or operons | Bacteria | [22] | |
Riboswitch | Gene regulation | Bacteria, Eukaryota | [23] | |
RNA thermometer | Gene regulation | Bacteria | [24] | |
Selenocysteine insertion sequence | SECIS | Directs the cell to translate UGA stop-codons as selenocysteines | Metazoa | [25] |
In genetics, a promoter is a sequence of DNA to which proteins bind to initiate transcription of a single RNA transcript from the DNA downstream of the promoter. The RNA transcript may encode a protein (mRNA), or can have a function in and of itself, such as tRNA or rRNA. Promoters are located near the transcription start sites of genes, upstream on the DNA . Promoters can be about 100–1000 base pairs long, the sequence of which is highly dependent on the gene and product of transcription, type or class of RNA polymerase recruited to the site, and species of organism.
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are 1500-1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. Gene expression is summarized in the central dogma of molecular biology first formulated by Francis Crick in 1958, further developed in his 1970 article, and expanded by the subsequent discoveries of reverse transcription and RNA replication.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules called non-coding RNAs (ncRNAs). mRNA comprises only 1–3% of total RNA samples. Less than 2% of the human genome can be transcribed into mRNA, while at least 80% of mammalian genomic DNA can be actively transcribed, with the majority of this 80% considered to be ncRNA.
In genetics, an enhancer is a short region of DNA that can be bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcription factors. Enhancers are cis-acting. They can be located up to 1 Mbp away from the gene, upstream or downstream from the start site. There are hundreds of thousands of enhancers in the human genome. They are found in both prokaryotes and eukaryotes.
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
In molecular biology, the TATA box is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes. The bacterial homolog of the TATA box is called the Pribnow box which has a shorter consensus sequence.
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.
A transcriptional activator is a protein that increases transcription of a gene or set of genes. Activators are considered to have positive control over gene expression, as they function to promote gene transcription and, in some cases, are required for the transcription of genes to occur. Most activators are DNA-binding proteins that bind to enhancers or promoter-proximal elements. The DNA site bound by the activator is referred to as an "activator-binding site". The part of the activator that makes protein–protein interactions with the general transcription machinery is referred to as an "activating region" or "activation domain".
In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the promoter, thus preventing transcription of the genes into messenger RNA. An RNA-binding repressor binds to the mRNA and prevents translation of the mRNA into protein. This blocking or reducing of expression is called repression.
In genetics, a silencer is a DNA sequence capable of binding transcription regulation factors, called repressors. DNA contains genes and provides the template to produce messenger RNA (mRNA). That mRNA is then translated into proteins. When a repressor protein binds to the silencer region of DNA, RNA polymerase is prevented from transcribing the DNA sequence into RNA. With transcription blocked, the translation of RNA into proteins is impossible. Thus, silencers prevent genes from being expressed as proteins.
A regulator gene, regulator, or regulatory gene is a gene involved in controlling the expression of one or more other genes. Regulatory sequences, which encode regulatory genes, are often at the five prime end (5') to the start site of transcription of the gene they regulate. In addition, these sequences can also be found at the three prime end (3') to the transcription start site. In both cases, whether the regulatory sequence occurs before (5') or after (3') the gene it regulates, the sequence is often many kilobases away from the transcription start site. A regulator gene may encode a protein, or it may work at the level of RNA, as in the case of genes encoding microRNAs. An example of a regulator gene is a gene that codes for a repressor protein that inhibits the activity of an operator.
Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.
A locus control region (LCR) is a long-range cis-regulatory element that enhances expression of linked genes at distal chromatin sites. It functions in a copy number-dependent manner and is tissue-specific, as seen in the selective expression of β-globin genes in erythroid cells. Expression levels of genes can be modified by the LCR and gene-proximal elements, such as promoters, enhancers, and silencers. The LCR functions by recruiting chromatin-modifying, coactivator, and transcription complexes. Its sequence is conserved in many vertebrates, and conservation of specific sites may suggest importance in function. It has been compared to a super-enhancer as both perform long-range cis regulation via recruitment of the transcription complex.
Eukaryotic transcription is the elaborate process that eukaryotic cells use to copy genetic information stored in DNA into units of transportable complementary RNA replica. Gene transcription occurs in both eukaryotic and prokaryotic cells. Unlike prokaryotic RNA polymerase that initiates the transcription of all different types of RNA, RNA polymerase in eukaryotes comes in three variations, each translating a different type of gene. A eukaryotic cell has a nucleus that separates the processes of transcription and translation. Eukaryotic transcription occurs within the nucleus where DNA is packaged into nucleosomes and higher order chromatin structures. The complexity of the eukaryotic genome necessitates a great variety and complexity of gene expression control.
Trans-regulatory elements (TRE) are DNA sequences encoding upstream regulators, which may modify or regulate the expression of distant genes. Trans-acting factors interact with cis-regulatory elements to regulate gene expression. TRE mediates expression profiles of a large number of genes via trans-acting factors. While TRE mutations affect gene expression, it is also one of the main driving factors for evolutionary divergence in gene expression.
The 5′ flanking region is a region of DNA that is adjacent to the 5′ end of the gene. The 5′ flanking region contains the promoter, and may contain enhancers or other protein binding sites. It is the region of DNA that is not transcribed into RNA. Not to be confused with the 5′ untranslated region, this region is not transcribed into RNA or translated into a functional protein. These regions primarily function in the regulation of gene transcription. 5′ flanking regions are categorized between prokaryotes and eukaryotes.
The bacterial one-hybrid (B1H) system is a method for identifying the sequence-specific target site of a DNA-binding domain. In this system, a given transcription factor (TF) is expressed as a fusion to a subunit of RNA polymerase. In parallel, a library of randomized oligonucleotides representing potential TF target sequences are cloned into a separate vector containing the selectable genes HIS3 and URA3. If the DNA-binding domain (bait) binds a potential DNA target site (prey) in vivo, it will recruit RNA polymerase to the promoter and activate transcription of the reporter genes in that clone. The two reporter genes, HIS3 and URA3, allow for positive and negative selections, respectively. At the end of the process, positive clones are sequenced and examined with motif-finding tools in order to resolve the favoured DNA target sequence.
Promoter activity is a term that encompasses several meanings around the process of gene expression from regulatory sequences —promoters and enhancers. Gene expression has been commonly characterized as a measure of how much, how fast, when and where this process happens. Promoters and enhancers are required for controlling where and when a specific gene is transcribed.