In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. [1] The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splicing to create monocistronic mRNAs that are translated separately, i.e. several strands of mRNA that each encode a single gene product. The result of this is that the genes contained in the operon are either expressed together or not at all. Several genes must be co-transcribed to define an operon. [2]
Originally, operons were thought to exist solely in prokaryotes (which includes organelles like plastids that are derived from bacteria), but their discovery in eukaryotes was shown in the early 1990s, and are considered to be rare. [3] [4] [5] [6] In general, expression of prokaryotic operons leads to the generation of polycistronic mRNAs, while eukaryotic operons lead to monocistronic mRNAs.
Operons are also found in viruses such as bacteriophages. [7] [8] For example, T7 phages have two operons. The first operon codes for various products, including a special T7 RNA polymerase which can bind to and transcribe the second operon. The second operon includes a lysis gene meant to cause the host cell to burst. [9]
The term "operon" was first proposed in a short paper in the Proceedings of the French Academy of Science in 1960. [10] From this paper, the so-called general theory of the operon was developed. This theory suggested that in all cases, genes within an operon are negatively controlled by a repressor acting at a single operator located before the first gene. Later, it was discovered that genes could be positively regulated and also regulated at steps that follow transcription initiation. Therefore, it is not possible to talk of a general regulatory mechanism, because different operons have different mechanisms. Today, the operon is simply defined as a cluster of genes transcribed into a single mRNA molecule. Nevertheless, the development of the concept is considered a landmark event in the history of molecular biology. The first operon to be described was the lac operon in E. coli . [10] The 1965 Nobel Prize in Physiology and Medicine was awarded to François Jacob, André Michel Lwoff and Jacques Monod for their discoveries concerning the operon and virus synthesis.
Operons occur primarily in prokaryotes but also rarely in some eukaryotes, including nematodes such as C. elegans and the fruit fly, Drosophila melanogaster. [3] rRNA genes often exist in operons that have been found in a range of eukaryotes including chordates. An operon is made up of several structural genes arranged under a common promoter and regulated by a common operator. It is defined as a set of adjacent structural genes, plus the adjacent regulatory signals that affect transcription of the structural genes.5 [12] The regulators of a given operon, including repressors, corepressors, and activators, are not necessarily coded for by that operon. The location and condition of the regulators, promoter, operator and structural DNA sequences can determine the effects of common mutations.
Operons are related to regulons, stimulons and modulons; whereas operons contain a set of genes regulated by the same operator, regulons contain a set of genes under regulation by a single regulatory protein, and stimulons contain a set of genes under regulation by a single cell stimulus. According to its authors, the term "operon" is derived from the verb "to operate". [13]
An operon contains one or more structural genes which are generally transcribed into one polycistronic mRNA (a single mRNA molecule that codes for more than one protein). However, the definition of an operon does not require the mRNA to be polycistronic, though in practice, it usually is. [6] Upstream of the structural genes lies a promoter sequence which provides a site for RNA polymerase to bind and initiate transcription. Close to the promoter lies a section of DNA called an operator.
All the structural genes of an operon are turned ON or OFF together, due to a single promoter and operator upstream to them, but sometimes more control over the gene expression is needed. To achieve this aspect, some bacterial genes are located near together, but there is a specific promoter for each of them; this is called gene clustering. Usually these genes encode proteins which will work together in the same pathway, such as a metabolic pathway. Gene clustering helps a prokaryotic cell to produce metabolic enzymes in a correct order. [14] In one study, it has been posited that in the Asgard (archaea), ribosomal protein coding genes occur in clusters that are less conserved in their organization than in other Archaea; the closer an Asgard (archaea) is to the eukaryotes, the more dispersed is the arrangement of the ribosomal protein coding genes. [15]
An operon is made up of 3 basic DNA components:
Not always included within the operon, but important in its function is a regulatory gene, a constantly expressed gene which codes for repressor proteins. The regulatory gene does not need to be in, adjacent to, or even near the operon to control it. [17]
An inducer (small molecule) can displace a repressor (protein) from the operator site (DNA), resulting in an uninhibited operon.
Alternatively, a corepressor can bind to the repressor to allow its binding to the operator site. A good example of this type of regulation is seen for the trp operon.
Control of an operon is a type of gene regulation that enables organisms to regulate the expression of various genes depending on environmental conditions. Operon regulation can be either negative or positive by induction or repression. [16]
Negative control involves the binding of a repressor to the operator to prevent transcription.
Operons can also be positively controlled. With positive control, an activator protein stimulates transcription by binding to DNA (usually at a site other than the operator).
The lac operon of the model bacterium Escherichia coli was the first operon to be discovered and provides a typical example of operon function. It consists of three adjacent structural genes, a promoter, a terminator, and an operator. The lac operon is regulated by several factors including the availability of glucose and lactose. It can be activated by allolactose. Lactose binds to the repressor protein and prevents it from repressing gene transcription. This is an example of the derepressible (from above: negative inducible) model. So it is a negative inducible operon induced by presence of lactose or allolactose.
Discovered in 1953 by Jacques Monod and colleagues, the trp operon in E. coli was the first repressible operon to be discovered. While the lac operon can be activated by a chemical (allolactose), the tryptophan (Trp) operon is inhibited by a chemical (tryptophan). This operon contains five structural genes: trp E, trp D, trp C, trp B, and trp A, which encodes tryptophan synthetase. It also contains a promoter which binds to RNA polymerase and an operator which blocks transcription when bound to the protein synthesized by the repressor gene (trp R) that binds to the operator. In the lac operon, lactose binds to the repressor protein and prevents it from repressing gene transcription, while in the trp operon, tryptophan binds to the repressor protein and enables it to repress gene transcription. Also unlike the lac operon, the trp operon contains a leader peptide and an attenuator sequence which allows for graded regulation. [18] This is an example of the corepressible model.
The number and organization of operons has been studied most critically in E. coli . As a result, predictions can be made based on an organism's genomic sequence.
One prediction method uses the intergenic distance between reading frames as a primary predictor of the number of operons in the genome. The separation merely changes the frame and guarantees that the read through is efficient. Longer stretches exist where operons start and stop, often up to 40–50 bases. [19]
An alternative method to predict operons is based on finding gene clusters where gene order and orientation is conserved in two or more genomes. [20]
Operon prediction is even more accurate if the functional class of the molecules is considered. Bacteria have clustered their reading frames into units, sequestered by co-involvement in protein complexes, common pathways, or shared substrates and transporters. Thus, accurate prediction would involve all of these data, a difficult task indeed.
Pascale Cossart's laboratory was the first to experimentally identify all operons of a microorganism, Listeria monocytogenes . The 517 polycistronic operons are listed in a 2009 study describing the global changes in transcription that occur in L. monocytogenes under different conditions. [21]
Enterobacteria phage λ is a bacterial virus, or bacteriophage, that infects the bacterial species Escherichia coli. It was discovered by Esther Lederberg in 1950. The wild type of this virus has a temperate life cycle that allows it to either reside within the genome of its host through lysogeny or enter into a lytic phase, during which it kills and lyses the cell to produce offspring. Lambda strains, mutated at specific sites, are unable to lysogenize cells; instead, they grow and enter the lytic cycle after superinfecting an already lysogenized cell.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.
The lac repressor (LacI) is a DNA-binding protein that inhibits the expression of genes coding for proteins involved in the metabolism of lactose in bacteria. These genes are repressed when lactose is not available to the cell, ensuring that the bacterium only invests energy in the production of machinery necessary for uptake and utilization of lactose when lactose is present. When lactose becomes available, it is firstly converted into allolactose by β-Galactosidase (lacZ) in bacteria. The DNA binding ability of lac repressor bound with allolactose is inhibited due to allosteric regulation, thereby genes coding for proteins involved in lactose uptake and utilization can be expressed.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
The lactose operon is an operon required for the transport and metabolism of lactose in E. coli and many other enteric bacteria. Although glucose is the preferred carbon source for most enteric bacteria, the lac operon allows for the effective digestion of lactose when glucose is not available through the activity of beta-galactosidase. Gene regulation of the lac operon was the first genetic regulatory mechanism to be understood clearly, so it has become a foremost example of prokaryotic gene regulation. It is often discussed in introductory molecular and cellular biology classes for this reason. This lactose metabolism system was used by François Jacob and Jacques Monod to determine how a biological cell knows which enzyme to synthesize. Their work on the lac operon won them the Nobel Prize in Physiology in 1965.
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.
A transcriptional activator is a protein that increases transcription of a gene or set of genes. Activators are considered to have positive control over gene expression, as they function to promote gene transcription and, in some cases, are required for the transcription of genes to occur. Most activators are DNA-binding proteins that bind to enhancers or promoter-proximal elements. The DNA site bound by the activator is referred to as an "activator-binding site". The part of the activator that makes protein–protein interactions with the general transcription machinery is referred to as an "activating region" or "activation domain".
Tryptophan repressor is a transcription factor involved in controlling amino acid metabolism. It has been best studied in Escherichia coli, where it is a dimeric protein that regulates transcription of the 5 genes in the tryptophan operon. When the amino acid tryptophan is plentiful in the cell, it binds to the protein, which causes a conformational change in the protein. The repressor complex then binds to its operator sequence in the genes it regulates, shutting off the genes.
In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the promoter, thus preventing transcription of the genes into messenger RNA. An RNA-binding repressor binds to the mRNA and prevents translation of the mRNA into protein. This blocking or reducing of expression is called repression.
In genetics, a silencer is a DNA sequence capable of binding transcription regulation factors, called repressors. DNA contains genes and provides the template to produce messenger RNA (mRNA). That mRNA is then translated into proteins. When a repressor protein binds to the silencer region of DNA, RNA polymerase is prevented from transcribing the DNA sequence into RNA. With transcription blocked, the translation of RNA into proteins is impossible. Thus, silencers prevent genes from being expressed as proteins.
In genetics, a regulator gene, regulator, or regulatory gene is a gene involved in controlling the expression of one or more other genes. Regulatory sequences, which encode regulatory genes, are often at the five prime end (5') to the start site of transcription of the gene they regulate. In addition, these sequences can also be found at the three prime end (3') to the transcription start site. In both cases, whether the regulatory sequence occurs before (5') or after (3') the gene it regulates, the sequence is often many kilobases away from the transcription start site. A regulator gene may encode a protein, or it may work at the level of RNA, as in the case of genes encoding microRNAs. An example of a regulator gene is a gene that codes for a repressor protein that inhibits the activity of an operator.
In molecular biology, an inducer is a molecule that regulates gene expression. An inducer functions in two ways; namely:
Cis-regulatory elements (CREs) or cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.
In genetics, attenuation is a regulatory mechanism for some bacterial operons that results in premature termination of transcription. The canonical example of attenuation used in many introductory genetics textbooks, is ribosome-mediated attenuation of the trp operon. Ribosome-mediated attenuation of the trp operon relies on the fact that, in bacteria, transcription and translation proceed simultaneously. Attenuation involves a provisional stop signal (attenuator), located in the DNA segment that corresponds to the leader sequence of mRNA. During attenuation, the ribosome becomes stalled (delayed) in the attenuator region in the mRNA leader. Depending on the metabolic conditions, the attenuator either stops transcription at that point or allows read-through to the structural gene part of the mRNA and synthesis of the appropriate protein.
Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.
The trp operon is a group of genes that are transcribed together, encoding the enzymes that produce the amino acid tryptophan in bacteria. The trp operon was first characterized in Escherichia coli, and it has since been discovered in many other bacteria. The operon is regulated so that, when tryptophan is present in the environment, the genes for tryptophan synthesis are repressed.
The L-arabinose operon, also called the ara or araBAD operon, is an operon required for the breakdown of the five-carbon sugar L-arabinose in Escherichia coli. The L-arabinose operon contains three structural genes: araB, araA, araD, which encode for three metabolic enzymes that are required for the metabolism of L-arabinose. AraB (ribulokinase), AraA, and AraD produced by these genes catalyse conversion of L-arabinose to an intermediate of the pentose phosphate pathway, D-xylulose-5-phosphate.
The gal operon is a prokaryotic operon, which encodes enzymes necessary for galactose metabolism. Repression of gene expression for this operon works via binding of repressor molecules to two operators. These repressors dimerize, creating a loop in the DNA. The loop as well as hindrance from the external operator prevent RNA polymerase from binding to the promoter, and thus prevent transcription. Additionally, since the metabolism of galactose in the cell is involved in both anabolic and catabolic pathways, a novel regulatory system using two promoters for differential repression has been identified and characterized within the context of the gal operon.
RegulonDB is a database of the regulatory network of gene expression in Escherichia coli K-12. RegulonDB also models the organization of the genes in transcription units, operons and regulons. A total of 120 sRNAs with 231 total interactions which all together regulate 192 genes are also included. RegulonDB was founded in 1998 and also contributes data to the EcoCyc database.