Directed evolution (DE) is a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids toward a user-defined goal. [1] It consists of subjecting a gene to iterative rounds of mutagenesis (creating a library of variants), selection (expressing those variants and isolating members with the desired function) and amplification (generating a template for the next round). It can be performed in vivo (in living organisms), or in vitro (in cells or free in solution). Directed evolution is used both for protein engineering as an alternative to rationally designing modified proteins, as well as for experimental evolution studies of fundamental evolutionary principles in a controlled, laboratory environment.
Directed evolution has its origins in the 1960s [2] with the evolution of RNA molecules in the "Spiegelman's Monster" experiment. [3] The concept was extended to protein evolution via evolution of bacteria under selection pressures that favoured the evolution of a single gene in its genome. [4]
Early phage display techniques in the 1980s allowed targeting of mutations and selection to a single protein. [5] This enabled selection of enhanced binding proteins, but was not yet compatible with selection for catalytic activity of enzymes. [6] Methods to evolve enzymes were developed in the 1990s and brought the technique to a wider scientific audience. [7] The field rapidly expanded with new methods for making libraries of gene variants and for screening their activity. [2] [8] The development of directed evolution methods was honored in 2018 with the awarding of the Nobel Prize in Chemistry to Frances Arnold for evolution of enzymes, and George Smith and Gregory Winter for phage display. [9]
Directed evolution is a mimic of the natural evolution cycle in a laboratory setting. Evolution requires three things to happen: variation between replicators, that the variation causes fitness differences upon which selection acts, and that this variation is heritable. In DE, a single gene is evolved by iterative rounds of mutagenesis, selection or screening, and amplification. [10] Rounds of these steps are typically repeated, using the best variant from one round as the template for the next to achieve stepwise improvements.
The likelihood of success in a directed evolution experiment is directly related to the total library size, as evaluating more mutants increases the chances of finding one with the desired properties. [11]
The first step in performing a cycle of directed evolution is the generation of a library of variant genes. The sequence space for random sequence is vast (10130 possible sequences for a 100 amino acid protein) and extremely sparsely populated by functional proteins. Neither experimental, [12] nor natural [13] [ failed verification ] evolution can ever get close to sampling so many sequences. Of course, natural evolution samples variant sequences close to functional protein sequences and this is imitated in DE by mutagenising an already functional gene. Some calculations suggest it is entirely feasible that for all practical (i.e. functional and structural) purposes, protein sequence space has been fully explored during the course of evolution of life on Earth. [13]
The starting gene can be mutagenised by random point mutations (by chemical mutagens or error prone PCR) [14] [15] and insertions and deletions (by transposons). [16] Gene recombination can be mimicked by DNA shuffling [17] [18] of several sequences (usually of more than 70% sequence identity) to jump into regions of sequence space between the shuffled parent genes. Finally, specific regions of a gene can be systematically randomised [19] for a more focused approach based on structure and function knowledge. Depending on the method, the library generated will vary in the proportion of functional variants it contains. Even if an organism is used to express the gene of interest, by mutagenising only that gene the rest of the organism's genome remains the same and can be ignored for the evolution experiment (to the extent of providing a constant genetic environment).
The majority of mutations are deleterious and so libraries of mutants tend to mostly have variants with reduced activity. [20] Therefore, a high-throughput assay is vital for measuring activity to find the rare variants with beneficial mutations that improve the desired properties. Two main categories of method exist for isolating functional variants. Selection systems directly couple protein function to survival of the gene, whereas screening systems individually assay each variant and allow a quantitative threshold to be set for sorting a variant or population of variants of a desired activity. Both selection and screening can be performed in living cells (in vivo evolution) or performed directly on the protein or RNA without any cells (in vitro evolution). [21] [22]
During in vivo evolution, each cell (usually bacteria or yeast) is transformed with a plasmid containing a different member of the variant library. In this way, only the gene of interest differs between the cells, with all other genes being kept the same. The cells express the protein either in their cytoplasm or surface where its function can be tested. This format has the advantage of selecting for properties in a cellular environment, which is useful when the evolved protein or RNA is to be used in living organisms. When performed without cells, DE involves using in vitro transcription translation to produce proteins or RNA free in solution or compartmentalised in artificial microdroplets. This method has the benefits of being more versatile in the selection conditions (e.g. temperature, solvent), and can express proteins that would be toxic to cells. Furthermore, in vitro evolution experiments can generate far larger libraries (up to 1015) because the library DNA need not be inserted into cells (often a limiting step).
Selection for binding activity is conceptually simple. The target molecule is immobilised on a solid support, a library of variant proteins is flowed over it, poor binders are washed away, and the remaining bound variants recovered to isolate their genes. [23] Binding of an enzyme to immobilised covalent inhibitor has been also used as an attempt to isolate active catalysts. This approach, however, only selects for single catalytic turnover and is not a good model of substrate binding or true substrate reactivity. If an enzyme activity can be made necessary for cell survival, either by synthesizing a vital metabolite, or destroying a toxin, then cell survival is a function of enzyme activity. [24] [25] Such systems are generally only limited in throughput by the transformation efficiency of cells. They are also less expensive and labour-intensive than screening, however they are typically difficult to engineer, prone to artefacts and give no information on the range of activities present in the library.
An alternative to selection is a screening system. Each variant gene is individually expressed and assayed to quantitatively measure the activity (most often by a colourgenic or fluorogenic product). The variants are then ranked and the experimenter decides which variants to use as templates for the next round of DE. Even the most high throughput assays usually have lower coverage than selection methods but give the advantage of producing detailed information on each one of the screened variants. This disaggregated data can also be used to characterise the distribution of activities in libraries which is not possible in simple selection systems. Screening systems, therefore, have advantages when it comes to experimentally characterising adaptive evolution and fitness landscapes.
When functional proteins have been isolated, it is necessary that their genes are too, therefore a genotype–phenotype link is required. [24] This can be covalent, such as mRNA display where the mRNA gene is linked to the protein at the end of translation by puromycin. [12] Alternatively the protein and its gene can be co-localised by compartmentalisation in living cells [26] or emulsion droplets. [27] The gene sequences isolated are then amplified by PCR or by transformed host bacteria. Either the single best sequence, or a pool of sequences can be used as the template for the next round of mutagenesis. The repeated cycles of Diversification-Selection-Amplification generate protein variants adapted to the applied selection pressures.
Rational design of a protein relies on an in-depth knowledge of the protein structure, as well as its catalytic mechanism. [28] [29] Specific changes are then made by site-directed mutagenesis in an attempt to change the function of the protein. A drawback of this is that even when the structure and mechanism of action of the protein are well known, the change due to mutation is still difficult to predict. Therefore, an advantage of DE is that there is no need to understand the mechanism of the desired activity or how mutations would affect it. [30]
A restriction of directed evolution is that a high-throughput assay is required in order to measure the effects of a large number of different random mutations. This can require extensive research and development before it can be used for directed evolution. Additionally, such assays are often highly specific to monitoring a particular activity and so are not transferable to new DE experiments. [31]
Additionally, selecting for improvement in the assayed function simply generates improvements in the assayed function. To understand how these improvements are achieved, the properties of the evolving enzyme have to be measured. Improvement of the assayed activity can be due to improvements in enzyme catalytic activity or enzyme concentration. There is also no guarantee that improvement on one substrate will improve activity on another. This is particularly important when the desired activity cannot be directly screened or selected for and so a ‘proxy’ substrate is used. DE can lead to evolutionary specialisation to the proxy without improving the desired activity. Consequently, choosing appropriate screening or selection conditions is vital for successful DE. [32]
The speed of evolution in an experiment also poses a limitation on the utility of directed evolution. For instance, evolution of a particular phenotype, while theoretically feasible, may occur on time-scales that are not practically feasible. [33] Recent theoretical approaches have aimed to overcome the limitation of speed through an application of counter-diabatic driving techniques from statistical physics, though this has yet to be implemented in a directed evolution experiment. [34]
Combined, 'semi-rational' approaches are being investigated to address the limitations of both rational design and directed evolution. [1] [35] Beneficial mutations are rare, so large numbers of random mutants have to be screened to find improved variants. 'Focused libraries' concentrate on randomising regions thought to be richer in beneficial mutations for the mutagenesis step of DE. A focused library contains fewer variants than a traditional random mutagenesis library and so does not require such high-throughput screening.
Creating a focused library requires some knowledge of which residues in the structure to mutate. For example, knowledge of the active site of an enzyme may allow just the residues known to interact with the substrate to be randomised. [36] [37] Alternatively, knowledge of which protein regions are variable in nature can guide mutagenesis in just those regions. [38] [39]
Directed evolution is frequently used for protein engineering as an alternative to rational design, [40] but can also be used to investigate fundamental questions of enzyme evolution. [41]
As a protein engineering tool, DE has been most successful in three areas:
The study of natural evolution is traditionally based on extant organisms and their genes. However, research is fundamentally limited by the lack of fossils (and particularly the lack of ancient DNA sequences) [50] [51] and incomplete knowledge of ancient environmental conditions. Directed evolution investigates evolution in a controlled system of genes for individual enzymes, [52] [53] [35] ribozymes [54] and replicators [55] [3] (similar to experimental evolution of eukaryotes, [56] [57] prokaryotes [58] and viruses [59] ).
DE allows control of selection pressure, mutation rate and environment (both the abiotic environment such as temperature, and the biotic environment, such as other genes in the organism). Additionally, there is a complete record of all evolutionary intermediate genes. This allows for detailed measurements of evolutionary processes, for example epistasis, evolvability, adaptive constraint [60] [61] fitness landscapes, [62] and neutral networks. [63]
The natural amino acid composition of proteomes can be changed by global canonical amino acids substitutions with suitable noncanonical counterparts under the experimentally imposed selective pressure. For example, global proteome-wide substitutions of natural amino acids with fluorinated analogs have been attempted in Escherichia coli [64] and Bacillus subtilis. [65] A complete tryptophan substitution with thienopyrrole-alanine in response to 20899 UGG codons in Escherichia coli was reported in 2015 by Budisa and Söll. [66] The experimental evolution of microbial strains with a clear-cut accommodation of an additional amino acid is expected to be instrumental for widening the genetic code experimentally. [67] Directed evolution typically targets a particular gene for mutagenesis and then screens the resulting variants for a phenotype of interest, often independent of fitness effects, whereas adaptive laboratory evolution selects many genome-wide mutations that contribute to the fitness of actively growing cultures. [68]
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.
Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.
Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.
Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the structure and/or function of genes in an organism's genome using genetic screens.
Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.
A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences that are moderately predictable based upon the specifics of the mutation. These consequences can range from no effect to deleterious effects, with regard to protein production, composition, and function.
In molecular biology, a library is a collection of DNA fragments that is stored and propagated in a population of micro-organisms through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries, genomic libraries and randomized mutant libraries. DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct. As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated.
Evolvability is defined as the capacity of a system for adaptive evolution. Evolvability is the ability of a population of organisms to not merely generate genetic diversity, but to generate adaptive genetic diversity, and thereby evolve through natural selection.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transposition. The word "transposase" was first coined by the individuals who cloned the enzyme required for transposition of the Tn3 transposon. The existence of transposons was postulated in the late 1940s by Barbara McClintock, who was studying the inheritance of maize, but the actual molecular basis for transposition was described by later groups. McClintock discovered that some segments of chromosomes changed their position, jumping between different loci or from one chromosome to another. The repositioning of these transposons allowed other genes for pigment to be expressed. Transposition in maize causes changes in color; however, in other organisms, such as bacteria, it can cause antibiotic resistance. Transposition is also important in creating genetic diversity within species and generating adaptability to changing living conditions.
Chemical biology is a scientific discipline between the fields of chemistry and biology. The discipline involves the application of chemical techniques, analysis, and often small molecules produced through synthetic chemistry, to the study and manipulation of biological systems. In contrast to biochemistry, which involves the study of the chemistry of biomolecules and regulation of biochemical pathways within and between cells, chemical biology deals with chemistry applied to biology.
DNA shuffling, also known as molecular breeding, is an in vitro random recombination method to generate mutant genes for directed evolution and to enable a rapid increase in DNA library size. Three procedures for accomplishing DNA shuffling are molecular breeding which relies on homologous recombination or the similarity of the DNA sequences, restriction enzymes which rely on common restriction sites, and nonhomologous random recombination which requires the use of hairpins. In all of these techniques, the parent genes are fragmented and then recombined.
In materials science and molecular biology, thermostability is the ability of a substance to resist irreversible change in its chemical or physical structure, often by resisting decomposition or polymerization, at a high relative temperature.
Site saturation mutagenesis (SSM), or simply site saturation, is a random mutagenesis technique used in protein engineering, in which a single codon or set of codons is substituted with all possible amino acids at the position. There are many variants of the site saturation technique, from paired site saturation (saturating two positions in every mutant in the library) to scanning site saturation (performing a site saturation at every site in the protein, resulting in a library of size [20 x (number of residues in the protein)] that contains every possible point mutant of the protein).
In evolutionary biology, robustness of a biological system is the persistence of a certain characteristic or trait in a system under perturbations or conditions of uncertainty. Robustness in development is known as canalization. According to the kind of perturbation involved, robustness can be classified as mutational, environmental, recombinational, or behavioral robustness etc. Robustness is achieved through the combination of many genetic and molecular mechanisms and can evolve by either direct or indirect selection. Several model systems have been developed to experimentally study robustness and its evolutionary consequences.
Infologs are independently designed synthetic genes derived from one or a few genes where substitutions are systematically incorporated to maximize information. Infologs are designed for perfect diversity distribution to maximize search efficiency.
In molecular biology, mutagenesis is an important laboratory technique whereby DNA mutations are deliberately engineered to produce libraries of mutant genes, proteins, strains of bacteria, or other genetically modified organisms. The various constituents of a gene, as well as its regulatory elements and its gene products, may be mutated so that the functioning of a genetic locus, process, or product can be examined in detail. The mutation may produce mutant proteins with interesting properties or enhanced or novel functions that may be of commercial use. Mutant strains may also be produced that have practical application or allow the molecular basis of a particular cell function to be investigated.
In evolutionary biology, sequence space is a way of representing all possible sequences. The sequence space has one dimension per amino acid or nucleotide in the sequence leading to highly dimensional spaces.
Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dependent on the genetic background in which it appears. Epistatic mutations therefore have different effects on their own than when they occur together. Originally, the term epistasis specifically meant that the effect of a gene variant is masked by that of different gene.
SeSaM-Biotech GmbH is a biotechnology service company founded in 2008 in Bremen and localized in Aachen today.
{{cite book}}
: |journal=
ignored (help)