Phage-assisted continuous evolution

Last updated

Phage-assisted continuous evolution (PACE) is a phage-based technique for the automated directed evolution of proteins. It relies on relating the desired activity of a target protein with the fitness of an infectious bacteriophage which carries the protein's corresponding gene. Proteins with greater desired activity hence confer greater infectivity to their carrier phage. More infectious phage propagate more effectively, selecting for advantageous mutations. Genetic variation is generated using error-prone polymerases on the phage vectors, and over time the protein accumulates beneficial mutations. This technique is notable for performing hundreds of rounds of selection with minimal human intervention.

Contents

Principle

The central component of PACE is a fixed-volume vessel known as the “lagoon”. The lagoon contains M13 bacteriophage vectors carrying the gene of interest (known as the selection plasmid, or SP), as well as host E. coli cells that allow the phage to replicate. The lagoon is constantly diluted via the addition and draining of liquid media containing E. coli cells. The liquid flow rate is set such that the dilution rate is faster than the rate of E. coli reproduction but slower than the rate of phage reproduction. Hence, a fresh supply of E. coli cells is constantly present in the lagoon, but phage can only be retained via sufficiently fast replication. [1]

Phage replication requires E. coli infection, which, for M13 phage, relies on protein III (pIII). [2] When using PACE, the phage vectors lack the gene to produce pIII. Instead, the production of pIII is tied with the activity of the protein of interest via a mechanism that varies per use case, oftentimes involves an extra plasmid containing the pIII-expressing gene III (gIII) known as the accessory plasmid, or AP. Notably, production of infectious phage scales with the production of pIII. [3] Hence, the better the activity of the protein, the higher the rate of pIII production, and the more infectious phage are generated for that particular gene.

Using error-prone polymerases (encoded on the mutagenesis plasmid, or MP), genetic variation is introduced into the protein gene portion of the phage vectors. Due to the selective pressures applied by the constant draining of the lagoon, only phages that can replicate fast enough can be retained in the lagoon, so over time beneficial mutations accumulate in phage replicating in the lagoon. In this manner, rounds of evolution are continuously performed, allowing hundreds of rounds to elapse with little human intervention. [1]

A general PACE scheme. MP stands for mutagenesis plasmid, and encodes the necessary proteins for introducing mutations into SP. SP stands for selection plasmid, and encodes the gene of the M13 bacteriophage minus gIII, as well as the gene of interest. AP stands for accessory plasmid, and contains gIII, as well as a method for inducing transcription of gIII. PACE Overview.png
A general PACE scheme. MP stands for mutagenesis plasmid, and encodes the necessary proteins for introducing mutations into SP. SP stands for selection plasmid, and encodes the gene of the M13 bacteriophage minus gIII, as well as the gene of interest. AP stands for accessory plasmid, and contains gIII, as well as a method for inducing transcription of gIII.

Applications

Polymerase promoter specificity

In the initial paper pioneering this technique, T7 RNA polymerases were evolved to recognize different promoters, such as the T3 or SP6 promoters. [4] This was done by making the target promoter the sole promoter for gIII. [5] Hence, mutant polymerases with greater specificity for the desired promoter caused greater pIII production. This resulted in polymerases with ~3-4 orders of magnitude greater activity for the target promoter than the original T3 promoter. [4] While this original PACE system only performed positive selection, a variant was developed that allowed for negative selection as well. This is done by linking undesired activity to the production of non-functional pIII, which decreases the amount of infectious phage made. [6]

A scheme for evolving polymerase promoter specificity using PACE. T7 RNAP containing mutations unfavorable to T3 promoter binding results in no gIII transcription. However, mutations that allow for the binding to a T3 promoter lead to increased gIII transcription. Promoter pace.png
A scheme for evolving polymerase promoter specificity using PACE. T7 RNAP containing mutations unfavorable to T3 promoter binding results in no gIII transcription. However, mutations that allow for the binding to a T3 promoter lead to increased gIII transcription.

Protease substrate specificity

Proteases have been evolved to cut different peptides using PACE. In these systems, the desired protease cut site is used to link a T7 RNA polymerase and a T7 lysozyme. The T7 lysozyme prevents the T7 polymerase from transcribing gIII. When the peptide linker is cleaved, the T7 polymerase is activated, allowing for the transcription of the pIII gene. This method was used to create a TEV protease with a significantly different peptide substrate. [6] [7]

A scheme of using PACE to evolve protease activity. When the T7 RNAP and the T7 lysozyme are linked, transcription of gIII is blocked. When the protease is active for the cleavage site, the T7 polymerase is liberated, allowing for transcription of gIII. Protease PACE.png
A scheme of using PACE to evolve protease activity. When the T7 RNAP and the T7 lysozyme are linked, transcription of gIII is blocked. When the protease is active for the cleavage site, the T7 polymerase is liberated, allowing for transcription of gIII.

Orthogonal Aminoacyl-tRNA Synthetases

Using PACE, aminoacyl-tRNA synthetases (aaRSs) were evolved for noncanonical amino acids as well. Activity of an aaRS is linked to pIII production by the addition of a TAG stop codon in the middle of gIII. Synthetases that aminoacylate the TAG codon's suppressor tRNA prevents stop codon activity, allowing for production of functional pIII. Using this system, aaRSs were evolved that utilize non-canonical amino acids p-nitro-phenyalanine, iodophenylalanine, and Boc-lysine. [8]

A scheme of using PACE to evolve orthogonal aminoacyl-tRNA synthetases. Without synthetase activity, T7 RNAP cannot be fully translated due to the presence of an Amber stop codon. Upon introduction of a functioning synthetase for the Amber codon tRNA, this Amber stop codon now encodes a non-canonical amino acid, allowing for the transcription of the full T7 RNAP and hence allowing gIII transcription. AaRS PACE.png
A scheme of using PACE to evolve orthogonal aminoacyl-tRNA synthetases. Without synthetase activity, T7 RNAP cannot be fully translated due to the presence of an Amber stop codon. Upon introduction of a functioning synthetase for the Amber codon tRNA, this Amber stop codon now encodes a non-canonical amino acid, allowing for the transcription of the full T7 RNAP and hence allowing gIII transcription.

Protein-Protein Interactions

Protein-protein interactions have been evolved using PACE as well. Under this scheme, the target protein is fused with a DNA binding protein, which binds to a target sequence placed upstream of the gIII promoter. The protein undergoing evolution is fused with an RNA polymerase. The better the protein-protein interaction, the more transcription of pIII occurs, allowing the evolution of the protein-protein interaction under PACE conditions. [6] This method was used to evolve Bacillus thuringiensis endotoxin variants that can overcome insect toxin resistance. [6] [9]

A scheme for using PACE to evolve protein-protein interactions. The target protein, in purple, is fused to a DNA binding protein, in blue. The evolving protein, in red, is bound to a functioning RNAP, in green. By having the DNA binding protein bind upstream of the gIII promoter, more favorable binding between the target and evolving protein leads to higher transcription of gIII. Protein-protein interaction PACE.png
A scheme for using PACE to evolve protein-protein interactions. The target protein, in purple, is fused to a DNA binding protein, in blue. The evolving protein, in red, is bound to a functioning RNAP, in green. By having the DNA binding protein bind upstream of the gIII promoter, more favorable binding between the target and evolving protein leads to higher transcription of gIII.

Base Editors

PACE was used to evolve APOBEC1 for greater soluble expression. APOBEC1 is a cytidine deaminase that has found use in base editors to catalyze the single nucleotide edit C-->T. [10] In E. coli, APOBEC1 usually falls out of solution into the insoluble fraction. [11] To evolve APOBEC1 for better soluble expression, the N-terminus of a T7 polymerase was fused to APOBEC1, with the remaining portion of the polymerase separately expressed. The T7 polymerase can only function when the N-terminus portion can bind to the rest of the polymerase. Since APOBEC1 must be properly folded for the N-terminus portion to be exposed properly, T7 polymerase activity is correlated to APOBEC1 folding. As follows, pIII transcription and production is linked with APOBEC1 soluble expression via the T7 polymerase. Using this approach, the soluble expression of APOBEC1 was increased by 4 fold with no change in function. [7] [9]

A scheme for the engineering of higher soluble expression. POI = protein of interest. When the POI is properly folded, the T7n portion is exposed, allowing for the binding to the T7c portion to form a fully functional T7 RNAP. This T7 RNAP can then transcribe gIII. If the POI is misfolded, then the T7 RNAP doesn't form, leading to no gIII transcription. Soluble Expression PACE.png
A scheme for the engineering of higher soluble expression. POI = protein of interest. When the POI is properly folded, the T7n portion is exposed, allowing for the binding to the T7c portion to form a fully functional T7 RNAP. This T7 RNAP can then transcribe gIII. If the POI is misfolded, then the T7 RNAP doesn't form, leading to no gIII transcription.

PACE was also used to create a more catalytically active deoxyadenosine deaminase. Deoxyadenosine deaminase is used in base editors to perform the single nucleotide edit A-->T. This was done by placing adenosine-containing stop codons in the gene for T7 polymerase. If the base editor is able to correct the error, functional T7 polymerase is produced, allowing production of pIII. Using this system, they evolved a deoxyadenosine deaminase with 590 fold activity compared to wild type. [12]

Evolution of higher activity deoxyadenosine deaminases using PACE. Here, the T7 RNAP gene contains two stop codons, both containing adenosine nucleotides. Without deoxyadenosine deaminase activity, the resulting T7 RNAP is truncated and hence nonfunctional. However, with a functional deoxyadenosine deaminase, the stop codons are converted into amino acid encoding codons, which allows for the production of functional T RNAP, which can go on to transcribe gIII. A to G PACE.png
Evolution of higher activity deoxyadenosine deaminases using PACE. Here, the T7 RNAP gene contains two stop codons, both containing adenosine nucleotides. Without deoxyadenosine deaminase activity, the resulting T7 RNAP is truncated and hence nonfunctional. However, with a functional deoxyadenosine deaminase, the stop codons are converted into amino acid encoding codons, which allows for the production of functional T RNAP, which can go on to transcribe gIII.

Related Research Articles

<span class="mw-page-title-main">Lambda phage</span> Bacteriophage that infects Escherichia coli

Enterobacteria phage λ is a bacterial virus, or bacteriophage, that infects the bacterial species Escherichia coli. It was discovered by Esther Lederberg in 1950. The wild type of this virus has a temperate life cycle that allows it to either reside within the genome of its host through lysogeny or enter into a lytic phase, during which it kills and lyses the cell to produce offspring. Lambda strains, mutated at specific sites, are unable to lysogenize cells; instead, they grow and enter the lytic cycle after superinfecting an already lysogenized cell.

<span class="mw-page-title-main">Protein production</span>

Protein production is the biotechnological process of generating a specific protein. It is typically achieved by the manipulation of gene expression in an organism such that it expresses large amounts of a recombinant gene. This includes the transcription of the recombinant DNA to messenger RNA (mRNA), the translation of mRNA into polypeptide chains, which are ultimately folded into functional proteins and may be targeted to specific subcellular or extracellular locations.

<span class="mw-page-title-main">RNA polymerase</span> Enzyme that synthesizes RNA from DNA

In molecular biology, RNA polymerase, or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

<span class="mw-page-title-main">Cloning vector</span> Small piece of maintainable DNA

A cloning vector is a small piece of DNA that can be stably maintained in an organism, and into which a foreign DNA fragment can be inserted for cloning purposes. The cloning vector may be DNA taken from a virus, the cell of a higher organism, or it may be the plasmid of a bacterium. The vector contains features that allow for the convenient insertion of a DNA fragment into the vector or its removal from the vector, for example through the presence of restriction sites. The vector and the foreign DNA may be treated with a restriction enzyme that cuts the DNA, and DNA fragments thus generated contain either blunt ends or overhangs known as sticky ends, and vector DNA and foreign DNA with compatible ends can then be joined by molecular ligation. After a DNA fragment has been cloned into a cloning vector, it may be further subcloned into another vector designed for more specific use.

DNA primase is an enzyme involved in the replication of DNA and is a type of RNA polymerase. Primase catalyzes the synthesis of a short RNA segment called a primer complementary to a ssDNA template. After this elongation, the RNA piece is removed by a 5' to 3' exonuclease and refilled with DNA.

Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.

<span class="mw-page-title-main">Expression vector</span> Virus or plasmid designed for gene expression in cells

An expression vector, otherwise known as an expression construct, is usually a plasmid or virus designed for gene expression in cells. The vector is used to introduce a specific gene into a target cell, and can commandeer the cell's mechanism for protein synthesis to produce the protein encoded by the gene. Expression vectors are the basic tools in biotechnology for the production of proteins.

<span class="mw-page-title-main">T7 phage</span> Species of virus

Bacteriophage T7 is a bacteriophage, a virus that infects bacteria. It infects most strains of Escherichia coli and relies on these hosts to propagate. Bacteriophage T7 has a lytic life cycle, meaning that it destroys the cell it infects. It also possesses several properties that make it an ideal phage for experimentation: its purification and concentration have produced consistent values in chemical analyses; it can be rendered noninfectious by exposure to UV light; and it can be used in phage display to clone RNA binding proteins.

The rpoB gene encodes the β subunit of bacterial RNA polymerase and the homologous plastid-encoded RNA polymerase (PEP). It codes for 1342 amino acids in E. coli, making it the second-largest polypeptide in the bacterial cell. It is targeted by the rifamycin family of antibacterials, such as rifampin. Mutations in rpoB that confer resistance to rifamycins do so by altering the protein's drug-binding residues, thereby reducing affinity for these antibiotics.

<span class="mw-page-title-main">Termination signal</span>

A termination signal is a sequence that signals the end of transcription or translation. Termination signals are found at the end of the part of the chromosome being transcribed during transcription of mRNA. Termination signals bring a stop to transcription, ensuring that only gene-encoding parts of the chromosome are transcribed. Transcription begins at the promoter when RNA polymerase, an enzyme that facilitates transcription of DNA into mRNA, binds to a promoter, unwinds the helical structure of the DNA, and uses the single-stranded DNA as a template to synthesize RNA. Once RNA polymerase reaches the termination signal, transcription is terminated. In bacteria, there are two main types of termination signals: intrinsic and factor-dependent terminators. In the context of translation, a termination signal is the stop codon on the mRNA that elicits the release of the growing peptide from the ribosome.

<span class="mw-page-title-main">T7 RNA polymerase</span> Class of enzymes

T7 RNA Polymerase is an RNA polymerase from the T7 bacteriophage that catalyzes the formation of RNA from DNA in the 5'→ 3' direction.

Missense mRNA is a messenger RNA bearing one or more mutated codons that yield polypeptides with an amino acid sequence different from the wild-type or naturally occurring polypeptide. Missense mRNA molecules are created when template DNA strands or the mRNA strands themselves undergo a missense mutation in which a protein coding sequence is mutated and an altered amino acid sequence is coded for.

fis E. coli gene

fis is an E. coli gene encoding the Fis protein. The regulation of this gene is more complex than most other genes in the E. coli genome, as Fis is an important protein which regulates expression of other genes. It is supposed that fis is regulated by H-NS, IHF and CRP. It also regulates its own expression (autoregulation). Fis is one of the most abundant DNA binding proteins in Escherichia coli under nutrient-rich growth conditions.

In molecular cloning, a vector is any particle used as a vehicle to artificially carry a foreign nucleic sequence – usually DNA – into another cell, where it can be replicated and/or expressed. A vector containing foreign DNA is termed recombinant DNA. The four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Of these, the most commonly used vectors are plasmids. Common to all engineered vectors are an origin of replication, a multicloning site, and a selectable marker.

The lacUV5 promoter is a mutated promoter from the Escherichia coli lac operon which is used in molecular biology to drive gene expression on a plasmid. lacUV5 is very similar to the classical lac promoter, containing just 2 base pair mutations in the -10 hexamer region, compared to the lac promoter. LacUV5 is among the most commonly used promoters in molecular biology because it requires no additional activators and it drives high levels of gene expression.

<span class="mw-page-title-main">CII protein</span> InterPro Family

cII or transcriptional activator II is a DNA-binding protein and important transcription factor in the life cycle of lambda phage. It is encoded in the lambda phage genome by the 291 base pair cII gene. cII plays a key role in determining whether the bacteriophage will incorporate its genome into its host and lie dormant (lysogeny), or replicate and kill the host (lysis).

Charles Clifton Richardson is an American biochemist and professor at Harvard University. Richardson received his undergraduate education at Duke University, where he majored in medicine. He received his M.D. at Duke Medical School in 1960. Richardson works as a professor at Harvard Medical School, and he served as editor/associate editor of the Annual Review of Biochemistry from 1972 to 2003. Richardson received the American Chemical Society Award in Biological Chemistry in 1968, as well as numerous other accolades.

The T7 expression system is used in the field of microbiology to clone recombinant DNA using strains of E. coli. It is the most popular system for expressing recombinant proteins in E. coli.

Bacteriophage AP205 is a plaque-forming bacteriophage that infects Acinetobacter bacteria. Bacteriophage AP205 is a protein-coated virus with a positive single-stranded RNA genome. It is a member of the family Fiersviridae, consisting of particles that infect Gram-negative bacteria such as E. coli.

References

  1. 1 2 Esvelt, K.; Carlson, J.; Liu, D.R. (2011). "A system for the continuous directed evolution of biomolecules". Nature. 472 (7344): 499–503. Bibcode:2011Natur.472..499E. doi:10.1038/nature09929. PMC   3084352 . PMID   21478873.
  2. Riechmann, L.; Holliger, P. (1997). "The C-terminal domain of TolA is the coreceptor for filamentous phage infection of E. coli". Cell. 90 (2): 351–360. doi: 10.1016/s0092-8674(00)80342-6 . PMID   9244308.
  3. Rakonjac, J.; Model, P. (1998). "Roles of pIII in filamentous phage assembly". J. Mol. Biol. 282 (1): 25–41. doi:10.1006/jmbi.1998.2006. PMID   9733639.
  4. 1 2 Lane, M.D.; Seelig, B. (2014). "Advances in the directed evolution of proteins". Curr. Opin. Chem. Biol. 22: 129–136. doi:10.1016/j.cbpa.2014.09.013. PMC   4253873 . PMID   25309990.
  5. Lemire, S.; Yehl, K.M.; Lu, T.K. (2018). "Phage-Based Applications in Synthetic Biology". Annu. Rev. Virol. 5 (1): 453–476. doi:10.1146/annurev-virology-092917-043544. PMC   6953747 . PMID   30001182.
  6. 1 2 3 4 Brödel, A.K.; Isalan, M.; Jaramillo, A. (2018). "Engineering of biomolecules by bacteriophage directed evolution". Curr. Opin. Biotech. 51: 32–38. doi: 10.1016/j.copbio.2017.11.004 . hdl: 10261/184372 . PMID   29175708.
  7. 1 2 Kim, J.Y.; Yoo, H.W.; Lee, P.G.; Lee, S.G.; Seo, J.H.; Kim, B.G. (2019). "In vivo Protein Evolution, Next Generation Protein Engineering Strategy: from Random Approach to Target-specific Approach". Biotechnol. Bioproc. E. 24: 85–94. doi:10.1007/s12257-018-0394-2. S2CID   91687131.
  8. Vargas-Rodriguez, O.; Sevostyanova, A.; Söll, D.; Crnković, A. (2018). "Upgrading aminoacyl-tRNA synthetases for genetic code expansion". Curr. Opin. Chem. Biol. 46C: 115–122. doi:10.1016/j.cbpa.2018.07.011. PMC   7083171 . PMID   30056281.
  9. 1 2 Simon, A.J.; d'Oelsnitz, S.; Ellington, A.D. (2018). "Synthetic Evolution". Nat. Biotechnol. 37 (7): 730–743. doi:10.1038/s41587-019-0157-4. PMID   31209374. S2CID   189927244.
  10. Gaudelli, N.M.; Komor, A.C.; Rees, H.A.; Packer, M.S.; Badran, A.H.; Bryson, D.I.; Liu, D.R. (2017). "Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage". Nature. 551 (7681): 464–471. doi:10.1038/nature24644. PMC   5726555 . PMID   29160308.
  11. Wang, T.; Badran, A.H.; Huang, T.P.; Liu, D.R. (2018). "Continuous directed evolution of proteins with improved soluble expression". Nat. Chem. Biol. 14 (10): 972–980. doi:10.1038/s41589-018-0121-5. PMC   6143403 . PMID   30127387.
  12. Richter, M.F.; Zhao, K.T.; Eton, E.; Lapinaite, A.; Newby, G.A.; Thuronyi, B.W.; Wilson, C.; Koblan, L.W.; Zeng, J.; Bauer, D.E.; Doudna, J.A.; Liu, D.R. (2020). "Phage-Assisted Evolution of an Adenine Base Editor with Enhanced Cas Domain Compatibility and Activity". Nat. Biotechnol. 38 (7): 883–891. doi:10.1038/s41587-020-0453-z. PMC   7357821 . PMID   32433547.