Transposon sequencing

Last updated

Transposon insertion sequencing (Tn-seq) combines transposon insertional mutagenesis with massively parallel sequencing (MPS) of the transposon insertion sites to identify genes contributing to a function of interest in bacteria. The method was originally established by concurrent work in four laboratories under the acronyms HITS, [1] INSeq, [2] TraDIS, [3] and Tn-Seq. [4] Numerous variations have been subsequently developed and applied to diverse biological systems. Collectively, the methods are often termed Tn-Seq as they all involve monitoring the fitness of transposon insertion mutants via DNA sequencing approaches. [5]

Contents

Transposons are highly regulated, discrete DNA segments that can relocate within the genome. They are universal and are found in Eubacteria, Archaea, and Eukarya, including humans. Transposons have a large influence on gene expression and can be used to determine gene function. In fact, when a transposon inserts itself in a gene, the gene's function will be disrupted. [6] Because of that property, transposons have been manipulated for use in insertional mutagenesis. [7] The development of microbial genome sequencing was a major advance for the use of transposon mutagenesis. [8] [9] The function affected by a transposon insertion could be linked to the disrupted gene by sequencing the genome to locate the transposon insertion site. Massively parallel sequencing allows simultaneous sequencing of transposon insertion sites in large mixtures of different mutants. Therefore, genome-wide analysis is feasible if transposons are positioned throughout the genome in a mutant collection. [5]

Transposon sequencing requires the creation of a transposon insertion library, which will contain a group of mutants that collectively have transposon insertions in all non-essential genes. The library is grown under an experimental condition of interest. Mutants with transposons inserted in genes required for growth under the test condition will diminish in frequency from the population. To identify mutants being lost, genomic sequences adjacent to the transposon ends are amplified by PCR and sequenced by MPS to determine the location and abundance of each insertion mutation. The importance of each gene for growth under the test condition is determined by comparing the abundance of each mutant before and after growth under the condition being examined. Tn-seq is useful for both the study of a single gene's fitness as well as gene interactions [10]

Signature–tagged mutagenesis (STM) is an older technique that also involves pooling transposon insertion mutants to determine the importance of the disrupted genes under selective growth conditions. [11] High-throughput versions of STM use genomic microarrays, which are less accurate and have a lower dynamic range than massively-parallel sequencing. [5] With the invention of next generation sequencing, genomic data became increasingly available. However, despite the increase in genomic data, our knowledge of gene function remains the limiting factor in our understanding of the role genes play. [12] [13] Therefore, a need for a high throughput approach to study genotype–phenotype relationships like Tn-seq was necessary.

Methodology

Transposon sequencing begins by transducing[ clarification needed ] bacterial populations with transposable elements[ clarification needed ] using bacteriophages. Tn-seq[ clarification needed ] uses the Himar I Mariner transposon, a common and stable[ clarification needed ] transposon. After transduction, the DNA is cleaved[ clarification needed ] and the inserted sequence amplified through PCR. The recognition sites[ clarification needed ] for MmeI, a type IIS restriction endonuclease[ clarification needed ], can be introduced by a single nucleotide change in the terminal repeats[ clarification needed ] of Mariner[ clarification needed ]. [14] It[ clarification needed ] is located 4 base pairs before the end of the terminal repeat.

MmeI makes a 2 base pair staggered cut[ clarification needed ] 20 bases downstream[ clarification needed ] of the recognition site[ clarification needed ]. [15]

When MmeI digests DNA from a library[ clarification needed ] of transposon insertion mutants[ clarification needed ], fragmented DNA including the left and right transposon and 16 base pair of surrounding genomic DNA is produced. The 16 base pair fragment is enough to determine the location of the transposon insertion in the bacterial genome. The ligation[ clarification needed ] of the adaptor[ clarification needed ] is facilitated by the 2 base overhang[ clarification needed ]. A primer[ clarification needed ] specific to the adaptor and a primer specific to the transposon are used to amplify the sequence via PCR. The 120 base pair product[ clarification needed ] is then isolated using agarose gel[ clarification needed ] or PAGE[ clarification needed ] purification. Massively parallel sequencing is then used to determine the sequences of the flanking 16 base pairs[ clarification needed ]. [10]

Gene function is inferred after looking at the effects of the insertion on gene function under certain conditions[ clarification needed ].

Advantages and disadvantages

Unlike high-throughput insertion track by deep sequencing (HITS) and transposon-directed insertion site sequencing (TraDIS)[ clarification needed ], Tn-seq is specific to the Himar I Mariner transposon, and cannot be applied to other transposons or insertional elements. [10] However, the protocol for Tn-seq[ clarification needed ] is less time intensive[ citation needed ]. HITS and TraDIS[ clarification needed ] use a DNA shearing[ clarification needed ] technique that produce a range of PCR product sizes that could cause shorter DNA templates being preferentially amplified over longer templates. Tn-seq produces a product that is uniform in size, therefore reducing the possibility of PCR bias. [10]

Tn-seq can be used to identify both the fitness of single genes and to map gene interactions in microorganisms. Existing methods for these types of study are dependent on preexisting genomic microarrays or gene knockout arrays, whereas Tn-seq is not. Tn-seq's utilization of massively parallel sequencing makes this technique easily reproducible, sensitive, and robust. [10] [ clarification needed ]

Applications

Tn-seq has proven to be a useful technique for identifying new gene functions.[ clarification needed ] The highly sensitive nature of Tn-seq[ citation needed ] can be used to determine phenotype-genotype relationships that may have been deemed insignificant by less sensitive methods. Tn-seq identified essential genes and pathways that are important for the utilization of cholesterol in Mycobacterium tuberculosis . [16]

Tn-seq has been used to study higher order genome organization using gene interactions.[ citation needed ] Genes function as a highly linked network[ citation needed ]. Therefore, in order to study a gene's impact on phenotype, gene interactions must also be considered[ citation needed ]. These gene networks can be studied by screening for synthetic lethality and gene interactions where a double mutant shows an unexpected fitness value compared to each individual mutant[ clarification needed ][ citation needed ]. Tn-seq was used to determine genetic interactions between five query genes and the rest of the genome in Streptococcus pneumoniae, which revealed both aggravating and alleviating genetic interactions. [4] [ clarification needed ] [10]

Tn-seq used in combination with RNA-seq can be utilized to examine the role of non-coding DNA regions. [17]

Related Research Articles

<span class="mw-page-title-main">Transposable element</span> Semiparasitic DNA sequence

A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

<span class="mw-page-title-main">Molecular genetics</span> Scientific study of genes at the molecular level

Molecular genetics is a sub-field of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the structure and/or function of genes in an organism's genome using genetic screens. The field of study is based on the merging of several sub-fields in biology: classical Mendelian inheritance, cellular biology, molecular biology, biochemistry, and biotechnology. Researchers search for mutations in a gene or induce mutations in a gene to link a gene sequence to a specific phenotype. Molecular genetics is a powerful methodology for linking mutations to genetic conditions that may aid the search for treatments/cures for various genetics diseases.

Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.

<span class="mw-page-title-main">Transfer DNA</span> Type of DNA in bacterial genomes

The transfer DNA is the transferred DNA of the tumor-inducing (Ti) plasmid of some species of bacteria such as Agrobacterium tumefaciens and Agrobacterium rhizogenes . The T-DNA is transferred from bacterium into the host plant's nuclear DNA genome. The capability of this specialized tumor-inducing (Ti) plasmid is attributed to two essential regions required for DNA transfer to the host cell. The T-DNA is bordered by 25-base-pair repeats on each end. Transfer is initiated at the right border and terminated at the left border and requires the vir genes of the Ti plasmid.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transposition. The word "transposase" was first coined by the individuals who cloned the enzyme required for transposition of the Tn3 transposon. The existence of transposons was postulated in the late 1940s by Barbara McClintock, who was studying the inheritance of maize, but the actual molecular basis for transposition was described by later groups. McClintock discovered that some segments of chromosomes changed their position, jumping between different loci or from one chromosome to another. The repositioning of these transposons allowed other genes for pigment to be expressed. Transposition in maize causes changes in color; however, in other organisms, such as bacteria, it can cause antibiotic resistance. Transposition is also important in creating genetic diversity within species and generating adaptability to changing living conditions.

Forward genetics is a molecular genetics approach of determining the genetic basis responsible for a phenotype. Forward genetics provides an unbiased approach because it relies heavily on identifying the genes or genetic factors that cause a particular phenotype or trait of interest.

In molecular biology, insertional mutagenesis is the creation of mutations in DNA by the addition of one or more base pairs. Such insertional mutations can occur naturally, mediated by viruses or transposons, or can be artificially created for research purposes in the lab.

In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA.

Signature-tagged mutagenesis (STM) is a genetic technique used to study gene function. Recent advances in genome sequencing have allowed us to catalogue a large variety of organisms' genomes, but the function of the genes they contain is still largely unknown. Using STM, the function of the product of a particular gene can be inferred by disabling it and observing the effect on the organism. The original and most common use of STM is to discover which genes in a pathogen are involved in virulence in its host, to aid the development of new medical therapies/drugs.

Transposon mutagenesis, or transposition mutagenesis, is a biological process that allows genes to be transferred to a host organism's chromosome, interrupting or modifying the function of an extant gene on the chromosome and causing mutation. Transposon mutagenesis is much more effective than chemical mutagenesis, with a higher mutation frequency and a lower chance of killing the organism. Other advantages include being able to induce single hit mutations, being able to incorporate selectable markers in strain construction, and being able to recover genes after mutagenesis. Disadvantages include the low frequency of transposition in living systems, and the inaccuracy of most transposition systems.

<span class="mw-page-title-main">Knockout rat</span> Type of genetically engineered rat

A knockout rat is a genetically engineered rat with a single gene turned off through a targeted mutation used for academic and pharmaceutical research. Knockout rats can mimic human diseases and are important tools for studying gene function and for drug discovery and development. The production of knockout rats was not economically or technically feasible until 2008.

Transposons are semi-parasitic DNA sequences which can replicate and spread through the host's genome. They can be harnessed as a genetic tool for analysis of gene and protein function. The use of transposons is well-developed in Drosophila and in Thale cress and bacteria such as Escherichia coli.

The PiggyBac (PB) transposon is a mobile genetic element that efficiently transposes between vectors and chromosomes via a "cut and paste" mechanism. During transposition, the PB transposase recognizes transposon-specific inverted terminal repeat sequences (ITRs) located on both ends of the transposon vector and efficiently moves the contents from the original sites and integrates them into TTAA chromosomal sites. The powerful activity of the PiggyBac transposon system enables genes of interest between the two ITRs in the PB vector to be easily mobilized into target genomes. The TTAA-specific transposon piggyBac is rapidly becoming a highly useful transposon for genetic engineering of a wide variety of species, particularly insects. They were discovered in 1989 by Malcolm Fraser at the University of Notre Dame.

<span class="mw-page-title-main">ChIP-exo</span>

ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites from hundreds of base pairs to almost one base pair. It employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5'-3' direction to within a small number of nucleotides of the protein binding site. The nucleotides of the exonuclease-treated ends are determined using some combination of DNA sequencing, microarrays, and PCR. These sequences are then mapped to the genome to identify the locations on the genome at which the protein binds.

The minimal genome is a concept which can be defined as the set of genes sufficient for life to exist and propagate under nutrient-rich and stress-free conditions. Alternatively, it can also be defined as the gene set supporting life on an axenic cell culture in rich media, and it is thought what makes up the minimal genome will depend on the environmental conditions that the organism inhabits. By one early investigation, the minimal genome of a bacterium should include a virtually complete set of proteins for replication and translation, a transcription apparatus including four subunits of RNA polymerase including the sigma factor rudimentary proteins sufficient for recombination and repair, several chaperone proteins, the capacity for anaerobic metabolism through glycolysis and substrate-level phosphorylation, transamination of glutamyl-tRNA to glutaminyl-tRNA, lipid biosynthesis, eight cofactor enzymes, protein export machinery, and a limited metabolite transport network including membrane ATPases. Proteins involved in the minimum bacterial genome tend to be substantially more related to proteins found in archaea and eukaryotes compared to the average gene in the bacterial genome more generally indicating a substantial number of universally conserved proteins. The minimal genomes reconstructed on the basis of existing genes does not preclude simpler systems in more primitive cells, such as an RNA world genome which does not have the need for DNA replication machinery, which is otherwise part of the minimal genome of current cells.

<span class="mw-page-title-main">STARR-seq</span>

STARR-seq is a method to assay enhancer activity for millions of candidates from arbitrary sources of DNA. It is used to identify the sequences that act as transcriptional enhancers in a direct, quantitative, and genome-wide manner.

<span class="mw-page-title-main">Mutagenesis (molecular biology technique)</span>

In molecular biology, mutagenesis is an important laboratory technique whereby DNA mutations are deliberately engineered to produce libraries of mutant genes, proteins, strains of bacteria, or other genetically modified organisms. The various constituents of a gene, as well as its regulatory elements and its gene products, may be mutated so that the functioning of a genetic locus, process, or product can be examined in detail. The mutation may produce mutant proteins with interesting properties or enhanced or novel functions that may be of commercial use. Mutant strains may also be produced that have practical application or allow the molecular basis of a particular cell function to be investigated.

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. However, being essential is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest starch is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.

References

  1. Gawronski JD, Wong SM, Giannoukos G, Ward DV, Akerley BJ. Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung. Proc Natl Acad Sci USA. 2009;106:16422–7. doi: 10.1073/pnas.0906627106.PMC Free Article
  2. Goodman AL, McNulty NP, Zhao Y, Leip D, Mitra RD, Lozupone CA, et al. Identifying genetic determinants needed to establish a human gut symbiont in its habitat. Cell Host Microbe. 2009;6:279–89. doi: 10.1016/j.chom.2009.08.003.
  3. Langridge GC, Phan MD, Turner DJ, Perkins TT, Parts L, Haase J, et al. Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Res. 2009;19:2308–16. doi: 10.1101/gr.097097.109.
  4. 1 2 van Opijnen T, Bodi KL, Camilli A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods. 2009;6:767–72. doi: 10.1038/nmeth.1377.
  5. 1 2 3 Barquist L, Boinett CJ, Cain AK (July 2013). "Approaches to querying bacterial genomes with transposon-insertion sequencing". RNA Biology. 10 (7): 1161–9. doi:10.4161/rna.24765. PMC   3849164 . PMID   23635712.
  6. Hayes F (2003). "Transposon-based strategies for microbial functional genomics and proteomics". Annual Review of Genetics. 37 (1): 3–29. doi:10.1146/annurev.genet.37.110801.142807. PMID   14616054.
  7. Kleckner N, Chan RK, Tye BK, Botstein D (October 1975). "Mutagenesis by insertion of a drug-resistance element carrying an inverted repetition". Journal of Molecular Biology. 97 (4): 561–75. doi:10.1016/s0022-2836(75)80059-3. PMID   1102715.
  8. Smith V, Chou KN, Lashkari D, Botstein D, Brown PO (December 1996). "Functional analysis of the genes of yeast chromosome V by genetic footprinting". Science. 274 (5295): 2069–74. Bibcode:1996Sci...274.2069S. doi:10.1126/science.274.5295.2069. PMID   8953036.
  9. Akerley BJ, Rubin EJ, Camilli A, Lampe DJ, Robertson HM, Mekalanos JJ (July 1998). "Systematic identification of essential genes by in vitro mariner mutagenesis". Proceedings of the National Academy of Sciences of the United States of America. 95 (15): 8927–32. Bibcode:1998PNAS...95.8927A. doi: 10.1073/pnas.95.15.8927 . PMC   21179 . PMID   9671781.
  10. 1 2 3 4 5 6 van Opijnen T, Bodi KL, Camilli A (October 2009). "Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms". Nature Methods. 6 (10): 767–72. doi:10.1038/nmeth.1377. PMC   2957483 . PMID   19767758.
  11. Mazurkiewicz P, Tang CM, Boone C, Holden DW (December 2006). "Signature-tagged mutagenesis: barcoding mutants for genome-wide screens". Nature Reviews Genetics. 7 (12): 929–39. doi: 10.1038/nrg1984 . PMID   17139324. S2CID   27956117.
  12. Bork P (April 2000). "Powers and pitfalls in sequence analysis: the 70% hurdle". Genome Research. 10 (4): 398–400. doi: 10.1101/gr.10.4.398 . PMID   10779480.
  13. Kasif S, Steffen M (January 2010). "Biochemical networks: the evolution of gene annotation". Nature Chemical Biology. 6 (1): 4–5. doi:10.1038/nchembio.288. PMC   2907659 . PMID   20016491.
  14. Goodman AL, McNulty NP, Zhao Y, Leip D, Mitra RD, Lozupone CA, Knight R, Gordon JI (September 2009). "Identifying genetic determinants needed to establish a human gut symbiont in its habitat". Cell Host & Microbe. 6 (3): 279–89. doi:10.1016/j.chom.2009.08.003. PMC   2895552 . PMID   19748469.
  15. Morgan RD, Dwinell EA, Bhatia TK, Lang EM, Luyten YA (August 2009). "The MmeI family: type II restriction-modification enzymes that employ single-strand modification for host protection". Nucleic Acids Research. 37 (15): 5208–21. doi:10.1093/nar/gkp534. PMC   2731913 . PMID   19578066.
  16. Griffin JE, Gawronski JD, Dejesus MA, Ioerger TR, Akerley BJ, Sassetti CM (September 2011). "High-resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism". PLOS Pathogens. 7 (9): e1002251. doi: 10.1371/journal.ppat.1002251 . PMC   3182942 . PMID   21980284.
  17. Mann B, van Opijnen T, Wang J, Obert C, Wang YD, Carter R, McGoldrick DJ, Ridout G, Camilli A, Tuomanen EI, Rosch JW (2012). "Control of virulence by small RNAs in Streptococcus pneumoniae". PLOS Pathogens. 8 (7): e1002788. doi: 10.1371/journal.ppat.1002788 . PMC   3395615 . PMID   22807675.