Time-resolved RNA sequencing

Last updated

Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome. [1] Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection. [1] This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.

Contents

Background

While DNA encodes all of the functional elements of life, the information encoded must be converted into functional form. Following the central dogma of molecular biology, messenger RNA encodes genetic information for producing proteins, which, alongside functional RNA carry out the majority of cellular processes required for life. [2] Changes in RNA abundance may be used as a measurement of changes in cellular behavior, such as heat stress, infection by virus, or oncogenesis. [3] Knowledge of how the transcriptome changes during cellular processes allows for greater understanding of the exact mechanisms underlying these processes.

Originally, transcriptome-wide RNA abundance could only be assessed using methods such as DNA microarrays or serial analysis of gene expression (SAGE). [4] [5] These methods are prohibitive in differing regards; microarrays, while cheap, provide inconsistent results [6] and SAGE is based on sanger sequencing, which provides limited throughput. Using second generation sequencing, instead of measuring relative hybridization of sequences to probes in the case of microarrays or sequencing short segments in the case of SAGE, a researcher can simply sequence the bulk RNA within a sample and measure relative abundances of specific types of RNA by comparing the number of times each RNA molecule was sequenced in a given sample.

Normally, in a traditional RNA-seq, microarray, or SAGE experiment RNA is extracted from a biological sample such as cultured cells, and the RNA is analyzed using the chosen method. The data obtained from such an experiment corresponds to abundance of RNA under the given experimental conditions at the time of harvest. For many applications, such as comparing the abundance of mRNA molecules between cells exposed to a drug and those not exposed to the drug, this type of experimental approach is sufficient. However, many cellular processes of scientific and medical interest are processes which occur over time, such as cellular differentiation or phagocytosis. [7] [8] Studying such processes requires analysis of RNA abundance across a series of time points.

Methods

Comparison of Time Resolved RNA-seq methods. Time series samples requires samples from both before and after all-time points, as well as sequencing of all biological samples separately. Affinity purification reduces the number of biological samples required, but increases the number of sequencing runs required. Nucleotide conversion requires the fewest biological samples and sequencing runs overall. h = hours. TimeResolvedRNAseqFig2.tif
Comparison of Time Resolved RNA-seq methods. Time series samples requires samples from both before and after all-time points, as well as sequencing of all biological samples separately. Affinity purification reduces the number of biological samples required, but increases the number of sequencing runs required. Nucleotide conversion requires the fewest biological samples and sequencing runs overall. h = hours.

Time series samples

Sample preparation and data processing

The simplest approach towards assessing RNA abundance over time is to simply use multiple samples which are treated in exactly the same way, except for the duration of treatment. For example, to investigate a biological process which is estimated to occur for an hour, a researcher might design an experiment where the process is triggered for five minutes, 15 minutes, 30 minutes, 45 minutes, one hour, and two hours in separate cell culture samples before harvesting the cells for RNA-seq analysis. The researcher would then have measurements of the transcriptome at each of these time points, and comparing between these samples would indicate which cellular processes are activated and deactivated over time.

Strengths

This method is the most common for measurement of RNA over time in cell culture models, mainly due to its simplicity. Each biological sample need only be processed in exactly the same way, and the factor of time is easily adjusted in most experimental protocols. Furthermore, since each time point is its own sample, more RNA can be harvested and sequenced for a study.

Weaknesses

The requirement of multiple samples for time-resolved data collection increases the cost of the experiment as well as introducing a greater potential for technical errors. While the price of massively parallel sequencing has decreased greatly since its introduction, it is still prohibitively expensive for many laboratories to conduct large scale RNA-seq studies. This issue is compounded by additional time points increasing the number of samples by a multiple of the number of time points; using two time points rather than one doubles the number of samples required in an experiment. Consequently, many studies which use time series RNA-seq become limited in either their sample size, which reduces statistical power, [9] or the number of time points, which reduces their time resolution, or both. Finally, by requiring a greater number of biological samples, there is greater risk for human error to affect the results, which may lead to spurious conclusions [10] [11]

Affinity Purification

Comparison of metabolic labelling workflows. TimeResolvedRNASeqFig3.tif
Comparison of metabolic labelling workflows.

Sample preparation and data processing

In this approach, cell culture samples are cultured with tagged nucleotides which allow for selective purification of newly synthesized RNA molecules. One popular approach is pulse labeling with 4-thiouridine (4-sU), a uracil analogue that is incorporated in newly synthesized RNA molecules. [12] In this type of experiment, a researcher would supplement cells with 4-sU at the time of the experiment or shortly beforehand. When the experimental treatment presumably affects RNA expression, newly synthesized RNA would be labeled with 4-sU. Newly synthesized RNA is labeled with a reactive thiol group, making it possible to link useful molecules to the RNA. [13] Biotin is a popular molecule for use in this type of assay, as it is inexpensive and binds incredibly strongly and selectively to streptavidin. Incubation of biotinylated RNA with beads containing streptavidin allows for the selective purification of newly synthesized RNA. From here, newly synthesized and total RNA are sequenced separately and compared for differences.

Strengths

Affinity purification makes use of the incredibly popular biotin-streptavidin system for fractionation of biological materials. Binding of biotin to streptavidin is incredibly strong (Kd < 10−14 mol/L). [14] It is also highly specific, which results in minimal background signal from non-specific binding events. Furthermore, time resolution is obtained in a single biological sample, resulting in reduced biological variability compared to using separate samples for each time point.

Weaknesses

The weaknesses of this method are mainly centered around efficiency. One major difficulty is uptake of 4-sU into cultured cells. If 4-sU is given too early, then it will be incorporated into RNA that was not synthesized before the cell began responding to the experimental conditions. If it is given too late, then early stages of the cellular response are not captured by the experiment. The rate of uptake of 4-sU can be measured, but this requires additional experiments to determine optimal dosage and time. Furthermore, these parameters need to be measured in the specific cell lines of interest, as different cell lines may take up 4-sU more slowly than others. RNA is known to be prone to degradation in vitro. It is common for experimental protocols involving RNA to include a number of steps to reduce chances of Ribonuclease contamination or spontaneous degradation of samples, as RNA quality affects RNA-seq results. [15] Metabolic labeling involves a number of additional steps that must be performed in the laboratory on RNA that is in solution. Since metabolic labeling requires that the RNA be kept unfrozen in liquid solution, some level of spontaneous degradation is unavoidable, although it is usually not to such an extent that results are affected. Of greater risk is the chances of ribonuclease contamination, which would render a sample useless, wasting time and resources. It is important for researchers working with RNA in any capacity to minimize unnecessary handling of RNA due to these risks. One additional drawback of using this method is, given equivalent sample size, more sequencing runs are required compared to a time-series experiment. This is because multiple RNA samples corresponding to the initial time point must be sequenced.

Research suggests that 4-sU labeling may result in transcriptional changes on its own, which would affect any results obtained using this method. [16]

Nucleotide Conversion

Sample preparation and data processing

Nucleotide conversion works by converting some nucleotides in newly synthesized RNA into others, which can be detected through sequencing. Timelapse-seq is an example of such an approach. [17] As in affinity purification, cells are incubated with 4-sU. After extraction of RNA from samples, they are treated with 2,2,2-trifluoroethylamine and sodium periodate, which converts 4-sU into trifluoroethyl cytosine, a cytosine analogue that is sequenced as a cytosine nucleotide instead of uracil. During sequence alignment and data processing, the U-to-C conversions are used to quantify the number of transcripts that are newly synthesized compared to bulk RNA.

Strengths

This method shares many strengths with affinity purification; notably the fact that multiple samples are not required for a time-series. This method eliminates the need for multiple sequencing runs for multiple time points, as all RNA is run together on the sequencing instrument and labeled RNA is separated from nonlabeled in silico. This reduces sequencing costs significantly, as now time resolution may be obtained without the need for additional samples or additional sequencing runs. Furthermore, by sequencing multiple time points together, technical variability introduced by sample processing is further reduced in addition to the reduced biological variability provided through the 4-sU experimental strategy.

Weaknesses

As with strengths, this method shares many weaknesses with affinity purification methods. Notably, 4-sU uptake and increased sample handling. Since Timelapse-seq relies upon synthetic chemistry methods to convert nucleotides, incomplete reactions result in an underestimation of the abundance of newly synthesized RNA and may result in variability between samples.

Nascent Transcript Sequencing

Sample preparation and data processing

Unlike metabolic labeling, nascent transcript sequencing (NET-seq) directly sequences transcripts that are still undergoing transcription by RNA polymerase II. [18] This method allows for the study of the dynamics of transcription elongation, which is not possible with metabolic labeling techniques. For a NET-seq experiment, cells are treated as with a standard RNA-seq experiment until they are lysed. Lysis is performed such that RNA-protein complexes remain intact, and RNA polymerase II is immunoprecipitated from the lysate. RNA that was undergoing transcription from DNA is still attached to RNA polymerase and is subsequently eluted from the polymerase and sequenced.

Strengths

Since NET-seq extracts transcripts that have not completed transcription, it is possible to obtain single-nucleotide resolution on the most recently synthesized nucleotide of transcripts. This is valuable in the study of phenomena such as transcriptional kinetics. Furthermore, it allows for the study of unstable transcripts which are degraded shortly after transcription. The general approach of immunoprecipitating RNA-binding proteins has great utility in understanding other areas of RNA biology, such as splicing.

Weaknesses

This method relies upon immunoprecipitation of RNA polymerase II. There are a number of issues with immunoprecipitation, including non-specific binding interactions which may result in the immunoprecipitation of off-target RNA molecules. The temporal resolution of NET-seq is limited to transcription elongation. While comparing relative abundances between transcripts using NET-seq is possible, it is not the intention of the method.

Future Directions

Aside from time-series sampling, there are currently no methods for comparing more than two time points. Metabolic labeling experiments are only capable of comparing RNA abundances before and after pulse-labeling. It is of interest to be able to observe modifications to the transcriptome over a series of time points in a single sample, as this would provide increased time resolution in studies. Existing methods of metabolic labeling are of interest for this; if multiple different metabolic labels were used at differing time points this may allow for intermediate time points to be investigated. However, such approaches must be developed with care, as biases in labeling methods and sample processing steps could contribute to misleading results if data from different methods are compared to one another.

Metabolic labeling with 4-sU has been reported to affect cellular phenotype. [16] In current practice, this is unavoidable and is tolerated as the obtained data still fit current biological models, as well as the fact that 4-sU samples are compared with 4-sU samples in most cases. However, this has the potential to result in spurious conclusions, especially if there is any interaction between the effect of 4-sU and the chosen experimental condition. It is not possible to distinguish differences in RNA levels as being due to the experimental conditions being studied or being the result of 4-sU treatment. Identification of labeling chemicals that do not affect cellular phenotype would eliminate these issues altogether.

Related Research Articles

Complementary DNA Single-stranded DNA synthesized from an RNA template by the action of an RNA-dependent DNA polymerase

In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to clone eukaryotic genes in prokaryotes. When scientists want to express a specific protein in a cell that does not normally express that protein, they will transfer the cDNA that codes for the protein to the recipient cell. In molecular biology, cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays and RNA-seq.

Gene expression Conversion of a genes sequence into a mature gene product or products

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) or small nuclear RNA (snRNA) genes, the product is a functional RNA. Gene expression is summarized in the Central Dogma first formulated by Francis Crick in 1958, further developed in his 1970 article, and expanded by the subsequent discoveries of reverse transcription and RNA replication.

Transcription (biology) Process of copying a segment of DNA into RNA

Transcription is the first of several steps of DNA based gene expression in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase.

DNA microarray use of large set of oligonucleotide probes


A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown.

Functional genomics field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

Serial analysis of gene expression

Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.

Nuclear run-on

A nuclear run-on assay is conducted to identify the genes that are being transcribed at a certain time point. Approximately one million cell nuclei are isolated and incubated with labeled nucleotides, and genes in the process of being transcribed are detected by hybridization of extracted RNA to gene specific probes on a blot. Garcia-Martinez et al. (2004) developed a protocol for the yeast S. cerevisiae that allows for the calculation of transcription rates (TRs) for all yeast genes to estimate mRNA stabilities for all yeast mRNAs.

RNA spike-in

An RNA spike-in is an RNA transcript of known sequence and quantity used to calibrate measurements in RNA hybridization assays, such as DNA microarray experiments, RT-qPCR, and RNA-Seq.

RNA-Seq Lab technique in cellular biology

RNA-Seq is a particular technology-based sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome.

Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to DNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.

Ribosome profiling

Ribosome profiling, or Ribo-Seq, is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation sequencing that uses specialized messenger RNA (mRNA) sequencing to determine which mRNAs are being actively translated. It produces a “global snapshot” of all the ribosomes active in a cell at a particular moment, known as a translatome. Consequently, this enables researchers to identify the location of translation start sites, the complement of translated ORFs in a cell or tissue, the distribution of ribosomes on a messenger RNA, and the speed of translating ribosomes. Ribosome profiling involves similar sequencing library preparation and data analysis to RNA-Seq, but unlike RNA-Seq, which sequences all of the mRNA of a given sequence present in a sample, ribosome profiling targets only mRNA sequences protected by the ribosome during the process of decoding by translation. This technique is different from polysome profiling.

Single cell sequencing examines the sequence information from individual cells with optimized next-generation sequencing (NGS) technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear to be genetically clonal, but single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

Metatranscriptomics is the science that studies gene expression of microbes within natural environments, i.e., the metatranscriptome. It also allows to obtain whole gene expression profiling of complex microbial communities.

G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...

Epitranscriptomic sequencing term in biology

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology lies in understanding how the same genome can give rise to different cell types and how gene expression is regulated.

CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.

References

  1. 1 2 Wang, Z; Gerstein, M; Snyder, M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC   2949280 . PMID   19015660.
  2. Crick, F.H.C. (1958). "On Protein Synthesis". In F.K. Sanders (ed.). Symposia of the Society for Experimental Biology, Number XII: The Biological Replication of Macromolecules. Cambridge University Press. pp. 138–163.
  3. Ozsolak, F; Milos, PM (February 2011). "RNA sequencing: advances, challenges and opportunities". Nature Reviews. Genetics. 12 (2): 87–98. doi:10.1038/nrg2934. PMC   3031867 . PMID   21191423.
  4. Schena, M; Shalon, D; Davis, RW; Brown, PO (20 October 1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray". Science. 270 (5235): 467–70. Bibcode:1995Sci...270..467S. doi:10.1126/science.270.5235.467. PMID   7569999.
  5. Velculescu, VE; Zhang, L; Vogelstein, B; Kinzler, KW (20 October 1995). "Serial analysis of gene expression". Science. 270 (5235): 484–7. Bibcode:1995Sci...270..484V. doi:10.1126/science.270.5235.484. PMID   7570003.
  6. Okoniewski, MJ; Miller, CJ (2 June 2006). "Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations". BMC Bioinformatics. 7: 276. doi:10.1186/1471-2105-7-276. PMC   1513401 . PMID   16749918.
  7. Spies, D; Ciaudo, C (2015). "Dynamics in Transcriptomics: Advancements in RNA-seq Time Course and Downstream Analysis". Computational and Structural Biotechnology Journal. 13: 469–77. doi:10.1016/j.csbj.2015.08.004. PMC   4564389 . PMID   26430493.
  8. Hejblum, BP; Skinner, J; Thiébaut, R (June 2015). "Time-Course Gene Set Analysis for Longitudinal Gene Expression Data". PLoS Computational Biology. 11 (6): e1004310. Bibcode:2015PLSCB..11E4310H. doi:10.1371/journal.pcbi.1004310. PMC   4482329 . PMID   26111374.
  9. Schurch, NJ; Schofield, P; Gierliński, M; Cole, C; Sherstnev, A; Singh, V; Wrobel, N; Gharbi, K; Simpson, GG; Owen-Hughes, T; Blaxter, M; Barton, GJ (June 2016). "How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?". RNA. 22 (6): 839–51. doi:10.1261/rna.053959.115. PMC   4878611 . PMID   27022035.
  10. McIntyre, LM; Lopiano, KK; Morse, AM; Amin, V; Oberg, AL; Young, LJ; Nuzhdin, SV (6 June 2011). "RNA-seq: technical variability and sampling". BMC Genomics. 12: 293. doi:10.1186/1471-2164-12-293. PMC   3141664 . PMID   21645359.
  11. Liu, Y; Zhou, J; White, KP (1 February 2014). "RNA-seq differential expression studies: more sequence or more replication?". Bioinformatics. 30 (3): 301–4. doi:10.1093/bioinformatics/btt688. PMC   3904521 . PMID   24319002.
  12. Dölken, L; Ruzsics, Z; Rädle, B; Friedel, CC; Zimmer, R; Mages, J; Hoffmann, R; Dickinson, P; Forster, T; Ghazal, P; Koszinowski, UH (September 2008). "High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay". RNA. 14 (9): 1959–72. doi:10.1261/rna.1136108. PMC   2525961 . PMID   18658122.
  13. Schwanhäusser, B; Busse, D; Li, N; Dittmar, G; Schuchhardt, J; Wolf, J; Chen, W; Selbach, M (19 May 2011). "Global quantification of mammalian gene expression control" (PDF). Nature. 473 (7347): 337–42. Bibcode:2011Natur.473..337S. doi:10.1038/nature10098. PMID   21593866.
  14. Green, NM (1975). "Avidin". Advances in Protein Chemistry. 29: 85–133. doi:10.1016/S0065-3233(08)60411-8. ISBN   9780120342297. PMID   237414.
  15. Gallego Romero, I; Pai, AA; Tung, J; Gilad, Y (30 May 2014). "RNA-seq: impact of RNA degradation on transcript quantification". BMC Biology. 12: 42. doi:10.1186/1741-7007-12-42. PMC   4071332 . PMID   24885439.
  16. 1 2 Burger, K; Mühl, B; Kellner, M; Rohrmoser, M; Gruber-Eber, A; Windhager, L; Friedel, CC; Dölken, L; Eick, D (October 2013). "4-thiouridine inhibits rRNA synthesis and causes a nucleolar stress response". RNA Biology. 10 (10): 1623–30. doi:10.4161/rna.26214. PMC   3866244 . PMID   24025460.
  17. Schofield, JA; Duffy, EE; Kiefer, L; Sullivan, MC; Simon, MD (March 2018). "TimeLapse-seq: adding a temporal dimension to RNA sequencing through nucleoside recoding". Nature Methods. 15 (3): 221–225. doi:10.1038/nmeth.4582. PMC   5831505 . PMID   29355846.
  18. Churchman, LS; Weissman, JS (20 January 2011). "Nascent transcript sequencing visualizes transcription at nucleotide resolution". Nature. 469 (7330): 368–73. Bibcode:2011Natur.469..368C. doi:10.1038/nature09652. PMC   3880149 . PMID   21248844.