3' mRNA-seq is a quantitative, genome-wide transcriptomic technique based on the barcoding of the 3' untranslated region (UTR) of mRNA molecules. Unlike standard bulk RNA-seq, where short sequencing reads are generated along the entire length of mRNA transcripts, only the 3' end of polyadenylated RNAs are sequenced in 3' mRNA-seq. This approach results in a need for fewer reads to quantify the expression of a gene and reduces the sequencing depth required per sample while providing robust and reliable transcriptome-wide read-outs of gene expression levels comparable to full-length RNA-seq methods. [1] [2]
Sample barcoding and the reduced per-sample sequencing depth also allow higher levels of sample multiplexing per experiment and lower the cost of transcriptome sequencing compared to full-length RNA-seq methods. These factors are crucial for large-scale, ultra-high-throughput gene expression studies or studies assessing differential gene expression between different experimental conditions or cell types. [2]
Some 3' mRNA-seq technologies, like Bulk RNA Barcoding and Sequencing (BRB-seq) commercialized by Alithea Genomics further streamline the library preparation process by pooling up to 384 samples very early in the workflow for a cost per sample tantamount to profiling four individual genes using conventional qRT-PCR, in a workflow requiring less than two and a half hours hands-on time. [2] An increasing number of 3' mRNA-seq techniques also include unique molecular identifiers (UMIs) in sample barcodes to uniquely label each mRNA molecule and to distinguish between original mRNA transcripts and duplicates that result from PCR amplification.
The sample barcoding approach used in 3' mRNA-seq was first established in the field of single-cell transcriptomics, where sample and mRNA barcoding allowed hundreds to thousands of single cells to be multiplexed in one experiment. [3] Single-cell RNA profiling technologies like CEL-seq2, SCRB-seq, and STRT-seq also allowed the pooling of large sets of samples into one unique sequencing library at an early stage in the protocol due to the addition of sample barcodes recognizing the 3' poly(A)-tail of mRNA molecules. [4] [5] [6]
However, while early iterations of 3' mRNA-seq methods employed oligo-dT priming to enrich for the 3' poly(A) regions of mRNA molecules, they often did not include the option to multiplex samples early in the workflow or to include UMIs to correct for amplification errors (Moll et al., 2014). Subsequent iterations and refinements of the method now often include combinations of UMIs and sample barcodes, with workflows optimized specifically for early multiplexing, and suitable for ultra-high-throughput sequencing experiments. [2]
Numerous 3' mRNA-seq methods exist, such as BRB-seq, QuantSeq, 3’Pool-seq, TagSeq, and QIAseq. [2] [7] [8] [9]
Each method relies on an initial reverse transcription step in which mRNAs are labeled with sample barcodes. Reverse transcription can be performed with oligo dT primers, barcoded oligo dT primers, or template-switching oligos. [2] [6] [7] In contrast, bulk RNA-seq library preparation methods like Illumina TruSeq mRNA Stranded kit use random priming of pre-fragmented RNA for reverse transcription to ensure reads are generated along the entire length of mRNA transcripts. [10]
Second-strand synthesis is then performed in each method by DNA polymerase 1 nick translation or PCR, resulting in double-stranded complementary DNA (cDNA). This is followed by a process called tagmentation, in which double-stranded cDNA is fragmented and tagged using Tn5 transposase, which cleaves the cDNA and ligates adaptors for library amplification. Some methods use random primers for this stage. [2] [7]
Library indexing and PCR amplification then take place, resulting in libraries enriched for the 3' untranslated region of mRNAs and suitable for short-read sequencing on Illumina or MGI sequencing instruments. [2] [7] [6]
3' mRNA-seq methods are generally cheaper per sample than standard bulk RNA-seq methods. [2] [7] [8] [9] This is because of the lower sequencing depth required due to only the 3' end of mRNA molecules being sequenced instead of the whole length of entire transcripts. Read depths of between one million and five million reads are recommended in commercialized 3' mRNA-seq protocols and are suitable for detecting the majority of highly expressed genes. [11] [12] This also allows more samples to be sequenced in the same sequencing run. The sample throughput for 3' mRNA-seq library preparation differs per method but can allow up to 384 samples to be processed in plates, with options for automation. [2] [12] For methods where samples are pooled early in the workflow, consumable use and cost are further reduced. For instance, BRB-seq is up to 25 times cheaper than Illumina TruSeq stranded mRNA library preparations, with a cost equivalent to assessing four genes by RT-qPCR. [2]
The methods are largely insensitive to RNA degradation because only the 3' region of mRNA transcripts are prepared for sequencing, regardless of how fragmented the rest of the mRNA molecules are due to degradation. This makes 3' mRNA-seq methods suitable for both high-quality and degraded RNA with RIN <6 and results in data of a quality similar to full-length RNA-seq methods. [2] However, as only the 3' region of mRNA molecules are sequenced, 3' mRNA-seq methods are not suitable for the analysis of full-length transcripts, splice variants, fusion genes, or RNA editing.
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed from an RNA. cDNA exists in both single-stranded and double-stranded forms and in both natural and engineered forms.
A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.
Rapid amplification of cDNA ends (RACE) is a technique used in molecular biology to obtain the full length sequence of an RNA transcript found within a cell. RACE results in the production of a cDNA copy of the RNA sequence of interest, produced through reverse transcription, followed by PCR amplification of the cDNA copies. The amplified cDNA copies are then sequenced and, if long enough, should map to a unique genomic region. RACE is commonly followed up by cloning before sequencing of what was originally individual RNA molecules. A more high-throughput alternative which is useful for identification of novel transcript structures, is to sequence the RACE-products by next generation sequencing technologies.
RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.
Cap analysis of gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
Fluorescent in situ sequencing (FISSEQ) is a method of sequencing a cell's RNA while it remains in tissue or culture using next-generation sequencing.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...
Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.
Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added before PCR amplification, and can be used to reduce errors and quantitative bias introduced by the amplification.
Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Spatial transcriptomics is a method for assigning cell types to their locations in the histological sections. Recent work demonstrated that the subcellular localization of mRNA molecules, for example, in the nucleus can also be studied.
Small RNA sequencing is a type of RNA sequencing based on the use of NGS technologies that allows to isolate and get information about noncoding RNA molecules in order to evaluate and discover new forms of small RNA and to predict their possible functions. By using this technique, it is possible to discriminate small RNAs from the larger RNA family to better understand their functions in the cell and in gene expression. Small RNA-Seq can analyze thousands of small RNA molecules with a high throughput and specificity. The greatest advantage of using RNA-seq is represented by the possibility of generating libraries of RNA fragments starting from the whole RNA content of a cell.
CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.
BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.
TCR-Seq is a method used to identify and track specific T cells and their clones... TCR-Seq utilizes the unique nature of a T-cell receptor (TCR) as a ready-made molecular barcode. This technology can apply to both single cell sequencing technologies and high throughput screens
Bulk RNA barcoding and sequencing (BRB-seq) is an ultra-high-throughput bulk 3' mRNA-seq technology that uses early-stage sample barcoding and unique molecular identifiers (UMIs) to allow the pooling of up to 384 samples in one tube early in the sequencing library preparation workflow. The transcriptomic technology is compatible with both Illumina and MGI short-read sequencing instruments.