Bulk RNA barcoding and sequencing (BRB-seq) is an ultra-high-throughput bulk 3' mRNA-seq technology that uses early-stage sample barcoding and unique molecular identifiers (UMIs) to allow the pooling of up to 384 samples in one tube early in the sequencing library preparation workflow. The transcriptomic technology is compatible with both Illumina and MGI short-read sequencing instruments. [1]
In standard RNA-seq, a sequencing library must be prepared for each RNA sample individually. [2] In contrast, in BRB-seq, all samples are pooled early in the workflow for simultaneous processing to reduce the cost and hands-on time associated with the library preparation stage [1]
As BRB-seq is a 3' mRNA-seq technique, short reads are generated only for the 3' region of polyadenylated mRNA molecules instead of the full length of transcripts like in standard RNA-seq. This means that BRB-seq requires a far lower sequencing depth per sample to generate genome-wide transcriptomic data that allows users to detect similar numbers of expressed genes and differentially expressed genes as the standard Illumina TruSeq approach but at a cost up to 25 times cheaper or similar to profiling four genes using RT-qPCR. [1] BRB-seq also has a greater tolerance for lower RNA quality (RIN <6) where transcripts are degraded because only the 3' region is required in library preparation [1]
The BRB-seq technique was first published in April 2019 in the peer-reviewed journal Genome Research in a manuscript entitled 'BRB-seq: ultra-affordable high-throughput transcriptomics enabled by bulk RNA barcoding and sequencing. [1] By the end of 2019, the article was among the top 10 most-read papers in the journal and has been cited over 150 times [3] (April 2024).
The technique was developed at the École Polytechnique Fédérale de Lausanne in Switzerland in the labs of Professor Bart Deplancke and collaborators. In May 2020, a company called Alithea Genomics was established to provide BRB-seq as kits for researchers or as a full service. [4] BRB-seq builds upon technological advances in single-cell transcriptomics, where sample barcoding made the early multiplexing of hundreds to thousands of single cells possible. Sample multiplexing allowed researchers to create single sequencing libraries containing multiple distinct samples, reducing overall experimental costs and hands-on time while dramatically boosting throughput. [5]
BRB-seq applies these advancements in sample and mRNA barcoding to mRNAs derived from bulk cell populations to enable ultra-high-throughput studies crucial for drug discovery, population studies, or fundamental research. [1] [6]
The fundamental aspect of BRB-seq is the optimized sample barcode primers. Each barcoded nucleotide sequence includes an adaptor for primer annealing, a 14-nt long barcode that assigns a unique identifier to each individual RNA sample, and a random 14-nt long UMI that tags each mRNA molecule with a unique sequence to distinguish between original mRNA transcripts and duplicates that result from PCR amplification bias. BRB-seq allows up to 384 individually barcoded RNA samples to be pooled into one tube early in the workflow to streamline subsequent steps in cDNA library preparation and sequencing. [1] [7] [8]
Input RNA requirements
Isolated total RNA samples require RIN ≥ 6 and an A260/230 ratio ˃ 1.5 when quantified by Nanodrop. Between 10 ng to 1 μg of purified RNA per sample is recommended for standard BRB-seq. To ensure library uniformity and an even distribution of reads for each sample after sequencing, the RNA concentration per input sample, their RIN, and their 260/230 values must be as uniform as possible. [1]
Workflow
The BRB-seq workflow begins by adding isolated RNA samples to individual wells of a 96- or 384-well plate. Each sample then undergoes independent barcoded reverse transcription after the addition of unique optimized barcoded oligo(dT) primers. These primers uniquely tag the 3’ poly(A) tail of mRNA molecules during the first-strand synthesis of cDNA. Strand information is preserved. As each RNA sample has an individual barcode, all samples from the 96- or 384-well plate can be pooled into one tube for simultaneous processing after this first step.
Following sample pooling into a single tube, free primers are digested. A second-strand synthesis reaction then results in double-stranded cDNA (DS cDNA).
Next, these full-length cDNA molecules undergo a process called tagmentation facilitated byTn5 transposase preloaded with adaptors necessary for library amplification. The transposase first fragments cDNA molecules and then ligates the pre-loaded adaptors to these cDNA fragments. Higher library complexity occurs when using around 20 ng of cDNA per sample for tagmentation, meaning fewer PCR amplification cycles are required.
For compatibility with Illumina sequencers, the resulting cDNA library is then indexed and amplified using a unique dual indexing (UDI) strategy with indexes P5 and P7. These indexes minimize the risk of barcode misassignment after next-generation sequencing.
Information about the average fragment size of libraries is then required to assess the libraries' molarity and prepare the appropriate library dilution for sequencing. A successful library contains fragments in the range of 300 – 1000 bp with a peak of 400-700 bp.
Unlike standard bulk RNA-seq methods which require around 30 million reads per sample for robust gene expression information, for BRB-seq, a sequencing depth of between one and five million reads per sample is sufficient to detect the majority of expressed genes in a sample. Lowly expressed genes can be detected by sequencing at higher depths.
BRB-seq sequencing data can be analyzed with standard open-source transcriptomic analysis methods, such as STARsolo, designed to align multiplexed data and generate gene and UMI count matrices for downstream RNA-seq analysis from raw fastq files.
BRB-seq is suitable for any study requiring genome-wide transcriptomic data. It is especially suited to studies with hundreds or thousands of samples thanks to its scalable, straightforward, and quick workflow, which is suitable for automation.
Artificial intelligence requires vast amounts of training data to reach robust and reliable conclusions about a drug's on- or off-target biological effects and their toxicogenomic profiles. BRB-seq is a cost-effective and time-efficient sequencing technology that allows pharmaceutical companies to extract more transcriptomic data at a lower cost to investigate the pharmacological effects of thousands of molecules on cells of interest simultaneously and at scale. [9]
BRB-seq has been used to discover a new type of cell that inhibits the formation of fat in humans, with the potential to improve treatments for obesity and type 2 diabetes, [10] to determine the expression of immune genes activated by SARS-CoV-2 at different temperatures in human airway cells [11] and to discover genes that are turned on or off at different times of the day in the fruit fly [12]
Researchers also used Plant BRB-seq in agritranscriptomics to investigate the transcriptomic response of maize to nitrogen fertilizers. They found the differential expression of a subset of stress-responsive genes in response to altering levels of fertilizer [13]
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.
Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.
RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.
Cap analysis of gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.
Fluorescent in situ sequencing (FISSEQ) is a method of sequencing a cell's RNA while it remains in tissue or culture using next-generation sequencing.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...
Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.
Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added before PCR amplification, and can be used to reduce errors and quantitative bias introduced by the amplification.
Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.
Single-cell transcriptomics examines the gene expression level of individual cells in a given population by simultaneously measuring the RNA concentration of hundreds to thousands of genes. Single-cell transcriptomics makes it possible to unravel heterogeneous cell populations, reconstruct cellular developmental pathways, and model transcriptional dynamics — all previously masked in bulk RNA sequencing.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Spatial transcriptomics is a method for assigning cell types to their locations in the histological sections. It comprises an important part of spatial biology. Recent work demonstrated that the subcellular localization of mRNA molecules, for example, in the nucleus can also be studied.
Small RNA sequencing is a type of RNA sequencing based on the use of NGS technologies that allows to isolate and get information about noncoding RNA molecules in order to evaluate and discover new forms of small RNA and to predict their possible functions. By using this technique, it is possible to discriminate small RNAs from the larger RNA family to better understand their functions in the cell and in gene expression. Small RNA-Seq can analyze thousands of small RNA molecules with a high throughput and specificity. The greatest advantage of using RNA-seq is represented by the possibility of generating libraries of RNA fragments starting from the whole RNA content of a cell.
CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.
snRNA-seq, also known as single nucleus RNA sequencing, single nuclei RNA sequencing or sNuc-seq, is an RNA sequencing method for profiling gene expression in cells which are difficult to isolate, such as those from tissues that are archived or which are hard to be dissociated. It is an alternative to single cell RNA seq (scRNA-seq), as it analyzes nuclei instead of intact cells.
Deterministic Barcoding in Tissue for Spatial Omics Sequencing (DBiT-seq) was developed at Yale University by Rong Fan and colleagues in 2020 to create a multi-omics approach for studying spatial gene expression heterogenicity within a tissue sample. This method can be used for the co-mapping mRNA and protein levels at a near single-cell resolution in fresh or frozen formaldehyde-fixed tissue samples. DBiT-seq utilizes next generation sequencing (NGS) and microfluidics. This method allows for simultaneous spatial transcriptomic and proteomic analysis of a tissue sample. DBiT-seq improves upon previous spatial transcriptomics applications such as High-Definition Spatial Transcriptomics (HDST) and Slide-seq by increasing the number of detectable genes per pixel, increased cellular resolution, and ease of implementation.
TCR-Seq is a method used to identify and track specific T cells and their clones. TCR-Seq utilizes the unique nature of a T-cell receptor (TCR) as a ready-made molecular barcode. This technology can apply to both single cell sequencing technologies and high throughput screens
3' mRNA-seq is a quantitative, genome-wide transcriptomic technique based on the barcoding of the 3' untranslated region (UTR) of mRNA molecules. Unlike standard bulk RNA-seq, where short sequencing reads are generated along the entire length of mRNA transcripts, only the 3' end of polyadenylated RNAs are sequenced in 3' mRNA-seq. This approach results in a need for fewer reads to quantify the expression of a gene and reduces the sequencing depth required per sample while providing robust and reliable transcriptome-wide read-outs of gene expression levels comparable to full-length RNA-seq methods.
{{cite journal}}
: Cite journal requires |journal=
(help)