Small RNA sequencing

Last updated

Small RNA sequencing (Small RNA-Seq) is a type of RNA sequencing based on the use of NGS technologies that allows to isolate and get information about noncoding RNA molecules in order to evaluate and discover new forms of small RNA and to predict their possible functions. By using this technique, it is possible to discriminate small RNAs from the larger RNA family to better understand their functions in the cell and in gene expression. Small RNA-Seq can analyze thousands of small RNA molecules with a high throughput and specificity. The greatest advantage of using RNA-seq is represented by the possibility of generating libraries of RNA fragments starting from the whole RNA content of a cell.

Contents

Introduction

Small RNAs are noncoding RNA molecules between 20 and 200 nucleotide in length. The item "small RNA" is a rather arbitrary term, which is vaguely defined based on its length comparing with regular RNA such as messenger RNA (mRNA). Previously bacterial short regulatory RNAs have been referred to as small RNAs, but they are not related to eukaryotic small RNAs. [1]

Descriptive scheme of RNA molecules RNA Classification.jpg
Descriptive scheme of RNA molecules

Small RNAs include several different classes of noncoding RNAs, depending on their sizes and functions: snRNA, snoRNA, scRNA, piRNA, miRNA, YRNA, tsRNA, rsRNA, and siRNA. Their functions go from RNAi (specific for endogenously expressed miRNA and exogenously derived siRNA), RNA processing and modification, gene silencing (i.g. X chromosome inactivation by Xist RNA), epigenetics modifications, protein stability and transport.

Small RNA sequencing

Purification

This step is very critical and important for any molecular-based technique since it ensures that the small RNA fragments found in the samples to be analyzed are characterized by a good level of purity and quality. There are different purification methods that can be used, based on the purposes of the experiment:

Once small RNAs have been isolated, it is important to quantify them and to evaluate the quality of the purification. There are two different methods to do this:

Library preparation and amplification

Many of the NGS sequencing protocols rely on the production of a genomic library that contains thousands of fragments of the target nucleic acids that will then be sequenced by proper technologies. According to the sequencing methods to be used, libraries can be created differently (in the case of the Ion Torrent technology RNA fragments are directly attached to a magnetic bead through an adapter, while for Illumina sequencing, the RNA fragments are firstly ligated to the adapters and then attached to the surface of a plate): generally, universal adapters A and B (containing well known sequences comprehensive of Unique Molecular Identifiers that are used to quantify small RNAs in a sample and sample indexing that allows to discriminate between different RNA molecules deriving from different samples) are ligated to the 5' and 3' ends of the RNA fragments thanks to the activity of the T4 RNA ligase 2 truncated. After the adapters are ligated to both ends of the small RNAs, retrotranscription occurs producing complementary DNA molecules (cDNAs) which will be, eventually, amplified by different amplification techniques depending on the sequencing protocol that is being followed (Ion Torrent exploits the emulsion PCR, while Illumina requires a bridge PCR) in order to obtain up to billions of amplicons to be sequenced. [4] Besides the regular PCR mix, masking oligonucleotides targeting 5.8s rRNA are added to increase sensitivity to small RNA targets and to improve the amplification results. Caution has to be used, as RNA samples are prone to degradation, and further improvement of this technique should be oriented towards the elimination of adapter dimers. [4] Some specific RNA modifications (such as 5′ hydroxyl (5′-OH), 3′-phosphate (3′-P) and 2′,3′-cyclic phosphate (2′3′-cP)) can block the adapter ligation process, while some other RNA modifications ( such as m1A, m3C, m1G and m22G) can interfere with reverse transcription process. Small RNA bearing one or more of these modifications are often inefficiently and incompletely converted into cDNAs, leading to challenges with their detection and quantitation by deep sequencing, which can be overcome by enzyme (such as PNK and AlkB) pre-treatment. [5]

Sequencing

Depending on the purpose of the analysis, RNA-seq can be performed using different approaches:

Data analysis and storage

The final step regards analysis of data and storage: after obtaining the sequencing reads, UMI and index sequences are automatically removed from the reads and their quality is analyzed by PHRED (software able to evaluate the quality of the sequencing process); reads can then be mapped or aligned to a reference genome in order to extract information about their similarity: reads having the same length, sequence and UMI are considered as equal and are removed from the hit list. Indeed, the number of different UMIs for a given small RNA sequence reflects its copy number. The small RNAs are finally quantified by assigning molecules to transcript annotations from different databases (Mirbase, GtRNAdb and Gencode). [4]

Applications

Small RNA sequencing can be useful for:

Related Research Articles

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

DNA sequencing Process of determining the order of nucleotides in DNA molecules

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Cross-linking immunoprecipitation (CLIP) is a method used in molecular biology that combines UV cross-linking with immunoprecipitation in order to analyse protein interactions with RNA or to precisely locate RNA modifications. CLIP-based techniques can be used to map RNA binding protein binding sites or RNA modification sites of interest on a genome-wide scale, thereby increasing the understanding of post-transcriptional regulatory networks.

ABI Solid Sequencing

SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2006. This next generation technology generates 108 - 109 small sequence reads at one time. It uses 2 base encoding to decode the raw data generated by the sequencing platform into sequence data.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

RNA-Seq Lab technique in cellular biology

RNA-Seq is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome.

Illumina dye sequencing

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

Single cell sequencing examines the sequence information from individual cells with optimized next-generation sequencing (NGS) technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear to be genetically clonal, but single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...

Translation complex profile sequencing (TCP-seq) is a molecular biology method for obtaining snapshots of momentary distribution of protein synthesis complexes along messenger RNA (mRNA) chains.

Coverage in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence.

Epitranscriptomic sequencing

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.

Third-generation sequencing is a class of DNA sequencing methods currently under active development.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome. Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection. This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.

Spatial transcriptomics Range of methods designed for assigning cell types

Spatial transcriptomics is an overarching term for a range of methods designed for assigning cell types to their locations in the histological sections. This method can also be used to determine subcellular localization of mRNA molecules. The term is a variation of Spatial Genomics, first described by Doyle, et al., in 2000 and then expanded upon by Ståhl et. al. in a technique developed in 2016, which has since undergone a variety of improvements and modifications.

CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.

BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.

Translatomics

Translatomics is the study of all open reading frames (ORFs) that are being actively translated in a cell or organism. This collection of ORFs is called the translatome. Characterizing a cell's translatome can give insight into the array of biological pathways that are active in the cell. According to the central dogma of molecular biology, the DNA in a cell is transcribed to produce RNA, which is then translated to produce a protein. Thousands of proteins are encoded in an organism's genome, and the proteins present in a cell cooperatively carry out many functions to support the life of the cell. Under various conditions, such as during stress or specific timepoints in development, the cell may require different biological pathways to be active, and therefore require a different collection of proteins. Depending on intrinsic and environmental conditions, the collection of proteins being made at one time varies. Translatomic techniques can be used to take a "snapshot" of this collection of actively translating ORFs, which can give information about which biological pathways the cell is activating under the present conditions.

References

  1. Kim, V. Narry; Han, Jinju; Siomi, Mikiko C. (Feb 2009). "Biogenesis of small RNAs in animals". Nature Reviews. Molecular Cell Biology. 10 (2): 126–139. doi:10.1038/nrm2632. ISSN   1471-0080. PMID   19165215. S2CID   8360619.
  2. Citartan M, Tan SC, Tang TH. (2012 January 28). "A rapid and cost effective method in purifying small RNA". World Journal of Microbiology and Biotechnology. 28(1):105-11. doi: 10.1007/s11274-011-0797-0. PMID   22806785.
  3. Donald C. Rio, Manuel Ares Jr, Gregory J. Hannon, and Timothy W. Nilsen (2011). "RNA: A Laboratory Manual". CSHL Press.
  4. 1 2 3 Hagemann-Jensen M, Abdullayev I, Sandberg R, Faridani OR (2018 October). "Small-seq for single-cell small-RNA sequencing". Nature Protocols. 13(10):2407-2424. doi: 10.1038/s41596-018-0049-y. PMID   30250291.
  5. Shi, Junchao; Zhang, Yunfang; Tan, Dongmei; Zhang, Xudong; Yan, Menghong; Zhang, Ying; Franklin, Reuben; et, al. (April 2021). "PANDORA-seq expands the repertoire of regulatory small RNAs by overcoming RNA modifications". Nature Cell Biology. 23 (4): 424–436. doi:10.1038/s41556-021-00652-7. ISSN   1476-4679. PMC   8236090 . PMID   33820973.
  6. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, Sabina J, Feierstein E, Schorn M, Alanjary M, Dimalanta E, Dressman D, Kasinskas R, Sokolsky T, Fidanza JA, Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo J (2011 July). "An integrated semiconductor device enabling non-optical genome sequencing". Nature. 475(7356):348-52. doi:10.1038/nature.10242. PMID   21776081.
  7. "Small RNA Sequencing | Small RNA and miRNA profiling and discovery". www.illumina.com. Retrieved 2018-11-28.
  8. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012 July). "A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers". BMC Genomics. 13:341. doi:10.1186/1471-2164-13-341. PMID   22827831.

See also