MicroRNA sequencing (miRNA-seq), a type of RNA-Seq, is the use of next-generation sequencing or massively parallel high-throughput DNA sequencing to sequence microRNAs, also called miRNAs. miRNA-seq differs from other forms of RNA-seq in that input material is often enriched for small RNAs. miRNA-seq allows researchers to examine tissue-specific expression patterns, disease associations, and isoforms of miRNAs, and to discover previously uncharacterized miRNAs. Evidence that dysregulated miRNAs play a role in diseases such as cancer [1] has positioned miRNA-seq to potentially become an important tool in the future for diagnostics and prognostics as costs continue to decrease. [2] Like other miRNA profiling technologies, miRNA-Seq has both advantages (sequence-independence, coverage) and disadvantages (high cost, infrastructure requirements, run length, and potential artifacts). [3]
MicroRNAs (miRNAs) are a family of small ribonucleic acids, 21-25 nucleotides in length, that modulate protein expression through transcript degradation, inhibition of translation, or sequestering transcripts. [4] [5] [6] The first miRNA to be discovered, lin-4 , was found in a genetic mutagenesis screen to identify molecular elements controlling post-embryonic development of the nematode Caenorhabditis elegans . [7] The lin-4 gene encoded a 22 nucleotide RNA with conserved complementary binding sites in the 3’-untranslated region of the lin-14 mRNA transcript [8] and downregulated LIN-14 protein expression. [9] miRNAs are now thought to be involved in the regulation of many developmental and biological processes, including haematopoiesis ( miR-181 in Mus musculus [10] ), lipid metabolism ( miR-14 in Drosophila melanogaster [11] ) and neuronal development ( lsy-6 in Caenorhabditis elegans [12] ). [6] These discoveries necessitated development of techniques able to identify and characterize miRNAs, such as miRNA-seq.
MicroRNA sequencing (miRNA-seq) was developed to take advantage of next-generation sequencing or massively parallel high-throughput sequencing technologies in order to find novel miRNAs and their expression profiles in a given sample. miRNA sequencing in and of itself is not a new idea, initial methods of sequencing utilized Sanger sequencing methods. Sequencing preparation involved creating libraries by cloning of DNA reverse transcribed from endogenous small RNAs of 21–25 bp size selected by column and gel electrophoresis. [13] However, this method is exhaustive in terms of time and resources, as each clone has to be individually amplified and prepared for sequencing. This method also inadvertently favors miRNAs that are highly expressed. [6] Next-generation sequencing eliminates the need for sequence specific hybridization probes required in DNA microarray analysis as well as laborious cloning methods required in the Sanger sequencing method. Additionally, next-generation sequencing platforms in the miRNA-SEQ method facilitate the sequencing of large pools of small RNAs in a single sequencing run. [14]
miRNA-seq can be performed using a variety of sequencing platforms. The first analysis of small RNAs using miRNA-seq methods examined approximately 1.4 million small RNAs from the model plant Arabidopsis thaliana using Lynx Therapeutics' Massively Parallel Signature Sequencing (MPSS) sequencing platform. This study demonstrated the potential of novel, high-throughput sequencing technologies for the study of small RNAs, and it showed that genomes generate large numbers of small RNAs with plants as particularly rich sources of small RNAs. [15] Later studies used other sequencing technologies, such as a study in C. elegans which identified 18 novel miRNA genes as well as a new class of nematode small RNAs termed 21U-RNAs. [16] Another study comparing small RNA profiles of human cervical tumours and normal tissue, utilized the Illumina (company) Genome Analyzer to identify 64 novel human miRNA genes as well as 67 differentially expressed miRNAs. [17] Applied Biosystems SOLiD sequencing platform has also been used to examine the prognostic value of miRNAs in detecting human breast cancer. [18]
Sequence library construction can be performed using a variety of different kits depending on the high-throughput sequencing platform being employed. However, there are several common steps for small RNA sequencing preparation. [19] [20]
Total RNA Isolation
In a given sample all the RNA is extracted and isolated using an isothiocyanate/phenol/chloroform (GITC/phenol) method or a commercial product such as Trizol (Invitrogen) reagent. A starting quantity of 50-100 μg total RNA, 1 g of tissue typically yields 1 mg of total RNA, is usually required for gel purification and size selection. [20] Quality control of the RNA is also measured, for example running an RNA chip on Caliper LabChipGX (Caliper Life Sciences).
Size Fractionation of small RNAs by Gel Electrophoresis
Isolated RNA is run on a denaturing polyacrylamide gel. An imaging method such as radioactive 5’-32P-labeled oligonucleotides along with a size ladder is used to identify a section of the gel containing RNA of the appropriate size, reducing the amount of material ultimately sequenced. This step does not have to be necessarily carried out before the ligation and reverse transcription steps outlined below. [19] [20]
Ligation
The ligation step adds DNA adaptors to both ends of the small RNAs, which act as primer binding sites during reverse transcription and PCR amplification. An adenylated single strand DNA 3’adaptor followed by a 5’adaptor is ligated to the small RNAs using a ligating enzyme such as T4 RNA ligase2. The adaptors are also designed to capture small RNAs with a 5’ phosphate group, characteristic microRNAs, rather than RNA degradation products with a 5’ hydroxyl group. [19] [20]
Reverse Transcription and PCR Amplification
This step converts the small adaptor ligated RNAs into cDNA clones used in the sequencing reaction. There are many commercial kits available that will carry out this step using some form of reverse transcriptase. PCR is then carried out to amplify the pool of cDNA sequences. Primers designed with unique nucleotide tags can also be used in this step to create ID tags in pooled library multiplex sequencing. [19] [20]
The actual RNA sequencing varies significantly depending on the platform used. Three common next-generation sequencing [21] platforms are Pyrosequencing on the 454 Life Sciences platform, [22] polymerase-based sequence-by-synthesis on the Illumina (company) platform, [23] or sequencing by ligation on the ABI Solid Sequencing platform. [24]
Central to miRNA-seq data analysis is the ability to 1) obtain miRNA abundance levels from sequence reads, 2) discover novel miRNAs and then be able to 3) determine the differentially expressed miRNA and their 4) associated mRNA gene targets.
miRNAs may be preferentially expressed in certain cell types, tissues, stages of development, or in particular disease states such as cancer. [1] Since deep sequencing (miRNA-seq) generates millions of reads from a given sample, it allows us to profile miRNAs; whether it may be by quantifying their absolute abundance, to discover their variants (known as isomirs [25] ) Note that given that the average length of sequence reads are longer than the average miRNA (17-25 nt), the 3’ and 5’ ends of the miRNA should be found on the same read. There are several miRNA abundance quantification algorithms. [21] [26] Their general steps are as follows: [27]
Another advantage of miRNA-seq is that it allows the discovery of novel miRNAs that may have eluded traditional screening and profiling methods. [27] There are several novel miRNA discovery algorithms. Their general steps are as follows:
After the abundances of miRNAs are quantified for each sample, their expression levels can be compared between samples. One would then be able to identify miRNA that are preferentially expressed that particular time points, or in particular tissues or disease states. After normalizing for the number of mapped reads between samples, one can use a host of statistical tests (like those used in gene expression profiling) to determine differential expression
Identifying a miRNA's mRNA targets will provide an understanding of the genes or networks of genes whose expression they regulate. [31] Public databases provide predictions of miRNA targets. But to better distinguish true positive predictions from false positive predictions, miRNA-seq data can be integrated to mRNA-seq data to observe for miRNA:mRNA functional pairs. RNA22, [32] TargetScan, [33] [34] [35] [36] [37] [38] miRanda, [39] and PicTar [40] are software designed for this purpose. A list of prediction software is given here. The general steps are:
Many miRNAs function to direct cleavage of their mRNA targets; this is particularly true in plants, and thus high-throughput sequencing methods have been developed to take advantage of this property of miRNAs by sequencing the uncapped 3' ends of cleaved or degraded mRNAs. These methods are known as Degradome sequencing or PARE. [41] [42] Validation of target cleavage in specific mRNAs is typically performed using a modified version of 5' Rapid Amplification of cDNA Ends with a gene-specific primer.
miRNA-seq has revealed novel miRNAs that were previously eluded in traditional miRNA profiling methods. Examples of such findings are in embryonic stem cells, [25] chicken embryos, [43] acute lymphoblastic leukaemia, [44] diffuse large b-cell lymphoma and b-cells, [45] acute myeloid leukemia, [46] and lung cancer. [47]
Micro RNAs are important regulators of almost all cellular processes such as survival, proliferation, and differentiation. Consequently, it is not unexpected that miRNAs are involved in various aspects of cancer through the regulation of onco- and tumor suppressor gene expression. In combination with the development of high-throughput profiling methods, miRNAs have been identified as biomarkers for cancer classification, response to therapy, and prognosis. [48] Additionally, because miRNAs regulate gene expression they can also reveal perturbations in important regulatory networks that may be driving a particular disorder. [48] Several applications of miRNAs as biomarkers and predictors of disease are given below.
Cancer type | miRNAs α | Ref. |
---|---|---|
Breast | ||
ER Status | miR-26a/b, miR-30 family, miR-29b, miR-155, miR-342, miR-206, miR-191 | [49] [50] [51] [52] |
PR status | let-7c, miR-29b, miR-26a, miR-30 family, miR-520g | [52] [53] |
HER2/neu status | miR-520d, miR-181c, miR-302c, miR-376b, miR-30e | [49] [53] |
Lung | ||
Squamous vs non-squamous cell | miR-205 | [54] |
Small cell vs non-small cell | miR-17-5p, miR-22, miR-24, miR-31 | [48] |
Gastric | ||
Diffuse vs intestinal | miR-29b/c, miR-30 family, miR-135a/b | [55] |
Endometrial | ||
Endometrioid vs uterine papillary | miR-19a/b, miR-30e-5p, miR-101, miR-452, miR-382, miR-15a, miR-29c | [56] |
Renal | ||
Clear cell vs papillary | miR-424, miR-203, miR-31, miR-126 | [57] |
Oncocytoma vs chromophobe | miR-200c, miR-139-5p | [57] |
Myeloma | ||
with t(14;16) | miR-1, miR-133a | [58] |
with t(4;14) | miR-203, miR-155, miR-375 | [58] |
with t(11;14) | miR-125a, miR-650, miR-184 | [58] |
Acute myeloid leukemia | ||
with t(15;17) | miR-382, miR-134, miR-376a, miR-127, miR-299-5p, miR-323 | [59] |
with t(8;21) or inv(16) | let-7b/c, miR-127 | [59] |
with NPM1 mutations | miR-10a/b, let-7, miR-29, miR-204, miR-128a, miR-196a/b | [59] [60] |
with FLT3 ITD | miR-155 | [59] [60] [61] |
Chronic lymphocytic leukemia | ||
ZAP-70 levels and IgVH status | miR-15a, miR-195, miR-221, miR-155, miR-23b | [62] |
Melanoma | ||
with BRAF V600E | miR-193a, miR-338, miR-565 | [63] |
Lymphoma | ||
Diffuse Large B Cell Lymphoma | has-miR-128, has-miR-129-3p, has-miR-152, has-miR-155, has-miR-185, has-miR-193a-5p, has-miR-196b, has-miR-199b-3p, has-miR-20b, has-miR-23a, has-miR-27a, has-miR-28-5p, has-miR-301a, has-miR-331-3p, has-miR-365, has-miR-625, has-miR-9 | [45] |
αThis is not a comprehensive list of miRNAs involved with these malignancies.
The disadvantages of using miRNA-seq over other methods of miRNA profiling are that it is more expensive, generally requires a larger amount of total RNA, involves extensive amplification, and is more time-consuming than microarray and qPCR methods. [3] As well, miRNA-seq library preparation methods seem to have systematic preferential representation of the miRNA complement, and this prevents accurate determination of miRNA abundance. [64] At the same time, the approach is hybridization independent and therefore does not require a priori sequence information. Because of this, one can obtain sequences of novel miRNAs and miRNA isoforms (isoMirs), distinguish sequentially similar miRNAs, and identify point mutations. [65]
qPCR | Microarray | Sequencing | |
---|---|---|---|
Throughput time | ~6 hours | ~2 days | 1–2 weeks |
Total RNA required | 500 ng | 100-1,000 ng | 500-5,000 ng |
Dynamic range detected | Six orders of magnitude | Four orders of magnitude | Five or more orders of magnitude |
Infrastructure and technical requirements | Few | Moderate | Substantial |
Cost per sample (USD) | $400 | $250–$350 | $500–$700 |
In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a specific protein in a cell that does not normally express that protein, or to sequence or quantify mRNA molecules using DNA based methods. cDNA that codes for a specific protein can be transferred to a recipient cell for expression, often bacterial or yeast expression systems. cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays, qPCR, and RNA-seq.
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miRNAs base-pair to complementary sequences in mRNA molecules, then gene silence said mRNA molecules by one or more of the following processes:
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.
The RNA-induced silencing complex, or RISC, is a multiprotein complex, specifically a ribonucleoprotein, which functions in gene silencing via a variety of pathways at the transcriptional and translational levels. Using single-stranded RNA (ssRNA) fragments, such as microRNA (miRNA), or double-stranded small interfering RNA (siRNA), the complex functions as a key tool in gene regulation. The single strand of RNA acts as a template for RISC to recognize complementary messenger RNA (mRNA) transcript. Once found, one of the proteins in RISC, Argonaute, activates and cleaves the mRNA. This process is called RNA interference (RNAi) and it is found in many eukaryotes; it is a key process in defense against viral infections, as it is triggered by the presence of double-stranded RNA (dsRNA).
An epigenome consists of a record of the chemical changes to the DNA and histone proteins of an organism; these changes can be passed down to an organism's offspring via transgenerational stranded epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome.
The miR-192 microRNA precursor, is a short non-coding RNA gene involved in gene regulation. miR-192 and miR-215 have now been predicted or experimentally confirmed in mouse and human.
The miR-129 microRNA precursor is a small non-coding RNA molecule that regulates gene expression. This microRNA was first experimentally characterised in mouse and homologues have since been discovered in several other species, such as humans, rats and zebrafish. The mature sequence is excised by the Dicer enzyme from the 5' arm of the hairpin. It was elucidated by Calin et al. that miR-129-1 is located in a fragile site region of the human genome near a specific site, FRA7H in chromosome 7q32, which is a site commonly deleted in many cancers. miR-129-2 is located in 11p11.2.
In molecular biology miR-181 microRNA precursor is a small non-coding RNA molecule. MicroRNAs (miRNAs) are transcribed as ~70 nucleotide precursors and subsequently processed by the RNase-III type enzyme Dicer to give a ~22 nucleotide mature product. In this case the mature sequence comes from the 5' arm of the precursor. They target and modulate protein expression by inhibiting translation and / or inducing degradation of target messenger RNAs. This new class of genes has recently been shown to play a central role in malignant transformation. miRNA are downregulated in many tumors and thus appear to function as tumor suppressor genes. The mature products miR-181a, miR-181b, miR-181c or miR-181d are thought to have regulatory roles at posttranscriptional level, through complementarity to target mRNAs. miR-181 which has been predicted or experimentally confirmed in a wide number of vertebrate species as rat, zebrafish, and in the pufferfish.
There are 89 known sequences today in the microRNA 19 (miR-19) family but it will change quickly. They are found in a large number of vertebrate species. The miR-19 microRNA precursor is a small non-coding RNA molecule that regulates gene expression. Within the human and mouse genome there are three copies of this microRNA that are processed from multiple predicted precursor hairpins:
RNA-Seq is a sequencing technique that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample, representing an aggregated snapshot of the cells' dynamic pool of RNAs, also known as transcriptome.
In molecular biology miR-203 is a short non-coding RNA molecule. MicroRNAs function to regulate the expression levels of other genes by several mechanisms, such as translational repression and Argonaute-catalyzed messenger RNA cleavage. miR-203 has been identified as a skin-specific microRNA, and it forms an expression gradient that defines the boundary between proliferative epidermal basal progenitors and terminally differentiating suprabasal cells. It has also been found upregulated in psoriasis and differentially expressed in some types of cancer.
This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.
Degradome sequencing (Degradome-Seq), also referred to as parallel analysis of RNA ends (PARE), is a modified version of 5'-Rapid Amplification of cDNA Ends (RACE) using high-throughput, deep sequencing methods such as Illumina's SBS technology. The degradome encompasses the entire set of proteases that are expressed at a specific time in a given biological material, including tissues, cells, organisms, and biofluids. Thus, sequencing this degradome offers a method for studying and researching the process of RNA degradation. This process is used to identify and quantify RNA degradation products, or fragments, present in any given biological sample. This approach allows for the systematic identification of targets of RNA decay and provides insight into the dynamics of transcriptional and post-transcriptional gene regulation.
PAR-CLIP is a biochemical method for identifying the binding sites of cellular RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs). The method relies on the incorporation of ribonucleoside analogs that are photoreactive, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG), into nascent RNA transcripts by living cells. Irradiation of the cells by ultraviolet light of 365 nm wavelength induces efficient crosslinking of photoreactive nucleoside–labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by isolation of the crosslinked and coimmunoprecipitated RNA. The isolated RNA is converted into a cDNA library and is deep sequenced using next-generation sequencing technology.
miR-138 is a family of microRNA precursors found in animals, including humans. MicroRNAs are typically transcribed as ~70 nucleotide precursors and subsequently processed by the Dicer enzyme to give a ~22 nucleotide product. The excised region or, mature product, of the miR-138 precursor is the microRNA mir-138.
miR-146 is a family of microRNA precursors found in mammals, including humans. The ~22 nucleotide mature miRNA sequence is excised from the precursor hairpin by the enzyme Dicer. This sequence then associates with RISC which effects RNA interference.
miR-191 is a family of microRNA precursors found in mammals, including humans. The ~22 nucleotide mature miRNA sequence is excised from the precursor hairpin by the enzyme Dicer. This sequence then associates with RISC which effects RNA interference.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.
{{cite web}}
: CS1 maint: archived copy as title (link)