Rapid amplification of cDNA ends

Last updated

Rapid amplification of cDNA ends (RACE) is a technique used in molecular biology to obtain the full length sequence of an RNA transcript found within a cell. RACE results in the production of a cDNA copy of the RNA sequence of interest, produced through reverse transcription, followed by PCR amplification of the cDNA copies (see RT-PCR). The amplified cDNA copies are then sequenced and, if long enough, should map to a unique genomic region. RACE is commonly followed up by cloning before sequencing of what was originally individual RNA molecules. A more high-throughput alternative which is useful for identification of novel transcript structures, is to sequence the RACE-products by next generation sequencing technologies.

Contents

Process

RACE can provide the sequence of an RNA transcript from a small known sequence within the transcript to the 5' end (5' RACE-PCR) or 3' end (3' RACE-PCR) of the RNA. This technique is sometimes called one-sided PCR or anchored PCR.

The first step in RACE is to use reverse transcription to produce a cDNA copy of a region of the RNA transcript. In this process, an unknown end portion of a transcript is copied using a known sequence from the center of the transcript. The copied region is bounded by the known sequence, at either the 5' or 3' end.

The protocols for 5' or 3' RACES differ slightly. 5' RACE-PCR begins using mRNA as a template for a first round of cDNA synthesis (or reverse transcription) reaction using an anti-sense (reverse) oligonucleotide primer that recognizes a known sequence in the middle of the gene of interest; the primer is called a gene specific primer (GSP). The primer binds to the mRNA, and the enzyme reverse transcriptase adds base pairs to the 3' end of the primer to generate a specific single-stranded cDNA product; this is the reverse complement of the mRNA. Following cDNA synthesis, the enzyme terminal deoxynucleotidyl transferase (TdT) is used to add a string of identical nucleotides, known as a homopolymeric tail, to the 3' end of the cDNA. (There are some other ways to add the 3'-terminal sequence for the first strand of the de novo cDNA synthesis which are much more efficient than homopolymeric tailing, but the sense of the method remains the same). PCR is then carried out, which uses a second anti-sense gene specific primer (GSP2) that binds to the known sequence, and a sense (forward) universal primer (UP) that binds the homopolymeric tail added to the 3' ends of the cDNAs to amplify a cDNA product from the 5' end.

3' RACE-PCR uses the natural polyA tail that exists at the 3' end of all eukaryotic mRNAs for priming during reverse transcription, so this method does not require the addition of nucleotides by TdT. cDNAs are generated using an Oligo-dT-adaptor primer (a primer with a short sequence of deoxy-thymine nucleotides) that complements the polyA stretch and adds a special adaptor sequence to the 5' end of each cDNA. PCR is then used to amplify 3' cDNA from a known region using a sense GSP, and an anti-sense primer complementary to the adaptor sequence.

RACE-sequencing

The cDNA molecules generated by RACE can be sequenced using high-throughput sequencing technologies (also called, RACE-seq). High-throughput sequencing characterization of RACE fragments is highly time-efficient, more sensitive, less costly and technically feasible compared to traditional characterization of RACE fragments with molecular cloning followed by Sanger sequencing of a few clones.

History and applications

RACE can be used to amplify unknown 5' (5'-RACE) or 3' (3'-RACE) parts of RNA molecules where part of the RNA sequence is known and targeted by a gene-specific primer. Combined with high-throughput sequencing for characterization of these amplified RACE products, it is possible to apply the approach to characterize any types of coding or non-coding RNA-molecules.

The idea of combining RACE with high-throughput sequencing was first introduced in 2009 as Deep-RACE to perform mapping of Transcription start sites (TSS) of 17 genes in a single cell-line. [1] For example, In a study from 2014 to accurately map cleavage sites of target RNA directed by synthetic siRNAs, the approach was first named RACE-seq. [2] Further, the methodology was used to characterize full-length unknown parts of novel transcripts and fusion transcripts in colorectal cancer. [3] In another study aiming to characterize unknown transcript structures of lncRNAs, RACE was used in combination with semi-long 454 sequencing. [4]

Related Research Articles

<span class="mw-page-title-main">Complementary DNA</span> Single-stranded DNA synthesized from RNA

In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to clone eukaryotic genes in prokaryotes. When scientists want to express a specific protein in a cell that does not normally express that protein, they will transfer the cDNA that codes for the protein to the recipient cell. In molecular biology, cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays and RNA-seq.

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. PCR was invented in 1983 by the American biochemist Kary Mullis at Cetus Corporation; Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

<span class="mw-page-title-main">Reverse transcription polymerase chain reaction</span> Laboratory technique to multiply an RNA sample for study

Reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique combining reverse transcription of RNA into DNA and amplification of specific DNA targets using polymerase chain reaction (PCR). It is primarily used to measure the amount of a specific RNA. This is achieved by monitoring the amplification reaction using fluorescence, a technique called real-time PCR or quantitative PCR (qPCR). Combined RT-PCR and qPCR are routinely used for analysis of gene expression and quantification of viral RNA in research and clinical settings.

This is a list of topics in molecular biology. See also index of biochemistry articles.

<span class="mw-page-title-main">Serial analysis of gene expression</span> Molecular biology technique

Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.

<span class="mw-page-title-main">Bisulfite sequencing</span> Lab procedure detecting 5-methylcytosines in DNA

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA.

Trans-Spliced Exon Coupled RNA End Determination (TEC-RED) is a transcriptomic technique that, like SAGE, allows for the digital detection of messenger RNA sequences. Unlike SAGE, detection and purification of transcripts from the 5’ end of the messenger RNA require the presence of a trans-spliced leader sequence.

Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1994 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

<span class="mw-page-title-main">Illumina dye sequencing</span>

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

<span class="mw-page-title-main">STARR-seq</span>

STARR-seq is a method to assay enhancer activity for millions of candidates from arbitrary sources of DNA. It is used to identify the sequences that act as transcriptional enhancers in a direct, quantitative, and genome-wide manner.

Single-cell sequencing examines the sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...

<span class="mw-page-title-main">Epitranscriptomic sequencing</span>

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

<span class="mw-page-title-main">Spatial transcriptomics</span> Range of methods designed for assigning cell types

Spatial transcriptomics is a method for assigning cell types to their locations in the histological sections. This method can also be used to determine subcellular localization of mRNA molecules. The term is a variation of Spatial Genomics, first described by Doyle, et al., in 2000 and then expanded upon by Ståhl et al. in a technique developed in 2016, which has since undergone a variety of improvements and modifications.

This glossary of genetics is a list of definitions of terms and concepts commonly used in the study of genetics and related disciplines in biology, including molecular biology, cell biology, and evolutionary biology. It is intended as introductory material for novices; for more specific and technical detail, see the article corresponding to each term. For related terms, see Glossary of evolutionary biology.

References

  1. Olivarius, S; Plessy, C; Carninci, P (February 2009). "High-throughput verification of transcriptional starting sites by Deep-RACE" (PDF). BioTechniques. 46 (2): 130–2. doi: 10.2144/000113066 . PMID   19317658.
  2. Denise, H; Moschos, SA; Sidders, B; Burden, F; Perkins, H; Carter, N; Stroud, T; Kennedy, M; Fancy, SA; Lapthorn, C; Lavender, H; Kinloch, R; Suhy, D; Corbau, R (4 February 2014). "Deep Sequencing Insights in Therapeutic shRNA Processing and siRNA Target Cleavage Precision". Molecular Therapy: Nucleic Acids. 3: e145. doi:10.1038/mtna.2013.73. PMC   3951910 . PMID   24496437.
  3. Hoff, AM; Johannessen, B; Alagaratnam, S; Zhao, S; Nome, T; Løvf, M; Bakken, AC; Hektoen, M; Sveen, A; Lothe, RA; Skotheim, RI (3 November 2015). "Novel RNA variants in colorectal cancers". Oncotarget. 6 (34): 36587–602. doi:10.18632/oncotarget.5500. PMC   4742197 . PMID   26474385.
  4. Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Santoyo-Lopez, Javier; Gonzalez, Jose Manuel; Tapanari, Electra; Mudge, Jonathan M.; Steward, Charles A.; Wilming, Laurens; Tanzer, Andrea; Howald, Cédric; Chrast, Jacqueline; Vela-Boza, Alicia; Rueda, Antonio; Lopez-Domingo, Francisco J.; Dopazo, Joaquin; Reymond, Alexandre; Guigó, Roderic; Harrow, Jennifer (17 August 2016). "Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq)". Nature Communications. 7: 12339. Bibcode:2016NatCo...712339L. doi:10.1038/ncomms12339. PMC   4992054 . PMID   27531712.