Complementary DNA

Last updated

Output from a cDNA microarray used in testing Cdnaarray.jpg
Output from a cDNA microarray used in testing

In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engineered forms.

Contents

In engineered forms, it often is a copy (replicate) of the naturally occurring DNA from any particular organism's natural genome; the organism's own mRNA was naturally transcribed from its DNA, and the cDNA is reverse transcribed from the mRNA, yielding a duplicate of the original DNA. Engineered cDNA is often used to express a specific protein in a cell that does not normally express that protein (i.e., heterologous expression), or to sequence or quantify mRNA molecules using DNA based methods (qPCR, RNA-seq). cDNA that codes for a specific protein can be transferred to a recipient cell for expression as part of recombinant DNA, often bacterial or yeast expression systems. [1] cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays, qPCR, and RNA-seq.

In natural forms, cDNA is produced by retroviruses (such as HIV-1, HIV-2, simian immunodeficiency virus, etc.) and then integrated into the host's genome, where it creates a provirus. [2]

The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence, expressed as DNA bases (deoxy-GCAT) rather than RNA bases (GCAU).

Patentability of cDNA was a subject of a 2013 US Supreme Court decision in Association for Molecular Pathology v. Myriad Genetics, Inc. As a compromise, the Court declared, that exons-only cDNA is patent-eligible, whereas isolated sequences of naturally occurring DNA comprising introns are not.

Synthesis

RNA serves as a template for cDNA synthesis. [3] In cellular life, cDNA is generated by viruses and retrotransposons for integration of RNA into target genomic DNA. In molecular biology, RNA is purified from source material after genomic DNA, proteins and other cellular components are removed. cDNA is then synthesized through in vitro reverse transcription. [4]

RNA purification

RNA is transcribed from genomic DNA in host cells and is extracted by first lysing cells then purifying RNA utilizing widely used methods such as phenol-chloroform, silica column, and bead-based RNA extraction methods. [5] Extraction methods vary depending on the source material. For example, extracting RNA from plant tissue requires additional reagents, such as polyvinylpyrrolidone (PVP), to remove phenolic compounds, carbohydrates, and other compounds that will otherwise render RNA unusable. [6] To remove DNA and proteins, enzymes such as DNase and Proteinase K are used for degradation. [7] Importantly, RNA integrity is maintained by inactivating RNases with chaotropic agents such as guanidinium isothiocyanate, sodium dodecyl sulphate (SDS), phenol or chloroform. Total RNA is then separated from other cellular components and precipitated with alcohol. Various commercial kits exist for simple and rapid RNA extractions for specific applications. [8] Additional bead-based methods can be used to isolate specific sub-types of RNA (e.g. mRNA and microRNA) based on size or unique RNA regions. [9] [10]

Reverse transcription

First-strand synthesis

Using a reverse transcriptase enzyme and purified RNA templates, one strand of cDNA is produced (first-strand cDNA synthesis). The M-MLV reverse transcriptase from the Moloney murine leukemia virus is commonly used due to its reduced RNase H activity suited for transcription of longer RNAs. [11] The AMV reverse transcriptase from the avian myeloblastosis virus may also be used for RNA templates with strong secondary structures (i.e. high melting temperature). [12] cDNA is commonly generated from mRNA for gene expression analyses such as RT-qPCR and RNA-seq. [13] mRNA is selectively reverse transcribed using oligo-dT primers that are the reverse complement of the poly-adenylated tail on the 3' end of all mRNA. The oligo-dT primer anneals to the poly-adenylated tail of the mRNA to serve as a binding site for the reverse transcriptase to begin reverse transcription. An optimized mixture of oligo-dT and random hexamer primers increases the chance of obtaining full-length cDNA while reducing 5' or 3' bias. [14] Ribosomal RNA may also be depleted to enrich both mRNA and non-poly-adenylated transcripts such as some non-coding RNA. [15]

Second-strand synthesis

The result of first-strand syntheses, RNA-DNA hybrids, can be processed through multiple second-strand synthesis methods or processed directly in downstream assays. [16] [17] An early method known as hairpin-primed synthesis relied on hairpin formation on the 3' end of the first-strand cDNA to prime second-strand synthesis. However, priming is random and hairpin hydrolysis leads to loss of information. The Gubler and Hoffman Procedure uses E. Coli RNase H to nick mRNA that is replaced with E. Coli DNA Polymerase I and sealed with E. Coli DNA Ligase. An optimization of this procedure relies on low RNase H activity of M-MLV to nick mRNA with remaining RNA later removed by adding RNase H after DNA Polymerase translation of the second-strand cDNA. This prevents lost sequence information at the 5' end of the mRNA.

Applications

Complementary DNA is often used in gene cloning or as gene probes or in the creation of a cDNA library. When scientists transfer a gene from one cell into another cell in order to express the new genetic material as a protein in the recipient cell, the cDNA will be added to the recipient (rather than the entire gene), because the DNA for an entire gene may include DNA that does not code for the protein or that interrupts the coding sequence of the protein (e.g., introns). Partial sequences of cDNAs are often obtained as expressed sequence tags.

With amplification of DNA sequences via polymerase chain reaction (PCR) now commonplace, one will typically conduct reverse transcription as an initial step, followed by PCR to obtain an exact sequence of cDNA for intra-cellular expression. This is achieved by designing sequence-specific DNA primers that hybridize to the 5' and 3' ends of a cDNA region coding for a protein. Once amplified, the sequence can be cut at each end with nucleases and inserted into one of many small circular DNA sequences known as expression vectors. Such vectors allow for self-replication, inside the cells, and potentially integration in the host DNA. They typically also contain a strong promoter to drive transcription of the target cDNA into mRNA, which is then translated into protein.

cDNA is also used to study gene expression via methods such as RNA-seq or RT-qPCR. [18] [19] [20] For sequencing, RNA must be fragmented due to sequencing platform size limitations. Additionally, second-strand synthesized cDNA must be ligated with adapters that allow cDNA fragments to be PCR amplified and bind to sequencing flow cells. Gene-specific analysis methods commonly use microarrays and RT-qPCR to quantify cDNA levels via fluorometric and other methods.

On 13 June 2013, the United States Supreme Court ruled in the case of Association for Molecular Pathology v. Myriad Genetics that while naturally occurring genes cannot be patented, cDNA is patent-eligible because it does not occur naturally. [21]

Viruses and retrotransposons

Some viruses also use cDNA to turn their viral RNA into mRNA (viral RNA → cDNA → mRNA). The mRNA is used to make viral proteins to take over the host cell.

An example of this first step from viral RNA to cDNA can be seen in the HIV cycle of infection. Here, the host cell membrane becomes attached to the virus' lipid envelope which allows the viral capsid with two copies of viral genome RNA to enter the host. The cDNA copy is then made through reverse transcription of the viral RNA, a process facilitated by the chaperone CypA and a viral capsid associated reverse transcriptase. [22]

cDNA is also generated by retrotransposons in eukaryotic genomes. Retrotransposons are mobile genetic elements that move themselves within, and sometimes between, genomes via RNA intermediates. This mechanism is shared with viruses with the exclusion of the generation of infectious particles. [23] [24]

See also

Related Research Articles

<span class="mw-page-title-main">Retrovirus</span> Family of viruses

A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase enzyme to produce DNA from its RNA genome, the reverse of the usual pattern, thus retro (backward). The new DNA is then incorporated into the host cell genome by an integrase enzyme, at which point the retroviral DNA is referred to as a provirus. The host cell then treats the viral DNA as part of its own genome, transcribing and translating the viral genes along with the cell's own genes, producing the proteins required to assemble new copies of the virus. Many retroviruses cause serious diseases in humans, other mammals, and birds.

<span class="mw-page-title-main">Reverse transcriptase</span> Enzyme which generates DNA

A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobile genetic elements to proliferate within the host genome, and by eukaryotic cells to extend the telomeres at the ends of their linear chromosomes. Contrary to a widely held belief, the process does not violate the flows of genetic information as described by the classical central dogma, as transfers of information from RNA to DNA are explicitly held possible.

<span class="mw-page-title-main">Gene expression</span> Conversion of a genes sequence into a mature gene product or products

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.

<span class="mw-page-title-main">Transcription (biology)</span> Process of copying a segment of DNA into RNA

Transcription is the process of copying a segment of DNA into RNA. Some segments of DNA are transcribed into RNA molecules that can encode proteins, called messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).

<span class="mw-page-title-main">Reverse transcription polymerase chain reaction</span> Laboratory technique to multiply an RNA sample for study

Reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique combining reverse transcription of RNA into DNA and amplification of specific DNA targets using polymerase chain reaction (PCR). It is primarily used to measure the amount of a specific RNA. This is achieved by monitoring the amplification reaction using fluorescence, a technique called real-time PCR or quantitative PCR (qPCR). Confusion can arise because some authors use the acronym RT-PCR to denote real-time PCR. In this article, RT-PCR will denote Reverse Transcription PCR. Combined RT-PCR and qPCR are routinely used for analysis of gene expression and quantification of viral RNA in research and clinical settings.

<span class="mw-page-title-main">DNA synthesis</span> Replication of DNA

DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially or naturally. Nucleotide units are made up of a nitrogenous base, pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing occurs naturally when hydrogen bonds form between the nucleotide bases.

<span class="mw-page-title-main">Ribonuclease H</span> Enzyme family

Ribonuclease H is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly all organisms, from bacteria to archaea to eukaryotes.

A cDNA library is a combination of cloned cDNA fragments inserted into a collection of host cells, which constitute some portion of the transcriptome of the organism and are stored as a "library". cDNA is produced from fully transcribed mRNA found in the nucleus and therefore contains only the expressed genes of an organism. Similarly, tissue-specific cDNA libraries can be produced. In eukaryotic cells the mature mRNA is already spliced, hence the cDNA produced lacks introns and can be readily expressed in a bacterial cell. While information in cDNA libraries is a powerful and useful tool since gene products are easily identified, the libraries lack information about enhancers, introns, and other regulatory elements found in a genomic DNA library.

Cauliflower mosaic virus (CaMV) is a member of the genus Caulimovirus, one of the six genera in the family Caulimoviridae, which are pararetroviruses that infect plants. Pararetroviruses replicate through reverse transcription just like retroviruses, but the viral particles contain DNA instead of RNA.

<span class="mw-page-title-main">Retrotransposon</span> Type of genetic component

Retrotransposons are mobile elements which move in the host genome by converting their transcribed RNA into DNA through reverse transcription. Thus, they differ from Class II transposable elements, or DNA transposons, in utilizing an RNA intermediate for the transposition and leaving the transposition donor site unchanged.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

Rapid amplification of cDNA ends (RACE) is a technique used in molecular biology to obtain the full length sequence of an RNA transcript found within a cell. RACE results in the production of a cDNA copy of the RNA sequence of interest, produced through reverse transcription, followed by PCR amplification of the cDNA copies. The amplified cDNA copies are then sequenced and, if long enough, should map to a unique genomic region. RACE is commonly followed up by cloning before sequencing of what was originally individual RNA molecules. A more high-throughput alternative which is useful for identification of novel transcript structures, is to sequence the RACE-products by next generation sequencing technologies.

Lentivirus is a genus of retroviruses that cause chronic and deadly diseases characterized by long incubation periods, in humans and other mammalian species. The genus includes the human immunodeficiency virus (HIV), which causes AIDS. Lentiviruses are distributed worldwide, and are known to be hosted in apes, cows, goats, horses, cats, and sheep as well as several other mammals.

Baltimore classification is a system used to classify viruses based on their manner of messenger RNA (mRNA) synthesis. By organizing viruses based on their manner of mRNA production, it is possible to study viruses that behave similarly as a distinct group. Seven Baltimore groups are described that take into consideration whether the viral genome is made of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), whether the genome is single- or double-stranded, and whether the sense of a single-stranded RNA genome is positive or negative.

In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the context, sense may have slightly different meanings. For example, the negative-sense strand of DNA is equivalent to the template strand, whereas the positive-sense strand is the non-template strand whose nucleotide sequence is equivalent to the sequence of the mRNA transcript.

<span class="mw-page-title-main">RNA spike-in</span>

An RNA spike-in is an RNA transcript of known sequence and quantity used to calibrate measurements in RNA hybridization assays, such as DNA microarray experiments, RT-qPCR, and RNA-Seq.

<span class="mw-page-title-main">Hepatitis B virus DNA polymerase</span> Hepatitis B viral protein

Hepatitis B virus DNA polymerase is a hepatitis B viral protein. It is a DNA polymerase that can use either DNA or RNA templates and a ribonuclease H that cuts RNA in the duplex. Both functions are supplied by the reverse transcriptase (RT) domain.

<span class="mw-page-title-main">Retroviral ribonuclease H</span>

The retroviral ribonuclease H is a catalytic domain of the retroviral reverse transcriptase (RT) enzyme. The RT enzyme is used to generate complementary DNA (cDNA) from the retroviral RNA genome. This process is called reverse transcription. To complete this complex process, the retroviral RT enzymes need to adopt a multifunctional nature. They therefore possess 3 of the following biochemical activities: RNA-dependent DNA polymerase, ribonuclease H, and DNA-dependent DNA polymerase activities. Like all RNase H enzymes, the retroviral RNase H domain cleaves DNA/RNA duplexes and will not degrade DNA or unhybridized RNA.

G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

References

Mark D. Adams et al. "Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project." Science (American Association for the Advancement of Science) 252.5013 (1991): 1651–1656. Web.

Philip M. Murphy, and H. Lee Tiffany. "Cloning of Complementary DNA Encoding a Functional Human Interleukin-8 Receptor." Science (American Association for the Advancement of Science) 253.5025 (1991): 1280–1283. Web.

  1. Hastings, P. J. (1 January 2001), "Complementary DNA (cDNA)", in Brenner, Sydney; Miller, Jefferey H. (eds.), Encyclopedia of Genetics, New York: Academic Press, p. 433, ISBN   978-0-12-227080-2 , retrieved 29 November 2022
  2. Croy, Ron. "Molecular Genetics II - Genetic Engineering Course (Supplementary notes)". Durham University durham.ac.uk; 20 April 1998. Archived from the original on 24 August 2002. Retrieved 4 February 2015.
  3. Ying, Shao-Yao (1 July 2004). "Complementary DNA libraries". Molecular Biotechnology. 27 (3): 245–252. doi:10.1385/MB:27:3:245. ISSN   1559-0305. PMID   15247497. S2CID   25600775.
  4. "5 Steps to Optimal cDNA Synthesis - US". www.thermofisher.com. Retrieved 12 May 2020.
  5. Tavares, Lucélia; Alves, Paula M.; Ferreira, Ricardo B.; Santos, Claudia N. (6 January 2011). "Comparison of different methods for DNA-free RNA isolation from SK-N-MC neuroblastoma". BMC Research Notes. 4 (1): 3. doi: 10.1186/1756-0500-4-3 . ISSN   1756-0500. PMC   3050700 . PMID   21211020.
  6. R, Kansal; K, Kuhar; I, Verma; Rn, Gupta; Vk, Gupta; Kr, Koundal (December 2008). "Improved and Convenient Method of RNA Isolation From Polyphenols and Polysaccharide Rich Plant Tissues". Indian Journal of Experimental Biology. 46 (12): 842–5. PMID   19245182.
  7. I, Vomelová; Z, Vanícková; A, Sedo (2009). "Methods of RNA Purification. All Ways (Should) Lead to Rome". Folia Biologica. 55 (6): 243–51. PMID   20163774.
  8. Sellin Jeffries, Marlo K.; Kiss, Andor J.; Smith, Austin W.; Oris, James T. (14 November 2014). "A comparison of commercially-available automated and manual extraction kits for the isolation of total RNA from small tissue samples". BMC Biotechnology. 14 (1): 94. doi: 10.1186/s12896-014-0094-8 . ISSN   1472-6750. PMC   4239376 . PMID   25394494.
  9. "mRNA Isolation with Dynabeads in 15 minutes - US". www.thermofisher.com. Retrieved 20 May 2020.
  10. Gaarz, Andrea; Debey-Pascher, Svenja; Classen, Sabine; Eggle, Daniela; Gathof, Birgit; Chen, Jing; Fan, Jian-Bing; Voss, Thorsten; Schultze, Joachim L.; Staratschek-Jox, Andrea (May 2010). "Bead Array–Based microRNA Expression Profiling of Peripheral Blood and the Impact of Different RNA Isolation Approaches". The Journal of Molecular Diagnostics. 12 (3): 335–344. doi:10.2353/jmoldx.2010.090116. ISSN   1525-1578. PMC   2860470 . PMID   20228267.
  11. Haddad, Fadia; Baldwin, Kenneth M. (2010), King, Nicola (ed.), "Reverse Transcription of the Ribonucleic Acid: The First Step in RT-PCR Assay", RT-PCR Protocols: Second Edition, Methods in Molecular Biology, vol. 630, Humana Press, pp. 261–270, doi:10.1007/978-1-60761-629-0_17, ISBN   978-1-60761-629-0, PMID   20301003
  12. Martin, Karen. "Reverse Transcriptase & cDNA Overview & Applications". Gold Biotechnology. Retrieved 20 May 2020.
  13. "qPCR, Microarrays or RNA Sequencing - What to Choose?". BioSistemika. 10 August 2017. Retrieved 20 May 2020.
  14. "cDNA Synthesis | Bio-Rad". www.bio-rad.com. Retrieved 28 May 2020.
  15. Herbert, Zachary T.; Kershner, Jamie P.; Butty, Vincent L.; Thimmapuram, Jyothi; Choudhari, Sulbha; Alekseyev, Yuriy O.; Fan, Jun; Podnar, Jessica W.; Wilcox, Edward; Gipson, Jenny; Gillaspy, Allison (15 March 2018). "Cross-site comparison of ribosomal depletion kits for Illumina RNAseq library construction". BMC Genomics. 19 (1): 199. doi: 10.1186/s12864-018-4585-1 . ISSN   1471-2164. PMC   6389247 . PMID   29703133.
  16. Invitrogen. "cDNA Synthesis System" (PDF). Thermofisher. Archived (PDF) from the original on 22 December 2018. Retrieved 27 May 2020.
  17. Agarwal, Saurabh; Macfarlan, Todd S.; Sartor, Maureen A.; Iwase, Shigeki (21 January 2015). "Sequencing of first-strand cDNA library reveals full-length transcriptomes". Nature Communications. 6 (1): 6002. Bibcode:2015NatCo...6.6002A. doi:10.1038/ncomms7002. ISSN   2041-1723. PMC   5054741 . PMID   25607527.
  18. Derisi, J.; Penland, L.; Brown, P. O.; Bittner, M. L.; Meltzer, P. S.; Ray, M.; Chen, Y.; Su, Y. A.; Trent, J. M. (December 1996). "Use of a cDNA microarray to analyse gene expression patterns in human cancer". Nature Genetics. 14 (4): 457–460. doi:10.1038/ng1296-457. ISSN   1546-1718. PMID   8944026. S2CID   23091561.
  19. White, Adam K.; VanInsberghe, Michael; Petriv, Oleh I.; Hamidi, Mani; Sikorski, Darek; Marra, Marco A.; Piret, James; Aparicio, Samuel; Hansen, Carl L. (23 August 2011). "High-throughput microfluidic single-cell RT-qPCR". Proceedings of the National Academy of Sciences. 108 (34): 13999–14004. Bibcode:2011PNAS..10813999W. doi: 10.1073/pnas.1019446108 . ISSN   0027-8424. PMC   3161570 . PMID   21808033.
  20. Hrdlickova, Radmila; Toloue, Masoud; Tian, Bin (January 2017). "RNA-Seq methods for transcriptome analysis". Wiley Interdisciplinary Reviews. RNA. 8 (1): e1364. doi:10.1002/wrna.1364. ISSN   1757-7004. PMC   5717752 . PMID   27198714.
  21. Liptak, Adam (13 June 2013). "Supreme Court Rules Human Genes May Not Be Patented". The New York Times . Archived from the original on 1 January 2022. Retrieved 14 June 2013.
  22. Altfeld, Marcus; Gale, Michael Jr. (1 June 2015). "Innate immunity against HIV-1 infection". Nature Immunology. 16 (6): 554–562. doi: 10.1038/ni.3157 . ISSN   1529-2908. PMID   25988887. S2CID   1577651.
  23. Havecker, Ericka R.; Gao, Xiang; Voytas, Daniel F. (18 May 2004). "The diversity of LTR retrotransposons". Genome Biology. 5 (6): 225. doi: 10.1186/gb-2004-5-6-225 . ISSN   1474-760X. PMC   463057 . PMID   15186483.
  24. Cordaux, Richard; Batzer, Mark A. (October 2009). "The impact of retrotransposons on human genome evolution". Nature Reviews Genetics. 10 (10): 691–703. doi:10.1038/nrg2640. ISSN   1471-0064. PMC   2884099 . PMID   19763152.