This article needs additional citations for verification .(October 2010) |
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engineered forms.
In engineered forms, it often is a copy (replicate) of the naturally occurring DNA from any particular organism's natural genome; the organism's own mRNA was naturally transcribed from its DNA, and the cDNA is reverse transcribed from the mRNA, yielding a duplicate of the original DNA. Engineered cDNA is often used to express a specific protein in a cell that does not normally express that protein (i.e., heterologous expression), or to sequence or quantify mRNA molecules using DNA based methods (qPCR, RNA-seq). cDNA that codes for a specific protein can be transferred to a recipient cell for expression as part of recombinant DNA, often bacterial or yeast expression systems. [1] cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays, qPCR, and RNA-seq.
In natural forms, cDNA is produced by retroviruses (such as HIV-1, HIV-2, simian immunodeficiency virus, etc.) and then integrated into the host's genome, where it creates a provirus. [2]
The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence, expressed as DNA bases (deoxy-GCAT) rather than RNA bases (GCAU).
Patentability of cDNA was a subject of a 2013 US Supreme Court decision in Association for Molecular Pathology v. Myriad Genetics, Inc. As a compromise, the Court declared, that exons-only cDNA is patent-eligible, whereas isolated sequences of naturally occurring DNA comprising introns are not.
RNA serves as a template for cDNA synthesis. [3] In cellular life, cDNA is generated by viruses and retrotransposons for integration of RNA into target genomic DNA. In molecular biology, RNA is purified from source material after genomic DNA, proteins and other cellular components are removed. cDNA is then synthesized through in vitro reverse transcription. [4]
RNA is transcribed from genomic DNA in host cells and is extracted by first lysing cells then purifying RNA utilizing widely used methods such as phenol-chloroform, silica column, and bead-based RNA extraction methods. [5] Extraction methods vary depending on the source material. For example, extracting RNA from plant tissue requires additional reagents, such as polyvinylpyrrolidone (PVP), to remove phenolic compounds, carbohydrates, and other compounds that will otherwise render RNA unusable. [6] To remove DNA and proteins, enzymes such as DNase and Proteinase K are used for degradation. [7] Importantly, RNA integrity is maintained by inactivating RNases with chaotropic agents such as guanidinium isothiocyanate, sodium dodecyl sulphate (SDS), phenol or chloroform. Total RNA is then separated from other cellular components and precipitated with alcohol. Various commercial kits exist for simple and rapid RNA extractions for specific applications. [8] Additional bead-based methods can be used to isolate specific sub-types of RNA (e.g. mRNA and microRNA) based on size or unique RNA regions. [9] [10]
Using a reverse transcriptase enzyme and purified RNA templates, one strand of cDNA is produced (first-strand cDNA synthesis). The M-MLV reverse transcriptase from the Moloney murine leukemia virus is commonly used due to its reduced RNase H activity suited for transcription of longer RNAs. [11] The AMV reverse transcriptase from the avian myeloblastosis virus may also be used for RNA templates with strong secondary structures (i.e. high melting temperature). [12] cDNA is commonly generated from mRNA for gene expression analyses such as RT-qPCR and RNA-seq. [13] mRNA is selectively reverse transcribed using oligo-dT primers that are the reverse complement of the poly-adenylated tail on the 3' end of all mRNA. The oligo-dT primer anneals to the poly-adenylated tail of the mRNA to serve as a binding site for the reverse transcriptase to begin reverse transcription. An optimized mixture of oligo-dT and random hexamer primers increases the chance of obtaining full-length cDNA while reducing 5' or 3' bias. [14] Ribosomal RNA may also be depleted to enrich both mRNA and non-poly-adenylated transcripts such as some non-coding RNA. [15]
The result of first-strand syntheses, RNA-DNA hybrids, can be processed through multiple second-strand synthesis methods or processed directly in downstream assays. [16] [17] An early method known as hairpin-primed synthesis relied on hairpin formation on the 3' end of the first-strand cDNA to prime second-strand synthesis. However, priming is random and hairpin hydrolysis leads to loss of information. The Gubler and Hoffman Procedure uses E. Coli RNase H to nick mRNA that is replaced with E. Coli DNA Polymerase I and sealed with E. Coli DNA Ligase. An optimization of this procedure relies on low RNase H activity of M-MLV to nick mRNA with remaining RNA later removed by adding RNase H after DNA Polymerase translation of the second-strand cDNA. This prevents lost sequence information at the 5' end of the mRNA.
Complementary DNA is often used in gene cloning or as gene probes or in the creation of a cDNA library. When scientists transfer a gene from one cell into another cell in order to express the new genetic material as a protein in the recipient cell, the cDNA will be added to the recipient (rather than the entire gene), because the DNA for an entire gene may include DNA that does not code for the protein or that interrupts the coding sequence of the protein (e.g., introns). Partial sequences of cDNAs are often obtained as expressed sequence tags.
With amplification of DNA sequences via polymerase chain reaction (PCR) now commonplace, one will typically conduct reverse transcription as an initial step, followed by PCR to obtain an exact sequence of cDNA for intra-cellular expression. This is achieved by designing sequence-specific DNA primers that hybridize to the 5' and 3' ends of a cDNA region coding for a protein. Once amplified, the sequence can be cut at each end with nucleases and inserted into one of many small circular DNA sequences known as expression vectors. Such vectors allow for self-replication, inside the cells, and potentially integration in the host DNA. They typically also contain a strong promoter to drive transcription of the target cDNA into mRNA, which is then translated into protein.
cDNA is also used to study gene expression via methods such as RNA-seq or RT-qPCR. [18] [19] [20] For sequencing, RNA must be fragmented due to sequencing platform size limitations. Additionally, second-strand synthesized cDNA must be ligated with adapters that allow cDNA fragments to be PCR amplified and bind to sequencing flow cells. Gene-specific analysis methods commonly use microarrays and RT-qPCR to quantify cDNA levels via fluorometric and other methods.
On 13 June 2013, the United States Supreme Court ruled in the case of Association for Molecular Pathology v. Myriad Genetics that while naturally occurring genes cannot be patented, cDNA is patent-eligible because it does not occur naturally. [21]
Some viruses also use cDNA to turn their viral RNA into mRNA (viral RNA → cDNA → mRNA). The mRNA is used to make viral proteins to take over the host cell.
An example of this first step from viral RNA to cDNA can be seen in the HIV cycle of infection. Here, the host cell membrane becomes attached to the virus' lipid envelope which allows the viral capsid with two copies of viral genome RNA to enter the host. The cDNA copy is then made through reverse transcription of the viral RNA, a process facilitated by the chaperone CypA and a viral capsid associated reverse transcriptase. [22]
cDNA is also generated by retrotransposons in eukaryotic genomes. Retrotransposons are mobile genetic elements that move themselves within, and sometimes between, genomes via RNA intermediates. This mechanism is shared with viruses with the exclusion of the generation of infectious particles. [23] [24]
A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase enzyme to produce DNA from its RNA genome, the reverse of the usual pattern, thus retro (backward). The new DNA is then incorporated into the host cell genome by an integrase enzyme, at which point the retroviral DNA is referred to as a provirus. The host cell then treats the viral DNA as part of its own genome, transcribing and translating the viral genes along with the cell's own genes, producing the proteins required to assemble new copies of the virus. Many retroviruses cause serious diseases in humans, other mammals, and birds.
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobile genetic elements to proliferate within the host genome, and by eukaryotic cells to extend the telomeres at the ends of their linear chromosomes. Contrary to a widely held belief, the process does not violate the flows of genetic information as described by the classical central dogma, as transfers of information from RNA to DNA are explicitly held possible.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins produce messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).
Reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique combining reverse transcription of RNA into DNA and amplification of specific DNA targets using polymerase chain reaction (PCR). It is primarily used to measure the amount of a specific RNA. This is achieved by monitoring the amplification reaction using fluorescence, a technique called real-time PCR or quantitative PCR (qPCR). Confusion can arise because some authors use the acronym RT-PCR to denote real-time PCR. In this article, RT-PCR will denote Reverse Transcription PCR. Combined RT-PCR and qPCR are routinely used for analysis of gene expression and quantification of viral RNA in research and clinical settings.
Hepadnaviridae is a family of viruses. Humans, apes, and birds serve as natural hosts. There are currently 18 species in this family, divided among 5 genera. Its best-known member is hepatitis B virus. Diseases associated with this family include: liver infections, such as hepatitis, hepatocellular carcinomas, and cirrhosis. It is the sole accepted family in the order Blubervirales.
DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially or naturally. Nucleotide units are made up of a nitrogenous base, pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing occurs naturally when hydrogen bonds form between the nucleotide bases.
Ribonuclease H is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly all organisms, from bacteria to archaea to eukaryotes.
A cDNA library is a combination of cloned cDNA fragments inserted into a collection of host cells, which constitute some portion of the transcriptome of the organism and are stored as a "library". cDNA is produced from fully transcribed mRNA found in the nucleus and therefore contains only the expressed genes of an organism. Similarly, tissue-specific cDNA libraries can be produced. In eukaryotic cells the mature mRNA is already spliced, hence the cDNA produced lacks introns and can be readily expressed in a bacterial cell. While information in cDNA libraries is a powerful and useful tool since gene products are easily identified, the libraries lack information about enhancers, introns, and other regulatory elements found in a genomic DNA library.
Cauliflower mosaic virus (CaMV) is a member of the genus Caulimovirus, one of the six genera in the family Caulimoviridae, which are pararetroviruses that infect plants. Pararetroviruses replicate through reverse transcription just like retroviruses, but the viral particles contain DNA instead of RNA.
Retrotransposons are mobile elements which move in the host genome by converting their transcribed RNA into DNA through the reverse transcription. Thus, they differ from Class II transposable elements, or DNA transposons, in utilizing an RNA intermediate for the transposition and leaving the transposition donor site unchanged.
Lentivirus is a genus of retroviruses that cause chronic and deadly diseases characterized by long incubation periods, in humans and other mammalian species. The genus includes the human immunodeficiency virus (HIV), which causes AIDS. Lentiviruses are distributed worldwide, and are known to be hosted in apes, cows, goats, horses, cats, and sheep as well as several other mammals.
Baltimore classification is a system used to classify viruses based on their manner of messenger RNA (mRNA) synthesis. By organizing viruses based on their manner of mRNA production, it is possible to study viruses that behave similarly as a distinct group. Seven Baltimore groups are described that take into consideration whether the viral genome is made of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), whether the genome is single- or double-stranded, and whether the sense of a single-stranded RNA genome is positive or negative.
In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the context, sense may have slightly different meanings. For example, the negative-sense strand of DNA is equivalent to the template strand, whereas the positive-sense strand is the non-template strand whose nucleotide sequence is equivalent to the sequence of the mRNA transcript.
An RNA spike-in is an RNA transcript of known sequence and quantity used to calibrate measurements in RNA hybridization assays, such as DNA microarray experiments, RT-qPCR, and RNA-Seq.
Hepatitis B virus DNA polymerase is a hepatitis B viral protein. It is a DNA polymerase that can use either DNA or RNA templates and a ribonuclease H that cuts RNA in the duplex. Both functions are supplied by the reverse transcriptase (RT) domain.
The retroviral ribonuclease H is a catalytic domain of the retroviral reverse transcriptase (RT) enzyme. The RT enzyme is used to generate complementary DNA (cDNA) from the retroviral RNA genome. This process is called reverse transcription. To complete this complex process, the retroviral RT enzymes need to adopt a multifunctional nature. They therefore possess 3 of the following biochemical activities: RNA-dependent DNA polymerase, ribonuclease H, and DNA-dependent DNA polymerase activities. Like all RNase H enzymes, the retroviral RNase H domain cleaves DNA/RNA duplexes and will not degrade DNA or unhybridized RNA.
G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
This glossary of cellular and molecular biology is a list of definitions of terms and concepts commonly used in the study of cell biology, molecular biology, and related disciplines, including molecular genetics, biochemistry, and microbiology. It is split across two articles:
Mark D. Adams et al. "Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project." Science (American Association for the Advancement of Science) 252.5013 (1991): 1651–1656. Web.
Philip M. Murphy, and H. Lee Tiffany. "Cloning of Complementary DNA Encoding a Functional Human Interleukin-8 Receptor." Science (American Association for the Advancement of Science) 253.5025 (1991): 1280–1283. Web.