An R-loop is a three-stranded nucleic acid structure, composed of a DNA:RNA hybrid and the associated non-template single-stranded DNA. R-loops may be formed in a variety of circumstances and may be tolerated or cleared by cellular components. The term "R-loop" was given to reflect the similarity of these structures to D-loops; the "R" in this case represents the involvement of an RNA moiety.
In the laboratory, R-loops may also be created by the hybridization of mature mRNA with double-stranded DNA under conditions favoring the formation of a DNA-RNA hybrid; in this case, the intron regions (which have been spliced out of the mRNA) form single-stranded DNA loops, as they cannot hybridize with complementary sequence in the mRNA. [1]
R-looping was first described in 1976. [2] Independent R-looping studies from the laboratories of Richard J. Roberts and Phillip A. Sharp showed that protein coding adenovirus genes contained DNA sequences that were not present in the mature mRNA. [3] [4] Roberts and Sharp were awarded the Nobel Prize in 1993 for independently discovering introns. After their discovery in adenovirus, introns were found in a number of eukaryotic genes such as the eukaryotic ovalbumin gene (first by the O'Malley laboratory, then confirmed by other groups), [5] [6] hexon DNA, [3] and extrachromosomal rRNA genes of Tetrahymena thermophila . [7]
In the mid-1980s, development of an antibody that binds specifically to the R-loop structure opened the door for immunofluorescence studies, as well as genome-wide characterization of R-loop formation by DRIP-seq. [8]
R-loop mapping is a laboratory technique used to distinguish introns from exons in double-stranded DNA. [9] These R-loops are visualized by electron microscopy and reveal intron regions of DNA by creating unbound loops at these regions. [10]
The potential for R-loops to serve as replication primers was demonstrated in 1980. [11] In 1994, R-loops were demonstrated to be present in vivo through analysis of plasmids isolated from E. coli mutants carrying mutations in topoisomerase. [12] This discovery of endogenous R-loops, in conjunction with rapid advances in genetic sequencing technologies, inspired a blossoming of R-loop research in the early 2000s that continues to this day. [13]
RNaseH enzymes are the primary proteins responsible for the dissolution of R-loops, acting to degrade the RNA moiety in order to allow the two complementary DNA strands to anneal. [14] Research over the past decade has identified more than 50 proteins that appear to influence R-loop accumulation, and while many of them are believed to contribute by sequestering or processing newly transcribed RNA to prevent re-annealing to the template, mechanisms of R-loop interaction for many of these proteins remain to be determined. [15]
R-loop formation is a key step in immunoglobulin class switching, a process that allows activated B cells to modulate antibody production. [16] They also appear to play a role in protecting some active promoters from methylation. [17] The presence of R-loops can also inhibit transcription. [18] Additionally, R-loop formation appears to be associated with “open” chromatin, characteristic of actively transcribed regions. [19] [20]
When unscheduled R-loops form, they can cause damage by a number of different mechanisms. [21] Exposed single-stranded DNA can come under attack by endogenous mutagens, including DNA-modifying enzymes such as activation-induced cytidine deaminase, and can block replication forks to induce fork collapse and subsequent double-strand breaks. [22] As well, R-loops may induce unscheduled replication by acting as a primer. [11] [20]
R-loop accumulation has been associated with a number of diseases, including amyotrophic lateral sclerosis type 4 (ALS4), ataxia oculomotor apraxia type 2 (AOA2), Aicardi–Goutières syndrome, Angelman syndrome, Prader–Willi syndrome, and cancer. [13]
Introns are non-coding regions within genes that are transcribed along with the coding regions of genes, but are subsequently removed from the primary RNA transcript by splicing. Actively transcribed regions of DNA often form R-loops that are vulnerable to DNA damage. Introns reduce R-loop formation and DNA damage in highly expressed yeast genes. [23] Genome-wide analysis showed that intron-containing genes display decreased R-loop levels and decreased DNA damage compared to intron-less genes of similar expression in both yeast and humans. [23] Inserting an intron within an R-loop prone gene can also suppress R-loop formation and recombination. Bonnet et al. (2017) [23] speculated that the function of introns in maintaining genetic stability may explain their evolutionary maintenance at certain locations, particularly in highly expressed genes.
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn are wrapped into 30-nanometer fibers that form tightly packed chromatin. Histones prevent DNA from becoming tangled and protect it from DNA damage. In addition, histones play important roles in gene regulation and DNA replication. Without histones, unwound DNA in chromosomes would be very long. For example, each human cell has about 1.8 meters of DNA if completely stretched out; however, when wound about histones, this length is reduced to about 90 micrometers (0.09 mm) of 30 nm diameter chromatin fibers.
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.
Enterobacteria phage λ is a bacterial virus, or bacteriophage, that infects the bacterial species Escherichia coli. It was discovered by Esther Lederberg in 1950. The wild type of this virus has a temperate life cycle that allows it to either reside within the genome of its host through lysogeny or enter into a lytic phase, during which it kills and lyses the cell to produce offspring. Lambda strains, mutated at specific sites, are unable to lysogenize cells; instead, they grow and enter the lytic cycle after superinfecting an already lysogenized cell.
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself or by forming a template for production of proteins. RNA and deoxyribonucleic acid (DNA) are nucleic acids. The nucleic acids constitute one of the four major macromolecules essential for all known forms of life. RNA is assembled as a chain of nucleotides. Cellular organisms use messenger RNA (mRNA) to convey genetic information that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.
An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules called non-coding RNAs (ncRNAs). mRNA comprises only 1–3% of total RNA samples. Less than 2% of the human genome can be transcribed into mRNA, while at least 80% of mammalian genomic DNA can be actively transcribed, with the majority of this 80% considered to be ncRNA.
A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs (snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to specific proteins to form a small nuclear ribonucleoprotein complex, which in turn combines with other snRNPs to form a large ribonucleoprotein complex called a spliceosome. The spliceosome removes introns from a transcribed pre-mRNA, a type of primary transcript. This process is generally referred to as splicing. An analogy is a film editor, who selectively cuts out irrelevant or incorrect material from the initial film and sends the cleaned-up version to the director for the final cut.
The nucleoid is an irregularly shaped region within the prokaryotic cell that contains all or most of the genetic material. The chromosome of a typical prokaryote is circular, and its length is very large compared to the cell dimensions, so it needs to be compacted in order to fit. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. Instead, the nucleoid forms by condensation and functional arrangement with the help of chromosomal architectural proteins and RNA molecules as well as DNA supercoiling. The length of a genome widely varies and a cell may contain multiple copies of it.
A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.
Adeno-associated viruses (AAV) are small viruses that infect humans and some other primate species. They belong to the genus Dependoparvovirus, which in turn belongs to the family Parvoviridae. They are small replication-defective, nonenveloped viruses and have linear single-stranded DNA (ssDNA) genome of approximately 4.8 kilobases (kb).
Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.
In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the context, sense may have slightly different meanings. For example, negative-sense strand of DNA is equivalent to the template strand, whereas the positive-sense strand is the non-template strand whose nucleotide sequence is equivalent to the sequence of the mRNA transcript.
Eukaryotic transcription is the elaborate process that eukaryotic cells use to copy genetic information stored in DNA into units of transportable complementary RNA replica. Gene transcription occurs in both eukaryotic and prokaryotic cells. Unlike prokaryotic RNA polymerase that initiates the transcription of all different types of RNA, RNA polymerase in eukaryotes comes in three variations, each translating a different type of gene. A eukaryotic cell has a nucleus that separates the processes of transcription and translation. Eukaryotic transcription occurs within the nucleus where DNA is packaged into nucleosomes and higher order chromatin structures. The complexity of the eukaryotic genome necessitates a great variety and complexity of gene expression control.
Intrinsic, or rho-independent termination, is a process in prokaryotes to signal the end of transcription and release the newly constructed RNA molecule. In prokaryotes such as E. coli, transcription is terminated either by a rho-dependent process or rho-independent process. In the Rho-dependent process, the rho-protein locates and binds the signal sequence in the mRNA and signals for cleavage. Contrarily, intrinsic termination does not require a special protein to signal for termination and is controlled by the specific sequences of RNA. When the termination process begins, the transcribed mRNA forms a stable secondary structure hairpin loop, also known as a Stem-loop. This RNA hairpin is followed by multiple uracil nucleotides. The bonds between uracil and adenine are very weak. A protein bound to RNA polymerase (nusA) binds to the stem-loop structure tightly enough to cause the polymerase to temporarily stall. This pausing of the polymerase coincides with transcription of the poly-uracil sequence. The weak adenine-uracil bonds lower the energy of destabilization for the RNA-DNA duplex, allowing it to unwind and dissociate from the RNA polymerase. Overall, the modified RNA structure is what terminates transcription.
In molecular biology, a displacement loop or D-loop is a DNA structure where the two strands of a double-stranded DNA molecule are separated for a stretch and held apart by a third strand of DNA. An R-loop is similar to a D-loop, but in this case the third strand is RNA rather than DNA. The third strand has a base sequence which is complementary to one of the main strands and pairs with it, thus displacing the other complementary main strand in the region. Within that region the structure is thus a form of triple-stranded DNA. A diagram in the paper introducing the term illustrated the D-loop with a shape resembling a capital "D", where the displaced strand formed the loop of the "D".
Genome instability refers to a high frequency of mutations within the genome of a cellular lineage. These mutations can include changes in nucleic acid sequences, chromosomal rearrangements or aneuploidy. Genome instability does occur in bacteria. In multicellular organisms genome instability is central to carcinogenesis, and in humans it is also a factor in some neurodegenerative diseases such as amyotrophic lateral sclerosis or the neuromuscular disease myotonic dystrophy.
DRIP-seq (DRIP-sequencing) is a technology for genome-wide profiling of a type of DNA-RNA hybrid called an "R-loop". DRIP-seq utilizes a sequence-independent but structure-specific antibody for DNA-RNA immunoprecipitation (DRIP) to capture R-loops for massively parallel DNA sequencing.
Nuclear organization refers to the spatial distribution of chromatin within a cell nucleus. There are many different levels and scales of nuclear organisation. Chromatin is a higher order structure of DNA.
Transcription-translation coupling is a mechanism of gene expression regulation in which synthesis of an mRNA (transcription) is affected by its concurrent decoding (translation). In prokaryotes, mRNAs are translated while they are transcribed. This allows communication between RNA polymerase, the multisubunit enzyme that catalyzes transcription, and the ribosome, which catalyzes translation. Coupling involves both direct physical interactions between RNA polymerase and the ribosome, as well as ribosome-induced changes to the structure and accessibility of the intervening mRNA that affect transcription.