Experimental approaches of determining the structure of nucleic acids, such as RNA and DNA, can be largely classified into biophysical and biochemical methods. Biophysical methods use the fundamental physical properties of molecules for structure determination, including X-ray crystallography, NMR and cryo-EM. Biochemical methods exploit the chemical properties of nucleic acids using specific reagents and conditions to assay the structure of nucleic acids. [1] Such methods may involve chemical probing with specific reagents, or rely on native or analogue chemistry. Different experimental approaches have unique merits and are suitable for different experimental purposes.
This section needs expansionwith: its meaningful contribution to the study of nucleic acid structures, and some examples. You can help by adding to it.(December 2018) |
X-ray crystallography is not common for nucleic acids alone, since neither DNA nor RNA readily form crystals. This is due to the greater degree of intrinsic disorder and dynamism in nucleic acid structures and the negatively charged (deoxy)ribose-phosphate backbones, which repel each other in close proximity. Therefore, crystallized nucleic acids tend to be complexed with a protein of interest to provide structural order and neutralize the negative charge.[ citation needed ]
Nucleic acid NMR is the use of NMR spectroscopy to obtain information about the structure and dynamics of nucleic acid molecules, such as DNA or RNA. As of 2003, nearly half of all known RNA structures had been determined by NMR spectroscopy. [2]
Nucleic acid NMR uses similar techniques as protein NMR, but has several differences. Nucleic acids have a smaller percentage of hydrogen atoms, which are the atoms usually observed in NMR, and because nucleic acid double helices are stiff and roughly linear, they do not fold back on themselves to give "long-range" correlations. [3] The types of NMR usually done with nucleic acids are 1H or proton NMR, 13C NMR, 15N NMR, and 31P NMR. Two-dimensional NMR methods are almost always used, such as correlation spectroscopy (COSY) and total coherence transfer spectroscopy (TOCSY) to detect through-bond nuclear couplings, and nuclear Overhauser effect spectroscopy (NOESY) to detect couplings between nuclei that are close to each other in space. [4]
Parameters taken from the spectrum, mainly NOESY cross-peaks and coupling constants, can be used to determine local structural features such as glycosidic bond angles, dihedral angles (using the Karplus equation), and sugar pucker conformations. For large-scale structure, these local parameters must be supplemented with other structural assumptions or models, because errors add up as the double helix is traversed, and unlike with proteins, the double helix does not have a compact interior and does not fold back upon itself. NMR is also useful for investigating nonstandard geometries such as bent helices, non-Watson–Crick basepairing, and coaxial stacking. It has been especially useful in probing the structure of natural RNA oligonucleotides, which tend to adopt complex conformations such as stem-loops and pseudoknots. NMR is also useful for probing the binding of nucleic acid molecules to other molecules, such as proteins or drugs, by seeing which resonances are shifted upon binding of the other molecule. [4]
This section needs expansion. You can help by adding to it. (March 2020) |
Cryogenic electron microscopy (cryo-EM) is a technique that uses an electron beam to image samples that have been cryogenically preserved in an aqueous solution. Liquid samples are pipetted on small metallic grids and plunged into a liquid ethane/propane solution which is kept extremely cold by a liquid nitrogen bath. Upon this freezing process, water molecules in the sample do not have enough time to form hexagonal lattices as found in ice, and therefore the sample is preserved in a glassy water-like state (also referred to as a vitrified ice), making these samples easier to image using the electron beam. An advantage of cryo-EM over x-ray crystallography is that the samples are preserved in their aqueous solution state and not perturbed by forming a crystal of the sample. One disadvantage, is that it is difficult to resolve nucleic acid or protein structures that are smaller than ~75 kilodaltons, partly due to the difficulty of having enough contrast to locate particles in this vitrified aqueous solution. Another disadvantage is that to attain atomic-level structure information about a sample requires taking many images (often referred to as electron micrographs) and averaging over those images in a process called single-particle reconstruction. This is a computationally intensive process.
Cryo-EM is a newer, less perturbative version of transmission electron microscopy (TEM). It is less perturbative because the sample is not dried onto a surface, this drying process is often done in negative-stain TEM, and because Cryo-EM does not require contrast agent like heavy metal salts (e.g. uranyl acetate or phoshotungstic acid) which also may affect the structure of the biomolecule. Transmission electron microscopy, as a technique, utilizes the fact that samples interact with a beam of electrons and only parts of the sample that do not interact with the electron beam are allowed to 'transmit' onto the electron detection system. TEM, in general, has been a useful technique in determining nucleic acid structure since the 1960s. [5] [6] While double-stranded DNA (dsDNA) structure may not traditionally be considered structure, in the typical sense of alternating segments of single- and double-stranded regions, in reality, dsDNA is not simply a perfectly ordered double helix at every location of its length due to thermal fluctuations in the DNA and alternative structures that can form like g-quadruplexes. CryoEM of nucleic acid has been done on ribosomes, [7] viral RNA, [8] and single-stranded RNA structures within viruses. [9] [10] These studies have resolved structural features at different resolutions from the nucleobase level (2-3 angstroms) up to tertiary structure motifs (greater than a nanometer).
RNA chemical probing uses chemicals that react with RNAs. Importantly, their reactivity depends on local RNA structure e.g. base-pairing or accessibility. Differences in reactivity can therefore serve as a footprint of structure along the sequence. Different reagents react at different positions on the RNA structure, and have different spectra of reactivity. [1] Recent advances allow the simultaneous study of the structure of many RNAs (transcriptome-wide probing) [11] and the direct assay of RNA molecules in their cellular environment (in-cell probing). [12]
Structured RNA is first reacted with the probing reagents for a given incubation time. These reagents would form a covalent adduct on the RNA at the site of reaction. When the RNA is reverse transcribed using a reverse transcriptase into a DNA copy, the DNA generated is truncated at the positions of reaction because the enzyme is blocked by the adducts. The collection of DNA molecules of various truncated lengths therefore informs the frequency of reaction at every base position, which reflects the structure profile along the RNA. This is traditionally assayed by running the DNA on a gel, and the intensity of bands inform the frequency of observing a truncation at each position. Recent approaches use high-throughput sequencing to achieve the same purpose with greater throughput and sensitivity.
The reactivity profile can be used to study the degree of structure at particular positions for specific hypotheses, or used in conjunction with computational algorithms to produce a complete experimentally supported structure model. [13]
Depending on the chemical reagent used, some reagents, e.g. hydroxyl radicals, would cleave the RNA molecule instead. The result in the truncated DNA is the same. Some reagents, e.g. DMS, sometimes do not block the reverse transcriptase, but trigger a mistake at the site in the DNA copy instead. These can be detected when using high-throughput sequencing methods, and is sometimes employed for improved results of probing as mutational profiling (MaP). [14] [15]
Positions on the RNA can be protected from the reagents not only by local structure but also by a binding protein over that position. This has led some work to use chemical probing to also assay protein-binding. [16]
As hydroxyl radicals are short-lived in solution, they need to be generated upon experiment. This can be done using H2O2, ascorbic acid, and Fe(II)-EDTA complex. These reagents form a system that generates hydroxyl radicals through Fenton chemistry. The hydroxyl radicals can then react with the nucleic acid molecules. [17] Hydroxyl radicals attack the ribose/deoxyribose ring and this results in breaking of the sugar-phosphate backbone. Sites under protection from binding proteins or RNA tertiary structure would be cleaved by hydroxyl radical at a lower rate. [17] These positions would therefore show up as absence of bands on the gel, or low signal through sequencing. [17] [18]
Dimethyl sulfate, known as DMS, is a chemical that can be used to modify nucleic acids in order to determine secondary structure. Reaction with DMS adds a methyl adduct at the site, known as methylation. In particular, DMS methylates N1 of adenine (A) and N3 of cytosine (C), [19] both located at the site of natural hydrogen bonds upon base-pairing. Therefore, modification can only occur at A and C nucleobases that are single-stranded, base paired at the end of a helix, or in a base pair at or next to a GU wobble pair, the latter two being positions where the base-pairing can occasionally open up. Moreover, since modified sites cannot be base-paired, modification sites can be detected by RT-PCR, where the reverse transcriptase falls off at methylated bases and produces different truncated cDNAs. These truncated cDNAs can be identified through gel electrophoresis or high-throughput sequencing.
Improving upon truncation-based methods, DMS mutational profiling with sequencing (DMS-MaPseq) can detect multiple DMS modifications in a single RNA molecule, which enables one to obtain more information per read (for a read of 150 nt, typically two to three mutation sites, rather than zero to one truncation sites), determine structures of low-abundance RNAs, and identify subpopulations of RNAs with alternative secondary structures. [20] DMS-MaPseq uses a thermostable group II intron reverse transcriptase (TGIRT) that creates a mutation (rather than a truncation) in the cDNA when it encounters a base methylated by DMS, but otherwise it reverse transcribes with high fidelity. Sequencing the resulting cDNA identifies which bases were mutated during reverse transcription; these bases cannot have been base-paired in the original RNA.
DMS modification can also be used for DNA, for example in footprinting DNA-protein interactions. [21]
Selective 2′-hydroxyl acylation analyzed by primer extension, or SHAPE, takes advantage of reagents that preferentially modify the backbone of RNA in structurally flexible regions.
Reagents such as N-methylisatoic anhydride (NMIA) and 1-methyl-7-nitroisatoic anhydride (1M7) [22] react with the 2'-hydroxyl group to form adducts on the 2'-hydroxyl of the RNA backbone. Compared to the chemicals used in other RNA probing techniques, these reagents have the advantage of being largely unbiased to base identity, while remaining very sensitive to conformational dynamics. Nucleotides which are constrained (usually by base-pairing) show less adduct formation than nucleotides which are unpaired. Adduct formation is quantified for each nucleotide in a given RNA by extension of a complementary DNA primer with reverse transcriptase and comparison of the resulting fragments with those from an unmodified control. [23] SHAPE therefore reports on RNA structure at the individual nucleotide level. This data can be used as input to generate highly accurate secondary structure models. [24] SHAPE has been used to analyze diverse RNA structures, including that of an entire HIV-1 genome. [25] The best approach is to use a combination of chemical probing reagents and experimental data. [26] In SHAPE-Seq SHAPE is extended by bar-code based multiplexing combined with RNA-Seq and can be performed in a high-throughput fashion. [27]
The carbodiimide moiety can also form covalent adducts at exposed nucleobases, which are uracil, and to a smaller extent guanine, upon nucleophilic attack by a deprotonated N. They react primarily with N3 of uracil and N1 of guanine modifying two sites responsible for hydrogen bonding on the bases. [19]
1-cyclohexyl-(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate, also known as CMCT or CMC, is the most commonly used carbodiimide for RNA structure probing. [29] [30] Similar to DMS, it can be detected by reverse transcription followed by gel electrophoresis or high-throughput sequencing. As it is reactive towards G and U, it can be used to complement the data from DMS probing experiments, which inform A and C. [31]
1-ethyl-3-(3-dimethylaminopropyl)carbodiimide, also known as EDC, is a water-soluble carbodiimide that exhibits similar reactivity as CMC, and is also used for the chemical probing of RNA structure. EDC is able to permeate into cells and is thus used for direct in-cell probing of RNA in their native environments. [32] [28]
Some 1,2-dicarbonyl compounds are able to react with single-stranded guanine (G) at N1 and N2, forming a five-membered ring adduct at the Watson-Crick face.
1,1-Dihydroxy-3-ethoxy-2-butanone, also known as kethoxal, has a structure related to 1,2-dicarbonyls, and was the first in this category used extensively for the chemical probing of RNA. Kethoxal causes the modification of guanine, specifically altering the N1 and the exocyclic amino group (N2) simultaneously by covalent interaction. [35]
Glyoxal, methylglyoxal, and phenylglyoxal, which all carry the key 1,2-dicarbonyl moiety, all react with free guanines similar to kethoxal, and can be used to probe unpaired guanine bases in structured RNA. Due to their chemical properties, these reagents can permeate readily into cells and can therefore be used to assay RNAs in their native cellular environments. [34]
Light-Activated Structural Examination of RNA (LASER) probing utilizes UV light to activate nicotinoyl azide (NAz), generating highly reactive nitrenium cation in water, which reacts with solvent accessible guanosine and adenosine of RNA at C-8 position through a barrierless Friedel-Crafts reaction. LASER probing targets both single-stranded and double-stranded residues as long as they are solvent accessible. Because hydroxyl radical probing requires synchrotron radiation to measure solvent accessibility of RNA in vivo, it is hard to apply hydroxyl radical probing to footprint RNA in cells for many laboratories. In contrast, LASER probing utilizes a hand-held UV lamp (20 W) for excitation, it is much easier to apply LASER probing for in vivo studying RNA solvent accessibility. This chemical probing method is light-controllable, and probes solvent accessibility of nucleobase, which has been shown to footprint RNA binding proteins inside cells. [36]
In-line probing does not involve treatment with any type of chemical or reagent to modify RNA structures. This type of probing assay uses the structure dependent cleavage of RNA; single stranded regions are more flexible and unstable and will degrade over time. [38] The process of in-line probing is often used to determine changes in structure due to ligand binding. Binding of a ligand can result in different cleavage patterns. The process of in-line probing involves incubation of structural or functional RNAs over a long period of time. This period can be several days, but varies in each experiment. The incubated products are then run on a gel to visualize the bands. This experiment is often done using two different conditions: 1) with ligand and 2) in the absence of ligand. [37] Cleavage results in shorter band lengths and is indicative of areas that are not basepaired, as basepaired regions tend to be less sensitive to spontaneous cleavage. [38] In-line probing is a functional assay that can be used to determine structural changes in RNA in response to ligand binding. It can directly show the change in flexibility and binding of regions of RNA in response to a ligand, as well as compare that response to analogous ligands. This assay is commonly used in dynamic studies, specifically when examining riboswitches. [38]
Nucleotide analog interference mapping (NAIM) is the process of using nucleotide analogs, molecules that are similar in some ways to nucleotides but lack function, to determine the importance of a functional group at each location of an RNA molecule. [39] [40] The process of NAIM is to insert a single nucleotide analog into a unique site. This can be done by transcribing a short RNA using T7 RNA polymerase, then synthesizing a short oligonucleotide containing the analog in a specific position, then ligating them together on the DNA template using a ligase. [39] The nucleotide analogs are tagged with a phosphorothioate, the active members of the RNA population are then distinguished from the inactive members, the inactive members then have the phosphorothioate tag removed and the analog sites are identified using gel electrophoresis and autoradiography. [39] This indicates a functionally important nucleotide, as cleavage of the phosphorothioate by iodine results in an RNA that is cleaved at the site of the nucleotide analog insert. By running these truncated RNA molecules on a gel, the nucleotide of interest can be identified against a sequencing experiment [40] Site directed incorporation results indicate positions of importance where when running on a gel, functional RNAs that have the analog incorporated at that position will have a band present, but if the analog results in non-functionality, when the functional RNA molecules are run on a gel there will be no band corresponding to that position on the gel. [41] This process can be used to evaluate an entire area, where analogs are placed in site specific locations, differing by a single nucleotide, then when functional RNAs are isolated and run on a gel, all areas where bands are produced indicate non-essential nucleotides, but areas where bands are absent from the functional RNA indicate that inserting a nucleotide analog in that position caused the RNA molecule to become non-functional [39]
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" base pairs allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.
Deoxyribonucleic acid is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.
Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). If the sugar is ribose, the polymer is RNA; if the sugar is the ribose derivative deoxyribose, the polymer is DNA.
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all life-forms on Earth. Nucleotides are obtained in the diet and are also synthesized from common nutrients by the liver.
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid (DNA) are nucleic acids. Along with lipids, proteins, and carbohydrates, nucleic acids constitute one of the four major macromolecules essential for all known forms of life. Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA, RNA is found in nature as a single strand folded onto itself, rather than a paired double strand. Cellular organisms use messenger RNA (mRNA) to convey genetic information that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.
The RNA world is a hypothetical stage in the evolutionary history of life on Earth, in which self-replicating RNA molecules proliferated before the evolution of DNA and proteins. The term also refers to the hypothesis that posits the existence of this stage.
Nucleobases, also known as nitrogenous bases or often simply bases, are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nucleic acids. The ability of nucleobases to form base pairs and to stack one upon another leads directly to long-chain helical structures such as ribonucleic acid (RNA) and deoxyribonucleic acid (DNA). Five nucleobases—adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U)—are called primary or canonical. They function as the fundamental units of the genetic code, with the bases A, G, C, and T being found in DNA while A, G, C, and U are found in RNA. Thymine and uracil are distinguished by merely the presence or absence of a methyl group on the fifth carbon (C5) of these heterocyclic six-membered rings. In addition, some viruses have aminoadenine (Z) instead of adenine. It differs in having an extra amine group, creating a more stable bond to thymine.
A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA or RNA (GACU) molecule. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.
In biochemistry, a ribonucleotide is a nucleotide containing ribose as its pentose component. It is considered a molecular precursor of nucleic acids. Nucleotides are the basic building blocks of DNA and RNA. Ribonucleotides themselves are basic monomeric building blocks for RNA. Deoxyribonucleotides, formed by reducing ribonucleotides with the enzyme ribonucleotide reductase (RNR), are essential building blocks for DNA. There are several differences between DNA deoxyribonucleotides and RNA ribonucleotides. Successive nucleotides are linked together via phosphodiester bonds.
The history of molecular biology begins in the 1930s with the convergence of various, previously distinct biological and physical disciplines: biochemistry, genetics, microbiology, virology and physics. With the hope of understanding life at its most fundamental level, numerous physicists and chemists also took an interest in what would become molecular biology.
In molecular biology, G-quadruplex secondary structures (G4) are formed in nucleic acids by sequences that are rich in guanine. They are helical in shape and contain guanine tetrads that can form from one, two or four strands. The unimolecular forms often occur naturally near the ends of the chromosomes, better known as the telomeric regions, and in transcriptional regulatory regions of multiple genes, both in microbes and across vertebrates including oncogenes in humans. Four guanine bases can associate through Hoogsteen hydrogen bonding to form a square planar structure called a guanine tetrad, and two or more guanine tetrads can stack on top of each other to form a G-quadruplex.
Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.
Nucleic acid thermodynamics is the study of how temperature affects the nucleic acid structure of double-stranded DNA (dsDNA). The melting temperature (Tm) is defined as the temperature at which half of the DNA strands are in the random coil or single-stranded (ssDNA) state. Tm depends on the length of the DNA molecule and its specific nucleotide sequence. DNA, when in a state where its two strands are dissociated, is referred to as having been denatured by the high temperature.
Nucleic acid analogues are compounds which are analogous to naturally occurring RNA and DNA, used in medicine and in molecular biology research. Nucleic acids are chains of nucleotides, which are composed of three parts: a phosphate backbone, a pentose sugar, either ribose or deoxyribose, and one of four nucleobases. An analogue may have any of these altered. Typically the analogue nucleobases confer, among other things, different base pairing and base stacking properties. Examples include universal bases, which can pair with all four canonical bases, and phosphate-sugar backbone analogues such as PNA, which affect the properties of the chain . Nucleic acid analogues are also called Xeno Nucleic Acid and represent one of the main pillars of xenobiology, the design of new-to-nature forms of life based on alternative biochemistries.
1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide is a water-soluble carbodiimide usually handled as the hydrochloride. It is typically employed in the 4.0-6.0 pH range. It is generally used as a carboxyl activating agent for the coupling of primary amines to yield amide bonds. While other carbodiimides like dicyclohexylcarbodiimide (DCC) or diisopropylcarbodiimide (DIC) are also employed for this purpose, EDC has the advantage that the urea byproduct formed can be washed away from the amide product using dilute acid. Additionally, EDC can also be used to activate phosphate groups in order to form phosphomonoesters and phosphodiesters. Common uses for this carbodiimide include peptide synthesis, protein crosslinking to nucleic acids, but also in the preparation of immunoconjugates. EDC is often used in combination with N-hydroxysuccinimide (NHS) for the immobilisation of large biomolecules. Recent work has also used EDC to assess the structure state of uracil nucleobases in RNA.
Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.
Nucleic acid NMR is the use of nuclear magnetic resonance spectroscopy to obtain information about the structure and dynamics of nucleic acid molecules, such as DNA or RNA. It is useful for molecules of up to 100 nucleotides, and as of 2003, nearly half of all known RNA structures had been determined by NMR spectroscopy.
Xeno nucleic acids (XNA) are synthetic nucleic acid analogues that have a different sugar backbone than the natural nucleic acids DNA and RNA. As of 2011, at least six types of synthetic sugars have been shown to form nucleic acid backbones that can store and retrieve genetic information. Research is now being done to create synthetic polymerases to transform XNA. The study of its production and application has created a field known as xenobiology.
DNA base flipping, or nucleotide flipping, is a mechanism in which a single nucleotide base, or nucleobase, is rotated outside the nucleic acid double helix. This occurs when a nucleic acid-processing enzyme needs access to the base to perform work on it, such as its excision for replacement with another base during DNA repair. It was first observed in 1994 using X-ray crystallography in a methyltransferase enzyme catalyzing methylation of a cytosine base in DNA. Since then, it has been shown to be used by different enzymes in many biological processes such as DNA methylation, various DNA repair mechanisms, and DNA replication. It can also occur in RNA double helices or in the DNA:RNA intermediates formed during RNA transcription.
Ribose is a simple sugar and carbohydrate with molecular formula C5H10O5 and the linear-form composition H−(C=O)−(CHOH)4−H. The naturally-occurring form, d-ribose, is a component of the ribonucleotides from which RNA is built, and so this compound is necessary for coding, decoding, regulation and expression of genes. It has a structural analog, deoxyribose, which is a similarly essential component of DNA. l-ribose is an unnatural sugar that was first prepared by Emil Fischer and Oscar Piloty in 1891. It was not until 1909 that Phoebus Levene and Walter Jacobs recognised that d-ribose was a natural product, the enantiomer of Fischer and Piloty's product, and an essential component of nucleic acids. Fischer chose the name "ribose" as it is a partial rearrangement of the name of another sugar, arabinose, of which ribose is an epimer at the 2' carbon; both names also relate to gum arabic, from which arabinose was first isolated and from which they prepared l-ribose.