Cruciform DNA is a form of non-B DNA, or an alternative DNA structure. The formation of cruciform DNA requires the presence of palindromes called inverted repeat sequences. [1] These inverted repeats contain a sequence of DNA in one strand that is repeated in the opposite direction on the other strand. As a result, inverted repeats are self-complementary and can give rise to structures such as hairpins and cruciforms. Cruciform DNA structures require at least a six nucleotide sequence of inverted repeats to form a structure consisting of a stem, branch point and loop in the shape of a cruciform, stabilized by negative DNA supercoiling. [1] [2]
Two classes of cruciform DNA have been described: folded and unfolded. Folded cruciform structures are characterized by the formation of acute angles between adjacent arms and main strand DNA. Unfolded cruciform structures have square planar geometry and 4-fold symmetry in which the two arms of the cruciform are perpendicular to each other. [2] Two mechanisms for the formation of cruciform DNA have been described: C-type and S-type. [3] The formation of cruciform structures in linear DNA is thermodynamically unfavorable due to the possibility of base unstacking at junction points and open regions at loops. [2]
Cruciform DNA is found in both prokaryotes and eukaryotes and has a role in DNA transcription and DNA replication, double strand repair, DNA translocation and recombination. They also serve a function in epigenetic regulation along with biological implications such as DNA supercoiling, double strand breaks, and targets for cruciform-binding proteins. [4] [5] [6] Cruciform structures can increase genomic instability and are involved in the formation of various diseases, such as cancer and Werner's Disease. [7] [8] [9]
The first theoretical description of cruciform-forming DNA structures was hypothesized in the early 1960s. [10] Alfred Gierer was one of the first scientists to propose an interaction between proteins and the grooves of specific double-stranded DNA nucleotide sequences. [11] If inverted repeat sequences were present, then double-stranded DNA was speculated to form branches and loops. [11] Proteins were hypothesized to bind to these branched DNA structures and cause regulation in gene expression. [11] The binding association between proteins and branch-forming DNA was suggested due to the structure and function of tRNA. [11] As tRNA folds on itself in the presence of paired complementary bases, it causes the formation of branches and loops that are both key components in interactions with protein. Starting in the early 1980s, recognition sites of DNA that formed hairpin structures for a range of cellular proteins were characterized. [10]
The mechanism of cruciform extrusion occurs through the opening of double stranded DNA to allow for intrastrand base pairing. [12] The mechanism of this opening is classified into two types: C-type and S-type. C-type cruciform formation is marked by a large initial opening in the double-stranded DNA. This opening has several adenine and thymine nucleotides distal to the inverted repeat. [3] As the unwound section gets larger, both sides of the inverted repeat unwind and intrastrand base pairing occurs. This leads to the formation of a cruciform structure. C-type cruciform formation is temperature dependent because of higher entropy and enthalpy of activation than S-type. [3] Unlike C-type, S-type cruciform formation requires salt for extrusion. [3] It begins with a smaller unwound state of approximately ten base pairs at the center of the inverted repeat. [12] As intrastrand base pairing occurs, a protocruciform is formed. In a protocruciform, the stems of the structure are partially formed and not completely extruded. Therefore, a protocruciform is seen as an intermediate step before the final cruciform conformation produced. [3] As the unwound state becomes larger, the stems elongate through a process of called branch migration. [13] This eventually forms a fully extruded cruciform.
Cruciform formation is dependent on several factors including temperature, sodium, magnesium, and the presence of negatively supercoiled DNA. Like prior mentioned, the C-type mechanism of cruciform extrusion is temperature dependent; however, it has been observed that 37 °C is optimal for cruciform formation. [14] Additionally, the presence or absence of sodium and magnesium ions can affect the conformation of cruciform adopted. [14] At high sodium ion concentration and in the absence of magnesium ions, a compact, folded cruciform structure is formed. Here, the stems form acute angles with the main DNA strand instead of sharing 90° between them. [13] At lower sodium ion concentration and in the absence of magnesium ions, the cruciform adopts a symmetrical, square planar conformation with fully extended stems. [13] In the presence of magnesium ions and no sodium ions, a compact, folded conformation is adopted, similar to that formed at high sodium concentrations. The conformation formed here has symmetry, unlike the folded conformation formed at high sodium ion concentrations. [13] Lastly, the formation of cruciform DNA is kinetically unfavorable. When DNA is faced with significant stress, a negative supercoiled conformation is adopted. A negative supercoiled conformation is marked with fewer helical turns than relaxed DNA. The negatively supercoiled DNA helix becomes flexible when a cruciform structure forms and intrastrand base pairing occurs. As a result, formation of the cruciform structure becomes thermodynamically favorable when a negative supercoiled DNA domain is present. [13] [15]
Cruciform structures have been found to play a role in epigenetic regulation and other important biological implications. These biological implications range from affecting the supercoiling of DNA, causing double strand breaks in chromosomal DNA, and serving as targets for protein to bind to the DNA. [10] [5] [6] A multitude of cruciform-binding proteins have been found to interact with cruciform DNA structures that act as recognition signals and perform functions associated with transcription factors, DNA replication, and endonuclease activity. [10] [16] These cruciform-binding proteins bind to the base of the stem-loop structure near the four-way junction that is assumed in cruciform DNA structures. [17]
The 14-3-3 protein family has been known to interact with inverted repeat sequences that may form cruciform DNA while regulating the replication of DNA in eukaryotic cells. [10] [18] B-DNA can form transient structures of cruciform DNA that act as recognition signals near origins of replication in the DNA of these eukaryotic cells. [10] This association between the 14-3-3 protein family and inverted repeat sequences is found to occur at the beginning of S phase of the cell cycle. [10] The interaction between 14-3-3 proteins and cruciform DNA serve a role in origin firing which in turn will activate DNA helicase to begin the process of DNA replication. [10] [19] The 14-3-3 proteins dissociate after they assist in the initiation step of DNA replication. [10] [20]
The inverted repeat sequences that suggest cruciform structures, have been found to act as target sites where endonucleases can cleave. [21] [22] An endonuclease from organism Saccharomyces cerevisiae, Mus81-Mms4, has been found to interact with a protein labeled Crp1 that recognizes assumed cruciform structures. [22] Crp1 was separately identified as a cruciform-binding protein in S. cerevisiae because it had a high affinity to target synthetic inverted repeat sequences. [21] Moreover, in the presence of the Crp1 protein, endonuclease activity of Mus81-Mms4 increases. [22] This suggests inverted repeat sequences may enhance the activity of endonucleases like Mus81-Mms4 when bound to Crp1. [22]
Specific endonucleases like Endonuclease T7 and S1 have been found to recognize and cleave inverted repeat sequences within plasmids pVH51 and pBR322. [23] The inverted repeat sequences in these plasmids displayed nicks on the DNA strand which led to linearization of the plasmid. [23] Inverted repeat sequences were also observed in pLAT75 in vivo. [16] pLAT75 is derived from pBR322 (found in Escherichia coli) after it is transfected with colE1, an inverted repeat sequence. [16] In the presence of Endonuclease T7, pLAT75 adopted a linear structure after cleavage at the colE1 sequence site. [16]
Cruciform DNA structures are stabilized through supercoiling and their formation alleviates stress generated from DNA supercoiling. Cruciform structures block the recognition of the tet promoter in pX by RNA polymerase. The cruciform structures can also disrupt a step in the kinetic pathway, shown when gyrase is inhibited by novobiocin. Cruciform structures regulate transcription initiation [4] such as the suppression of pX transcription. DNA replication can then be inhibited by cruciform containing tertiary structures of DNA formed during recombination, [24] which can be studied to help treat malignancy. Recombination is also observed in Holliday junctions, a type of cruciform structure.
In bacterial plasmids, RuvA and RuvB repair DNA damage, and are involved in the recombination process of Holliday junctions. [24] These proteins are also responsible for regulating branch migration. During branch migration, the RuvAB complex helps to initiate recombination when it binds and unzips the Holliday junction, like DNA helicase, and also when the RuvAB/Holliday junction complex is cleaved, once RuvC binds to it.
Another example of cruciform structure significance is seen in the interaction between p53, a tumor suppressor, and cruciform forming sequences. p53 binding correlates with inverted repeat sequences, such as the ones that help form cruciform DNA structures. Under negative superhelical stress p53 binds preferentially to cruciform forming targets due to the A/T rich environment which feature these necessary inverted repeat sequences. [25]
Non B-DNA with high cruciform forming capacity is correlated with significantly higher rates of mutation compared to B-DNA. [26] These mutations include single base substitutions and insertions, but more often cruciform structures lead to deletion of genetic material. In the human genome, cruciform DNA structures are present in higher density within and surrounding chromosomal fragile sites, which are segments of DNA that experience replication stress and are more prone to breaking. Cruciform structures contribute to the instability, translocations, and deletions common in fragile sites by promoting double-stranded breaks. [7] [27] This occurs because inappropriate cruciform DNA is a potential target for endonuclease double-stranded cleavage, most often at loop ends. [28] Double-stranded breaks in DNA can trigger incorrect DNA repair, chromosomal translocations, and in severe cases, DNA degradation, which is lethal to the cell. Often, entire cruciform forming sequences are mistakenly cut out by DNA repair enzymes and degraded, which may disrupt cell functioning if the cruciform forming sequence was within a gene.
Additionally, cruciform DNA formation stalls replication and transcription when the strands are separated, which may trigger DNA repair enzymes to mistakenly add or delete base pairs. [27] [28] Replication and transcription stalling most often leads to deletions of the cruciform DNA sequence by repair enzymes, similar to the mechanism seen in chromosomal fragile sites. There is an increased risk for replication and transcription collision due to cruciform stalling, which further contributes to genomic instability. [28]
The high genomic instability of cruciform forming DNA sequences make them prone to mutations and deletions, some of which contribute to the development of cancer. Inappropriate cruciform structures are found more often in highly proliferative tissue and rapidly dividing cells, and thus play a role in the uncontrolled cell proliferation of tumorigenesis. [7] There are several cellular mechanisms in place to prevent genomic discrepancies caused by cruciform structures, but disruption of these processes can lead to malignancies. Architectural human oncoproteins, such as DEK, preferentially bind to cruciform structures during replication and transcription to prevent double-stranded breaks or erroneous DNA repair. [29] Malfunction in architectural oncoproteins, as observed in lung, breast, and other cancers as well as autoimmune disorders, leads to uncontrolled formation of cruciform DNA structures and promotion of double-stranded breaks. The BRCA1 protein, a tumor suppressor that functions in DNA repair, binds preferentially to cruciform structures. [30] Mutations in the BRCA1 gene or absence of functional BRCA1 protein contributes to breast, ovarian, and prostate cancer development. Inactivation of p53, a tumor suppressor protein that preferentially binds to cruciform structures, is responsible for over 50% of human tumor development. [31] The IFI16 protein modulates p53 functioning and inhibits cell proliferation in the RAS/RAF signaling pathway. IFI16 has a high binding affinity for cruciform structures, and mutations in the IFI16 gene have been linked to Kaposi sarcoma. [32]
While cruciform DNA structures are implicated in cancer development, the unique structure allows reliable transport of chemotherapy drugs. Cruciform DNA is currently being researched as a potential mechanism for cancer treatment, and targeted delivery of anticancer agents to tumorigenic cells by specially constructed cruciform DNA segments has shown efficacy in reducing tumor size in malignant lung, breast, and colon cancers. [33] [34]
Werner's syndrome is a genetic disorder that causes premature aging. Patients with Werner's syndrome lack a functional WRN protein, which is a part of the RecQ family of DNA helicases. Specifically, the WRN protein unwinds Holliday junctions, which are a subset of cruciform DNA structures, to prevent DNA replication stalling. [35] [8]
Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in reinforcing the DNA during cell division, preventing DNA damage, and regulating gene expression and DNA replication. During mitosis and meiosis, chromatin facilitates proper segregation of the chromosomes in anaphase; the characteristic shapes of chromosomes visible during this stage are the result of DNA being coiled into highly condensed chromatin.
An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.
DNA topoisomerases are enzymes that catalyze changes in the topological state of DNA, interconverting relaxed and supercoiled forms, linked (catenated) and unlinked species, and knotted and unknotted DNA. Topological issues in DNA arise due to the intertwined nature of its double-helical structure, which, for example, can lead to overwinding of the DNA duplex during DNA replication and transcription. If left unchanged, this torsion would eventually stop the DNA or RNA polymerases involved in these processes from continuing along the DNA helix. A second topological challenge results from the linking or tangling of DNA during replication. Left unresolved, links between replicated DNA will impede cell division. The DNA topoisomerases prevent and correct these types of topological problems. They do this by binding to DNA and cutting the sugar-phosphate backbone of either one or both of the DNA strands. This transient break allows the DNA to be untangled or unwound, and, at the end of these processes, the DNA backbone is resealed. Since the overall chemical composition and connectivity of the DNA do not change, the DNA substrate and product are chemical isomers, differing only in their topology.
In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have to be associated with a sequence motif; it can be represented by different and completely unrelated sequences in different proteins or RNA.
DnaA is a protein that activates initiation of DNA replication in bacteria. Based on the Replicon Model, a positively active initiator molecule contacts with a particular spot on a circular chromosome called the replicator to start DNA replication. It is a replication initiation factor which promotes the unwinding of DNA at oriC. The DnaA proteins found in all bacteria engage with the DnaA boxes to start chromosomal replication. In addition to the DnaA protein, its concentration, binding to DnaA-boxes, and binding of ATP or ADP, we will cover the regulation of the DnaA gene, the unique characteristics of the DnaA gene expression, promoter strength, and translation efficiency. The onset of the initiation phase of DNA replication is determined by the concentration of DnaA. DnaA accumulates during growth and then triggers the initiation of replication. Replication begins with active DnaA binding to 9-mer (9-bp) repeats upstream of oriC. Binding of DnaA leads to strand separation at the 13-mer repeats. This binding causes the DNA to loop in preparation for melting open by the helicase DnaB.
The nucleoid is an irregularly shaped region within the prokaryotic cell that contains all or most of the genetic material. The chromosome of a typical prokaryote is circular, and its length is very large compared to the cell dimensions, so it needs to be compacted in order to fit. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. Instead, the nucleoid forms by condensation and functional arrangement with the help of chromosomal architectural proteins and RNA molecules as well as DNA supercoiling. The length of a genome widely varies and a cell may contain multiple copies of it.
Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the helix winds to the left in a zigzag pattern, instead of to the right, like the more common B-DNA form. Z-DNA is thought to be one of three biologically active double-helical structures along with A-DNA and B-DNA.
Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
Cohesin is a protein complex that mediates sister chromatid cohesion, homologous recombination, and DNA looping. Cohesin is formed of SMC3, SMC1, SCC1 and SCC3. Cohesin holds sister chromatids together after DNA replication until anaphase when removal of cohesin leads to separation of sister chromatids. The complex forms a ring-like structure and it is believed that sister chromatids are held together by entrapment inside the cohesin ring. Cohesin is a member of the SMC family of protein complexes which includes Condensin, MukBEF and SMC-ScpAB.
A Holliday junction is a branched nucleic acid structure that contains four double-stranded arms joined. These arms may adopt one of several conformations depending on buffer salt concentrations and the sequence of nucleobases closest to the junction. The structure is named after Robin Holliday, the molecular biologist who proposed its existence in 1964.
A DNA unwinding element is the initiation site for the opening of the double helix structure of the DNA at the origin of replication for DNA synthesis. It is A-T rich and denatures easily due to its low helical stability, which allows the single-strand region to be recognized by origin recognition complex.
The Tn3 transposon is a 4957 base pair mobile genetic element, found in prokaryotes. It encodes three proteins:
Replication protein A (RPA) is the major protein that binds to single-stranded DNA (ssDNA) in eukaryotic cells. In vitro, RPA shows a much higher affinity for ssDNA than RNA or double-stranded DNA. RPA is required in replication, recombination and repair processes such as nucleotide excision repair and homologous recombination. It also plays roles in responding to damaged DNA.
Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.
MutS is a mismatch DNA repair protein, originally described in Escherichia coli.
In molecular biology, bacterial DNA binding proteins are a family of small, usually basic proteins of about 90 residues that bind DNA and are known as histone-like proteins. Since bacterial binding proteins have a diversity of functions, it has been difficult to develop a common function for all of them. They are commonly referred to as histone-like and have many similar traits with the eukaryotic histone proteins. Eukaryotic histones package DNA to help it to fit in the nucleus, and they are known to be the most conserved proteins in nature. Examples include the HU protein in Escherichia coli, a dimer of closely related alpha and beta chains and in other bacteria can be a dimer of identical chains. HU-type proteins have been found in a variety of bacteria and archaea, and are also encoded in the chloroplast genome of some algae. The integration host factor (IHF), a dimer of closely related chains which is suggested to function in genetic recombination as well as in translational and transcriptional control is found in Enterobacteria and viral proteins including the African swine fever virus protein A104R.
Cas9 is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic engineering applications. Its main function is to cut DNA and thereby alter a cell's genome. The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.
Rolling hairpin replication (RHR) is a unidirectional, strand displacement form of DNA replication used by parvoviruses, a group of viruses that constitute the family Parvoviridae. Parvoviruses have linear, single-stranded DNA (ssDNA) genomes in which the coding portion of the genome is flanked by telomeres at each end that form hairpin loops. During RHR, these hairpin loops repeatedly unfold and refold to change the direction of DNA replication so that replication progresses in a continuous manner back and forth across the genome. RHR is initiated and terminated by an endonuclease encoded by parvoviruses that is variously called NS1 or Rep, and RHR is similar to rolling circle replication, which is used by ssDNA viruses that have circular genomes.
Non-B DNA refers to DNA conformations that differ from the canonical B-DNA conformation, the most common form of DNA found in nature at neutral pH and physiological salt concentrations. Non-B DNA structures can arise due to various factors, including DNA sequence, length, supercoiling, and environmental conditions. Non-B DNA structures can have important biological roles, but they can also cause problems, such as genomic instability and disease.