DNA-binding domain

Last updated

A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. [1] Some DNA-binding domains may also include nucleic acids in their folded structure.

Contents

Function

Example of a DNA-binding domain in the context of a protein. The N-terminal DNA-binding domain (labeled) of Lac repressor is regulated by a C-terminal regulatory domain (labeled). The regulatory domain binds an allosteric effector molecule (green). The allosteric response of the protein is communicated from the regulatory domain to the DNA binding domain through the linker region. LacI Dimer Structure Annotated.png
Example of a DNA-binding domain in the context of a protein. The N-terminal DNA-binding domain (labeled) of Lac repressor is regulated by a C-terminal regulatory domain (labeled). The regulatory domain binds an allosteric effector molecule (green). The allosteric response of the protein is communicated from the regulatory domain to the DNA binding domain through the linker region.

One or more DNA-binding domains are often part of a larger protein consisting of further protein domains with differing function. The extra domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involves transcription regulation, with the two roles sometimes overlapping.[ citation needed ]

DNA-binding domains with functions involving DNA structure have biological roles in DNA replication, repair, storage, and modification, such as methylation.[ citation needed ]

Many proteins involved in the regulation of gene expression contain DNA-binding domains. For example, proteins that regulate transcription by binding DNA are called transcription factors. The final output of most cellular signaling cascades is gene regulation.[ citation needed ]

The DBD interacts with the nucleotides of DNA in a DNA sequence-specific or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort of molecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure of DNA). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting enzyme DNAse I cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNA structure, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting.[ citation needed ]

Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of transcription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, like restriction enzymes and telomerase. The hydrogen bonding pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for sequence-specific DNA recognition.[ citation needed ]

The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as gel electrophoresis, analytical ultracentrifugation, calorimetry, DNA mutation, protein structure mutation or modification, nuclear magnetic resonance, x-ray crystallography, surface plasmon resonance, electron paramagnetic resonance, cross-linking and microscale thermophoresis (MST).

DNA-binding protein in genomes

A large fraction of genes in each genome encodes DNA-binding proteins (see Table). However, only a rather small number of protein families are DNA-binding. For instance, more than 2000 of the ~20,000 human proteins are "DNA-binding", including about 750 Zinc-finger proteins. [3]

SpeciesDNA-binding proteins [4] DNA-binding families [4]
Arabidopsis thaliana (thale cress)4471300
Saccharomyces cerevisiae (yeast)720243
Caenorhabditis elegans (worm)2028271
Drosophila melanogaster (fruit fly)2620283

Types

DNA contacts of different types of DNA-binding domains Transcription factors DNA binding sites.svg
DNA contacts of different types of DNA-binding domains

Helix-turn-helix

Originally discovered in bacteria, the helix-turn-helix motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, the homeodomain comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes. [5]

Zinc finger

Crystallographic structure (PDB: 1R4O ) of a dimer of the zinc finger containing DBD of the glucocorticoid receptor (top) bound to DNA (bottom). Zinc atoms are represented by grey spheres and the coordinating cysteine sidechains are depicted as sticks. 1r4o.png
Crystallographic structure ( PDB: 1R4O ) of a dimer of the zinc finger containing DBD of the glucocorticoid receptor (top) bound to DNA (bottom). Zinc atoms are represented by grey spheres and the coordinating cysteine sidechains are depicted as sticks.

The zinc finger domain is mostly found in eukaryotes, but some examples have been found in bacteria. [6] The zinc finger domain is generally between 23 and 28 amino acids long and is stabilized by coordinating zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet. [7] In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.

Leucine zipper

The basic leucine zipper (bZIP) domain is found mainly in eukaryotes and to a limited extent in bacteria. The bZIP domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.

Winged helix

Consisting of about 110 amino acids, the winged helix (WH) domain has four helices and a two-strand beta-sheet.

Winged helix-turn-helix

The winged helix-turn-helix (wHTH) domain SCOP 46785 is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).

Helix-loop-helix

The basic helix-loop-helix (bHLH) domain is found in some transcription factors and is characterized by two alpha helices (α-helixes) connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.

HMG-box

HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. They also alter the flexibility of the DNA by inducing bends. [8] [9] The domain consists of three alpha helices separated by loops.

Wor3 domain

Wor3 domains, named after the White–Opaque Regulator 3 (Wor3) in Candida albicans arose more recently in evolutionary time than most previously described DNA-binding domains and are restricted to a small number of fungi. [10]

OB-fold domain

The OB-fold is a small structural motif originally named for its oligonucleotide/oligosaccharide binding properties. OB-fold domains range between 70 and 150 amino acids in length. [11] OB-folds bind single-stranded DNA, and hence are single-stranded binding proteins. [11]

OB-fold proteins have been identified as critical for DNA replication, DNA recombination, DNA repair, transcription, translation, cold shock response, and telomere maintenance. [12]

Unusual

Immunoglobulin fold

The immunoglobulin domain (InterPro :  IPR013783 ) consists of a beta-sheet structure with large connecting loops, which serve to recognize either DNA major grooves or antigens. Usually found in immunoglobulin proteins, they are also present in Stat proteins of the cytokine pathway. This is likely because the cytokine pathway evolved relatively recently and has made use of systems that were already functional, rather than creating its own.

B3 domain

The B3 DBD (InterPro :  IPR003340 , SCOP 117343 ) is found exclusively in transcription factors from higher plants and restriction endonucleases EcoRII and BfiI and typically consists of 100-120 residues. It includes seven beta sheets and two alpha helices, which form a DNA-binding pseudobarrel protein fold.

TAL effector

TAL effectors are found in bacterial plant pathogens of the genus Xanthomonas and are involved in regulating the genes of the host plant in order to facilitate bacterial virulence, proliferation, and dissemination. [13] They contain a central region of tandem 33-35 residue repeats and each repeat region encodes a single DNA base in the TALE's binding site. [14] [15] Within the repeat it is residue 13 alone that directly contacts the DNA base, determining sequence specificity, while other positions make contacts with the DNA backbone, stabilising the DNA-binding interaction. [16] Each repeat within the array takes the form of paired alpha-helices, while the whole repeat array forms a right-handed superhelix, wrapping around the DNA-double helix. TAL effector repeat arrays have been shown to contract upon DNA binding and a two-state search mechanism has been proposed whereby the elongated TALE begins to contract around the DNA beginning with a successful Thymine recognition from a unique repeat unit N-terminal of the core TAL-effector repeat array. [17] Related proteins are found in bacterial plant pathogen Ralstonia solanacearum, [18] the fungal endosymbiont Burkholderia rhizoxinica [19] and two as-yet unidentified marine-microorganisms. [20] The DNA binding code and the structure of the repeat array is conserved between these groups, referred to collectively as the TALE-likes.

See also

Related Research Articles

<span class="mw-page-title-main">Transcription factor</span> Protein that regulates the rate of DNA transcription

In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.

<span class="mw-page-title-main">Zinc finger</span> Small structural protein motif found mostly in transcriptional proteins

A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.

In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have to be associated with a sequence motif; it can be represented by different and completely unrelated sequences in different proteins or RNA.

<span class="mw-page-title-main">DNA-binding protein</span> Proteins that bind with DNA, such as transcription factors, polymerases, nucleases and histones

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.

<span class="mw-page-title-main">Helix-turn-helix</span> Structural motif capable of binding DNA

Helix-turn-helix is a DNA-binding domain (DBD). The helix-turn-helix (HTH) is a major structural motif capable of binding DNA. Each monomer incorporates two α helices, joined by a short strand of amino acids, that bind to the major groove of DNA. The HTH motif occurs in many proteins that regulate gene expression. It should not be confused with the helix–loop–helix motif.

<span class="mw-page-title-main">Leucine zipper</span> DNA-binding structural motif

A leucine zipper is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amino acid segment and the display of these amino acid sequences on an idealized alpha helix revealed a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns. The polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization.

Therapeutic gene modulation refers to the practice of altering the expression of a gene at one of various stages, with a view to alleviate some form of ailment. It differs from gene therapy in that gene modulation seeks to alter the expression of an endogenous gene whereas gene therapy concerns the introduction of a gene whose product aids the recipient directly.

<span class="mw-page-title-main">Artificial transcription factor</span>

Artificial transcription factors (ATFs) are engineered individual or multi molecule transcription factors that either activate or repress gene transcription (biology).

<i>Fok</i>I Restriction enzyme

The restriction endonuclease Fok1, naturally found in Flavobacterium okeanokoites, is a bacterial type IIS restriction endonuclease consisting of an N-terminal DNA-binding domain and a non sequence-specific DNA cleavage domain at the C-terminal. Once the protein is bound to duplex DNA via its DNA-binding domain at the 5'-GGATG-3' recognition site, the DNA cleavage domain is activated and cleaves the DNA at two locations, regardless of the nucleotide sequence at the cut site. The DNA is cut 9 nucleotides downstream of the motif on the forward strand, and 13 nucleotides downstream of the motif on the reverse strand, producing two sticky ends with 4-bp overhangs.

In a zinc finger protein, certain sequences of amino acid residues are able to recognise and bind to an extended target-site of four or even five nucleotides When this occurs in a ZFP in which the three-nucleotide subsites are contiguous, one zinc finger interferes with the target-site of the zinc finger adjacent to it, a situation known as target-site overlap. For example, a zinc finger containing arginine at position -1 and aspartic acid at position 2 along its alpha-helix will recognise an extended sequence of four nucleotides of the sequence 5'-NNG(G/T)-3'. The hydrogen bond between Asp2 and the N4 of either a cytosine or adenine base paired to the guanine or thymine, respectively defines these two nucleotides at the 3' position, defining a sequence that overlaps into the subsite of any zinc finger that may be attached N-terminally.

Zinc finger protein chimera are chimeric proteins composed of a DNA-binding zinc finger protein domain and another domain through which the protein exerts its effect. The effector domain may be a transcriptional activator (A) or repressor (R), a methylation domain (M) or a nuclease (N).

<span class="mw-page-title-main">B3 domain</span> DNA binding domain

The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.

<span class="mw-page-title-main">Pho4</span> Protein-coding gene in the species Saccharomyces cerevisiae S288c

Pho4 is a protein with a basic helix-loop-helix (bHLH) transcription factor. It is found in S. cerevisiae and other yeasts. It functions as a transcription factor to regulate phosphate responsive genes located in yeast cells. The Pho4 protein homodimer is able to do this by binding to DNA sequences containing the bHLH binding site 5'-CACGTG-3'. This sequence is found in the promoters of genes up-regulated in response to phosphate availability such as the PHO5 gene.

<span class="mw-page-title-main">Transcription activator-like effector</span>

TALeffectors are proteins secreted by some β- and γ-proteobacteria. Most of these are Xanthomonads. Plant pathogenic Xanthomonas bacteria are especially known for TALEs, produced via their type III secretion system. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. The TALE domain responsible for binding to DNA is known to have 1.5 to 33.5 short sequences that are repeated multiple times. Each of these repeats was found to be specific for a certain base pair of the DNA. These repeats also have repeat variable residues (RVD) that can detect specific DNA base pairs. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica, as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.

<span class="mw-page-title-main">Transcription activator-like effector nuclease</span> Enzymes that cleave DNA in specific ways

Transcription activator-like effector nucleases (TALEN) are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA-binding domain to a DNA cleavage domain. Transcription activator-like effectors (TALEs) can be engineered to bind to practically any desired DNA sequence, so when combined with a nuclease, DNA can be cut at specific locations. The restriction enzymes can be introduced into cells, for use in gene editing or for genome editing in situ, a technique known as genome editing with engineered nucleases. Alongside zinc finger nucleases and CRISPR/Cas9, TALEN is a prominent tool in the field of genome editing.

The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary literature. It includes four metazoan species, which are Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. RNA-binding domains included in this database are RNA recognition motif, K homology, CCCH zinc finger, and more domains. As of 2021, the latest RBPDB release includes 1,171 RNA-binding proteins.

In molecular biology, the BEN domain is a protein domain which is found in diverse proteins including:

<span class="mw-page-title-main">WRKY protein domain</span> Protein domain

The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) cis-regulatory element.

<span class="mw-page-title-main">PKNOX2</span> Protein-coding gene in the species Homo sapiens

PBX/Knotted 1 Homeobox 2 (PKNOX2) protein belongs to the three amino acid loop extension (TALE) class of homeodomain proteins, and is encoded by PKNOX2 gene in humans. The protein regulates the transcription of other genes and affects anatomical development.

<span class="mw-page-title-main">ZFP62</span> Gene in Humans

Zinc Finger Protein 62, also known as "ZNF62," "ZNF755," or "ZET," is a protein that in humans is encoded by the ZFP62 gene. ZFP62 is part of the C2H2 Zinc Finger family of genes.

References

  1. Lilley DM (1995). DNA-protein: structural interactions. Oxford: IRL Press at Oxford University Press. ISBN   0-19-963453-X.
  2. Swint-Kruse L, Matthews KS (April 2009). "Allostery in the LacI/GalR family: variations on a theme". Current Opinion in Microbiology. 12 (2): 129–37. doi:10.1016/j.mib.2009.01.009. PMC   2688824 . PMID   19269243.
  3. "reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640 in UniProtKB". www.uniprot.org. Retrieved 2017-10-25.
  4. 1 2 Malhotra S, Sowdhamini R (August 2013). "Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions". Nucleic Acids Research. 41 (15): 7212–9. doi:10.1093/nar/gkt505. PMC   3753632 . PMID   23775796.
  5. "HTH search at PROSITE". Expasy. Retrieved 2024-06-17.
  6. Malgieri G, Palmieri M, Russo L, Fattorusso R, Pedone PV, Isernia C (December 2015). "The prokaryotic zinc-finger: structure, function and comparison with the eukaryotic counterpart". The FEBS Journal. 282 (23): 4480–96. doi: 10.1111/febs.13503 . PMID   26365095.
  7. Pabo CO, Peisach E, Grant RA (2001). "Design and selection of novel Cys2His2 zinc finger proteins". Annual Review of Biochemistry. 70: 313–40. doi:10.1146/annurev.biochem.70.1.313. PMID   11395410.
  8. Murugesapillai D, et al. (2014). "DNA bridging and looping by HMO1 provides a mechanism for stabilizing nucleosome-free chromatin". Nucleic Acids Res. 42 (14): 8996–9004. doi:10.1093/nar/gku635. PMC   4132745 . PMID   25063301.
  9. Murugesapillai D, McCauley MJ, Maher LJ 3rd, Williams MC (2017). "Single-molecule studies of high-mobility group B architectural DNA bending proteins". Biophys Rev. 9 (1): 17–40. doi:10.1007/s12551-016-0236-4. PMC   5331113 . PMID   28303166.
  10. Lohse MB, Hernday AD, Fordyce PM, Noiman L, Sorrells TR, Hanson-Smith V, Nobile CJ, DeRisi JL, Johnson AD (May 2013). "Identification and characterization of a previously undescribed family of sequence-specific DNA-binding domains". Proceedings of the National Academy of Sciences of the United States of America. 110 (19): 7660–5. Bibcode:2013PNAS..110.7660L. doi: 10.1073/pnas.1221734110 . PMC   3651432 . PMID   23610392.
  11. 1 2 Flynn RL, Zou L (August 2010). "Oligonucleotide/oligosaccharide-binding fold proteins: a growing family of genome guardians". Critical Reviews in Biochemistry and Molecular Biology . 45 (4): 266–75. doi:10.3109/10409238.2010.488216. PMC   2906097 . PMID   20515430.
  12. Theobald DL, Mitton-Fry RM, Wuttke DS (2003). "Nucleic acid recognition by OB-fold proteins". Annual Review of Biophysics and Biomolecular Structure. 32: 115–33. doi:10.1146/annurev.biophys.32.110601.142506. PMC   1564333 . PMID   12598368.
  13. Boch J, Bonas U (2010). "Xanthomonas AvrBs3 family-type III effectors: discovery and function". Annual Review of Phytopathology. 48: 419–36. doi:10.1146/annurev-phyto-080508-081936. PMID   19400638.
  14. Moscou MJ, Bogdanove AJ (December 2009). "A simple cipher governs DNA recognition by TAL effectors". Science. 326 (5959): 1501. Bibcode:2009Sci...326.1501M. doi:10.1126/science.1178817. PMID   19933106. S2CID   6648530.
  15. Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U (December 2009). "Breaking the code of DNA binding specificity of TAL-type III effectors". Science. 326 (5959): 1509–12. Bibcode:2009Sci...326.1509B. doi:10.1126/science.1178811. PMID   19933107. S2CID   206522347.
  16. Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL (February 2012). "The crystal structure of TAL effector PthXo1 bound to its DNA target". Science. 335 (6069): 716–9. Bibcode:2012Sci...335..716M. doi:10.1126/science.1216211. PMC   3427646 . PMID   22223736.
  17. Cuculis L, Abil Z, Zhao H, Schroeder CM (June 2015). "Direct observation of TALE protein dynamics reveals a two-state search mechanism". Nature Communications. 6: 7277. Bibcode:2015NatCo...6.7277C. doi:10.1038/ncomms8277. PMC   4458887 . PMID   26027871.
  18. de Lange O, Schreiber T, Schandry N, Radeck J, Braun KH, Koszinowski J, Heuer H, Strauß A, Lahaye T (August 2013). "Breaking the DNA-binding code of Ralstonia solanacearum TAL effectors provides new possibilities to generate plant resistance genes against bacterial wilt disease". The New Phytologist. 199 (3): 773–86. doi: 10.1111/nph.12324 . PMID   23692030.
  19. Juillerat A, Bertonati C, Dubois G, Guyot V, Thomas S, Valton J, Beurdeley M, Silva GH, Daboussi F, Duchateau P (January 2014). "BurrH: a new modular DNA binding protein for genome engineering". Scientific Reports. 4: 3831. Bibcode:2014NatSR...4E3831J. doi:10.1038/srep03831. PMC   5379180 . PMID   24452192.
  20. de Lange O, Wolf C, Thiel P, Krüger J, Kleusch C, Kohlbacher O, Lahaye T (November 2015). "DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats". Nucleic Acids Research. 43 (20): 10065–80. doi:10.1093/nar/gkv1053. PMC   4787788 . PMID   26481363.
  21. Blanc-Mathieu, Romain; Dumas, Renaud; Turchi, Laura; Lucas, Jérémy; Parcy, François (July 2023). "Plant-TFClass: a structural classification for plant transcription factors". Trends in Plant Science. doi:10.1016/j.tplants.2023.06.023.