WRKY protein domain

Last updated
WRKY
PDB 1wj2 EBI.jpg
solution structure of the c-terminal wrky domain of atwrky4
Identifiers
SymbolWRKY
Pfam PF03106
Pfam clan CL0274
InterPro IPR003657
Available protein structures:
Pfam   structures / ECOD  
PDB RCSB PDB; PDBe; PDBj
PDBsum structure summary

The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. [1] The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. [2] The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) (and variants of this sequence) cis-regulatory element.

Contents

Structure

WRKY transcription factors contain either one or two WRKY protein domains. The WRKY protein domain is 60 to 70 amino acids long type of DNA binding domain. The domain is characterized by a highly conserved core WRKYGQK motif and a zinc finger region. The cysteine and histidine zinc finger domain occurs as a CX4-5CX22-23HXH or CX7CX23HXC type, where X can be any amino acid. [3] The zinc finger binds a Zn+2 ion, which is required for protein function. [4] While the WRKYGQK is highly conserved in most WRKY domains, variation in the core sequence has been documented. [2] [5] A frequently occurring variant of the core sequence is WRKYGKK, which is present in most plant species. [2] [3] [5] [6] [7]

The structure of the WRKY protein domain was first determined in 2005 using nuclear magnetic resonance (NMR) and later by crystallography. [4] [8] The WRKY protein domain is a globular shape composed of five anti-parallel β-strands. The core WRKYGQK motif is found on the second β-strand. [8] Eighteen amino acids are highly conserved in the WRKY protein domain, including the core motif, zinc-finger binding cysteines and histidines, and a triad forming a DWK salt bridge. [8] The triad consist of a conserved tryptophan (W) of the core motif, along with an aspartic acid (D) four amino acids upstream and a lysine (K) 29 amino acids downstream of it, stabilizing the entire domain. [8] Five amino acids on the third β-strand (PRSYY) are also well conserved in the WRKY domain. [8] Importantly, the WRKY genes contain a conserved intron in the WRKY domain, which occurs at the location encoding for the PR of the PRSYY amino acid sequence, [3] thus explaining the conservation of this motif.

WRKY-DNA Interaction

The WRKY domain forms a unique wedge-shaped structure that enters perpendicularly in the major groove of the DNA strand. [9] WRKY protein domains interact with the (T/A)TGAC(T/A) cis-element, also called the W-box. [1] [10] [11] Recent evidence suggests that the GAC core of the W-box is the primary target of the WRKY domain and flanking sequences help dictate DNA interaction with very specific WRKY proteins. [12] The RKYGQK residues of the core motif and additional arginine and lysine residues of the WRKY domain are responsible for interaction with the phosphate backbone of seven consecutive DNA base pairs, including the GAC core. [9] [12] Changing the tryptophan, tyrosine, or either lysine of the WRKYGQK motif to alanine completely abolishes DNA-binding, [8] [13] indicating these amino acids are essential for recognizing the W-box element. While not essential, altering the WRKYGQK motif arginine, glycine or glutamine to alanine reduces DNA-binding to the W-box. [8] [13] Overall, these complex WRKY protein domain-DNA interactions results in gene activation necessary for numerous aspects of plant development and defense.

Related Research Articles

<span class="mw-page-title-main">Zinc finger</span> Small structural protein motif found mostly in transcriptional proteins

A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.

<span class="mw-page-title-main">DNA-binding protein</span> Proteins that bind with DNA, such as transcription factors, polymerases, nucleases and histones

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.

<span class="mw-page-title-main">Helix-turn-helix</span> Structural motif capable of binding DNA

Helix-turn-helix is a DNA-binding protein (DBP). The helix-turn-helix (HTH) is a major structural motif capable of binding DNA. Each monomer incorporates two α helices, joined by a short strand of amino acids, that bind to the major groove of DNA. The HTH motif occurs in many proteins that regulate gene expression. It should not be confused with the helix–loop–helix motif.

In molecular biology, a CCAAT box is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 60–100 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.

<span class="mw-page-title-main">Leucine zipper</span> DNA-binding structural motif

A leucine zipper is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amino acid segment and the display of these amino acid sequences on an idealized alpha helix revealed a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns. The polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization.

A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.

<span class="mw-page-title-main">SV40 large T antigen</span> Proto-oncogene derived from polyomavirus SV40

SV40 large T antigen is a hexamer protein that is a dominant-acting oncoprotein derived from the polyomavirus SV40. TAg is capable of inducing malignant transformation of a variety of cell types. The transforming activity of TAg is due in large part to its perturbation of the retinoblastoma (pRb) and p53 tumor suppressor proteins. In addition, TAg binds to several other cellular factors, including the transcriptional co-activators p300 and CBP, which may contribute to its transformation function. Similar proteins from related viruses are known as large tumor antigen in general.

<span class="mw-page-title-main">Artificial transcription factor</span>

Artificial transcription factors (ATFs) are engineered individual or multi molecule transcription factors that either activate or repress gene transcription (biology).

The MADS box is a conserved sequence motif. The genes which contain this motif are called the MADS-box gene family. The MADS box encodes the DNA-binding MADS domain. The MADS domain binds to DNA sequences of high similarity to the motif CC[A/T]6GG termed the CArG-box. MADS-domain proteins are generally transcription factors. The length of the MADS-box reported by various researchers varies somewhat, but typical lengths are in the range of 168 to 180 base pairs, i.e. the encoded MADS domain has a length of 56 to 60 amino acids. There is evidence that the MADS domain evolved from a sequence stretch of a type II topoisomerase in a common ancestor of all extant eukaryotes.

<span class="mw-page-title-main">Transcription factor II B</span> Mammalian protein found in Homo sapiens

Transcription factor II B (TFIIB) is a general transcription factor that is involved in the formation of the RNA polymerase II preinitiation complex (PIC) and aids in stimulating transcription initiation. TFIIB is localised to the nucleus and provides a platform for PIC formation by binding and stabilising the DNA-TBP complex and by recruiting RNA polymerase II and other transcription factors. It is encoded by the TFIIB gene, and is homologous to archaeal transcription factor B and analogous to bacterial sigma factors.

The W box is a deoxyribonucleic acid (DNA) cis-regulatory element sequence, (T)TGAC(C/T), which is recognized by the family of WRKY transcription factors.

<span class="mw-page-title-main">Apetala 2</span> Protein in Arabidopsis

Apetala 2(AP2) is a gene and a member of a large family of transcription factors, the AP2/EREBP family. In Arabidopsis thaliana AP2 plays a role in the ABC model of flower development. It was originally thought that this family of proteins was plant-specific; however, recent studies have shown that apicomplexans, including the causative agent of malaria, Plasmodium falciparum encode a related set of transcription factors, called the ApiAP2 family.

<span class="mw-page-title-main">R.EcoRII</span> Restriction enzyme

Restriction endonuclease (REase) EcoRII is an enzyme of restriction modification system (RM) naturally found in Escherichia coli, a Gram-negative bacteria. Its molecular mass is 45.2 kDa, being composed of 402 amino acids.

<span class="mw-page-title-main">B3 domain</span> DNA binding domain

The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.

<span class="mw-page-title-main">Transcription activator-like effector</span>

TALeffectors are proteins secreted by some β- and γ-proteobacteria. Most of these are Xanthomonads. Plant pathogenic Xanthomonas bacteria are especially known for TALEs, produced via their type III secretion system. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. The TALE domain responsible for binding to DNA is known to have 1.5 to 33.5 short sequences that are repeated multiple times. Each of these repeats was found to be specific for a certain base pair of the DNA. These repeats also have repeat variable residues (RVD) that can detect specific DNA base pairs. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica, as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.

<span class="mw-page-title-main">Squamosa promoter binding protein</span> InterPro Family

The SQUAMOSA promoter binding protein-like family of transcription factors are defined by a plant-specific DNA-binding domain. The founding member of the family was identified based on its specific in vitro binding to the promoter of the snapdragon SQUAMOSA gene. SBP proteins are thought to be transcriptional activators.

<span class="mw-page-title-main">Archaeal transcription factor B</span> Protein family

Archaeal transcription factor B is a protein family of extrinsic transcription factors that guide the initiation of RNA transcription in organisms that fall under the domain of Archaea. It is homologous to eukaryotic TFIIB and, more distantly, to bacterial sigma factor. Like these proteins, it is involved in forming transcription preinitiation complexes. Its structure includes several conserved motifs which interact with DNA and other transcription factors, notably the single type of RNA polymerase that performs transcription in Archaea.

WRKY transcription factors are proteins that bind DNA. They are transcription factors that regulate many processes in plants and algae (Viridiplantae), such as the responses to biotic and abiotic stresses, senescence, seed dormancy and seed germination and some developmental processes but also contribute to secondary metabolism.

<span class="mw-page-title-main">PKNOX2</span> Protein-coding gene in the species Homo sapiens

PBX/Knotted 1 Homeobox 2 (PKNOX2) protein belongs to the three amino acid loop extension (TALE) class of homeodomain proteins, and is encoded by PKNOX2 gene in humans. The protein regulates the transcription of other genes and affects anatomical development.

References

  1. 1 2 Rushton PJ, Torres JT, Parniske M, Wernert P, Hahlbrock K, Somssich IE (October 1996). "Interaction of elicitor-induced DNA-binding proteins with elicitor response elements in the promoters of parsley PR1 genes". The EMBO Journal. 15 (20): 5690–700. doi:10.1002/j.1460-2075.1996.tb00953.x. PMC   452313 . PMID   8896462.
  2. 1 2 3 Schluttenhofer C, Yuan L (February 2015). "Regulation of specialized metabolism by WRKY transcription factors". Plant Physiology. 167 (2): 295–306. doi:10.1104/pp.114.251769. PMC   4326757 . PMID   25501946.
  3. 1 2 3 Eulgem T, Rushton PJ, Robatzek S, Somssich IE (May 2000). "The WRKY superfamily of plant transcription factors". Trends in Plant Science. 5 (5): 199–206. doi:10.1016/s1360-1385(00)01600-9. PMID   10785665.
  4. 1 2 Yamasaki K, Kigawa T, Inoue M, Tateno M, Yamasaki T, Yabuki T, et al. (March 2005). "Solution structure of an Arabidopsis WRKY DNA binding domain". The Plant Cell. 17 (3): 944–56. doi:10.1105/tpc.104.026435. PMC   1069710 . PMID   15705956.
  5. 1 2 Zhang Y, Wang L (January 2005). "The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants". BMC Evolutionary Biology. 5: 1. doi:10.1186/1471-2148-5-1. PMC   544883 . PMID   15629062.
  6. Song H, Wang P, Nan Z, Wang X (2014). "The WRKY Transcription Factor Genes in Lotus japonicus". International Journal of Genomics. 2014: 420128. doi: 10.1155/2014/420128 . PMC   3976811 . PMID   24745006.
  7. Xiong W, Xu X, Zhang L, Wu P, Chen Y, Li M, Jiang H, Wu G (July 2013). "Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.)". Gene. 524 (2): 124–32. doi:10.1016/j.gene.2013.04.047. PMID   23644253.
  8. 1 2 3 4 5 6 7 Duan MR, Nan J, Liang YH, Mao P, Lu L, Li L, Wei C, Lai L, Li Y, Su XD (2007). "DNA binding mechanism revealed by high resolution crystal structure of Arabidopsis thaliana WRKY1 protein". Nucleic Acids Research. 35 (4): 1145–54. doi:10.1093/nar/gkm001. PMC   1851648 . PMID   17264121.
  9. 1 2 Yamasaki K, Kigawa T, Watanabe S, Inoue M, Yamasaki T, Seki M, Shinozaki K, Yokoyama S (March 2012). "Structural basis for sequence-specific DNA recognition by an Arabidopsis WRKY transcription factor". The Journal of Biological Chemistry. 287 (10): 7683–91. doi: 10.1074/jbc.M111.279844 . PMC   3293589 . PMID   22219184.
  10. Eulgem T, Rushton PJ, Schmelzer E, Hahlbrock K, Somssich IE (September 1999). "Early nuclear events in plant defence signalling: rapid gene activation by WRKY transcription factors". The EMBO Journal. 18 (17): 4689–99. doi:10.1093/emboj/18.17.4689. PMC   1171542 . PMID   10469648.
  11. de Pater S, Greco V, Pham K, Memelink J, Kijne J (December 1996). "Characterization of a zinc-dependent transcriptional activator from Arabidopsis". Nucleic Acids Research. 24 (23): 4624–31. doi:10.1093/nar/24.23.4624. PMC   146317 . PMID   8972846.
  12. 1 2 Brand LH, Fischer NM, Harter K, Kohlbacher O, Wanke D (November 2013). "Elucidating the evolutionary conserved DNA-binding specificities of WRKY transcription factors by molecular dynamics and in vitro binding assays". Nucleic Acids Research. 41 (21): 9764–78. doi:10.1093/nar/gkt732. PMC   3834811 . PMID   23975197.
  13. 1 2 Maeo K, Hayashi S, Kojima-Suzuki H, Morikami A, Nakamura K (November 2001). "Role of conserved residues of the WRKY domain in the DNA-binding of tobacco WRKY family proteins". Bioscience, Biotechnology, and Biochemistry. 65 (11): 2428–36. doi: 10.1271/bbb.65.2428 . PMID   11791715. S2CID   22671192.