WRKY | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | WRKY | ||||||||
Pfam | PF03106 | ||||||||
Pfam clan | CL0274 | ||||||||
InterPro | IPR003657 | ||||||||
|
The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. [1] The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. [2] The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) (and variants of this sequence) cis-regulatory element.
WRKY transcription factors contain either one or two WRKY protein domains. The WRKY protein domain is 60 to 70 amino acids long type of DNA binding domain. The domain is characterized by a highly conserved core WRKYGQK motif and a zinc finger region. The cysteine and histidine zinc finger domain occurs as a CX4-5CX22-23HXH or CX7CX23HXC type, where X can be any amino acid. [3] The zinc finger binds a Zn+2 ion, which is required for protein function. [4] While the WRKYGQK is highly conserved in most WRKY domains, variation in the core sequence has been documented. [2] [5] A frequently occurring variant of the core sequence is WRKYGKK, which is present in most plant species. [2] [3] [5] [6] [7]
The structure of the WRKY protein domain was first determined in 2005 using nuclear magnetic resonance (NMR) and later by crystallography. [4] [8] The WRKY protein domain is a globular shape composed of five anti-parallel β-strands. The core WRKYGQK motif is found on the second β-strand. [8] Eighteen amino acids are highly conserved in the WRKY protein domain, including the core motif, zinc-finger binding cysteines and histidines, and a triad forming a DWK salt bridge. [8] The triad consist of a conserved tryptophan (W) of the core motif, along with an aspartic acid (D) four amino acids upstream and a lysine (K) 29 amino acids downstream of it, stabilizing the entire domain. [8] Five amino acids on the third β-strand (PRSYY) are also well conserved in the WRKY domain. [8] Importantly, the WRKY genes contain a conserved intron in the WRKY domain, which occurs at the location encoding for the PR of the PRSYY amino acid sequence, [3] thus explaining the conservation of this motif.
The WRKY domain forms a unique wedge-shaped structure that enters perpendicularly in the major groove of the DNA strand. [9] WRKY protein domains interact with the (T/A)TGAC(T/A) cis-element, also called the W-box. [1] [10] [11] Recent evidence suggests that the GAC core of the W-box is the primary target of the WRKY domain and flanking sequences help dictate DNA interaction with very specific WRKY proteins. [12] The RKYGQK residues of the core motif and additional arginine and lysine residues of the WRKY domain are responsible for interaction with the phosphate backbone of seven consecutive DNA base pairs, including the GAC core. [9] [12] Changing the tryptophan, tyrosine, or either lysine of the WRKYGQK motif to alanine completely abolishes DNA-binding, [8] [13] indicating these amino acids are essential for recognizing the W-box element. While not essential, altering the WRKYGQK motif arginine, glycine or glutamine to alanine reduces DNA-binding to the W-box. [8] [13] Overall, these complex WRKY protein domain-DNA interactions results in gene activation necessary for numerous aspects of plant development and defense.
A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.
Helix-turn-helix is a DNA-binding protein (DBP). The helix-turn-helix (HTH) is a major structural motif capable of binding DNA. Each monomer incorporates two α helices, joined by a short strand of amino acids, that bind to the major groove of DNA. The HTH motif occurs in many proteins that regulate gene expression. It should not be confused with the helix–loop–helix motif.
In molecular biology, a CCAAT box is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 60–100 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.
A leucine zipper is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amino acid segment and the display of these amino acid sequences on an idealized alpha helix revealed a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns. The polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
SV40 large T antigen is a hexamer protein that is a dominant-acting oncoprotein derived from the polyomavirus SV40. TAg is capable of inducing malignant transformation of a variety of cell types. The transforming activity of TAg is due in large part to its perturbation of the retinoblastoma (pRb) and p53 tumor suppressor proteins. In addition, TAg binds to several other cellular factors, including the transcriptional co-activators p300 and CBP, which may contribute to its transformation function. Similar proteins from related viruses are known as large tumor antigen in general.
Artificial transcription factors (ATFs) are engineered individual or multi molecule transcription factors that either activate or repress gene transcription (biology).
The MADS box is a conserved sequence motif. The genes which contain this motif are called the MADS-box gene family. The MADS box encodes the DNA-binding MADS domain. The MADS domain binds to DNA sequences of high similarity to the motif CC[A/T]6GG termed the CArG-box. MADS-domain proteins are generally transcription factors. The length of the MADS-box reported by various researchers varies somewhat, but typical lengths are in the range of 168 to 180 base pairs, i.e. the encoded MADS domain has a length of 56 to 60 amino acids. There is evidence that the MADS domain evolved from a sequence stretch of a type II topoisomerase in a common ancestor of all extant eukaryotes.
Transcription factor II B (TFIIB) is a general transcription factor that is involved in the formation of the RNA polymerase II preinitiation complex (PIC) and aids in stimulating transcription initiation. TFIIB is localised to the nucleus and provides a platform for PIC formation by binding and stabilising the DNA-TBP complex and by recruiting RNA polymerase II and other transcription factors. It is encoded by the TFIIB gene, and is homologous to archaeal transcription factor B and analogous to bacterial sigma factors.
The W box is a deoxyribonucleic acid (DNA) cis-regulatory element sequence, (T)TGAC(C/T), which is recognized by the family of WRKY transcription factors.
Apetala 2(AP2) is a gene and a member of a large family of transcription factors, the AP2/EREBP family. In Arabidopsis thaliana AP2 plays a role in the ABC model of flower development. It was originally thought that this family of proteins was plant-specific; however, recent studies have shown that apicomplexans, including the causative agent of malaria, Plasmodium falciparum encode a related set of transcription factors, called the ApiAP2 family.
Restriction endonuclease (REase) EcoRII is an enzyme of restriction modification system (RM) naturally found in Escherichia coli, a Gram-negative bacteria. Its molecular mass is 45.2 kDa, being composed of 402 amino acids.
The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.
TALeffectors are proteins secreted by some β- and γ-proteobacteria. Most of these are Xanthomonads. Plant pathogenic Xanthomonas bacteria are especially known for TALEs, produced via their type III secretion system. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. The TALE domain responsible for binding to DNA is known to have 1.5 to 33.5 short sequences that are repeated multiple times. Each of these repeats was found to be specific for a certain base pair of the DNA. These repeats also have repeat variable residues (RVD) that can detect specific DNA base pairs. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica, as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.
The SQUAMOSA promoter binding protein-like family of transcription factors are defined by a plant-specific DNA-binding domain. The founding member of the family was identified based on its specific in vitro binding to the promoter of the snapdragon SQUAMOSA gene. SBP proteins are thought to be transcriptional activators.
Archaeal transcription factor B is a protein family of extrinsic transcription factors that guide the initiation of RNA transcription in organisms that fall under the domain of Archaea. It is homologous to eukaryotic TFIIB and, more distantly, to bacterial sigma factor. Like these proteins, it is involved in forming transcription preinitiation complexes. Its structure includes several conserved motifs which interact with DNA and other transcription factors, notably the single type of RNA polymerase that performs transcription in Archaea.
WRKY transcription factors are proteins that bind DNA. They are transcription factors that regulate many processes in plants and algae (Viridiplantae), such as the responses to biotic and abiotic stresses, senescence, seed dormancy and seed germination and some developmental processes but also contribute to secondary metabolism.
PBX/Knotted 1 Homeobox 2 (PKNOX2) protein belongs to the three amino acid loop extension (TALE) class of homeodomain proteins, and is encoded by PKNOX2 gene in humans. The protein regulates the transcription of other genes and affects anatomical development.