A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. [1] Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein [2] [3] followed soon thereafter by the Krüppel factor in Drosophila . [4] It often appears as a metal-binding domain in multi-domain proteins. [3]
Proteins that contain zinc fingers (zinc finger proteins) are classified into several different structural families. Unlike many other clearly defined supersecondary structures such as Greek keys or β hairpins, there are a number of types of zinc fingers, each with a unique three-dimensional architecture. A particular zinc finger protein's class is determined by its three-dimensional structure, but it can also be recognized based on the primary structure of the protein or the identity of the ligands coordinating the zinc ion. In spite of the large variety of these proteins, however, the vast majority typically function as interaction modules that bind DNA, RNA, proteins, or other small, useful molecules, and variations in structure serve primarily to alter the binding specificity of a particular protein.
Since their original discovery and the elucidation of their structure, these interaction modules have proven ubiquitous in the biological world and may be found in 3% of the genes of the human genome. [5] In addition, zinc fingers have become extremely useful in various therapeutic and research capacities. Engineering zinc fingers to have an affinity for a specific sequence is an area of active research, and zinc finger nucleases and zinc finger transcription factors are two of the most important applications of this to be realized to date.
Zinc fingers were first identified in a study of transcription in the African clawed frog, Xenopus laevis in the laboratory of Aaron Klug. A study of the transcription of a particular RNA sequence revealed that the binding strength of a small transcription factor (transcription factor IIIA; TFIIIA) was due to the presence of zinc-coordinating finger-like structures. [6] Amino acid sequencing of TFIIIA revealed nine tandem sequences of 30 amino acids, including two invariant pairs of cysteine and histidine residues. Extended x-ray absorption fine structure confirmed the identity of the zinc ligands: two cysteines and two histidines. [5] The DNA-binding loop formed by the coordination of these ligands by zinc were thought to resemble fingers, hence the name. [1] This was followed soon thereafter by the discovery of the Krüppel factor in Drosophila by the Schuh team in 1986. [4] More recent work in the characterization of proteins in various organisms has revealed the importance of zinc ions in polypeptide stabilization. [7] [8]
The crystal structures of zinc finger-DNA complexes solved in 1991 and 1993 revealed the canonical pattern of interactions of zinc fingers with DNA. [9] [10] The binding of zinc finger is found to be distinct from many other DNA-binding proteins that bind DNA through the 2-fold symmetry of the double helix, instead zinc fingers are linked linearly in tandem to bind nucleic acid sequences of varying lengths. [5] Zinc fingers often bind to a sequence of DNA known as the GC box. [11] The modular nature of the zinc finger motif allows for a large number of combinations of DNA and RNA sequences to be bound with high degree of affinity and specificity, and is therefore ideally suited for engineering protein that can be targeted to and bind specific DNA sequences. In 1994, it was shown that an artificially-constructed three-finger protein can block the expression of an oncogene in a mouse cell line. Zinc fingers fused to various other effector domains, some with therapeutic significance, have since been constructed. [5]
Such was its importance that "the zinc-finger motif" was cited in the Scientific Background to the 2024 Nobel Prize in Chemistry (awarded to David Baker, Demis Hassabis, and John M. Jumper for computational protein design and protein structure prediction). [12]
Zinc finger (Znf) domains are relatively small protein motifs that contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not, instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein, and/or lipid substrates. [13] [14] [15] [16] [17] Their binding properties depend on the amino acid sequence of the finger domains and on the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. Znf motifs occur in several unrelated protein superfamilies, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g., some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organization, epithelial development, cell adhesion, protein folding, chromatin remodeling, and zinc sensing, to name but a few. [18] Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.
Initially, the term zinc finger was used solely to describe DNA-binding motif found in Xenopus laevis; however, it is now used to refer to any number of structures related by their coordination of a zinc ion. In general, zinc fingers coordinate zinc ions with a combination of cysteine and histidine residues. Originally, the number and order of these residues was used to classify different types of zinc fingers ( e.g., Cys2His2, Cys4, and Cys6). More recently, a more systematic method has been used to classify zinc finger proteins instead. This method classifies zinc finger proteins into "fold groups" based on the overall shape of the protein backbone in the folded domain. The most common "fold groups" of zinc fingers are the Cys2His2-like (the "classic zinc finger"), treble clef, and zinc ribbon. [19]
The following table [19] shows the different structures and their key features:
Fold Group | Representative structure | Ligand placement |
---|---|---|
Cys2His2 | Two ligands from a knuckle and two more from the c terminus of a helix. | |
Gag knuckle | Two ligands from a knuckle and two more from a short helix or loop. | |
Treble clef | Two ligands from a knuckle and two more from the N-terminus of a helix. | |
Zinc ribbon | Two ligands each from two knuckles. | |
Zn2/Cys6 | Two ligands from the N terminus of a helix and two more from a loop. | |
TAZ2 domain like | Two ligands from the termini of two helices. |
Zinc finger, C2H2 type | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | zf-C2H2 | ||||||||
Pfam | PF00096 | ||||||||
Pfam clan | CL0361 | ||||||||
ECOD | 386.1.1 | ||||||||
InterPro | IPR007087 | ||||||||
PROSITE | PS00028 | ||||||||
|
The Cys2His2-like fold group (C2H2) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors. Such domains adopt a simple ββα fold and have the amino acid sequence motif: [20]
This class of zinc fingers can have a variety of functions such as binding RNA and mediating protein-protein interactions, but is best known for its role in sequence-specific DNA-binding proteins such as Zif268 (Egr1). In such proteins, individual zinc finger domains typically occur as tandem repeats with two, three, or more fingers comprising the DNA-binding domain of the protein. These tandem arrays can bind in the major groove of DNA and are typically spaced at 3-bp intervals. The α-helix of each domain (often called the "recognition helix") can make sequence-specific contacts to DNA bases; residues from a single recognition helix can contact four or more bases to yield an overlapping pattern of contacts with adjacent zinc fingers.
Zinc knuckle | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | zf-CCHC | ||||||||
Pfam | PF00098 | ||||||||
InterPro | IPR001878 | ||||||||
SMART | SM00343 | ||||||||
PROSITE | PS50158 | ||||||||
|
This fold group is defined by two short β-strands connected by a turn (zinc knuckle) followed by a short helix or loop and resembles the classical Cys2His2 motif with a large portion of the helix and β-hairpin truncated.
The retroviral nucleocapsid (NC) protein from HIV and other related retroviruses are examples of proteins possessing these motifs. The gag-knuckle zinc finger in the HIV NC protein is the target of a class of drugs known as zinc finger inhibitors.
The treble-clef motif consists of a β-hairpin at the N-terminus and an α-helix at the C-terminus that each contribute two ligands for zinc binding, although a loop and a second β-hairpin of varying length and conformation can be present between the N-terminal β-hairpin and the C-terminal α-helix. These fingers are present in a diverse group of proteins that frequently do not share sequence or functional similarity with each other. The best-characterized proteins containing treble-clef zinc fingers are the nuclear hormone receptors.
TFIIB zinc-binding | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | TF_Zn_Ribbon | ||||||||
Pfam | PF08271 | ||||||||
Pfam clan | CL0167 | ||||||||
ECOD | 375.1.1 | ||||||||
InterPro | IPR013137 | ||||||||
PROSITE | PS51134 | ||||||||
|
The zinc ribbon fold is characterised by two beta-hairpins forming two structurally similar zinc-binding sub-sites.
Fungal Zn(2)-Cys(6) binuclear cluster domain | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | Zn_clus | ||||||||
Pfam | PF00172 | ||||||||
InterPro | IPR001138 | ||||||||
SMART | GAL4 | ||||||||
PROSITE | PS00463 | ||||||||
CDD | cd00067 | ||||||||
|
The canonical members of this class contain a binuclear zinc cluster in which two zinc ions are bound by six cysteine residues. These zinc fingers can be found in several transcription factors including the yeast Gal4 protein.
zf-C2HC | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | zf-C2HC | ||||||||
Pfam | PF01530 | ||||||||
InterPro | IPR002515 | ||||||||
|
zf-C2HC5 | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | zf-C2HC5 | ||||||||
Pfam | PF06221 | ||||||||
InterPro | IPR009349 | ||||||||
|
The zinc finger antiviral protein (ZAP) binds to the CpG site. It is used in mammals for antiviral defense. [21] [22]
Various protein engineering techniques can be used to alter the DNA-binding specificity of zinc fingers [20] and tandem repeats of such engineered zinc fingers can be used to target desired genomic DNA sequences. [23] Fusing a second protein domain such as a transcriptional activator or repressor to an array of engineered zinc fingers that bind near the promoter of a given gene can be used to alter the transcription of that gene. [23] Fusions between engineered zinc finger arrays and protein domains that cleave or otherwise modify DNA can also be used to target those activities to desired genomic loci. [23] The most common applications for engineered zinc finger arrays include zinc finger transcription factors and zinc finger nucleases, but other applications have also been described. Typical engineered zinc finger arrays have between 3 and 6 individual zinc finger motifs and bind target sites ranging from 9 basepairs to 18 basepairs in length. Arrays with 6 zinc finger motifs are particularly attractive because they bind a target site that is long enough to have a good chance of being unique in a mammalian genome. [24]
Engineered zinc finger arrays are often fused to a DNA cleavage domain (usually the cleavage domain of FokI) to generate zinc finger nucleases. Such zinc finger-FokI fusions have become useful reagents for manipulating genomes of many higher organisms including Drosophila melanogaster , Caenorhabditis elegans , tobacco, corn, [25] zebrafish, [26] various types of mammalian cells, [27] and rats. [28] Targeting a double-strand break to a desired genomic locus can be used to introduce frame-shift mutations into the coding sequence of a gene due to the error-prone nature of the non-homologous DNA repair pathway. If a homologous DNA "donor sequence" is also used then the genomic locus can be converted to a defined sequence via the homology directed repair pathway. An ongoing clinical trial is evaluating Zinc finger nucleases that disrupt the CCR5 gene in CD4+ human T-cells as a potential treatment for HIV/AIDS. [29]
The majority of engineered zinc finger arrays are based on the zinc finger domain of the murine transcription factor Zif268, although some groups have used zinc finger arrays based on the human transcription factor SP1. Zif268 has three individual zinc finger motifs that collectively bind a 9 bp sequence with high affinity. [30] The structure of this protein bound to DNA was solved in 1991 [9] and stimulated a great deal of research into engineered zinc finger arrays. In 1994 and 1995, a number of groups used phage display to alter the specificity of a single zinc finger of Zif268. [31] [32] [33] [34] There are two main methods currently used to generate engineered zinc finger arrays, modular assembly, and a bacterial selection system, and there is some debate about which method is best suited for most applications. [35] [36]
The most straightforward method to generate new zinc finger arrays is to combine smaller zinc finger "modules" of known specificity. The structure of the zinc finger protein Zif268 bound to DNA described by Pavletich and Pabo in their 1991 publication has been key to much of this work and describes the concept of obtaining fingers for each of the 64 possible base pair triplets and then mixing and matching these fingers to design proteins with any desired sequence specificity. [9] The most common modular assembly process involves combining separate zinc fingers that can each recognize a 3-basepair DNA sequence to generate 3-finger, 4-, 5-, or 6-finger arrays that recognize target sites ranging from 9 basepairs to 18 basepairs in length. Another method uses 2-finger modules to generate zinc finger arrays with up to six individual zinc fingers. [25] The Barbas Laboratory of The Scripps Research Institute used phage display to develop and characterize zinc finger domains that recognize most DNA triplet sequences [37] [38] [39] while another group isolated and characterized individual fingers from the human genome. [40] A potential drawback with modular assembly in general is that specificities of individual zinc finger can overlap and can depend on the context of the surrounding zinc fingers and DNA. A recent study demonstrated that a high proportion of 3-finger zinc finger arrays generated by modular assembly fail to bind their intended target with sufficient affinity in a bacterial two-hybrid assay and fail to function as zinc finger nucleases, but the success rate was somewhat higher when sites of the form GNNGNNGNN were targeted. [41]
A subsequent study used modular assembly to generate zinc finger nucleases with both 3-finger arrays and 4-finger arrays and observed a much higher success rate with 4-finger arrays. [42] A variant of modular assembly that takes the context of neighboring fingers into account has also been reported and this method tends to yield proteins with improved performance relative to standard modular assembly. [43]
Numerous selection methods have been used to generate zinc finger arrays capable of targeting desired sequences. Initial selection efforts utilized phage display to select proteins that bound a given DNA target from a large pool of partially randomized zinc finger arrays. This technique is difficult to use on more than a single zinc finger at a time, so a multi-step process that generated a completely optimized 3-finger array by adding and optimizing a single zinc finger at a time was developed. [44] More recent efforts have utilized yeast one-hybrid systems, bacterial one-hybrid and two-hybrid systems, and mammalian cells. A promising new method to select novel 3-finger zinc finger arrays utilizes a bacterial two-hybrid system and has been dubbed "OPEN" by its creators. [45] This system combines pre-selected pools of individual zinc fingers that were each selected to bind a given triplet and then utilizes a second round of selection to obtain 3-finger arrays capable of binding a desired 9-bp sequence. This system was developed by the Zinc Finger Consortium as an alternative to commercial sources of engineered zinc finger arrays. It is somewhat difficult to directly compare the binding properties of proteins generated with this method to proteins generated by modular assembly as the specificity profiles of proteins generated by the OPEN method have never been reported.
This entry represents the CysCysHisCys (C2HC) type zinc finger domain found in eukaryotes. Proteins containing these domains include:
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.
Helix-turn-helix is a DNA-binding domain (DBD). The helix-turn-helix (HTH) is a major structural motif capable of binding DNA. Each monomer incorporates two α helices, joined by a short strand of amino acids, that bind to the major groove of DNA. The HTH motif occurs in many proteins that regulate gene expression. It should not be confused with the helix–loop–helix motif.
EGR-1 also known as ZNF268 or NGFI-A is a protein that in humans is encoded by the EGR1 gene.
RNA-binding proteins are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells express diverse RBPs with unique RNA-binding activity and protein–protein interaction. According to the Eukaryotic RBP Database (EuRBPDB), there are 2961 genes encoding RBPs in humans. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.It has now become clear that RNA–RBP interactions play important roles in many biological processes among organisms.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
Therapeutic gene modulation refers to the practice of altering the expression of a gene at one of various stages, with a view to alleviate some form of ailment. It differs from gene therapy in that gene modulation seeks to alter the expression of an endogenous gene whereas gene therapy concerns the introduction of a gene whose product aids the recipient directly.
Zinc-finger nucleases (ZFNs) are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain. Zinc finger domains can be engineered to target specific desired DNA sequences and this enables zinc-finger nucleases to target unique sequences within complex genomes. By taking advantage of endogenous DNA repair machinery, these reagents can be used to precisely alter the genomes of higher organisms. Alongside CRISPR/Cas9 and TALEN, ZFN is a prominent tool in the field of genome editing.
Artificial transcription factors (ATFs) are engineered individual or multi molecule transcription factors that either activate or repress gene transcription (biology).
The restriction endonuclease Fok1, naturally found in Flavobacterium okeanokoites, is a bacterial type IIS restriction endonuclease consisting of an N-terminal DNA-binding domain and a non sequence-specific DNA cleavage domain at the C-terminal. Once the protein is bound to duplex DNA via its DNA-binding domain at the 5'-GGATG-3' recognition site, the DNA cleavage domain is activated and cleaves the DNA at two locations, regardless of the nucleotide sequence at the cut site. The DNA is cut 9 nucleotides downstream of the motif on the forward strand, and 13 nucleotides downstream of the motif on the reverse strand, producing two sticky ends with 4-bp overhangs.
Zinc finger protein 40 is a protein that in humans is encoded by the HIVEP1 gene.
Zinc finger protein chimera are chimeric proteins composed of a DNA-binding zinc finger protein domain and another domain through which the protein exerts its effect. The effector domain may be a transcriptional activator (A) or repressor (R), a methylation domain (M) or a nuclease (N).
Chimeric nucleases are an example of engineered proteins which must comprise a DNA-binding domain to give sequence specificity and a nuclease domain for DNA cleavage.
TALeffectors are proteins secreted by some β- and γ-proteobacteria. Most of these are Xanthomonads. Plant pathogenic Xanthomonas bacteria are especially known for TALEs, produced via their type III secretion system. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. The TALE domain responsible for binding to DNA is known to have 1.5 to 33.5 short sequences that are repeated multiple times. Each of these repeats was found to be specific for a certain base pair of the DNA. These repeats also have repeat variable residues (RVD) that can detect specific DNA base pairs. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica, as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.
Transcription activator-like effector nucleases (TALEN) are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA-binding domain to a DNA cleavage domain. Transcription activator-like effectors (TALEs) can be engineered to bind to practically any desired DNA sequence, so when combined with a nuclease, DNA can be cut at specific locations. The restriction enzymes can be introduced into cells, for use in gene editing or for genome editing in situ, a technique known as genome editing with engineered nucleases. Alongside zinc finger nucleases and CRISPR/Cas9, TALEN is a prominent tool in the field of genome editing.
Genome editing, or genome engineering, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of a living organism. Unlike early genetic engineering techniques that randomly inserts genetic material into a host genome, genome editing targets the insertions to site-specific locations. The basic mechanism involved in genetic manipulations through programmable nucleases is the recognition of target genomic loci and binding of effector DNA-binding domain (DBD), double-strand breaks (DSBs) in target DNA by the restriction endonucleases, and the repair of DSBs through homology-directed recombination (HDR) or non-homologous end joining (NHEJ).
Zinc finger transcription factors or ZF-TFs, are transcription factors composed of a zinc finger-binding domain and any of a variety of transcription-factor effector-domains that exert their modulatory effect in the vicinity of any sequence to which the protein domain binds.
The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) cis-regulatory element.
Cas9 is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic engineering applications. Its main function is to cut DNA and thereby alter a cell's genome. The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.
Since antiretroviral therapy requires a lifelong treatment regimen, research to find more permanent cures for HIV infection is currently underway. It is possible to synthesize zinc finger nucleotides with zinc finger components that selectively bind to specific portions of DNA. Conceptually, targeting and editing could focus on host cellular co-receptors for HIV or on proviral HIV DNA.