Artificial transcription factors (ATFs) are engineered individual or multi molecule transcription factors that either activate or repress gene transcription (biology). [1]
ATFs often contain two main components linked together, a DNA-binding domain and a regulatory domain, also known as an effector domain or modulatory domain. [1] The DNA-binding domain targets a specific DNA sequence with high affinity, and the regulatory domain is responsible for activating or repressing the bound gene. [1] The ATF can directly regulate gene expression, can recruit proteins and other transcription factors to initiate transcription, or recruit proteins and other transcription factors to compact the DNA which inhibits RNA polymerase from binding and transcribing the DNA; an example of transcription factors up-regulating gene expression is displayed in figure 1 on the left. [1] [2] Because ATFs are composed of two separable components, the DNA-binding domain and the regulatory domain, the two domains are interchangeable, permitting the design of new ATFs from existing natural transcription factors. [1]
Some applications of ATFs include reprogramming cell state, cancer treatment, and a plausible treatment for Angelman Syndrome. [2] [3] [4]
The DNA-binding domain routes the ATF to a specific gene sequence. Natural DNA binding proteins are commonly used because of their high affinity for their DNA target sequence, however currently no algorithm that matches the protein amino-acid sequence to the complementary DNA binding sequence exists, limiting the rational design of new DNA-binding proteins. [1] Non-peptide, oligonucleotide, and polyamide DNA-binding domains have recently been explored which permit rational design. [1] The type of DNA binding domain chosen depends on the desired application of the ATF, common DNA-binding domains are presented in Types of ATF DNA-Binding Domains section below. [1] [2]
The regulatory domain is responsible for activating or repressing the bound gene and accomplishes this regulation by either directly regulating gene expression or recruiting other proteins and transcription factors to change transcription levels. [1] [2] One route to upregulate a gene is for the ATF to recruit proteins that loosen the DNA wrapping around histones allowing RNA polymerase to bind and transcribe the gene; likewise, compacting the DNA would downregulate gene expression by inhibiting RNA polymerase from binding. [1] Regulatory domains promoting gene transcription are usually acidic activators, composed of acidic and hydrophobic amino acids, and regulatory domains repressing gene transcription usually contain more basic amino acids. [1] Factors influencing the effect the ATF has on transcription include the distance the regulatory domain is from the transcription site, the cell type, and the number of activating or repressing sequences present in the regulatory domain. [1] Activating domains, regulatory domains that promote gene transcription, are often capable of upregulating transcription by 5 to 40-fold and RNA regulatory domains have been shown to result in 100 fold transcription levels. [1] An alternative strategy for repressing genes is for the ATF to out-compete natural transcriptions factors and physically block transcription by RNA polymerase; however, creating ATFs with higher affinity for the DNA sequence than the natural transcription factors remains a challenge. [1]
Linkers covalently or non-covalently link the DNA-binding domain and regulatory domain. [1] Frequently, peptide linkers are used, but polyethylene glycol and small molecules linkers also exist. [1] The linkers enable the DNA-binding domains and regulatory domains to be interchangeable allowing the design of new ATFs from natural transcription factor components. [1] Although linkers are less studied, the linker length is important because it alters the extent of impact the regulatory domain has on gene expression. [1]
Most ATFs have been constructed by exchanging existing DNA-binding domains and regulatory domains to generate ATFs with new targeting sites and transcription regulation consequences. [1] Designed DNA-binding domains, such as CRISPR-Cas, with new targeting capabilities are being explored to engineer higher specificity and control potential side effects. [2] In the future, ATFs which can respond to physiological cues, only change transcription levels in a specific cell type, and can easily be delivered without the use of electroporation are of great interest. [1]
The clustered regularly interspaced short palindromic repeats - Cas (CRISPR-Cas) system has been extensively studied to target a specific DNA sequence using a single guide RNA (sgRNA). [5] For ATF applications the CRISPR-Cas system is modified to inactivate the Cas enzyme's natural function and link a regulatory domain to the Cas enzyme. [2] The CRISPR-Cas system benefits from high specificity between the sgRNA and the target DNA sequence and the simplicity of designing new sgRNAs; however, the CRISPR-Cas system requires a PAM sequence directly upstream of the target DNA site and the large size of the Cas protein hinders delivery into the cell. [2]
Transcription activator-like effectors (TALEs) are peptide structures composed of repeating 34 amino acids long segments forming a peptide ranging in total length from 340 to 510 amino acids. [2] Each repeating segment folds into two alpha helices and amino acids at residue positions 12 and 13 in the repeating segment determines the DNA binding sequence. [2] The TALEs peptide has high specificity to the target DNA preventing secondary side effects, but this high specificity prevents the ATF from binding to multiple sites and requires a different ATF for each desired effect. [2]
Zinc fingers are naturally abundant, involved in multiple regulatory processes, and are common eukaryotic transcriptional factors. [6] Cis2/His2 zinc fingers have been extensively studied, are composed of 30 amino acids, can bind to non-palindromic sequences, and contain 3 to 4 critical amino acids at positions 1, 3, and 6 on the alpha helix which designate the complementary binding sequence. [4] [7] [8] Because zinc fingers are only 30 amino acids long they are easier to deliver, and multiple zinc fingers can be linked together to target larger DNA sequences with one ATF; however, connecting more than three zinc fingers together reduces each zinc finger’s specificity and increases off-site targeting. [2]
Directing cell differentiation and reprogramming cell fate have traditionally been achieved via a mixture of transcription factors. [9] The field gained significant interest once four transcription factors Oct4/Sox2/cMyc/Klf4 were found to reprogram cells from a differentiated state into an induced pluripotent stem cell state similar to embryonic stem cells. [10] Multiple ATFs composed of three zinc finger proteins linked together can each activate genes that eventually lead to the production of the Oct4 transcription factor in the cell, causing the cell to reprogram to an induced pluripotent state without the addition of external Oct4 transcription factors. [2] The change in cell state demonstrates that ATFs can replace traditional transcription factors in cell reprogramming. [2]
Angelman syndrome is a neurological development disorder caused by the deactivation of the maternal UBE3A gene. [3] Two potential treatment strategies using ATFs are to upregulate the expression of the maternal UBE3A gene or downregulate the expression of UBE3A-AS gene, the gene that causes repression of the paternal UBE3A gene. [3] Zinc finger ATF TAT-S1 acts as a strong repressor against the UBE3A-AS gene, and when administered to mice, resulted in increased Ube3a in the brain. [3]
Abnormal gene expression is regularly associated with cancer and uncontrolled tumor growth, making ATFs a promising therapeutic for cancer treatment. [4] By linking 6 zinc fingers together in an ATF, the ATF only binds to an 18 base pair sequence containing smaller subsequences complementary to each zinc finger in the ATF, so the ATF is more specific than one zinc finger which only targets a specific 3 to 4 base pair sequence. [4] ATFs linked to the KRAB repressor regulatory domain decreases cancer cells' drug resistance to chemotherapy, and ATFs linked to activator domains can upregulate Bax gene expression causing cell apoptosis; however, these treatments remain in the early stages because of inadequate delivery methods. [4]
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.
In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology.
A transcriptional activator is a protein that increases transcription of a gene or set of genes. Activators are considered to have positive control over gene expression, as they function to promote gene transcription and, in some cases, are required for the transcription of genes to occur. Most activators are DNA-binding proteins that bind to enhancers or promoter-proximal elements. The DNA site bound by the activator is referred to as an "activator-binding site". The part of the activator that makes protein–protein interactions with the general transcription machinery is referred to as an "activating region" or "activation domain".
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others.
EGR-1 also known as ZNF268 or NGFI-A is a protein that in humans is encoded by the EGR1 gene.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
In molecular genetics, the Krüppel-like family of transcription factors (KLFs) are a set of eukaryotic C2H2 zinc finger DNA-binding proteins that regulate gene expression. This family has been expanded to also include the Sp transcription factor and related proteins, forming the Sp/KLF family.
Therapeutic gene modulation refers to the practice of altering the expression of a gene at one of various stages, with a view to alleviate some form of ailment. It differs from gene therapy in that gene modulation seeks to alter the expression of an endogenous gene whereas gene therapy concerns the introduction of a gene whose product aids the recipient directly.
RAR-related orphan receptor alpha (RORα), also known as NR1F1 is a nuclear receptor that in humans is encoded by the RORA gene. RORα participates in the transcriptional regulation of some genes involved in circadian rhythm. In mice, RORα is essential for development of cerebellum through direct regulation of genes expressed in Purkinje cells. It also plays an essential role in the development of type 2 innate lymphoid cells (ILC2) and mutant animals are ILC2 deficient. In addition, although present in normal numbers, the ILC3 and Th17 cells from RORα deficient mice are defective for cytokine production.
In molecular biology, bacterial DNA binding proteins are a family of small, usually basic proteins of about 90 residues that bind DNA and are known as histone-like proteins. Since bacterial binding proteins have a diversity of functions, it has been difficult to develop a common function for all of them. They are commonly referred to as histone-like and have many similar traits with the eukaryotic histone proteins. Eukaryotic histones package DNA to help it to fit in the nucleus, and they are known to be the most conserved proteins in nature. Examples include the HU protein in Escherichia coli, a dimer of closely related alpha and beta chains and in other bacteria can be a dimer of identical chains. HU-type proteins have been found in a variety of bacteria and archaea, and are also encoded in the chloroplast genome of some algae. The integration host factor (IHF), a dimer of closely related chains which is suggested to function in genetic recombination as well as in translational and transcriptional control is found in Enterobacteria and viral proteins including the African swine fever virus protein A104R.
Genome editing, or genome engineering, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of a living organism. Unlike early genetic engineering techniques that randomly inserts genetic material into a host genome, genome editing targets the insertions to site specific locations.
Cas9 is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic engineering applications. Its main function is to cut DNA and thereby alter a cell's genome. The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.
Glis1 is gene encoding a Krüppel-like protein of the same name whose locus is found on Chromosome 1p32.3. The gene is enriched in unfertilised eggs and embryos at the one cell stage and it can be used to promote direct reprogramming of somatic cells to induced pluripotent stem cells, also known as iPS cells. Glis1 is a highly promiscuous transcription factor, regulating the expression of numerous genes, either positively or negatively. In organisms, Glis1 does not appear to have any directly important functions. Mice whose Glis1 gene has been removed have no noticeable change to their phenotype.
CRISPR interference (CRISPRi) is a genetic perturbation technique that allows for sequence-specific repression of gene expression in prokaryotic and eukaryotic cells. It was first developed by Stanley Qi and colleagues in the laboratories of Wendell Lim, Adam Arkin, Jonathan Weissman, and Jennifer Doudna. Sequence-specific activation of gene expression refers to CRISPR activation (CRISPRa).
Epigenome editing or Epigenome engineering is a type of genetic engineering in which the epigenome is modified at specific sites using engineered molecules targeted to those sites. Whereas gene editing involves changing the actual DNA sequence itself, epigenetic editing involves modifying and presenting DNA sequences to proteins and other DNA binding factors that influence DNA function. By "editing” epigenomic features in this manner, researchers can determine the exact biological role of an epigenetic modification at the site in question.
Zinc finger protein 226 is a protein that in humans is encoded by the ZNF226 gene.
CRISPR-Display (CRISP-Disp) is a modification of the CRISPR/Cas9 system for genome editing. The CRISPR/Cas9 system uses a short guide RNA (sgRNA) sequence to direct a Streptococcus pyogenes Cas9 nuclease, acting as a programmable DNA binding protein, to cleave DNA at a site of interest.
CRISPR activation (CRISPRa) is a type of CRISPR tool that uses modified versions of CRISPR effectors without endonuclease activity, with added transcriptional activators on dCas9 or the guide RNAs (gRNAs).
ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total. In addition, alternative splicing results in multiple transcript variants. The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations. While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875