Genome editing of synthetic target arrays for lineage tracing (GESTALT) is a method used to determine the developmental lineages of cells in multicellular systems. [1] GESTALT involves introducing a small DNA barcode that contains regularly spaced CRISPR/Cas9 target sites into the genomes of progenitor cells. Alongside the barcode, Cas9 and sgRNA are introduced into the cells. Mutations in the barcode accumulate during the course of cell divisions and the unique combination of mutations in a cell's barcode can be determined by DNA or RNA sequencing to link it to a developmental lineage.
Fate mapping is the process of identifying the embryonic origins of adult tissues. Lineage tracing is more specific, encompassing methods which examine the progeny that arise from a single/few cells. [2] One of the first lineage tracing methods developed involved the injection of dyes into specific cells of an early embryo, thereby labeling them and their progeny at each cell division. [3] Later methods used retroviral labeling, employing retroviruses to introduce a marker gene like fluorescent protein or beta-galactosidase into the genomes of the cells of interest, resulting in constitutive expression of the marker in those cells and their progeny. [4] These methods have the drawback of being invasive, and relatively difficult in targeting which cells to label. [2] Currently, the most widely used approach involves cell labeling via genetic recombination systems. These methods use recombinases, the two main ones being the Cre-loxP and Flp-firt systems, which can delete segments of DNA flanked by the loxP and frt sites, respectively. [5] [6] In this method, a transgenic model is created that can express Cre recombinase and has a reporter gene with an upstream stop cassette flanked by loxP sites. Cre recombination deletes the STOP cassette upstream of a reporter gene, allowing for expression of the reporter. [7] Spatial control over the labeled cells is achieved by using specific Cre alleles under the control elements of a chosen marker gene, and temporal control can be obtained if inducible Cre alleles are used. [7] For example, CreERT only has active recombination activity upon administration of tamoxifen. [8] Although powerful, it requires significant optimization to facilitate single cell lineage tracing and is low throughput. [9] Sequencing-based methods of lineage tracing have begun to emerge as they provide significantly higher resolution and high-throughput tracing of cell fate. [9] [10] Early approaches leveraged naturally occurring somatic mutations to identify cell lineage relationships. [11]
GESTALT takes advantage of the CRISPR-Cas9 system, which allows for the targeting of double stranded breaks in DNA to highly specific sites adjacent to PAM motifs based on the sequence of the sgRNA. [12] These breaks are then repaired by one of the endogenous cellular DNA mechanisms: non-homologous end joining DNA repair, or homology-directed repair. [12] Non-homolgous end joining is the more active of the two repair pathways, resulting in indels occurring at the targeted site. [13] The GESTALT system uses an array of ten CRISPR/Cas9 targets, with the first site having perfect specificity to the designed sgRNA, and the other nine having less Cas9 activity due to mismatches with the sgRNA. [1] Introducing the CRISPR-Cas9 reagents to cells carrying this array will cause the accumulation of indels at potentially each target of the array, marking the cell with a unique barcode sequence that can be used to identify it and its progeny via DNA or RNA-sequencing. [1]
The target sequences are 23 bp long, including a protospacer and PAM sequence. The target sequences are placed in contiguous array, separated by 3 to 5 bp linker sequences. Each target sequence must be screened against the genome of the host organism to ensure the specificity of the target sequences. Cas9 activity at each target site can be assessed using the GUIDE-seq assay. [14]
Two separate methods of introducing barcode arrays into the genomes of cells are used. The first method transduces progenitor cells with a lentivirus construct containing the barcode array inserted into the 3'-UTR of EGFP. This results in the incorporation of the barcode array into the genome and marks barcoded cells through stable expression of EGFP. A second method involves creating transgenic animal lines; the transgenic model has previously been generated using a Tol2 transgenesis vector which contains a barcode array cloned into the 3' UTR of DsRed under control of the ubiquitin promoter. [15] [ additional citation(s) needed ]
Initiation of barcode editing and labeling of cells is done by introducing the Cas9 protein and sgRNAs into progenitor cells. The CRISPR-Cas9 complex randomly produces double-stranded breaks in the barcode regions and subsequent NHEJ repair introduces random indels, resulting in a unique DNA sequence at the barcode region in each cell at time of labeling. There are multiple methods of delivering the CRISPR-Cas9 reagents into cells and it is an active field of research. [16] CRISPR-Cas9 reagents can be introduced into cells via transfection using lipid nanoparticles. [16] Alternatively, microinjection of the CRISPR-Cas9 reagents can be performed on 1-cell embryos. [17] The delivery of CRISPR-Cas9 reagents can be done at different developmental times to change the labeled populations. Barcode editing may persist for several hours after delivery. [1]
Following delivery of the CRISPR-Cas9 reagents, time is allowed for barcode editing and further development to occur, resulting in the expansion of the labeled populations and the unique marking of their progeny. Genomic DNA or RNA can then be extracted from the progeny cells or tissues of interest and the barcodes can be PCR-amplified. Unique molecular identifiers are used to correct for PCR bias and each UMI-barcode combo is therefore from a single cell. All barcode alleles can then be sequenced via NGS and the entire set of identified alleles can be subjected to phylogenetic analysis, identifying cell lineage based on barcode similarity. To control for sequencing error, only indels can be considered as most sequencing errors inherent to next-generation sequencing are base substitutions. [18] [1]
Single cell GESTALT (scGESTALT) adds upon the GESTALT system by integrating simultaneous capture of barcode and transcriptome information using scRNA-seq. [19] In scGESTALT, the barcode is cloned into progenitor cells of interest downstream of an inducible promoter. When the developmental period is complete, expression of the barcode will be induced and the barcode mRNA will be sequenced alongside the rest of the transcriptome using scRNA-seq. [19] The transcriptomic data can be used to track cell type differentiation while the barcodes can be used to create developmental relationships with other cells. An additional improvement is the ability to induce labeling at two different time points. This is enabled through the cloning of the Cas9/sgRNA under a heat shock promoter; the first labeling event is induced via microinjection like traditional GESTALT, while a subsequent second labeling period is initiated by heat shock-induced expression of Cas9 and sgRNAs. [19] This enables lineage tracing during later stages of development, beyond what is possible with GESTALT.
GESTALT was initially developed to examine the contributions of embryonic progenitors to the adult organ systems of zebrafish. [1] By sequencing the barcodes from bulk extractions of organ systems, each organ was found to possess only a small number of the barcode alleles, indicating that organs arise from the clonal expansion of a small number of early progenitors. [1] The lineage information of thousands of differentiated cells was captured in the experiment and demonstrated the high-throughput lineage tracing capabilities of GESTALT. [1]
scGESTALT has been used to refine the lineage tree of the zebrafish brain. [19] The existence of multipotent progenitors which give rise to cells that migrate across the brain was discovered following a scGESTALT experiment where some barcode sequences were captured in cell populations in the forebrain, midbrain, and the hindbrain. [19] Pseudotime trajectories generated using the scRNA-seq data for oligodendrocyte progenitors to oligodendrocytes as well as atoh1c+ progenitors to pax6b+ neurons were found to be consistent with the barcode distribution across those cell types. [19]
Gene knockouts are a widely used genetic engineering technique that involves the targeted removal or inactivation of a specific gene within an organism's genome. This can be done through a variety of methods, including homologous recombination, CRISPR-Cas9, and TALENs.
Guide RNA (gRNA) or single guide RNA (sgRNA) is a short sequence of RNA that functions as a guide for the Cas9-endonuclease or other Cas-proteins that cut the double-stranded DNA and thereby can be used for gene editing. In bacteria and archaea, gRNAs are a part of the CRISPR-Cas system that serves as an adaptive immune defense that protects the organism from viruses. Here the short gRNAs serve as detectors of foreign DNA and direct the Cas-enzymes that degrades the foreign nucleic acid.
In Molecular biology, an insert is a piece of DNA that is inserted into a larger DNA vector by a recombinant DNA technique, such as ligation or recombination. This allows it to be multiplied, selected, further manipulated or expressed in a host organism.
Genome editing, or genome engineering, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of a living organism. Unlike early genetic engineering techniques that randomly inserts genetic material into a host genome, genome editing targets the insertions to site-specific locations. The basic mechanism involved in genetic manipulations through programmable nucleases is the recognition of target genomic loci and binding of effector DNA-binding domain (DBD), double-strand breaks (DSBs) in target DNA by the restriction endonucleases, and the repair of DSBs through homology-directed recombination (HDR) or non-homologous end joining (NHEJ).
Cas9 is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic engineering applications. Its main function is to cut DNA and thereby alter a cell's genome. The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.
CRISPR interference (CRISPRi) is a genetic perturbation technique that allows for sequence-specific repression of gene expression in prokaryotic and eukaryotic cells. It was first developed by Stanley Qi and colleagues in the laboratories of Wendell Lim, Adam Arkin, Jonathan Weissman, and Jennifer Doudna. Sequence-specific activation of gene expression refers to CRISPR activation (CRISPRa).
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Epigenome editing or epigenome engineering is a type of genetic engineering in which the epigenome is modified at specific sites using engineered molecules targeted to those sites. Whereas gene editing involves changing the actual DNA sequence itself, epigenetic editing involves modifying and presenting DNA sequences to proteins and other DNA binding factors that influence DNA function. By "editing” epigenomic features in this manner, researchers can determine the exact biological role of an epigenetic modification at the site in question.
Cell lineage denotes the developmental history of a tissue or organ from the fertilized egg. This is based on the tracking of an organism's cellular ancestry due to the cell divisions and relocation as time progresses, this starts with the originator cells and finishing with a mature cell that can no longer divide.
A protospacer adjacent motif (PAM) is a 2–6-base pair DNA sequence immediately following the DNA sequence targeted by the Cas9 nuclease in the CRISPR bacterial adaptive immune system. The PAM is a component of the invading virus or plasmid, but is not found in the bacterial host genome and hence is not a component of the bacterial CRISPR locus. Cas9 will not successfully bind to or cleave the target DNA sequence if it is not followed by the PAM sequence. PAM is an essential targeting component which distinguishes bacterial self from non-self DNA, thereby preventing the CRISPR locus from being targeted and destroyed by the CRISPR-associated nuclease.
Cas12a is a subtype of Cas12 proteins and an RNA-guided endonuclease that forms part of the CRISPR system in some bacteria and archaea. It originates as part of a bacterial immune mechanism, where it serves to destroy the genetic material of viruses and thus protect the cell and colony from viral infection. Cas12a and other CRISPR associated endonucleases use an RNA to target nucleic acid in a specific and programmable matter. In the organisms from which it originates, this guide RNA is a copy of a piece of foreign nucleic acid that previously infected the cell.
No-SCAR genome editing is an editing method that is able to manipulate the Escherichia coli genome. The system relies on recombineering whereby DNA sequences are combined and manipulated through homologous recombination. No-SCAR is able to manipulate the E. coli genome without the use of the chromosomal markers detailed in previous recombineering methods. Instead, the λ-Red recombination system facilitates donor DNA integration while Cas9 cleaves double-stranded DNA to counter-select against wild-type cells. Although λ-Red and Cas9 genome editing are widely used technologies, the no-SCAR method is novel in combining the two functions; this technique is able to establish point mutations, gene deletions, and short sequence insertions in several genomic loci with increased efficiency and time sensitivity.
Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.
CRISPR-Display (CRISP-Disp) is a modification of the CRISPR/Cas9 system for genome editing. The CRISPR/Cas9 system uses a short guide RNA (sgRNA) sequence to direct a Streptococcus pyogenes Cas9 nuclease, acting as a programmable DNA binding protein, to cleave DNA at a site of interest.
CRISPR activation (CRISPRa) is a type of CRISPR tool that uses modified versions of CRISPR effectors without endonuclease activity, with added transcriptional activators on dCas9 or the guide RNAs (gRNAs).
Off-target genome editing refers to nonspecific and unintended genetic modifications that can arise through the use of engineered nuclease technologies such as: clustered, regularly interspaced, short palindromic repeats (CRISPR)-Cas9, transcription activator-like effector nucleases (TALEN), meganucleases, and zinc finger nucleases (ZFN). These tools use different mechanisms to bind a predetermined sequence of DNA (“target”), which they cleave, creating a double-stranded chromosomal break (DSB) that summons the cell's DNA repair mechanisms and leads to site-specific modifications. If these complexes do not bind at the target, often a result of homologous sequences and/or mismatch tolerance, they will cleave off-target DSB and cause non-specific genetic modifications. Specifically, off-target effects consist of unintended point mutations, deletions, insertions inversions, and translocations.
CRISPR gene editing is a genetic engineering technique in molecular biology by which the genomes of living organisms may be modified. It is based on a simplified version of the bacterial CRISPR-Cas9 antiviral defense system. By delivering the Cas9 nuclease complexed with a synthetic guide RNA (gRNA) into a cell, the cell's genome can be cut at a desired location, allowing existing genes to be removed and/or new ones added in vivo.
CITE-Seq is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.
Prime editing is a 'search-and-replace' genome editing technology in molecular biology by which the genome of living organisms may be modified. The technology directly writes new genetic information into a targeted DNA site. It uses a fusion protein, consisting of a catalytically impaired Cas9 endonuclease fused to an engineered reverse transcriptase enzyme, and a prime editing guide RNA (pegRNA), capable of identifying the target site and providing the new genetic information to replace the target DNA nucleotides. It mediates targeted insertions, deletions, and base-to-base conversions without the need for double strand breaks (DSBs) or donor DNA templates.
Genome-wide CRISPR-Cas9 knockout screens aim to elucidate the relationship between genotype and phenotype by ablating gene expression on a genome-wide scale and studying the resulting phenotypic alterations. The approach utilises the CRISPR-Cas9 gene editing system, coupled with libraries of single guide RNAs (sgRNAs), which are designed to target every gene in the genome. Over recent years, the genome-wide CRISPR screen has emerged as a powerful tool for performing large-scale loss-of-function screens, with low noise, high knockout efficiency and minimal off-target effects.