CITE-Seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. [1] So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry (a gold standard) by the groups that developed it. [2] It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.
The method was established by the New York Genome Center in collaboration with the Satija lab., [2] while a similar approach was earlier shown by AbVitro Inc..
Concurrent measurement of both protein and transcript levels opens up opportunities to use CITE-Seq in various biological areas, some of which were touched upon by the developers. For instance, it may be used to characterize tumor heterogeneity in different cancers, a major research field. [3] It also permits identifying rare subpopulations of cells as a high-throughput single-cell method and thus detect information otherwise lost with bulk methods. [3] It also may aid in tumor classification - for example, identification of novel subtypes. [3] All of the above are possible due to single-cell output of both protein and transcript data at the same time, also leading to novel information on protein-RNA correlation.
It also has potential in immunology. For example, it can be utilized for immune cell characterization – recent research on T-cells has investigated the ability of T cells to maintain an effector state. [4] Another study by one of CITE-Seq coauthors suggested CITE-Seq as a methods to look at the mechanisms of host-pathogen interactions. [5]
CITE-seq, like any other sequencing technique, has a wet lab portion, where the actual antibodies are prepared, cells stained, cDNA synthesized and RNA libraries are prepared that are further sequenced, and a dry lab portion for analysis of the sequencing data obtained. The most crucial part in the wet lab experiments is designing the antibody-oligonucleotide conjugates and titrating the amount of each conjugate that needs to be present in the pool to achieve a desired read-out and quantification.
The first step involves preparation of the antibody-oligo conjugates also known as Antibody-Derived Tags (ADTs). ADT preparation involves labeling an antibody directed against a cell surface protein of interest with oligonucleotides for barcoding the antibody.
Once you have the ADTs, the next step is to bind the cells with the desired ADT pool. The scRNA-seq libraries can be prepared using Drop-seq, 10X Genomics or ddSeq methods. In brief, ADT labelled cells are encapsulated within a droplet as single cells with DNA-barcoded microbeads. [6]
Within a droplet, the cells are next lysed to release both bound ADTs as well as mRNA. These then are converted to cDNA. Each DNA sequence on a microbead has a unique barcode thus indexing cDNA with cell barcodes. cDNA is prepared from both ADTs and cellular mRNAs.
In the next step, based on the developer's guidelines, cDNA is PCR-amplified and ADT cDNA and mRNA cDNA are separated based on size (generally, ADT-derived cDNAs are < 180bp and mRNA-derived cDNAs are > 300bp). [7] Each of the separated cDNA molecules is independently amplified and purified to prepare sequencing libraries. Finally, the independent libraries are pooled together and sequenced. Thus, proteomics and transcriptomics data can be obtained from a single sequencing run.
Analysis of single-cell sequencing presents many challenges, such as determining the best way to normalize the data. [8] Due to a new level of complications that arise from sequencing of both proteins and transcripts at a single-cell level, the developers of CITE-Seq and their collaborators are maintaining several tools to help with data analysis.
scRNA-Seq data analysis based on the developer's guidelines: [2] [9] The initial analysis steps are the same as in a standard scRNA-Seq experiment. Firstly, reads need to be aligned to a reference genome of a species of interest and cells with very low number of transcripts mapped to the reference are removed. Finally, a normalized count matrix with gene expression values is obtained.
ADT data analysis [2] [7] [10] [11] (based on the developer's guidelines): CITE-seq-Count is a Python package from CITE-Seq developers that can be used to obtain raw counts. Seurat package from Satija lab further allows combining of the protein and RNA counts and performing clustering on both measurements, as well as doing differential expression analysis between cell clusters of interest. ADT quantification needs to take into account the differences between the antibodies. Additionally, filtering may be required to reduce noise, similarly to scRNA-Seq analysis. But in contrast to RNA data, due to higher amounts of protein in a cell, there is less dropout.
The analyses may result in identification of novel cell clusters through such methods as PCA or tSNE, crucial genes responsible for a specific cell function and other new knowledge specific to a question of interest. In general, the results obtained with ADT counts substantially increase the amount of information obtained through single cell transcriptomics.
The applications of antibody-oligonucleotide conjugates have expanded beyond CITE-seq, and can be adapted for sample multiplexing as well as CRISPR screens.
Cell Hashing: New York Genome Center further adapted the use of their antibody-oligonucleotide conjugates to enable sample multiplexing for scRNA-seq. This technique called, Cell Hashing, [12] uses oligonucleotide-labelled antibodies against ubiquitously expressed cell surface proteins from a particular tissue sample. In this case, an oligonucleotide sequence contains a unique barcode which would be specific to cells from distinct samples. This sample-specific cell tagging allows pooling of the sequencing libraries prepared from different samples on a sequencing platform. Sequencing the antibody tags along with the cellular transcriptome helps identify a sample of origin for each analyzed cell. A unique barcode sequence used on the cell hashing antibody can be designed to be different from an antibody barcode present on the ADTs used in CITE-seq. This makes it possible to couple cell hashing with CITE-seq on a single sequencing run. [12] Cell hashing allows super-loading of the scRNA-seq platform, resulting in a lower cost of sequencing. It also enables detection of artifactual signals from multiplets, a major challenge in scRNA-seq. The cell hashing method has further been used by Gaublomme et al. to multiplex single-nucleus RNA-seq (snRNA-seq) by performing nucleus hashing. [13]
ECCITE-seq:Expanded CRISPR-compatible Cellular Indexing of Transcriptomes and Epitopes by sequencing or ECCITE-seq was developed to apply the use of CITE-seq to characterize multiple modalities from a single cell. By modifying the basic CITE-seq protocol to a 5' tag-based scRNA-seq assay, it can detect transcriptome, immune receptor clonotypes, surface markers, sample identity and single guide RNAs (sgRNAs) from each single cell. [14] The ability of ECCITE-seq to detect sgRNA molecules and measure their effect on gene expression levels opens a prospect of applying this technique in CRISPR screens.
Advantages: CITE-seq enables simultaneous analysis of the transcriptome as well as the proteome of single cells. Previous efforts of coupling index-sorting measurements from single cell sorts with scRNA-seq were limited to running a small sample size and were not compatible with multiplexing and massive parallel high-throughput sequencing. CITE-seq has been shown to be compatible with high-throughput microfluidic platforms like 10X Genomics and Drop-seq. It is also adaptable to micro/nano-well platforms. Coupling it with cell hashing enables the application of CITE-seq on bulk samples and sample multiplexing. These techniques work to reduce an overall cost of high-throughput sequencing on multiple samples. Lastly, CITE-seq can be adapted to detect small molecules, RNA interference, CRISPR, and other gene editing techniques.
Limitations: One of the limitations of CITE-Seq is a loss of location information. Due to the way the cells are treated, the spatial distribution of cells within a sample, as well as proteins within a cell is not known. [15] [9] In addition, this method shares the challenges of scRNA-Seq, such as high amount of noise and possible challenges in detecting lowly expressed genes. [9] In terms of phenotyping, optimization of the assay and antibodies also presents a potential problem if proteins of interest are not included in the currently available panels. [16] Moreover, right now CITE-Seq is not able to detect intracellular proteins. [16] With the current protocol, there are many challenges that would arise during the permeabilization step, thus limiting the technique to surface markers.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.
In situ hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA, RNA or modified nucleic acids strand to localize a specific DNA or RNA sequence in a portion or section of tissue or if the tissue is small enough, in the entire tissue, in cells, and in circulating tumor cells (CTCs). This is distinct from immunohistochemistry, which usually localizes proteins in tissue sections.
Cross-linking and immunoprecipitation is a method used in molecular biology that combines UV crosslinking with immunoprecipitation in order to identify RNA binding sites of proteins on a transcriptome-wide scale, thereby increasing our understanding of post-transcriptional regulatory networks. CLIP can be used either with antibodies against endogenous proteins, or with common peptide tags or affinity purification, which enables the possibility of profiling model organisms or RBPs otherwise lacking suitable antibodies.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
RNA-Seq is a technique that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
In the field of cellular biology, single-cell analysis and subcellular analysis is the study of genomics, transcriptomics, proteomics, metabolomics and cell–cell interactions at the single cell level. The concept of single-cell analysis originated in the 1970s. Before the discovery of heterogeneity, single-cell analysis mainly referred to the analysis or manipulation of an individual cell in a bulk population of cells at a particular condition using optical or electronic microscope. To date, due to the heterogeneity seen in both eukaryotic and prokaryotic cell populations, analyzing a single cell makes it possible to discover mechanisms not seen when studying a bulk population of cells. Technologies such as fluorescence-activated cell sorting (FACS) allow the precise isolation of selected single cells from complex samples, while high throughput single cell partitioning technologies, enable the simultaneous molecular analysis of hundreds or thousands of single unsorted cells; this is particularly useful for the analysis of transcriptome variation in genotypically identical cells, allowing the definition of otherwise undetectable cell subtypes. The development of new technologies is increasing our ability to analyze the genome and transcriptome of single cells, as well as to quantify their proteome and metabolome. Mass spectrometry techniques have become important analytical tools for proteomic and metabolomic analysis of single cells. Recent advances have enabled quantifying thousands of protein across hundreds of single cells, and thus make possible new types of analysis. In situ sequencing and fluorescence in situ hybridization (FISH) do not require that cells be isolated and are increasingly being used for analysis of tissues.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
A transcriptome in vivo analysis tag is a multifunctional, photoactivatable mRNA-capture molecule designed for isolating mRNA from a single cell in complex tissues.
In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.
Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Spatial transcriptomics is a method for assigning cell types to their locations in the histological sections and can also be used to determine subcellular localization of mRNA molecules. First described in 2016 by Ståhl et al., it has since undergone a variety of improvements and modifications.
CUT&Tag-sequencing, also known as cleavage under targets and tagmentation, is a method used to analyze protein interactions with DNA. CUT&Tag-sequencing combines antibody-targeted controlled cleavage by a protein A-Tn5 fusion with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN and CUT&Tag sequencing do not. CUT&Tag sequencing is an improvement over CUT&RUN because it does not require cells to be lysed or chromatin to be fractionated. CUT&RUN is not suitable for single-cell platforms so CUT&Tag is advantageous for these.
ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.
snRNA-seq, also known as single nucleus RNA sequencing, single nuclei RNA sequencing or sNuc-seq, is an RNA sequencing method for profiling gene expression in cells which are difficult to isolate, such as those from tissues that are archived or which are hard to be dissociated. It is an alternative to single cell RNA seq (scRNA-seq), as it analyzes nuclei instead of intact cells.
Deterministic Barcoding in Tissue for Spatial Omics Sequencing (DBiT-seq) was developed at Yale University by Rong Fan and colleagues in 2020 to create a multi-omics approach for studying spatial gene expression heterogenicity within a tissue sample. This method can used for the co-mapping mRNA and protein levels at a near single-cell resolution in fresh or frozen formaldehyde-fixed tissue samples. DBiT-seq utilizes next generation sequencing (NGS) and microfluidics. This method allows for simultaneous spatial transcriptomic and proteomic analysis of a tissue sample. DBiT-seq improves upon previous spatial transcriptomics applications such as High-Definition Spatial Transcriptomics (HDST) and Slide-seq by increasing the number of detectable genes per pixel, increased cellular resolution, and ease of implementation.