CITE-Seq

Last updated

CITE-Seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. [1] So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry (a gold standard) by the groups that developed it. [2] It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.

Contents

The method was established by the New York Genome Center in collaboration with the Satija lab., [2] while a similar approach was earlier shown by AbVitro Inc..

Applications

Concurrent measurement of both protein and transcript levels opens up opportunities to use CITE-Seq in various biological areas, some of which were touched upon by the developers. For instance, it may be used to characterize tumor heterogeneity in different cancers, a major research field. [3] It also permits identifying rare subpopulations of cells as a high-throughput single-cell method and thus detect information otherwise lost with bulk methods. [3] It also may aid in tumor classification - for example, identification of novel subtypes. [3] All of the above are possible due to single-cell output of both protein and transcript data at the same time, also leading to novel information on protein-RNA correlation.

It also has potential in immunology. For example, it can be utilized for immune cell characterization – recent research on T-cells has investigated the ability of T cells to maintain an effector state. [4] Another study by one of CITE-Seq coauthors suggested CITE-Seq as a methods to look at the mechanisms of host-pathogen interactions. [5]

Workflow

CITE-seq, like any other sequencing technique, has a wet lab portion, where the actual antibodies are prepared, cells stained, cDNA synthesized and RNA libraries are prepared that are further sequenced, and a dry lab portion for analysis of the sequencing data obtained. The most crucial part in the wet lab experiments is designing the antibody-oligonucleotide conjugates and titrating the amount of each conjugate that needs to be present in the pool to achieve a desired read-out and quantification.

Schematic of the wet lab workflow for CITE-Seq Structure of ADT& Wetlab workflow.jpg
Schematic of the wet lab workflow for CITE-Seq

Wet lab workflow

The first step involves preparation of the antibody-oligo conjugates also known as Antibody-Derived Tags (ADTs). ADT preparation involves labeling an antibody directed against a cell surface protein of interest with oligonucleotides for barcoding the antibody.

Once you have the ADTs, the next step is to bind the cells with the desired ADT pool. The scRNA-seq libraries can be prepared using Drop-seq, 10X Genomics or ddSeq methods. In brief, ADT labelled cells are encapsulated within a droplet as single cells with DNA-barcoded microbeads. [6]

Within a droplet, the cells are next lysed to release both bound ADTs as well as mRNA. These then are converted to cDNA. Each DNA sequence on a microbead has a unique barcode thus indexing cDNA with cell barcodes. cDNA is prepared from both ADTs and cellular mRNAs.

In the next step, based on the developer's guidelines, cDNA is PCR-amplified and ADT cDNA and mRNA cDNA are separated based on size (generally, ADT-derived cDNAs are < 180bp and mRNA-derived cDNAs are > 300bp). [7] Each of the separated cDNA molecules is independently amplified and purified to prepare sequencing libraries. Finally, the independent libraries are pooled together and sequenced. Thus, proteomics and transcriptomics data can be obtained from a single sequencing run.

Schematic of the dry lab workflow for CITE-Seq CITE-Seq dry lab figure.jpg
Schematic of the dry lab workflow for CITE-Seq

Dry lab workflow

Analysis of single-cell sequencing presents many challenges, such as determining the best way to normalize the data. [8] Due to a new level of complications that arise from sequencing of both proteins and transcripts at a single-cell level, the developers of CITE-Seq and their collaborators are maintaining several tools to help with data analysis.

scRNA-Seq data analysis based on the developer's guidelines: [2] [9] The initial analysis steps are the same as in a standard scRNA-Seq experiment. Firstly, reads need to be aligned to a reference genome of a species of interest and cells with very low number of transcripts mapped to the reference are removed. Finally, a normalized count matrix with gene expression values is obtained.

ADT data analysis [2] [7] [10] [11] (based on the developer's guidelines): CITE-seq-Count is a Python package from CITE-Seq developers that can be used to obtain raw counts. Seurat package from Satija lab further allows combining of the protein and RNA counts and performing clustering on both measurements, as well as doing differential expression analysis between cell clusters of interest. ADT quantification needs to take into account the differences between the antibodies. Additionally, filtering may be required to reduce noise, similarly to scRNA-Seq analysis. But in contrast to RNA data, due to higher amounts of protein in a cell, there is less dropout.

The analyses may result in identification of novel cell clusters through such methods as PCA or tSNE, crucial genes responsible for a specific cell function and other new knowledge specific to a question of interest. In general, the results obtained with ADT counts substantially increase the amount of information obtained through single cell transcriptomics.

Adaptations of the technique

Schematic of Cell Hashing Cell Hashing.jpg
Schematic of Cell Hashing

The applications of antibody-oligonucleotide conjugates have expanded beyond CITE-seq, and can be adapted for sample multiplexing as well as CRISPR screens.

Cell Hashing: New York Genome Center further adapted the use of their antibody-oligonucleotide conjugates to enable sample multiplexing for scRNA-seq. This technique called, Cell Hashing, [12] uses oligonucleotide-labelled antibodies against ubiquitously expressed cell surface proteins from a particular tissue sample. In this case, an oligonucleotide sequence contains a unique barcode which would be specific to cells from distinct samples. This sample-specific cell tagging allows pooling of the sequencing libraries prepared from different samples on a sequencing platform. Sequencing the antibody tags along with the cellular transcriptome helps identify a sample of origin for each analyzed cell. A unique barcode sequence used on the cell hashing antibody can be designed to be different from an antibody barcode present on the ADTs used in CITE-seq. This makes it possible to couple cell hashing with CITE-seq on a single sequencing run. [12] Cell hashing allows super-loading of the scRNA-seq platform, resulting in a lower cost of sequencing. It also enables detection of artifactual signals from multiplets, a major challenge in scRNA-seq. The cell hashing method has further been used by Gaublomme et al. to multiplex single-nucleus RNA-seq (snRNA-seq) by performing nucleus hashing. [13]

ECCITE-seq:Expanded CRISPR-compatible Cellular Indexing of Transcriptomes and Epitopes by sequencing or ECCITE-seq was developed to apply the use of CITE-seq to characterize multiple modalities from a single cell. By modifying the basic CITE-seq protocol to a 5' tag-based scRNA-seq assay, it can detect transcriptome, immune receptor clonotypes, surface markers, sample identity and single guide RNAs (sgRNAs) from each single cell. [14] The ability of ECCITE-seq to detect sgRNA molecules and measure their effect on gene expression levels opens a prospect of applying this technique in CRISPR screens.

Advantages and Limitations of CITE-seq

Advantages: CITE-seq enables simultaneous analysis of the transcriptome as well as the proteome of single cells. Previous efforts of coupling index-sorting measurements from single cell sorts with scRNA-seq were limited to running a small sample size and were not compatible with multiplexing and massive parallel high-throughput sequencing. CITE-seq has been shown to be compatible with high-throughput microfluidic platforms like 10X Genomics and Drop-seq. It is also adaptable to micro/nano-well platforms. Coupling it with cell hashing enables the application of CITE-seq on bulk samples and sample multiplexing. These techniques work to reduce an overall cost of high-throughput sequencing on multiple samples. Lastly, CITE-seq can be adapted to detect small molecules, RNA interference, CRISPR, and other gene editing techniques.

Limitations: One of the limitations of CITE-Seq is a loss of location information. Due to the way the cells are treated, the spatial distribution of cells within a sample, as well as proteins within a cell is not known. [15] [9] In addition, this method shares the challenges of scRNA-Seq, such as high amount of noise and possible challenges in detecting lowly expressed genes. [9] In terms of phenotyping, optimization of the assay and antibodies also presents a potential problem if proteins of interest are not included in the currently available panels. [16] Moreover, right now CITE-Seq is not able to detect intracellular proteins. [16] With the current protocol, there are many challenges that would arise during the permeabilization step, thus limiting the technique to surface markers.

Alternative methods

Related Research Articles

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<i>In situ</i> hybridization

In situ hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA, RNA or modified nucleic acids strand to localize a specific DNA or RNA sequence in a portion or section of tissue or if the tissue is small enough, in the entire tissue, in cells, and in circulating tumor cells (CTCs). This is distinct from immunohistochemistry, which usually localizes proteins in tissue sections.

Cross-linking and immunoprecipitation is a method used in molecular biology that combines UV crosslinking with immunoprecipitation in order to identify RNA binding sites of proteins on a transcriptome-wide scale, thereby increasing our understanding of post-transcriptional regulatory networks. CLIP can be used either with antibodies against endogenous proteins, or with common peptide tags or affinity purification, which enables the possibility of profiling model organisms or RBPs otherwise lacking suitable antibodies.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

<span class="mw-page-title-main">Single-cell analysis</span> Testbg biochemical processes and reactions in an individual cell

In the field of cellular biology, single-cell analysis and subcellular analysis is the study of genomics, transcriptomics, proteomics, metabolomics and cell–cell interactions at the single cell level. The concept of single-cell analysis originated in the 1970s. Before the discovery of heterogeneity, single-cell analysis mainly referred to the analysis or manipulation of an individual cell in a bulk population of cells at a particular condition using optical or electronic microscope. To date, due to the heterogeneity seen in both eukaryotic and prokaryotic cell populations, analyzing a single cell makes it possible to discover mechanisms not seen when studying a bulk population of cells. Technologies such as fluorescence-activated cell sorting (FACS) allow the precise isolation of selected single cells from complex samples, while high throughput single cell partitioning technologies, enable the simultaneous molecular analysis of hundreds or thousands of single unsorted cells; this is particularly useful for the analysis of transcriptome variation in genotypically identical cells, allowing the definition of otherwise undetectable cell subtypes. The development of new technologies is increasing our ability to analyze the genome and transcriptome of single cells, as well as to quantify their proteome and metabolome. Mass spectrometry techniques have become important analytical tools for proteomic and metabolomic analysis of single cells. Recent advances have enabled quantifying thousands of protein across hundreds of single cells, and thus make possible new types of analysis. In situ sequencing and fluorescence in situ hybridization (FISH) do not require that cells be isolated and are increasingly being used for analysis of tissues.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

<span class="mw-page-title-main">Transcriptome in vivo analysis tag</span>

A transcriptome in vivo analysis tag is a multifunctional, photoactivatable mRNA-capture molecule designed for isolating mRNA from a single cell in complex tissues.

<span class="mw-page-title-main">Epitranscriptomic sequencing</span>

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.

Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

<span class="mw-page-title-main">Spatial transcriptomics</span> Range of methods designed for assigning cell types

Spatial transcriptomics is a method for assigning cell types to their locations in the histological sections and can also be used to determine subcellular localization of mRNA molecules. First described in 2016 by Ståhl et al., it has since undergone a variety of improvements and modifications.

CUT&Tag-sequencing, also known as cleavage under targets and tagmentation, is a method used to analyze protein interactions with DNA. CUT&Tag-sequencing combines antibody-targeted controlled cleavage by a protein A-Tn5 fusion with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN and CUT&Tag sequencing do not. CUT&Tag sequencing is an improvement over CUT&RUN because it does not require cells to be lysed or chromatin to be fractionated. CUT&RUN is not suitable for single-cell platforms so CUT&Tag is advantageous for these.

ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.

snRNA-seq, also known as single nucleus RNA sequencing, single nuclei RNA sequencing or sNuc-seq, is an RNA sequencing method for profiling gene expression in cells which are difficult to isolate, such as those from tissues that are archived or which are hard to be dissociated. It is an alternative to single cell RNA seq (scRNA-seq), as it analyzes nuclei instead of intact cells.

Deterministic Barcoding in Tissue for Spatial Omics Sequencing (DBiT-seq) was developed at Yale University by Rong Fan and colleagues in 2020 to create a multi-omics approach for studying spatial gene expression heterogenicity within a tissue sample. This method can used for the co-mapping mRNA and protein levels at a near single-cell resolution in fresh or frozen formaldehyde-fixed tissue samples. DBiT-seq utilizes next generation sequencing (NGS) and microfluidics. This method allows for simultaneous spatial transcriptomic and proteomic analysis of a tissue sample. DBiT-seq improves upon previous spatial transcriptomics applications such as High-Definition Spatial Transcriptomics (HDST) and Slide-seq by increasing the number of detectable genes per pixel, increased cellular resolution, and ease of implementation.

References

  1. Mercatelli, Daniele; Balboni, Nicola; De Giorgio, Francesca; Aleo, Emanuela; Garone, Caterina; Giorgi, Fedrico M. (2021-05-06). "The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow". Methods and Protocols. 4 (2): 28. doi: 10.3390/mps4020028 . ISSN   2409-9279. PMC   8163004 . PMID   34066513.
  2. 1 2 3 4 Stoeckius, Marlon; Hafemeister, Christoph; Stephenson, William; Houck-Loomis, Brian; Chattopadhyay, Pratip K; Swerdlow, Harold; Satija, Rahul; Smibert, Peter (2017-07-31). "Simultaneous epitope and transcriptome measurement in single cells". Nature Methods. 14 (9): 865–868. doi:10.1038/nmeth.4380. ISSN   1548-7091. PMC   5669064 . PMID   28759029.
  3. 1 2 3 Tirosh, Itay; Suvà, Mario L. (2018-11-16). "Deciphering Human Tumor Biology by Single-Cell Expression Profiling". Annual Review of Cancer Biology. 3 (1): 151–166. doi: 10.1146/annurev-cancerbio-030518-055609 . ISSN   2472-3428. S2CID   53969464.
  4. Gutierrez-Arcelus, Maria; Teslovich, Nikola; Mola, Alex R.; Polidoro, Rafael B.; Nathan, Aparna; Kim, Hyun; Hannes, Susan; Slowikowski, Kamil; Watts, Gerald F. M. (2019-02-08). "Lymphocyte innateness defined by transcriptional states reflects a balance between proliferation and effector functions". Nature Communications. 10 (1): 687. Bibcode:2019NatCo..10..687G. doi:10.1038/s41467-019-08604-4. ISSN   2041-1723. PMC   6368609 . PMID   30737409.
  5. Chattopadhyay, Pratip K.; Roederer, Mario; Bolton, Diane L. (2018-11-06). "A deadly dance: the choreography of host–pathogen interactions, as revealed by single-cell technologies". Nature Communications. 9 (1): 4638. Bibcode:2018NatCo...9.4638C. doi:10.1038/s41467-018-06214-0. ISSN   2041-1723. PMC   6219517 . PMID   30401874.
  6. Macosko, Evan Z.; Basu, Anindita; Satija, Rahul; Nemesh, James; Shekhar, Karthik; Goldman, Melissa; Tirosh, Itay; Bialas, Allison R.; Kamitaki, Nolan (May 2015). "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets". Cell. 161 (5): 1202–1214. doi:10.1016/j.cell.2015.05.002. ISSN   0092-8674. PMC   4481139 . PMID   26000488.
  7. 1 2 "CITE-seq". CITE-seq. Retrieved 2019-02-27.
  8. Gao, Shan (2018), "Data Analysis in Single-Cell Transcriptome Sequencing", Computational Systems Biology, Methods in Molecular Biology, vol. 1754, Springer New York, pp. 311–326, doi:10.1007/978-1-4939-7717-8_18, ISBN   9781493977161, PMID   29536451
  9. 1 2 3 Liu, Serena; Trapnell, Cole (2016-02-17). "Single-cell transcriptome sequencing: recent advances and remaining challenges". F1000Research. 5: 182. doi: 10.12688/f1000research.7223.1 . ISSN   2046-1402. PMC   4758375 . PMID   26949524.
  10. Roelli, Patrick (2019-02-23), Small script that allows to count TAGS from a CITE-seq experiment: Hoohm/CITE-seq-Count , retrieved 2019-02-27
  11. "Seurat". satijalab.org. Retrieved 2019-02-27.
  12. 1 2 Stoeckius, Marlon; Zheng, Shiwei; Houck-Loomis, Brian; Hao, Stephanie; Yeung, Bertrand; Smibert, Peter; Satija, Rahul (2017-12-21). "Cell "hashing" with barcoded antibodies enables multiplexing and doublet detection for single cell genomics". Genome Biology. 19 (1): 224. bioRxiv   10.1101/237693 . doi: 10.1186/s13059-018-1603-1 . PMC   6300015 . PMID   30567574.
  13. Gaublomme, Jellert T.; Li, Bo; McCabe, Cristin; Knecht, Abigail; Drokhlyansky, Eugene; Van Wittenberghe, Nicholas; Waldman, Julia; Dionne, Danielle; Nguyen, Lan (2018-11-23). "Nuclei multiplexing with barcoded antibodies for single-nucleus genomics". bioRxiv. doi: 10.1101/476036 . hdl: 1721.1/125028 .
  14. Mimitou, Eleni; Cheng, Anthony; Montalbano, Antonino; Hao, Stephanie; Stoeckius, Marlon; Legut, Mateusz; Roush, Timothy; Herrera, Alberto; Papalexi, Efthymia (2018-11-08). "Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay". bioRxiv. doi: 10.1101/466466 .
  15. An, Xingyue; Varadarajan, Navin (March 2018). "Single-cell technologies for profiling T cells to enable monitoring of immunotherapies". Current Opinion in Chemical Engineering. 19: 142–152. doi:10.1016/j.coche.2018.01.003. ISSN   2211-3398. PMC   6530921 . PMID   31131208.
  16. 1 2 Baron, Maayan; Yanai, Itai (2017-08-24). "New skin for the old RNA-Seq ceremony: the age of single-cell multi-omics". Genome Biology. 18 (1): 159. doi: 10.1186/s13059-017-1300-5 . ISSN   1474-760X. PMC   5571565 . PMID   28837001.
  17. Peterson, Vanessa M; Zhang, Kelvin Xi; Kumar, Namit; Wong, Jerelyn; Li, Lixia; Wilson, Douglas C; Moore, Renee; McClanahan, Terrill K; Sadekova, Svetlana (2017-08-30). "Multiplexed quantification of proteins and transcripts in single cells". Nature Biotechnology. 35 (10): 936–939. doi:10.1038/nbt.3973. ISSN   1087-0156. PMID   28854175. S2CID   205285357.
  18. Frei, Andreas P; Bava, Felice-Alessio; Zunder, Eli R; Hsieh, Elena W Y; Chen, Shih-Yu; Nolan, Garry P; Gherardini, Pier Federico (2016-01-25). "Highly multiplexed simultaneous detection of RNAs and proteins in single cells". Nature Methods. 13 (3): 269–275. doi:10.1038/nmeth.3742. ISSN   1548-7091. PMC   4767631 . PMID   26808670.