Single-cell multi-omics integration

Last updated
Schematic of the different single-cell multi-omic integration strategies. Adapted from Adossa et al., 2021 MultiOmicIntegrationStrategies.png
Schematic of the different single-cell multi-omic integration strategies. Adapted from Adossa et al., 2021

Single-cell multi-omics integration describes a suite of computational methods used to harmonize information from multiple "omes" to jointly analyze biological phenomena. [2] [3] [4] [5] This approach allows researchers to discover intricate relationships between different chemical-physical modalities by drawing associations across various molecular layers simultaneously. Multi-omics integration approaches can be categorized into four broad categories: Early integration, intermediate integration, late integration methods. [6] Multi-omics integration can enhance experimental robustness by providing independent sources of evidence to address hypotheses, leveraging modality-specific strengths to compensate for another's weaknesses through imputation, and offering cell-type clustering and visualizations that are more aligned with reality [2] [3]

Contents

Background

The emergence of single-cell sequencing technologies has revolutionized our understanding of cellular heterogeneity, uncovering a nuanced landscape of cell types and their associations with biological processes. Single-cell omics technologies has extended beyond the transcriptome to profile diverse physical-chemical properties at single-cell resolution, including whole genomes/exomes, DNA methylation, chromatin accessibility, histone modifications, epitranscriptome (e.g., mRNAs, microRNAs, tRNAs, lncRNAs), proteome, phosphoproteome, metabolome, and more. [4] [7] [8] In fact, there is an expanding repository of publicly available single-cell datasets, exemplified by growing databases such as the Human Cell Atlas Project (HCA), the Cancer Genome Atlas (TCGA), and the ENCODE project. [9] [10] [11] [12] [13] With the increasing diversity in both available datasets and data types, multi-omics data integration and multimodal data analysis represent pivotal trajectories for the future of systems biology.

Single-cell multi-omics integration can reveal underappreciated relationships between chemical-physical modalities, broaden our definition of cell states beyond single modality feature profiles, and provide independent evidence during analysis to support testing of biological hypotheses. However, the high dimensionality (features > observations), high degree of stochastic technical and biological variability, and sparsity of single-cell data (low molecule recovery efficiency) make computational integration a challenging problem. [14] [15] [16] [17] Furthermore, different solutions for multi-omics integration are available depending on factors such as whether the data is matched (simultaneous measurements derived from the same cell) or unmatched (measurements derived from different cells), whether cell-type annotations are available, or whether modality feature conversion is available, with different implementations tailored to suit the specific use case. [2] As such, there are multiple approaches to single-cell data integration, each with a distinct use case, and each with its own set of advantages and disadvantages. [2] [6] [18]

Approaches to multi-omics integration

Early integration

Early integration is a method that concatenates (by binding rows and columns) two or more omics datasets into a single data matrix. [19] [20] Some advantages of early integration are that the approach is simple, highly interpretable, and capable of capturing relationships between features from different modalities. Early integration is primarily employed to merge datasets of the same datatype (e.g., integrating two distinct scRNA-seq datasets). This is because integrating datasets from different modalities may lead to a combined feature set with variable feature value ranges. For instance, expression data often spans a wider range compared to accessibility data, which typically ranges between values of 0 and 2.

Early integration approaches produce data matrices with higher dimensionality compared to the original matrix. As such, dimensionality reduction methods such as feature selection and feature extraction are often necessary steps for downstream analysis. Feature selection involves retaining only the important variables from the original omic layers, while feature extraction transforms the original input features into combinations of the original features. The projection of high-dimensional data into a lower-dimensional space reduces noise and simplifies the dataset, resulting in easier data handling.

Intermediate integration

Intermediate integration describes a class of approaches which aim to analyze multiple omic datasets simultaneously without the need for prior data transformation (as this occurs during data integration). [19] [20] Several examples of intermediate integration include similarity-based integration, joint dimension reduction, and statistical modelling.

Similarity-based integration

Similarity-based integration aims to identify patterns across multi-omic datasets through the use of spectral clustering (eg. Spectrum [21] and PC-MSC [22] ). Spectral clustering cluster cells based on either similarity matrices derived from a multi-omic dataset or graph fusion algorithms (eg. Seurat4) which construct graphs from individual omics layers and merges them into a single graph. [23]

Joint dimension reduction

Joint dimension reduction aims to reduce the complexity of multi-omics data by projecting observations onto a lower dimensional latent space such that the different omics layers can be analyzed together. [24] Canonical correlation analysis (CCA), non-negative matrix factorization (NMF) and manifold alignment are popular approaches for joint dimensionality reduction. Tools that use CCA or its derivative sparse CCA, such as Seurat3 [25] and bindSC [26] identify linear relationships between datasets by identifying linear combinations of variables that maximize feature correlation. Tools which use NMF (eg. LIGER [27] and coupledNMF [28] ) extract low-dimensional representations of high-dimensional data such that both shared and dataset-specific factors across the multiple omics datasets can be identified. Manifold alignment (eg., MATCHER [29] and MAGAN [30] ) refers to an approach where low dimension representations of various multi-omic datasets are computed individually and then represented as a common latent space.

Statistical modeling

Various statistical approaches, including the probabilistic Bayesian modeling framework (which allows for the incorporation of prior knowledge and uncertainties into the analysis), can be used to integrate multi-omic datasets. For instance, BREM-SC [31] employ a Bayesian clustering framework to jointly cluster multi-omic datasets, while other tools like clonealign utilizes Bayesian methods to integrate gene expression and copy number profiles for studying cancer clones.

Late integration

Late integration aims to preprocess and model omics modalities separately, and then combine the two models at the end. [19] [20] The advantage of late integration is that tailored tools for each omics modality can be applied per modality. While late integration approaches are commonly used in the context of bulk multi-omics studies (eg., Cluster-of-clusters analysis [32] and Kernel Learning Integrative Clustering [33] ), late integration approaches to single cell integration is still a novel field. For example, ensemble learning techniques such as ensemble clustering (eg. SAME-clustering, [34] Sc-GPE, [35] EC-PGMGR [36] ), have demonstrated potential in aggregating clustering results from different sources. These methods combine the clustering results from different omics datasets to create a consensus clustering which models the relationships between the individual clustering results to find an improved global clustering solution across the different modalities.

As late integration involves analyzing each individual omics layer separately before integrating the results into a consensus result, it may fail to capture interactions and relationships across different omics modalities. As such, some groups argue that late integration represents multiple parallel single-omics analysis conducted on multiple data types, rather than fulfilling the "true goal" of multi-omics integration, which is to discover inter-omics relationships present in multi-omics data. [20]

Single-Cell Multi-Omic Integration Tools
ToolBenchmarked Modalities SupportedIntegration Strategy
BindSC [26] Transcriptome and chromatin accessibilityIntermediate
BREM-SC [31] Transcriptome and proteomeEarly or Intermediate
CiteFuse [37] Transcriptome and proteomeLate
Clonealign [38] Transcriptome and genomeIntermediate
CoupledNMF [28] Transcriptome and chromatin accessibility dataIntermediate
LIGER [27] Transcriptome, spatial gene expression, methylome, and chromatin accessibilityIntermediate
MAGAN [30] Multiplexed immunohistochemistry and transcriptomeIntermediate
MMD-MA [39] Transcriptome and methylomeIntermediate
MOFA+ [40] Transcriptome and chromatin accessibilityEarly or Intermediate
SCHEMA [41] Transcriptome, chromatin accessibility, and spatial gene expressionIntermediate
scMVAE [42] Transcriptome and chromatin accessibilityIntermediate
Seurat3 [25] Transcriptome and chromatin accessibilityIntermediate or Late
Seurat4 [43] Transcriptome, chromatin accessibility and proteomeIntermediate or Late
Seurat5 [44] Transcriptome, proteome, methylome and hashtag oligosIntermediate or Late
Spectrum [21] Transcriptome, miRNA, and proteomeIntermediate
TotalVI [45] Transcriptome and proteomeIntermediate
UniCom [46] Transcriptome and methylomeIntermediate

Considerations for multi-omics data integration

Noise

As single-cell data is prone to noise from both biological and technical sources, developing robust de-noising methods to mitigate noise may be necessary. [47] In the context of single-cell experiments, biological variation arising from factors such as transcriptional bursts, differences in cell cycle, and cell microenvironment can introduce noise to the dataset. Additionally, technical variability resulting from factors like poor sequence quality, uneven sequence coverage, and sample contamination must also be addressed.

Dataset compatibility

Integrating different omic modalities can be challenging due to differences in the structure of different datasets. [48] For example, scRNA-seq features are expressed on a continuous scale whereas chromatin accessibility data (ie. scATAC-seq) exists between 0-2 (two copies of each region per cell). As such, integration of different modalities may require additional steps to transform the datasets into a common latent space. Even then, integration strategies such as early integration may still be prone to issues of bias if the resulting matrix is disproportionately represented by features from one specific modality.

Dimensionality

Analyzing large-scale single-cell multi-omics datasets can be computationally intensive because of the high dimensionality of the datasets. [1] [2] Hence, the tools employed for integrating datasets must be computationally efficient, or computational methods should be utilized initially to reduce the dimensionality of the datasets (refer to dimensionality reduction).

Interpretability and validation

Many integration methods focus on statistical associations rather than detailed causal modeling. As such, interpreting and validating the results can be particularly challenging, especially if a neural network was utilized, as these methods are black boxes. [20] The utility and validation of integration methods need to be assessed based on practical applications, such as accurately identifying biologically relevant multi-omic relationships.

Matched and unmatched data

The integration of single-cell multi-omic data presents different challenges depending on whether the datasets are matched or unmatched. [48] Matched datasets refer to multiple omic layers that are measured from the same individual cell whereas unmatched data refer to dataset that are measured from a different set of cells. While matched datasets enable direct comparisons between the different omics layers within the same cell, they may not be as readily available as unmatched datasets. On the other hand, while unmatched datasets allow for the integration of different sources and conditions, they require considerations of potential biases and confounding factors. (e.g., differences in cell populations, experimental conditions, or sample preparation methods between different datasets). Several approaches to multi-omics integration for unmatched data include matching by cell group (requires cell type annotations), matching by shared features, or statistical approaches such as NMF. [2]

Applications and uses

While single-modality datasets have proven to be a mainstay in systems biology, combining biological information across multiple modalities has the potential to address biological questions that cannot be inferred by a single data type alone.

Modelling biological networks

For example, the integration of transcriptome and DNA accessibility has enabled the development of bioinformatic tools to infer cell-type-specific gene regulatory networks. [49] [50] [51] This is achieved by leveraging transcription factor and target gene expression along with cis-regulatory information to impute relevant transcription factors and their regulatory partners.

Expanding definitions of cell state

Another application for multi omics integration is in expanding definitions of cell states incorporating features observed across multiple modalities. For instance, integrating protein marker detection with transcriptome profiling using a multi-omics sequencing technology such as CITE-seq can resolve cell state signatures based on joint gene regulatory and surface marker expression. [52] This enables more robust inferences regarding cellular phenotypes, which are akin to and directly comparable with results from classical flow cytometry. Moreover, defining cell states based on clustering analysis within an integrated latent space may offer more stable estimations of cellular phenotypes compared to analysis within a single-modality latent space. [2]

Imputation

Furthermore, multi omics integration can overcome modality-specific limitations through imputation. For example, most spatial transcriptomic sequencing technologies suffer from limited spatial resolution (pixels comprising a mixture of local cells) and low feature complexity. [53] Integration of spatial transcriptomics with scRNAseq can help overcome these limitations by supporting the spatial deconvolution of low-resolution readouts and estimating the frequencies of each cell type [54] [55]

References

  1. 1 2 Adossa, Nigatu; Khan, Sofia; Rytkönen, Kalle T.; Elo, Laura L. (2021). "Computational strategies for single-cell multi-omics integration". Computational and Structural Biotechnology Journal. 19: 2588–2596. doi:10.1016/j.csbj.2021.04.060. ISSN   2001-0370. PMC   8114078 . PMID   34025945.
  2. 1 2 3 4 5 6 7 Miao, Zhen; Humphreys, Benjamin D; McMahon, Andrew P; Kim, Junhyong (2021). "Multi-omics integration in the age of million single-cell data". Nat Rev Nephrol. 17 (11): 710–724. doi:10.1038/s41581-021-00463-x. PMC   9191639 . PMID   34417589.
  3. 1 2 Subramanian, Indhupriya (2020). "Multi-omics Data Integration, Interpretation, and Its Application". Bioinform Biol Insights. 14. doi:10.1177/1177932219899051. PMC   7003173 . PMID   32076369.
  4. 1 2 Stuart, Tim; Sajita, Rahul (2019). "Integrative single-cell analysis". Nat Rev Genet. 20 (5): 257–272. doi:10.1038/s41576-019-0093-7. PMID   30696980. S2CID   59409752.
  5. Li, Yunjin; Ma, Lu; Wu, Duojiao; Chen, Geng (2021). "Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine". Brief Bioinform. 22 (5). doi:10.1093/bib/bbab024. PMID   33778867.
  6. 1 2 Adossa, Nigatu; Khan, Sofia; Rytkönen, Kalle T; Elo, Laura L (2021). "Computational strategies for single-cell multi-omics integration". Comput Struct Biotechnol J. 19: 2588–2596. doi:10.1016/j.csbj.2021.04.060. PMC   8114078 . PMID   34025945.
  7. Baysoy, Alev; Bai, Zhiliang; Satija, Rahul; Fan, Rong (2024). "The technological landscape and applications of single-cell multi-omics". Nat Rev Mol Cell Biol. 24 (10): 695–713. doi:10.1038/s41580-023-00615-w. PMC   10242609 . PMID   37280296.
  8. Macaulay, Iain C; Ponting, Chris P; Voet, Thierry (2017). "Single-Cell Multiomics: Multiple Measurements from Single Cells". Trends Genet. 33 (2): 155–168. doi:10.1016/j.tig.2016.12.003. PMC   5303816 . PMID   28089370.
  9. Regev, Aviv; Teichmann, Sarah A; Lander, Eric S; Amit, Ido; Benoist, Christophe; Birney, Ewan; Bodenmiller, Bernd; Campbell, Peter; Carninci, Piero; Clatworthy, Menna; Clevers, Hans; Deplancke, Bart; Dunham, Ian; Eberwine, James; Eils, Roland; Enard, Wolfgang; Farmer, Andrew; Fugger, Lars; Göttgens, Berthold; Hacohen, Nir; Haniffa, Muzlifah; Hemberg, Martin; Kim, Seung; Klenerman, Paul; Kriegstein, Arnold; Lein, Ed; Linnarsson, Sten; Lundberg, Emma; Lundeberg, Joakim; Majumder, Partha; Marioni, John C; Merad, Miriam; Mhlanga, Musa; Nawijn, Martijn; Netea, Mihai; Nolan, Garry; Pe'er, Dana; Phillipakis, Anthony; Ponting, Chris P; Quake, Stephen; Reik, Wolf; Rozenblatt-Rosen, Orit; Sanes, Joshua; Satjia, Rahul; Schumacher, Ton N; Shalek, Alex; Shapiro, Ehud; Sharma, Padmanee; Shin, Jay W; Stegle, Oliver; Stratton, Michael; Stubbington, Michael J T; Theis, Fabian J; Uhlen, Matthias; Van Oudenaarden, Alexander; Wagner, Allon; Watt, Fiona; Weissman, Jonathan; Wold, Barbara; Xavier, Ramnik; Yosef, Nir (2017). "The Human Cell Atlas". eLife. 6. doi: 10.7554/eLife.27041 . PMC   5762154 . PMID   29206104.
  10. Lindeboom, Rik G.H; Regev, Aviv; Teichmann, Sarah A (2021). "Towards a Human Cell Atlas: Taking Notes from the Past". Trends Genet. 37 (7): 625–630. doi: 10.1016/j.tig.2021.03.007 . hdl: 1721.1/134116 . PMID   33879355.
  11. Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M (2013). "The Cancer Genome Atlas Pan-Cancer analysis project". Nat Genet. 45 (10): 1113–1120. doi:10.1038/ng.2764. PMC   3919969 . PMID   24071849.
  12. The ENCODE Project Consortium (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489...57T. doi:10.1038/nature11247. PMC   3439153 . PMID   22955616.
  13. The ENCODE Project Consortium (2020). "Expanded encyclopaedias of DNA elements in the human and mouse genomes". Nature. 583 (7818): 699–710. Bibcode:2020Natur.583..699E. doi:10.1038/s41586-020-2493-4. PMC   7410828 . PMID   32728249.
  14. Lähnemann, David; Köster, Johannes; Szczurek, Ewa; McCarthy, Davis J; Hicks, Stephanie C; Robinson, Mark D; Vallejos, Catalina A; Campbell, Kieran R; Beerenwinkel, Niko; Mahfouz, Ahmed; Pinello, Luca; Skums, Pavel; Stamatakis, Alexandros; Attolini, Camille Stephan-Otto; Aparicio, Samuel; Baaijens, Jasmijn; Balvert, Marleen; Barbanson, Buys De; Cappuccio, Antonio; Corleone, Giacomo; Dutilh, Bas E; Florescu, Maria; Guryev, Victor; Holmer, Rens; Jahn, Katharina; Lobo, Thamar Jessurun; Keizer, Emma M; Khatri, Indu; Kielbasa, Szymon M; Korbel, Jan O; Kozlov, Alexey M; Kuo, Tzu-Hao; Lelieveldt, Boudewijn P.F; Mandoiu, Ion I; Marioni, John C; Marschall, Tobias; Mölder, Felix; Niknejad, Amir; Rączkowska, Alicja; Reinders, Marcel; Ridder, Jeroen De; Saliba, Antoine-Emmanuel; Somarakis, Antonios; Stegle, Oliver; Theis, Fabian J; Yang, Huan; Zelikovsky, Alex; McHardy, Alice C; Raphael, Benjamin J; Shah, Sohrab P; Schönhuth, Alexander (2020). "Eleven grand challenges in single-cell data science". Genome Biol. 21 (1): 31. doi: 10.1186/s13059-020-1926-6 . PMC   7007675 . PMID   32033589.
  15. Santiago-Rodriguez, Tasha M; Hollister, Emily B (2021). "Multi 'omic data integration: A review of concepts, considerations, and approaches". Semin Perinatol. 45 (6). doi:10.1016/j.semperi.2021.151456. PMID   34256961. S2CID   235822759.
  16. Yuan, Guo-Cheng; Cai, Long; Elowitz, Michael; Enver, Tariq; Fan, Guoping; Guo, Guoji; Irizarry, Rafael; Kharchenko, Peter; Kim, Junhyong; Orkin, Stuart; Quackenbush, John; Saadatpour, Assieh; Schroeder, Timm; Shivdasani, Ramesh; Tirosh, Itay (2017). "Challenges and emerging directions in single-cell analysis". Genome Biol. 18 (1): 84. doi: 10.1186/s13059-017-1218-y . PMC   5421338 . PMID   28482897.
  17. Argelaguet, RICARD; Cuomo, Anna S. E; Stegle, Oliver; Marioni, John C (2021). "Computational principles and challenges in single-cell data integration". Nat Biotechnol. 39 (10): 1202–1215. doi:10.1038/s41587-021-00895-7. PMID   33941931. S2CID   233722751.
  18. Wu, Yan; Zhang, Kun (2020). "Tools for the analysis of high-dimensional single-cell RNA sequencing data". Nat Rev Nephrol. 16 (7): 408–421. doi:10.1038/s41581-020-0262-0. PMID   32221477. S2CID   214672522.
  19. 1 2 3 Adossa, Nigatu; Khan, Sofia; Rytkönen, Kalle T.; Elo, Laura L. (2021). "Computational strategies for single-cell multi-omics integration". Computational and Structural Biotechnology Journal. 19: 2588–2596. doi:10.1016/j.csbj.2021.04.060. PMC   8114078 . PMID   34025945.
  20. 1 2 3 4 5 Picard, Milan; Scott-Boyer, Marie-Pier; Bodein, Antoine; Périn, Olivier; Droit, Arnaud (2021). "Integration strategies of multi-omics data for machine learning analysis". Computational and Structural Biotechnology Journal. 19: 3735–3746. doi:10.1016/j.csbj.2021.06.030. ISSN   2001-0370. PMC   8258788 . PMID   34285775.
  21. 1 2 John, Christopher R; Watson, David; Barnes, Michael R; Pitzalis, Costantino; Lewis, Myles J (2019-09-10). "Spectrum: fast density-aware spectral clustering for single and multi-omic data". Bioinformatics. 36 (4): 1159–1166. doi:10.1093/bioinformatics/btz704. ISSN   1367-4803. PMC   7703791 . PMID   31501851.
  22. Kumar, Abhishek; Rai, Piyush; Daume, Hal (2011). "Co-regularized Multi-view Spectral Clustering". Advances in Neural Information Processing Systems. 24. Curran Associates, Inc.
  23. Wang, Bo; Mezlini, Aziz M; Demir, Feyyaz; Fiume, Marc; Tu, Zhuowen; Brudno, Michael; Haibe-Kains, Benjamin; Goldenberg, Anna (March 2014). "Similarity network fusion for aggregating data types on a genomic scale" . Nature Methods. 11 (3): 333–337. doi:10.1038/nmeth.2810. ISSN   1548-7091. PMID   24464287. S2CID   9033318.
  24. Cantini, Laura; Zakeri, Pooya; Hernandez, Celine; Naldi, Aurelien; Thieffry, Denis; Remy, Elisabeth; Baudot, Anaïs (2021-01-05). "Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer". Nature Communications. 12 (1): 124. Bibcode:2021NatCo..12..124C. doi:10.1038/s41467-020-20430-7. ISSN   2041-1723. PMC   7785750 . PMID   33402734.
  25. 1 2 Stuart, Tim; Butler, Andrew; Hoffman, Paul; Hafemeister, Christoph; Papalexi, Efthymia; Mauck, William M.; Hao, Yuhan; Stoeckius, Marlon; Smibert, Peter; Satija, Rahul (June 2019). "Comprehensive Integration of Single-Cell Data". Cell. 177 (7): 1888–1902.e21. doi:10.1016/j.cell.2019.05.031. ISSN   0092-8674. PMC   6687398 . PMID   31178118.
  26. 1 2 Dou, Jinzhuang; Liang, Shaoheng; Mohanty, Vakul; Miao, Qi; Huang, Yuefan; Liang, Qingnan; Cheng, Xuesen; Kim, Sangbae; Choi, Jongsu; Li, Yumei; Li, Li; Daher, May; Basar, Rafet; Rezvani, Katayoun; Chen, Rui (2022-05-09). "Bi-order multimodal integration of single-cell data". Genome Biology. 23 (1): 112. doi: 10.1186/s13059-022-02679-x . ISSN   1474-760X. PMC   9082907 . PMID   35534898.
  27. 1 2 Welch, Joshua D.; Kozareva, Velina; Ferreira, Ashley; Vanderburg, Charles; Martin, Carly; Macosko, Evan Z. (June 2019). "Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity". Cell. 177 (7): 1873–1887.e17. doi:10.1016/j.cell.2019.05.006. ISSN   0092-8674. PMC   6716797 . PMID   31178122.
  28. 1 2 Duren, Zhana; Chen, Xi; Zamanighomi, Mahdi; Zeng, Wanwen; Satpathy, Ansuman T.; Chang, Howard Y.; Wang, Yong; Wong, Wing Hung (2018-07-24). "Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations". Proceedings of the National Academy of Sciences. 115 (30): 7723–7728. Bibcode:2018PNAS..115.7723D. doi: 10.1073/pnas.1805681115 . ISSN   0027-8424. PMC   6065048 . PMID   29987051.
  29. Welch, Joshua D.; Hartemink, Alexander J.; Prins, Jan F. (2017-07-24). "MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics". Genome Biology. 18 (1): 138. doi: 10.1186/s13059-017-1269-0 . ISSN   1474-760X. PMC   5525279 . PMID   28738873.
  30. 1 2 Amodio, Matthew; Krishnaswamy, Smita (2018-02-09), MAGAN: Aligning Biological Manifolds, arXiv: 1803.00385
  31. 1 2 Wang, Xinjun; Sun, Zhe; Zhang, Yanfu; Xu, Zhongli; Xin, Hongyi; Huang, Heng; Duerr, Richard H; Chen, Kong; Ding, Ying; Chen, Wei (2020-05-07). "BREM-SC: a bayesian random effects mixture model for joint clustering single cell multi-omics data". Nucleic Acids Research. 48 (11): 5814–5824. doi:10.1093/nar/gkaa314. ISSN   0305-1048. PMC   7293045 . PMID   32379315.
  32. OSBREAC; Aure, Miriam Ragle; Vitelli, Valeria; Jernström, Sandra; Kumar, Surendra; Krohn, Marit; Due, Eldri U.; Haukaas, Tonje Husby; Leivonen, Suvi-Katri; Vollan, Hans Kristian Moen; Lüders, Torben; Rødland, Einar; Vaske, Charles J.; Zhao, Wei; Møller, Elen K. (December 2017). "Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome". Breast Cancer Research. 19 (1): 44. doi: 10.1186/s13058-017-0812-y . ISSN   1465-542X. PMC   5372339 . PMID   28356166.
  33. Cabassi, Alessandra; Kirk, Paul D (27 June 2020). "Multiple kernel learning for integrative consensus clustering of omic datasets". Bioinformatics. 36 (18): 4789–4796. doi:10.1093/bioinformatics/btaa593. PMC   7750932 . PMID   32592464 . Retrieved 2024-02-22.
  34. Huh, Ruth; Yang, Yuchen; Jiang, Yuchao; Shen, Yin; Li, Yun (2020). "SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble". Nucleic Acids Research. 48 (1): 86–95. doi:10.1093/nar/gkz959. PMC   6943136 . PMID   31777938 . Retrieved 2024-02-22.
  35. Zhu, Xiaoshu; Li, Jian; Li, Hong-Dong; Xie, Miao; Wang, Jianxin (2020-12-15). "Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell". Frontiers in Genetics. 11 604790. doi: 10.3389/fgene.2020.604790 . ISSN   1664-8021. PMC   7770236 . PMID   33384718.
  36. Zhu, Yuan; Zhang, De-Xin; Zhang, Xiao-Fei; Yi, Ming; Ou-Yang, Le; Wu, Mengyun (2020). "EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data". Frontiers in Genetics. 11 572242. doi: 10.3389/fgene.2020.572242 . ISSN   1664-8021. PMC   7673820 . PMID   33329710.
  37. Kim, Hani Jieun; Lin, Yingxin; Geddes, Thomas A.; Yang, Jean Yee Hwa; Yang, Pengyi (2020). "CiteFuse enables multi-modal analysis of CITE-seq data" . Bioinformatics. 36 (14): 4137–4143. doi:10.1093/bioinformatics/btaa282. PMID   32353146 . Retrieved 2024-02-22.
  38. Campbell, Kieran R.; Steif, Adi; Laks, Emma; Zahn, Hans; Lai, Daniel; McPherson, Andrew; Farahani, Hossein; Kabeer, Farhia; O’Flanagan, Ciara; Biele, Justina; Brimhall, Jazmine; Wang, Beixi; Walters, Pascale; Consortium, IMAXT; Bouchard-Côté, Alexandre (2019-03-12). "clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers". Genome Biology. 20 (1): 54. doi: 10.1186/s13059-019-1645-z . ISSN   1474-760X. PMC   6417140 . PMID   30866997.
  39. Liu, Jie; Huang, Yuanhao; Singh, Ritambhara; Vert, Jean-Philippe; Noble, William Stafford (2019). Jointly Embedding Multiple Single-Cell Omics Measurements. Leibniz International Proceedings in Informatics (LIPIcs). Vol. 143. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 13 pages, 3000894 bytes. doi: 10.4230/LIPICS.WABI.2019.10 . ISBN   978-3-95977-123-8. ISSN   1868-8969. PMC   8496402 . PMID   34632462.
  40. Argelaguet, Ricard; Arnol, Damien; Bredikhin, Danila; Deloro, Yonatan; Velten, Britta; Marioni, John C.; Stegle, Oliver (2020-05-11). "MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data". Genome Biology. 21 (1): 111. doi: 10.1186/s13059-020-02015-1 . ISSN   1474-760X. PMC   7212577 . PMID   32393329.
  41. Singh, Rohit; Hie, Brian L.; Narayan, Ashwin; Berger, Bonnie (2021-05-03). "Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities". Genome Biology. 22 (1): 131. doi: 10.1186/s13059-021-02313-2 . ISSN   1474-760X. PMC   8091541 . PMID   33941239.
  42. Zuo, Chunman; Chen, Luonan (2020-11-17). "Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data". Briefings in Bioinformatics. 22 (4) bbaa287. doi:10.1093/bib/bbaa287. ISSN   1467-5463. PMC   8293818 . PMID   33200787.
  43. Hao, Yuhan; Hao, Stephanie; Andersen-Nissen, Erica; Mauck, William M.; Zheng, Shiwei; Butler, Andrew; Lee, Maddie J.; Wilk, Aaron J.; Darby, Charlotte; Zager, Michael; Hoffman, Paul; Stoeckius, Marlon; Papalexi, Efthymia; Mimitou, Eleni P.; Jain, Jaison (June 2021). "Integrated analysis of multimodal single-cell data". Cell. 184 (13): 3573–3587.e29. doi:10.1016/j.cell.2021.04.048. ISSN   0092-8674. PMC   8238499 . PMID   34062119.
  44. Hao, Yuhan; Stuart, Tim; Kowalski, Madeline H.; Choudhary, Saket; Hoffman, Paul; Hartman, Austin; Srivastava, Avi; Molla, Gesmira; Madad, Shaista; Fernandez-Granda, Carlos; Satija, Rahul (February 2024). "Dictionary learning for integrative, multimodal and scalable single-cell analysis". Nature Biotechnology. 42 (2): 293–304. doi:10.1038/s41587-023-01767-y. ISSN   1546-1696. PMC   10928517 . PMID   37231261.
  45. Gayoso, Adam; Steier, Zoë; Lopez, Romain; Regier, Jeffrey; Nazor, Kristopher L.; Streets, Aaron; Yosef, Nir (March 2021). "Joint probabilistic modeling of single-cell multi-omic data with totalVI". Nature Methods. 18 (3): 272–282. doi:10.1038/s41592-020-01050-x. ISSN   1548-7105. PMC   7954949 . PMID   33589839.
  46. Cao, Kai; Bai, Xiangqi; Hong, Yiguang; Wan, Lin (2020-07-01). "Unsupervised topological alignment for single-cell multi-omics integration". Bioinformatics. 36 (Supplement_1): i48–i56. doi:10.1093/bioinformatics/btaa443. ISSN   1367-4803. PMC   7355262 . PMID   32657382.
  47. Janssen, Philipp; Kliesmete, Zane; Vieth, Beate; Adiconis, Xian; Simmons, Sean; Marshall, Jamie; McCabe, Cristin; Heyn, Holger; Levin, Joshua Z.; Enard, Wolfgang; Hellmann, Ines (2023-06-19). "The effect of background noise and its removal on the analysis of single-cell expression data". Genome Biology. 24 (1): 140. doi: 10.1186/s13059-023-02978-x . ISSN   1474-760X. PMC   10278251 . PMID   37337297.
  48. 1 2 Argelaguet, Ricard; Cuomo, Anna S. E.; Stegle, Oliver; Marioni, John C. (October 2021). "Computational principles and challenges in single-cell data integration" . Nature Biotechnology. 39 (10): 1202–1215. doi:10.1038/s41587-021-00895-7. ISSN   1546-1696. PMID   33941931. S2CID   233722751.
  49. Kim, Daniel; Tran, Andy; Kim, Hani Jieun; Lin, Yingxin; Yang, Jean Yee Hwa; Yang, Pengyi (2023). "Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data". npj Syst Biol Appl. 9 (1): 51. doi:10.1038/s41540-023-00312-6. PMC   10587078 . PMID   37857632.
  50. Bravo González-Blas, Carmen; De Winter, Seppe; Hulselmans, Gert; Hecker, Nikolai; Matetovici, Irina; Christiaens, Valerie; Poovathingal, Suresh; Wouters, Jasper; Aibar, Sara; Aerts, Stein (2023). "SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks". Nat Methods. 20 (9): 1355–1367. doi:10.1038/s41592-023-01938-4. PMC   10482700 . PMID   37443338.
  51. Fleck, Jonas Simon; Jansen, Sophie Martina Johanna; Whollny, Damian; Zenk, Fides; Seimiya, Makiko; Jain, Akanksha; Okamoto, Ryoko; Santel, Malgorzata; He, Zhisong; Camp, J. Gray; Treutlein, Barbara (2023). "Inferring and perturbing cell fate regulomes in human brain organoids". Nature. 621 (7978): 365–372. Bibcode:2023Natur.621..365F. doi:10.1038/s41586-022-05279-8. PMC   10499607 . PMID   36198796.
  52. Stoeckius, Marlon; Hafemeister, Christoph; Stephenson, William; Houck-Loomis, Brian; Chattopadhyay, Pratip K; Swerdlow, Harold; Sajita, Rahul; Smibert, Peter (2017). "Simultaneous epitope and transcriptome measurement in single cells". Nat Methods. 14 (9): 865–868. doi:10.1038/nmeth.4380. PMC   5669064 . PMID   28759029.
  53. Atta, Lyla; Fan, Jean (2021). "Computational challenges and opportunities in spatially resolved transcriptomic data analysis". Nat Commun. 12 (1): 5283. Bibcode:2021NatCo..12.5283A. doi:10.1038/s41467-021-25557-9. PMC   8421472 . PMID   34489425.
  54. Andersson, Alma; Bergenstråhle, Joseph; Asp, Michaela; Jurek, Aleksandra; Fernández Navarro, José; Lundeberg, Joakim (2020). "Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography". Commun Biol. 3 (1): 565. doi:10.1038/s42003-020-01247-y. PMC   7547664 . PMID   33037292.
  55. Ma, Ying; Zhou, Xiang (2022). "Spatially informed cell-type deconvolution for spatial transcriptomics". Nat Biotechnol. 40 (9): 1349–1359. doi:10.1038/s41587-022-01273-7. PMC   9464662 . PMID   35501392.