You can help expand this article with text translated from the corresponding article in French. (August 2021)Click [show] for important translation instructions.
|
Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data sets are multiple "omes", such as the genome, proteome, transcriptome, epigenome, metabolome, and microbiome (i.e., a meta-genome and/or meta-transcriptome, depending upon how it is sequenced); [1] [2] [3] in other words, the use of multiple omics technologies to study life in a concerted way. By combining these "omes", scientists can analyze complex biological big data to find novel associations between biological entities, pinpoint relevant biomarkers and build elaborate markers of disease and physiology. In doing so, multiomics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association. [4] The OmicTools service lists more than 99 softwares related to multiomic data analysis, as well as more than 99 databases on the topic.
Systems biology approaches are often based upon the use of panomic analysis data. [5] [6] The American Society of Clinical Oncology (ASCO) defines panomics as referring to "the interaction of all biological functions within a cell and with other body functions, combining data collected by targeted tests ... and global assays (such as genome sequencing) with other patient-specific information." [7]
A branch of the field of multiomics is the analysis of multilevel single-cell data, called single-cell multiomics. [8] [9] This approach gives us an unprecedent resolution to look at multilevel transitions in health and disease at the single cell level. An advantage in relation to bulk analysis is to mitigate confounding factors derived from cell to cell variation, allowing the uncovering of heterogeneous tissue architectures. [8]
Methods for parallel single-cell genomic and transcriptomic analysis can be based on simultaneous amplification [10] or physical separation of RNA and genomic DNA. [11] They allow insights that cannot be gathered solely from transcriptomic analysis, as RNA data do not contain non-coding genomic regions and information regarding copy-number variation, for example. An extension of this methodology is the integration of single-cell transcriptomes to single-cell methylomes, combining single-cell bisulfite sequencing [12] [13] to single cell RNA-Seq. [14] Other techniques to query the epigenome, as single-cell ATAC-Seq [15] and single-cell Hi-C [16] also exist.
A different, but related, challenge is the integration of proteomic and transcriptomic data. [17] [18] One approach to perform such measurement is to physically separate single-cell lysates in two, processing half for RNA, and half for proteins. [17] The protein content of lysates can be measured by proximity extension assays (PEA), for example, which use DNA-barcoded antibodies. [19] A different approach uses a combination of heavy-metal RNA probes and protein antibodies to adapt mass cytometry for multiomic analysis. [18]
Related to Single-cell multiomics is the field of Spatial Omics which assays tissues through omics readouts that preserve the relative spatial orientation of the cells in the tissue. The number of Spatial Omics methods published still lags behind the number of methods published for Single-Cell multiomics, but the numbers are catching up (Single-cell and Spatial methods).
In parallel to the advances in high-throughput biology, machine learning applications to biomedical data analysis are flourishing. The integration of multi-omics data analysis and machine learning has led to the discovery of new biomarkers. [20] [21] [22] For example, one of the methods of the mixOmics project implements a method based on sparse Partial Least Squares regression for selection of features (putative biomarkers). [23] A unified and flexible statistical framewok for heterogeneous data integration called "Regularized Generalized Canonical Correlation Analysis" (RGCCA [24] [25] [26] [27] ) enables identifying such putative biomarkers. This framework is implemented and made freely avalaible within the RGCCA R package .
Multiomics currently holds a promise to fill gaps in the understanding of human health and disease, and many researchers are working on ways to generate and analyze disease-related data. [28] The applications range from understanding host-pathogen interactions and infectious diseases, [29] [30] cancer, [31] to understanding better chronic and complex non-communicable diseases [32] and improving personalized medicine. [33]
The second phase of the $170 million Human Microbiome Project was focused on integrating patient data to different omic datasets, considering host genetics, clinical information and microbiome composition. [34] [35] The phase one focused on characterization of communities in different body sites. Phase 2 focused in the integration of multiomic data from host & microbiome to human diseases. Specifically, the project used multiomics to improve the understanding of the interplay of gut and nasal microbiomes with type 2 diabetes, [36] gut microbiomes and inflammatory bowel disease [37] and vaginal microbiomes and pre-term birth. [38]
The complexity of interactions in the human immune system has prompted the generation of a wealth of immunology-related multi-scale omic data. [39] Multi-omic data analysis has been employed to gather novel insights about the immune response to infectious diseases, such as pediatric chikungunya, [40] as well as noncommunicable autoimmune diseases. [41] Integrative omics has also been employed strongly to understand effectiveness and side effects of vaccines, a field called systems vaccinology. [42] For example, multiomics was essential to uncover the association of changes in plasma metabolites and immune system transcriptome on response to vaccination against herpes zoster. [43]
The Bioconductor project curates a variety of R packages aimed at integrating omic data:
The RGCCA package implements a versatile framework for data integration. This package is freely available on the Comprehensive R Archive Network (CRAN).
The OmicTools [49] database further highlights R packages and othertools for multi omic data analysis:
A major limitation of classical omic studies is the isolation of only one level of biological complexity. For example, transcriptomic studies may provide information at the transcript level, but many different entities contribute to the biological state of the sample (genomic variants, post-translational modifications, metabolic products, interacting organisms, among others). With the advent of high-throughput biology, it is becoming increasingly affordable to make multiple measurements, allowing transdomain (e.g. RNA and protein levels) correlations and inferences. These correlations aid the construction or more complete biological networks, filling gaps in our knowledge.
Integration of data, however, is not an easy task. To facilitate the process, groups have curated database and pipelines to systematically explore multiomic data:
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can some times referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.
Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.
The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.
Fluxomics describes the various approaches that seek to determine the rates of metabolic reactions within a biological entity. While metabolomics can provide instantaneous information on the metabolites in a biological sample, metabolism is a dynamic process. The significance of fluxomics is that metabolic fluxes determine the cellular phenotype. It has the added advantage of being based on the metabolome which has fewer components than the genome or proteome.
The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.
RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.
Cancer systems biology encompasses the application of systems biology approaches to cancer research, in order to study the disease as a complex adaptive system with emerging properties at multiple biological scales. Cancer systems biology represents the application of systems biology approaches to the analysis of how the intracellular networks of normal cells are perturbed during carcinogenesis to develop effective predictive models that can assist scientists and clinicians in the validations of new therapies and drugs. Tumours are characterized by genomic and epigenetic instability that alters the functions of many different molecules and networks in a single cell as well as altering the interactions with the local environment. Cancer systems biology approaches, therefore, are based on the use of computational and mathematical methods to decipher the complexity in tumorigenesis as well as cancer heterogeneity.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.
Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Cellular deconvolution refers to computational techniques aiming at estimating the proportions of different cell types in samples collected from a tissue. For example, samples collected from the human brain are a mixture of various neuronal and glial cell types in different proportions, where each cell type has a diverse gene expression profile. Since most high-throughput technologies use bulk samples and measure the aggregated levels of molecular information for all cells in a sample, the measured values would be an aggregate of the values pertaining to the expression landscape of different cell types. Therefore, many downstream analyses such as differential gene expression might be confounded by the variations in cell type proportions when using the output of high-throughput technologies applied to bulk samples. The development of statistical methods to identify cell type proportions in large-scale bulk samples is an important step for better understanding of the relationship between cell type composition and diseases.
The Tohoku Medical Megabank Project is a national project in Japan, which started in 2012. The mission of the Tohoku Medical Megabank (TMM) project is to carry out a long-term health survey in the Miyagi and Iwate prefectures, which were affected by the Great East Japan Earthquake, and provide the research infrastructure for the development of personalized medicine by establishing a biobank and conducting cohort studies.
Deterministic Barcoding in Tissue for Spatial Omics Sequencing (DBiT-seq) was developed at Yale University by Rong Fan and colleagues in 2020 to create a multi-omics approach for studying spatial gene expression heterogenicity within a tissue sample. This method can be used for the co-mapping mRNA and protein levels at a near single-cell resolution in fresh or frozen formaldehyde-fixed tissue samples. DBiT-seq utilizes next generation sequencing (NGS) and microfluidics. This method allows for simultaneous spatial transcriptomic and proteomic analysis of a tissue sample. DBiT-seq improves upon previous spatial transcriptomics applications such as High-Definition Spatial Transcriptomics (HDST) and Slide-seq by increasing the number of detectable genes per pixel, increased cellular resolution, and ease of implementation.
Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin, scGET-seq is also capable of probing inactive heterochromatin.
Precision diagnostics is a branch of precision medicine that involves managing a patient's healthcare model and diagnosing specific diseases based on omics data analytics.
Single-cell multi-omics integration describes a suite of computational methods used to harmonize information from multiple "omes" to jointly analyze biological phenomena. This approach allows researchers to discover intricate relationships between different chemical-physical modalities by drawing associations across various molecular layers simultaneously. Multi-omics integration approaches can be categorized into four broad categories: Early integration, intermediate integration, late integration methods. Multi-omics integration can enhance experimental robustness by providing independent sources of evidence to address hypotheses, leveraging modality-specific strengths to compensate for another's weaknesses through imputation, and offering cell-type clustering and visualizations that are more aligned with reality
{{cite journal}}
: CS1 maint: multiple names: authors list (link)