The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. [1] The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The small molecule chemicals found in a given metabolome may include both endogenous metabolites that are naturally produced by an organism (such as amino acids, organic acids, nucleic acids, fatty acids, amines, sugars, vitamins, co-factors, pigments, antibiotics, etc.) as well as exogenous chemicals (such as drugs, environmental contaminants, food additives, toxins and other xenobiotics) that are not naturally produced by an organism. [2] [3]
In other words, there is both an endogenous metabolome and an exogenous metabolome. The endogenous metabolome can be further subdivided to include a "primary" and a "secondary" metabolome (particularly when referring to plant or microbial metabolomes). A primary metabolite is directly involved in the normal growth, development, and reproduction. A secondary metabolite is not directly involved in those processes, but usually has important ecological function. Secondary metabolites may include pigments, antibiotics or waste products derived from partially metabolized xenobiotics. The study of the metabolome is called metabolomics.
The word metabolome appears to be a blending of the words "metabolite" and "chromosome". It was constructed to imply that metabolites are indirectly encoded by genes or act on genes and gene products. The term "metabolome" was first used in 1998 [1] [4] and was likely coined to match with existing biological terms referring to the complete set of genes (the genome), the complete set of proteins (the proteome) and the complete set of transcripts (the transcriptome). The first book on metabolomics was published in 2003. [5] The first journal dedicated to metabolomics (titled simply "Metabolomics") was launched in 2005 and is currently edited by Prof. Roy Goodacre. Some of the more significant early papers on metabolome analysis are listed in the references below. [6] [7] [8] [9]
The metabolome reflects the interaction between an organism's genome and its environment. As a result, an organism's metabolome can serve as an excellent probe of its phenotype (i.e. the product of its genotype and its environment). Metabolites can be measured (identified, quantified or classified) using a number of different technologies including NMR spectroscopy and mass spectrometry. [10] Most mass spectrometry (MS) methods must be coupled to various forms of liquid chromatography (LC), gas chromatography (GC) or capillary electrophoresis (CE) to facilitate compound separation. Each method is typically able to identify or characterize 50-5,000 different metabolites or metabolite "features" at a time, depending on the instrument or protocol being used. Currently it is not possible to analyze the entire range of metabolites by a single analytical method.
Nuclear magnetic resonance (NMR) spectroscopy is an analytical chemistry technique that measures the absorption of radiofrequency radiation of specific nuclei when molecules containing those nuclei are placed in strong magnetic fields. The frequency (i.e. the chemical shift) at which a given atom or nucleus absorbs is highly dependent on the chemical environment (bonding, chemical structure nearest neighbours, solvent) of that atom in a given molecule. The NMR absorption patterns produce "resonance" peaks at different frequencies or different chemical shifts – this collection of peaks is called an NMR spectrum. Because each chemical compound has a different chemical structure, each compound will have a unique (or almost unique) NMR spectrum. As a result, NMR is particularly useful for the characterization, identification and quantification of small molecules, such as metabolites. The widespread use of NMR for "classical" metabolic studies, along with its exceptional capacity to handle complex metabolite mixtures is likely the reason why NMR was one of the first technologies to be widely adopted for routine metabolome measurements. As an analytical technique, NMR is non-destructive, non-biased, easily quantifiable, requires little or no separation, permits the identification of novel compounds and it needs no chemical derivatization. NMR is particularly amenable to detecting compounds that are less tractable to LC-MS analysis, such as sugars, amines or volatile liquids or GC-MS analysis, such as large molecules (>500 Da) or relatively non-reactive compounds. NMR is not a very sensitive technique with a lower limit of detection of about 5 μM. Typically 50-150 compounds can be identified by NMR-based metabolomic studies.
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of molecules. Molecules or molecular fragments are typically charged or ionized by spraying them through a charged field (electrospray ionization), bombarding them with electrons from a hot filament (electron ionization) or blasting them with a laser when they are placed on specially coated plates (matrix assisted laser desorption ionization). The charged molecules are then propelled through space using electrodes or magnets and their speed, rate of curvature, or other physical characteristics are measured to determine their mass-to-charge ratio. From these data the mass of the parent molecule can be determined. Further fragmentation of the molecule through controlled collisions with gas molecules or with electrons can help determine the structure of molecules. Very accurate mass measurements can also be used to determine the elemental formulas or elemental composition of compounds. Most forms of mass spectrometry require some form of separation using liquid chromatography or gas chromatography. This separation step is required to simplify the resulting mass spectra and to permit more accurate compound identification. Some mass spectrometry methods also require that the molecules be derivatized or chemically modified so that they are more amenable for chromatographic separation (this is particularly true for GC-MS). As an analytical technique, MS is a very sensitive method that requires very little sample (<1 ng of material or <10 μL of a biofluid) and can generate signals for thousands of metabolites from a single sample. MS instruments can also be configured for very high throughput metabolome analyses (hundreds to thousands of samples a day). Quantification of metabolites and the characterization of novel compound structures is more difficult by MS than by NMR. LC-MS is particularly amenable to detecting hydrophobic molecules (lipids, fatty acids) and peptides while GC-MS is best for detecting small molecules (<500 Da) and highly volatile compounds (esters, amines, ketones, alkanes, thiols).
Unlike the genome or even the proteome, the metabolome is a highly dynamic entity that can change dramatically, over a period of just seconds or minutes. As a result, there is growing interest in measuring metabolites over multiple time periods or over short time intervals using modified versions of NMR or MS-based metabolomics.
Because an organism's metabolome is largely defined by its genome, different species will have different metabolomes. Indeed, the fact that the metabolome of a tomato is different from the metabolome of an apple is the reason why these two fruits taste so different. Furthermore, different tissues, different organs and biofluids associated with those organs and tissues can also have distinctly different metabolomes. The fact that different organisms and different tissues/biofluids have such different metabolomes has led to the development of a number of organism-specific and biofluid-specific metabolome databases. Some of the better known metabolome databases include the Human Metabolome Database or HMDB, [11] the Yeast Metabolome Database or YMDB, [12] the E. coli Metabolome Database or ECMDB, [13] the Arabidopsis metabolome database or AraCyc [14] as well as the Urine Metabolome Database, [15] the Cerebrospinal Fluid (CSF) Metabolome Database [16] and the Serum Metabolome Database. [17] The latter three databases are specific to human biofluids. A number of very popular general metabolite databases also exist including KEGG, [18] MetaboLights, [19] the Golm Metabolome Database, [20] MetaCyc, [21] LipidMaps [22] and Metlin. [23] Metabolome databases can be distinguished from metabolite databases in that metabolite databases contain lightly annotated or synoptic metabolite data from multiple organisms while metabolome databases contain richly detailed and heavily referenced chemical, pathway, spectral and metabolite concentration data for specific organisms.
The Human Metabolome Database (HMDB) is a freely available, open-access database containing detailed data on more than 40,000 metabolites that have already been identified or are likely to be found in the human body. The HMDB contains three kinds of information:
The chemical data includes >40,000 metabolite structures with detailed descriptions, extensive chemical classifications, synthesis information and observed/calculated chemical properties. It also contains nearly 10,000 experimentally measured NMR, GC-MS and LC/MS spectra from more than 1,100 different metabolites. The clinical information includes data on >10,000 metabolite-biofluid concentrations, metabolite concentration information on more than 600 different human diseases and pathway data for more than 200 different inborn errors of metabolism. The biochemical information includes nearly 6,000 protein (and DNA) sequences and more than 5,000 biochemical reactions that are linked to these metabolite entries. The HMDB supports a wide variety of online queries including text searches, chemical structure searches, sequence similarity searches and spectral similarity searches. This makes it particularly useful for metabolomic researchers who are attempting to identify or understand metabolites in clinical metabolomic studies. The first version of the HMDB was released in Jan. 1 2007 and was compiled by scientists at the University of Alberta and the University of Calgary. At that time, they reported data on 2,500 metabolites, 1,200 drugs and 3,500 food components. Since then these scientists have greatly expanded the collection. The version 3.5 of the HMDB contains >16,000 endogenous metabolites, >1,500 drugs and >22,000 food constituents or food metabolites. [24]
Scientists at the University of Alberta have been systematically characterizing specific biofluid metabolomes including the serum metabolome, [17] the urine metabolome, [15] the cerebrospinal fluid (CSF) metabolome [16] and the saliva metabolome. These efforts have involved both experimental metabolomic analysis (involving NMR, GC-MS, ICP-MS, LC-MS and HPLC assays) as well as extensive literature mining. According to their data, the human serum metabolome contains at least 4,200 different compounds (including many lipids), the human urine metabolome contains at least 3,000 different compounds (including hundreds of volatiles and gut microbial metabolites), the human CSF metabolome contains nearly 500 different compounds while the human saliva metabolome contains approximately 400 different metabolites, including many bacterial products.
The Yeast Metabolome Database is a freely accessible, online database of >2,000 small molecule metabolites found in or produced by Saccharomyces cerevisiae (Baker's yeast). The YMDB contains two kinds of information:
The chemical information in YMDB includes 2,027 metabolite structures with detailed metabolite descriptions, extensive chemical classifications, synthesis information and observed/calculated chemical properties. It also contains nearly 4,000 NMR, GC-MS and LC/MS spectra obtained from more than 500 different metabolites. The biochemical information in YMDB includes >1,100 protein (and DNA) sequences and >900 biochemical reactions. The YMDB supports a wide variety of queries including text searches, chemical structure searches, sequence similarity searches and spectral similarity searches. This makes it particularly useful for metabolomic researchers who are studying yeast as a model organism or who are looking into optimizing the production of fermented beverages (wine, beer).
Secondary electrospray ionization-high resolution mass spectrometry SESI-HRMS is a non-invasive analytical technique that allows us to monitor the yeast metabolic activities. SESI-HRMS has found around 300 metabolites in the yeast fermentation process, this suggests that a large number of glucose metabolites are not reported in the literature. [25]
The E. Coli Metabolome Database is a freely accessible, online database of >2,700 small molecule metabolites found in or produced by Escherichia coli (E. coli strain K12, MG1655). The ECMDB contains two kinds of information:
The chemical information includes more than 2,700 metabolite structures with detailed metabolite descriptions, extensive chemical classifications, synthesis information and observed/calculated chemical properties. It also contains nearly 5,000 NMR, GC-MS and LC-MS spectra from more than 600 different metabolites. The biochemical information includes >1,600 protein (and DNA) sequences and >3,100 biochemical reactions that are linked to these metabolite entries. The ECMDB supports many different types of online queries including text searches, chemical structure searches, sequence similarity searches and spectral similarity searches. This makes it particularly useful for metabolomic researchers who are studying E. coli as a model organism.
Secondary electrospray ionization (SESI-MS) can discriminate between eleven E. Coli strains thanks to the volatile organic compound profiling. [26]
In 2021, the first brain metabolome atlas of the mouse brain – and of an animal (a mammal) across different life stages – was released online. The data differentiates by brain regions and the metabolic changes could be "mapped to existing gene and protein brain atlases". [27] [28]
Human intestinal microbiota contribute to the etiology of colorectal cancer via their metabolome. [29] In particular, the conversion of primary bile acids to secondary bile acids as a consequence of bacterial metabolism in the colon promotes carcinogenesis. [29]
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles. The metabolome represents the complete set of metabolites in a biological cell, tissue, organ, or organism, which are the end products of cellular processes. Messenger RNA (mRNA), gene expression data, and proteomic analyses reveal the set of gene products being produced in the cell, data that represents one aspect of cellular function. Conversely, metabolic profiling can give an instantaneous snapshot of the physiology of that cell, and thus, metabolomics provides a direct "functional readout of the physiological state" of an organism. There are indeed quantifiable correlations between the metabolome and the other cellular ensembles, which can be used to predict metabolite abundances in biological samples from, for example mRNA abundances. One of the ultimate challenges of systems biology is to integrate metabolomics with all other -omics information to provide a better understanding of cellular biology.
BRENDA is an information system representing one of the most comprehensive enzyme repositories. It is an electronic resource that comprises molecular and biochemical information on enzymes that have been classified by the IUBMB. Every classified enzyme is characterized with respect to its catalyzed biochemical reaction. Kinetic properties of the corresponding reactants are described in detail. BRENDA contains enzyme-specific data manually extracted from primary scientific literature and additional data derived from automatic information retrieval methods such as text mining. It provides a web-based user interface that allows a convenient and sophisticated access to the data.
The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data with comprehensive drug target information. DrugBank has used content from Wikipedia; Wikipedia also often links to Drugbank, posing potential circular reporting issues.
KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.
The Human Metabolome Database (HMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in the human body. It has been created by the Human Metabolome Project funded by Genome Canada and is one of the first dedicated metabolomics databases. The HMDB facilitates human metabolomics research, including the identification and characterization of human metabolites using NMR spectroscopy, GC-MS spectrometry and LC/MS spectrometry. To aid in this discovery process, the HMDB contains three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data (Fig. 1–3). The chemical data includes 41,514 metabolite structures with detailed descriptions along with nearly 10,000 NMR, GC-MS and LC/MS spectra.
The Toxin and Toxin-Target Database (T3DB), also known as the Toxic Exposome Database, is a freely accessible online database of common substances that are toxic to humans, along with their protein, DNA or organ targets. The database currently houses nearly 3,700 toxic compounds or poisons described by nearly 42,000 synonyms. This list includes various groups of toxins, including common pollutants, pesticides, drugs, food toxins, household and industrial/workplace toxins, cigarette toxins, and uremic toxins. These toxic substances are linked to 2,086 corresponding protein/DNA target records. In total there are 42,433 toxic substance-toxin target associations. Each toxic compound record (ToxCard) in T3DB contains nearly 100 data fields and holds information such as chemical properties and descriptors, mechanisms of action, toxicity or lethal dose values, molecular and cellular interactions, medical information, NMR an MS spectra, and up- and down-regulated genes. This information has been extracted from over 18,000 sources, which include other databases, government documents, books, and scientific literature.
The Small Molecule Pathway Database (SMPDB) is a comprehensive, high-quality, freely accessible, online database containing more than 600 small molecule (i.e. metabolic) pathways found in humans. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing colorful, detailed, fully searchable, hyperlinked diagrams of five types of small molecule pathways: 1) general human metabolic pathways; 2) human metabolic disease pathways; 3) human metabolite signaling pathways; 4) drug-action pathways and 5) drug metabolism pathways. SMPDB pathways may be navigated, viewed and zoomed interactively using a Google Maps-like interface. All SMPDB pathways include information on the relevant organs, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures (Fig. 1). Each small molecule in SMPDB is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. Additionally, all SMPDB pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition or processes depicted in each diagram. Users can browse the SMPDB (Fig. 2) or search its contents by text searching (Fig. 3), sequence searching, or chemical structure searching. More powerful queries are also possible including searching with lists of gene or protein names, drug names, metabolite names, GenBank IDs, Swiss-Prot IDs, Agilent or Affymetrix microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein concentration data can also be visualized through SMPDB's mapping interface.
MetaboAnalyst is a set of online tools for metabolomic data analysis and interpretation, created by members of the Wishart Research Group at the University of Alberta. It was first released in May 2009 and version 2.0 was released in January 2012. MetaboAnalyst provides a variety of analysis methods that have been tailored for metabolomic data. These methods include metabolomic data processing, normalization, multivariate statistical analysis, and data annotation. The current version is focused on biomarker discovery and classification.
Pharmacometabolomics, also known as pharmacometabonomics, is a field which stems from metabolomics, the quantification and analysis of metabolites produced by the body. It refers to the direct measurement of metabolites in an individual's bodily fluids, in order to predict or evaluate the metabolism of pharmaceutical compounds, and to better understand the pharmacokinetic profile of a drug. Alternatively, pharmacometabolomics can be applied to measure metabolite levels following the administration of a pharmaceutical compound, in order to monitor the effects of the compound on certain metabolic pathways(pharmacodynamics). This provides detailed mapping of drug effects on metabolism and the pathways that are implicated in mechanism of variation of response to treatment. In addition, the metabolic profile of an individual at baseline (metabotype) provides information about how individuals respond to treatment and highlights heterogeneity within a disease state. All three approaches require the quantification of metabolites found in bodily fluids and tissue, such as blood or urine, and can be used in the assessment of pharmaceutical treatment options for numerous disease states.
The Yeast Metabolome Database (YMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in or produced by Saccharomyces cerevisiae. The YMDB was designed to facilitate yeast metabolomics research, specifically in the areas of general fermentation as well as wine, beer and fermented food analysis. YMDB supports the identification and characterization of yeast metabolites using NMR spectroscopy, GC-MS spectrometry and Liquid chromatography–mass spectrometry. The YMDB contains two kinds of data: 1) chemical data and 2) molecular biology/biochemistry data. The chemical data includes 2027 metabolite structures with detailed metabolite descriptions along with nearly 4000 NMR, GC-MS and LC/MS spectra.
Metabolite Set Enrichment Analysis (MSEA) is a method designed to help metabolomics researchers identify and interpret patterns of metabolite concentration changes in a biologically meaningful way. It is conceptually similar to another widely used tool developed for transcriptomics called Gene Set Enrichment Analysis or GSEA. GSEA uses a collection of predefined gene sets to rank the lists of genes obtained from gene chip studies. By using this “prior knowledge” about gene sets researchers are able to readily identify significant and coordinated changes in gene expression data while at the same time gaining some biological context. MSEA does the same thing by using a collection of predefined metabolite pathways and disease states obtained from the Human Metabolome Database. MSEA is offered as a service both through a stand-alone web server and as part of a larger metabolomics analysis suite called MetaboAnalyst.
Metabolomic Pathway Analysis, shortened to MetPA, is a freely available, user-friendly web server to assist with the identification analysis and visualization of metabolic pathways using metabolomic data. MetPA makes use of advances originally developed for pathway analysis in microarray experiments and applies those principles and concepts to the analysis of metabolic pathways. For input, MetPA expects either a list of compound names or a metabolite concentration table with phenotypic labels. The list of compounds can include common names, HMDB IDs or KEGG IDs with one compound per row. Compound concentration tables must have samples in rows and compounds in columns. MetPA's output is a series of tables indicating which pathways are significantly enriched as well as a variety of graphs or pathway maps illustrating where and how certain pathways were enriched. MetPA's graphical output uses a colorful Google-Maps visualization system that allows simple, intuitive data exploration that lets users employ a computer mouse or track pad to select, drag and place images and to seamlessly zoom in and out. Users can explore MetPA's output using three different views or levels: 1) a metabolome view; 2) a pathway view; 3) a compound view.
FooDB is a freely available, open-access database containing chemical composition data on common, unprocessed foods. It also contains extensive data on flavour and aroma constituents, food additives as well as positive and negative health effects associated with food constituents. The database contains information on more than 28,000 chemicals found in more than 1000 raw or unprocessed food products. The data in FooDB was collected from many sources including textbooks, scientific journals, on-line food composition or nutrient databases, flavour and aroma databases and various on-line metabolomic databases. This literature-derived information has been combined with experimentally derived data measured on thousands of compounds from more than 40 very common food products through the Alberta Food Metabolome Project which is led by David S. Wishart. Users are able to browse through the FooDB data by food source, name, descriptors or function. Chemical structures and molecular weights for compounds in FooDB may be searched via a specialized chemical structure search utility. Users are able to view the content of FooDB using two different “Viewing” options: FoodView, which lists foods by their chemical compounds, or ChemView, which lists chemicals by their food sources. Knowledge about the precise chemical composition of foods can be used to guide public health policies, assist food companies with improved food labelling, help dieticians prepare better dietary plans, support nutraceutical companies with their submissions of health claims and guide consumer choices with regard to food purchases.
The CyberCell Database (CCDB) is a freely available, web-accessible database that provides quantitative genomic, proteomic as well metabolomic data on Escherichia coli. Escherichia coli is perhaps the best-studied bacterium on the planet and has been the organism of choice for several international efforts in cell simulation. These cell simulation efforts require up-to-date web-accessible resources that provide comprehensive, non-redundant, and quantitative data on this bacterium. The intent of CCDB is to facilitate the collection, revision, coordination and storage of the key information required for in silico E. coli simulation.
The E. coli Metabolome Database (ECMDB) is a freely accessible, online database of small molecule metabolites found in or produced by Escherichia coli. Escherichia coli is perhaps the best studied bacterium on earth and has served as the "model microbe" in microbiology research for more than 60 years. The ECMDB is essentially an E. coli "omics" encyclopedia containing detailed data on the genome, proteome and metabolome of E. coli. ECMDB is part of a suite of organism-specific metabolomics databases that includes DrugBank, HMDB, YMDB and SMPDB. As a metabolomics resource, the ECMDB is designed to facilitate research in the area gut/microbiome metabolomics and environmental metabolomics. The ECMDB contains two kinds of data: 1) chemical data and 2) molecular biology and/or biochemical data. The chemical data includes more than 2700 metabolite structures with detailed metabolite descriptions along with nearly 5000 NMR, GC-MS and LC-MS spectra corresponding to these metabolites. The biochemical data includes nearly 1600 protein sequences and more than 3100 biochemical reactions that are linked to these metabolite entries. Each metabolite entry in the ECMDB contains more than 80 data fields with approximately 65% of the information being devoted to chemical data and the other 35% of the information devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases. The ECMDB also has a variety of structure and pathway viewing applets. The ECMDB database offers a number of text, sequence, spectral, chemical structure and relational query searches. These are described in more detail below.
MetaboLights is a data repository founded in 2012 for cross-species and cross-platform metabolomic studies that provides primary research data and meta data for metabolomic studies as well as a knowledge base for properties of individual metabolites. The database is maintained by the European Bioinformatics Institute (EMBL-EBI) and the development is funded by Biotechnology and Biological Sciences Research Council (BBSRC). As of July 2018, the MetaboLights browse functionality consists of 383 studies, two analytical platforms, NMR spectroscopy and mass spectrometry.
Exometabolomics, also known as 'metabolic footprinting', is the study of extracellular metabolites and is a sub-field of metabolomics.
NAIL-MS is a technique based on mass spectrometry used for the investigation of nucleic acids and its modifications. It enables a variety of experiment designs to study the underlying mechanism of RNA biology in vivo. For example, the dynamic behaviour of nucleic acids in living cells, especially of RNA modifications, can be followed in more detail.
David S. Wishart is a Canadian researcher and a Distinguished University Professor in the Department of Biological Sciences and the Department of Computing Science at the University of Alberta. Wishart also holds cross appointments in the Faculty of Pharmacy and Pharmaceutical Sciences and the Department of Laboratory Medicine and Pathology in the Faculty of Medicine and Dentistry. Additionally, Wishart holds a joint appointment in metabolomics at the Pacific Northwest National Laboratory in Richland, Washington. Wishart is well known for his pioneering contributions to the fields of protein NMR spectroscopy, bioinformatics, cheminformatics and metabolomics. In 2011, Wishart founded the Metabolomics Innovation Centre (TMIC), which is Canada's national metabolomics laboratory.