Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. [1] A reconstruction breaks down metabolic pathways (such as glycolysis and the citric acid cycle) into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.
In general, the process to build a reconstruction is as follows:
The related method of flux balance analysis seeks to mathematically simulate metabolism in genome-scale reconstructions of metabolic networks.
A metabolic reconstruction provides a highly mathematical, structured platform on which to understand the systems biology of metabolic pathways within an organism. [2] The integration of biochemical metabolic pathways with rapidly available, annotated genome sequences has developed what are called genome-scale metabolic models. Simply put, these models correlate metabolic genes with metabolic pathways. In general, the more information about physiology, biochemistry and genetics is available for the target organism, the better the predictive capacity of the reconstructed models. Mechanically speaking, the process of reconstructing prokaryotic and eukaryotic metabolic networks is essentially the same. Having said this, eukaryote reconstructions are typically more challenging because of the size of genomes, coverage of knowledge, and the multitude of cellular compartments. [2] The first genome-scale metabolic model was generated in 1995 for Haemophilus influenzae. [3] The first multicellular organism, C. elegans, was reconstructed in 1998. [4] Since then, many reconstructions have been formed. For a list of reconstructions that have been converted into a model and experimentally validated, see http://sbrg.ucsd.edu/InSilicoOrganisms/OtherOrganisms.
Organism | Genes in Genome | Genes in Model | Reactions | Metabolites | Date of reconstruction | Reference |
---|---|---|---|---|---|---|
Haemophilus influenzae | 1,775 | 296 | 488 | 343 | June 1999 | [3] |
Escherichia coli | 4,405 | 660 | 627 | 438 | May 2000 | [5] |
Saccharomyces cerevisiae | 6,183 | 708 | 1,175 | 584 | February 2003 | [6] |
Mus musculus | 28,287 | 473 | 1220 | 872 | January 2005 | [7] |
Homo sapiens | 21,090 [8] | 3,623 | 3,673 | -- | January 2007 | [9] |
Mycobacterium tuberculosis | 4,402 | 661 | 939 | 828 | June 2007 | [10] |
Bacillus subtilis | 4,114 | 844 | 1,020 | 988 | September 2007 | [11] |
Synechocystis sp. PCC6803 | 3,221 | 633 | 831 | 704 | October 2008 | [12] |
Salmonella typhimurium | 4,489 | 1,083 | 1,087 | 774 | April 2009 | [13] |
Arabidopsis thaliana | 27,379 | 1,419 | 1,567 | 1,748 | February 2010 | [14] |
Because the timescale for the development of reconstructions is so recent, most reconstructions have been built manually. However, now, there are quite a few resources that allow for the semi-automatic assembly of these reconstructions that are utilized due to the time and effort necessary for a reconstruction. An initial fast reconstruction can be developed automatically using resources like PathoLogic or ERGO in combination with encyclopedias like MetaCyc, and then manually updated by using resources like PathwayTools. These semi-automatic methods allow for a fast draft to be created while allowing the fine tune adjustments required once new experimental data is found. It is only in this manner that the field of metabolic reconstructions will keep up with the ever-increasing numbers of annotated genomes.
Database | Scope | ||||
---|---|---|---|---|---|
Enzymes | Genes | Reactions | Pathways | Metabolites | |
KEGG | X | X | X | X | X |
BioCyc | X | X | X | X | X |
MetaCyc | X | X | X | X | |
ENZYME | X | X | X | ||
BRENDA | X | X | X | ||
BiGG | X | X | X |
A reconstruction is built by compiling data from the resources above. Database tools such as KEGG and BioCyc can be used in conjunction with each other to find all the metabolic genes in the organism of interest. These genes will be compared to closely related organisms that have already developed reconstructions to find homologous genes and reactions. These homologous genes and reactions are carried over from the known reconstructions to form the draft reconstruction of the organism of interest. Tools such as ERGO, Pathway Tools and Model SEED can compile data into pathways to form a network of metabolic and non-metabolic pathways. These networks are then verified and refined before being made into a mathematical simulation. [2]
The predictive aspect of a metabolic reconstruction hinges on the ability to predict the biochemical reaction catalyzed by a protein using that protein's amino acid sequence as an input, and to infer the structure of a metabolic network based on the predicted set of reactions. A network of enzymes and metabolites is drafted to relate sequences and function. When an uncharacterized protein is found in the genome, its amino acid sequence is first compared to those of previously characterized proteins to search for homology. When a homologous protein is found, the proteins are considered to have a common ancestor and their functions are inferred as being similar. However, the quality of a reconstruction model is dependent on its ability to accurately infer phenotype directly from sequence, so this rough estimation of protein function will not be sufficient. A number of algorithms and bioinformatics resources have been developed for refinement of sequence homology-based assignments of protein functions:
Once proteins have been established, more information about the enzyme structure, reactions catalyzed, substrates and products, mechanisms, and more can be acquired from databases such as KEGG, MetaCyc and NC-IUBMB. Accurate metabolic reconstructions require additional information about the reversibility and preferred physiological direction of an enzyme-catalyzed reaction which can come from databases such as BRENDA or MetaCyc database. [24]
An initial metabolic reconstruction of a genome is typically far from perfect due to the high variability and diversity of microorganisms. Often, metabolic pathway databases such as KEGG and MetaCyc will have "holes", meaning that there is a conversion from a substrate to a product (i.e., an enzymatic activity) for which there is no known protein in the genome that encodes the enzyme that facilitates the catalysis. What can also happen in semi-automatically drafted reconstructions is that some pathways are falsely predicted and don't actually occur in the predicted manner. [24] Because of this, a systematic verification is made in order to make sure no inconsistencies are present and that all the entries listed are correct and accurate. [1] Furthermore, previous literature can be researched in order to support any information obtained from one of the many metabolic reaction and genome databases. This provides an added level of assurance for the reconstruction that the enzyme and the reaction it catalyzes do actually occur in the organism.
Enzyme promiscuity and spontaneous chemical reactions can damage metabolites. This metabolite damage, and its repair or pre-emption, create energy costs that need to be incorporated into models. It is likely that many genes of unknown function encode proteins that repair or pre-empt metabolite damage, but most genome-scale metabolic reconstructions only include a fraction of all genes. [25] [26]
Any new reaction not present in the databases needs to be added to the reconstruction. This is an iterative process that cycles between the experimental phase and the coding phase. As new information is found about the target organism, the model will be adjusted to predict the metabolic and phenotypical output of the cell. The presence or absence of certain reactions of the metabolism will affect the amount of reactants/products that are present for other reactions within the particular pathway. This is because products in one reaction go on to become the reactants for another reaction, i.e. products of one reaction can combine with other proteins or compounds to form new proteins/compounds in the presence of different enzymes or catalysts. [1]
Francke et al. [1] provide an excellent example as to why the verification step of the project needs to be performed in significant detail. During a metabolic network reconstruction of Lactobacillus plantarum , the model showed that succinyl-CoA was one of the reactants for a reaction that was a part of the biosynthesis of methionine. However, an understanding of the physiology of the organism would have revealed that due to an incomplete tricarboxylic acid pathway, Lactobacillus plantarum does not actually produce succinyl-CoA, and the correct reactant for that part of the reaction was acetyl-CoA.
Therefore, systematic verification of the initial reconstruction will bring to light several inconsistencies that can adversely affect the final interpretation of the reconstruction, which is to accurately comprehend the molecular mechanisms of the organism. Furthermore, the simulation step also ensures that all the reactions present in the reconstruction are properly balanced. To sum up, a reconstruction that is fully accurate can lead to greater insight about understanding the functioning of the organism of interest. [1]
A metabolic network can be broken down into a stoichiometric matrix where the rows represent the compounds of the reactions, while the columns of the matrix correspond to the reactions themselves. Stoichiometry is a quantitative relationship between substrates of a chemical reaction. In order to deduce what the metabolic network suggests, recent research has centered on a few approaches, such as extreme pathways, elementary mode analysis, [27] flux balance analysis, and a number of other constraint-based modeling methods. [28] [29]
Price, Reed, and Papin, [30] from the Palsson lab, use a method of singular value decomposition (SVD) of extreme pathways in order to understand regulation of a human red blood cell metabolism. Extreme pathways are convex basis vectors that consist of steady state functions of a metabolic network. [31] For any particular metabolic network, there is always a unique set of extreme pathways available. [27] Furthermore, Price, Reed, and Papin, [30] define a constraint-based approach, where through the help of constraints like mass balance and maximum reaction rates, it is possible to develop a ‘solution space’ where all the feasible options fall within. Then, using a kinetic model approach, a single solution that falls within the extreme pathway solution space can be determined. [30] Therefore, in their study, Price, Reed, and Papin, [30] use both constraint and kinetic approaches to understand the human red blood cell metabolism. In conclusion, using extreme pathways, the regulatory mechanisms of a metabolic network can be studied in further detail.
Elementary mode analysis closely matches the approach used by extreme pathways. Similar to extreme pathways, there is always a unique set of elementary modes available for a particular metabolic network. [27] These are the smallest sub-networks that allow a metabolic reconstruction network to function in steady state. [32] [33] [34] According to Stelling (2002), [33] elementary modes can be used to understand cellular objectives for the overall metabolic network. Furthermore, elementary mode analysis takes into account stoichiometrics and thermodynamics when evaluating whether a particular metabolic route or network is feasible and likely for a set of proteins/enzymes. [32]
In 2009, Larhlimi and Bockmayr presented a new approach called "minimal metabolic behaviors" for the analysis of metabolic networks. [35] Like elementary modes or extreme pathways, these are uniquely determined by the network, and yield a complete description of the flux cone. However, the new description is much more compact. In contrast with elementary modes and extreme pathways, which use an inner description based on generating vectors of the flux cone, MMBs are using an outer description of the flux cone. This approach is based on sets of non-negativity constraints. These can be identified with irreversible reactions, and thus have a direct biochemical interpretation. One can characterize a metabolic network by MMBs and the reversible metabolic space.
A different technique to simulate the metabolic network is to perform flux balance analysis. This method uses linear programming, but in contrast to elementary mode analysis and extreme pathways, only a single solution results in the end. Linear programming is usually used to obtain the maximum potential of the objective function that you are looking at, and therefore, when using flux balance analysis, a single solution is found to the optimization problem. [33] In a flux balance analysis approach, exchange fluxes are assigned to those metabolites that enter or leave the particular network only. Those metabolites that are consumed within the network are not assigned any exchange flux value. Also, the exchange fluxes along with the enzymes can have constraints ranging from a negative to positive value (ex: -10 to 10).
Furthermore, this particular approach can accurately define if the reaction stoichiometry is in line with predictions by providing fluxes for the balanced reactions. Also, flux balance analysis can highlight the most effective and efficient pathway through the network in order to achieve a particular objective function. In addition, gene knockout studies can be performed using flux balance analysis. The enzyme that correlates to the gene that needs to be removed is given a constraint value of 0. Then, the reaction that the particular enzyme catalyzes is completely removed from the analysis.
In order to perform a dynamic simulation with such a network it is necessary to construct an ordinary differential equation system that describes the rates of change in each metabolite's concentration or amount. To this end, a rate law, i.e., a kinetic equation that determines the rate of reaction based on the concentrations of all reactants is required for each reaction. Software packages that include numerical integrators, such as COPASI or SBMLsimulator, are then able to simulate the system dynamics given an initial condition. Often these rate laws contain kinetic parameters with uncertain values. In many cases it is desired to estimate these parameter values with respect to given time-series data of metabolite concentrations. The system is then supposed to reproduce the given data. For this purpose the distance between the given data set and the result of the simulation, i.e., the numerically or in few cases analytically obtained solution of the differential equation system is computed. The values of the parameters are then estimated to minimize this distance. [36] One step further, it may be desired to estimate the mathematical structure of the differential equation system because the real rate laws are not known for the reactions within the system under study. To this end, the program SBMLsqueezer allows automatic creation of appropriate rate laws for all reactions with the network. [37]
Synthetic accessibility is a simple approach to network simulation whose goal is to predict which metabolic gene knockouts are lethal. The synthetic accessibility approach uses the topology of the metabolic network to calculate the sum of the minimum number of steps needed to traverse the metabolic network graph from the inputs, those metabolites available to the organism from the environment, to the outputs, metabolites needed by the organism to survive. To simulate a gene knockout, the reactions enabled by the gene are removed from the network and the synthetic accessibility metric is recalculated. An increase in the total number of steps is predicted to cause lethality. Wunderlich and Mirny showed this simple, parameter-free approach predicted knockout lethality in E. coli and S. cerevisiae as well as elementary mode analysis and flux balance analysis in a variety of media. [38]
Reconstructions and their corresponding models allow the formulation of hypotheses about the presence of certain enzymatic activities and the production of metabolites that can be experimentally tested, complementing the primarily discovery-based approach of traditional microbial biochemistry with hypothesis-driven research. [41] The results these experiments can uncover novel pathways and metabolic activities and decipher between discrepancies in previous experimental data. Information about the chemical reactions of metabolism and the genetic background of various metabolic properties (sequence to structure to function) can be utilized by genetic engineers to modify organisms to produce high value outputs whether those products be medically relevant like pharmaceuticals; high value chemical intermediates such as terpenoids and isoprenoids; or biotechnological outputs like biofuels, [42] or polyhydroxybutyrates also known as bioplastics. [43]
Metabolic network reconstructions and models are used to understand how an organism or parasite functions inside of the host cell. For example, if the parasite serves to compromise the immune system by lysing macrophages, then the goal of metabolic reconstruction/simulation would be to determine the metabolites that are essential to the organism's proliferation inside of macrophages. If the proliferation cycle is inhibited, then the parasite would not continue to evade the host's immune system. A reconstruction model serves as a first step to deciphering the complicated mechanisms surrounding disease. These models can also look at the minimal genes necessary for a cell to maintain virulence. The next step would be to use the predictions and postulates generated from a reconstruction model and apply it to discover novel biological functions such as drug-engineering and drug delivery techniques.
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Metabolism is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the conversion of food to building blocks of proteins, lipids, nucleic acids, and some carbohydrates; and the elimination of metabolic wastes. These enzyme-catalyzed reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments. The word metabolism can also refer to the sum of all chemical reactions that occur in living organisms, including digestion and the transportation of substances into and between different cells, in which case the above described set of reactions within the cells is called intermediary metabolism.
In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites, which are modified by a sequence of chemical reactions catalyzed by enzymes. In most cases of a metabolic pathway, the product of one enzyme acts as the substrate for the next. However, side products are considered waste and removed from the cell. These enzymes often require dietary minerals, vitamins, and other cofactors to function.
Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.
The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The small molecule chemicals found in a given metabolome may include both endogenous metabolites that are naturally produced by an organism as well as exogenous chemicals that are not naturally produced by an organism.
Modelling biological systems is a significant task of systems biology and mathematical biology. Computational systems biology aims to develop and use efficient algorithms, data structures, visualization and communication tools with the goal of computer modelling of biological systems. It involves the use of computer simulations of biological systems, including cellular subsystems, to both analyze and visualize the complex connections of these cellular processes.
Metabolic engineering is the practice of optimizing genetic and regulatory processes within cells to increase the cell's production of a certain substance. These processes are chemical networks that use a series of biochemical reactions and enzymes that allow cells to convert raw materials into molecules necessary for the cell's survival. Metabolic engineering specifically seeks to mathematically model these networks, calculate a yield of useful products, and pin point parts of the network that constrain the production of these products. Genetic engineering techniques can then be used to modify the network in order to relieve these constraints. Once again this modified network can be modeled to calculate the new product yield.
A biochemical cascade, also known as a signaling cascade or signaling pathway, is a series of chemical reactions that occur within a biological cell when initiated by a stimulus. This stimulus, known as a first messenger, acts on a receptor that is transduced to the cell interior through second messengers which amplify the signal and transfer it to effector molecules, causing the cell to respond to the initial stimulus. Most biochemical cascades are series of events, in which one event triggers the next, in a linear fashion. At each step of the signaling cascade, various controlling factors are involved to regulate cellular actions, in order to respond effectively to cues about their changing internal and external environments.
A metabolic network is the complete set of metabolic and physical processes that determine the physiological and biochemical properties of a cell. As such, these networks comprise the chemical reactions of metabolism, the metabolic pathways, as well as the regulatory interactions that guide these reactions.
KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.
Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steady-state metabolic fluxes for large models in a few seconds on modern personal computers. The related method of metabolic pathway analysis seeks to find and list all possible pathways between metabolites.
Reactome is a free online database of biological pathways. There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, the following description concentrates on the human Reactome. It is authored by biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.
Flux, or metabolic flux is the rate of turnover of molecules through a metabolic pathway. Flux is regulated by the enzymes involved in a pathway. Within cells, regulation of flux is vital for all metabolic pathways to regulate the pathway's activity under different conditions. Flux is therefore of great interest in metabolic network modelling, where it is analysed via flux balance analysis and metabolic control analysis.
The BioCyc database collection is an assortment of organism specific Pathway/Genome Databases (PGDBs) that provide reference to genome and metabolic pathway information for thousands of organisms. As of July 2023, there were over 20,040 databases within BioCyc. SRI International, based in Menlo Park, California, maintains the BioCyc database family.
Metabolic flux analysis (MFA) is an experimental fluxomics technique used to examine production and consumption rates of metabolites in a biological system. At an intracellular level, it allows for the quantification of metabolic fluxes, thereby elucidating the central metabolism of the cell. Various methods of MFA, including isotopically stationary metabolic flux analysis, isotopically non-stationary metabolic flux analysis, and thermodynamics-based metabolic flux analysis, can be coupled with stoichiometric models of metabolism and mass spectrometry methods with isotopic mass resolution to elucidate the transfer of moieties containing isotopic tracers from one metabolite into another and derive information about the metabolic network. Metabolic flux analysis (MFA) has many applications such as determining the limits on the ability of a biological system to produce a biochemical such as ethanol, predicting the response to gene knockout, and guiding the identification of bottleneck enzymes in metabolic networks for metabolic engineering efforts.
Fluxomics describes the various approaches that seek to determine the rates of metabolic reactions within a biological entity. While metabolomics can provide instantaneous information on the metabolites in a biological sample, metabolism is a dynamic process. The significance of fluxomics is that metabolic fluxes determine the cellular phenotype. It has the added advantage of being based on the metabolome which has fewer components than the genome or proteome.
Metabolomic Pathway Analysis, shortened to MetPA, is a freely available, user-friendly web server to assist with the identification analysis and visualization of metabolic pathways using metabolomic data. MetPA makes use of advances originally developed for pathway analysis in microarray experiments and applies those principles and concepts to the analysis of metabolic pathways. For input, MetPA expects either a list of compound names or a metabolite concentration table with phenotypic labels. The list of compounds can include common names, HMDB IDs or KEGG IDs with one compound per row. Compound concentration tables must have samples in rows and compounds in columns. MetPA's output is a series of tables indicating which pathways are significantly enriched as well as a variety of graphs or pathway maps illustrating where and how certain pathways were enriched. MetPA's graphical output uses a colorful Google-Maps visualization system that allows simple, intuitive data exploration that lets users employ a computer mouse or track pad to select, drag and place images and to seamlessly zoom in and out. Users can explore MetPA's output using three different views or levels: 1) a metabolome view; 2) a pathway view; 3) a compound view.
Metabolite damage can occur through enzyme promiscuity or spontaneous chemical reactions. Many metabolites are chemically reactive and unstable and can react with other cell components or undergo unwanted modifications. Enzymatically or chemically damaged metabolites are always useless and often toxic. To prevent toxicity that can occur from the accumulation of damaged metabolites, organisms have damage-control systems that:
Minoru Kanehisa is a Japanese bioinformatician. He is a project professor at Kyoto University, technical director of Pathway Solutions Inc and president of NPO Bioinformatics Japan. He is one of Japan's most recognized and respected bioinformatics experts and is known for developing the KEGG bioinformatics database.
{{cite book}}
: |journal=
ignored (help); Missing or empty |title=
(help){{cite book}}
: |journal=
ignored (help)