PottersWheel

Last updated
PottersWheel
Developer(s) TIKANIS GmbH, Freiburg, Germany
Initial releaseOctober 6, 2006 (2006-10-06)
Stable release
4.1.1 / May 20, 2017;5 years ago (2017-05-20)
Written in MATLAB, C
Operating system Microsoft Windows, Mac OS X, Linux
Size 9  MB (250.000 lines)
Type Mathematical modeling
License Free trial license
Website www.potterswheel.de

PottersWheel is a MATLAB toolbox for mathematical modeling of time-dependent dynamical systems that can be expressed as chemical reaction networks or ordinary differential equations (ODEs). [1] It allows the automatic calibration of model parameters by fitting the model to experimental measurements. CPU-intensive functions are written or – in case of model dependent functions – dynamically generated in C. Modeling can be done interactively using graphical user interfaces or based on MATLAB scripts using the PottersWheel function library. The software is intended to support the work of a mathematical modeler as a real potter's wheel eases the modeling of pottery.

Contents

Seven modeling phases

The basic use of PottersWheel covers seven phases from model creation to the prediction of new experiments.

Model creation

Formalization-of-Jak-Stat-Pathway.png

The dynamical system is formalized into a set of reactions or differential equations using a visual model designer or a text editor. The model is stored as a MATLAB *.m ASCII file. Modifications can therefore be tracked using a version control system like subversion or git. Model import and export is supported for SBML. Custom import-templates may be used to import custom model structures. Rule-based modeling is also supported, where a pattern represents a set of automatically generated reactions.

Example for a simple model definition file for a reaction network A → B → C → A with observed species A and C:

functionm=getModel()% Starting with an empty modelm=pwGetEmtptyModel();% Adding reactionsm=pwAddR(m,'A','B');m=pwAddR(m,'B','C');m=pwAddR(m,'C','A');% Adding observablesm=pwAddY(m,'A');m=pwAddY(m,'C');end

Data import

External data saved in *.xls or *.txt files can be added to a model creating a model-data-couple. A mapping dialog allows to connect data column names to observed species names. Meta information in the data files comprise information about the experimental setting. Measurement errors are either stored in the data files, will be calculated using an error model, or are estimated automatically.

Parameter calibration

To fit a model to one or more data sets, the corresponding model-data-couples are combined into a fitting-assembly. Parameters like initial values, rate constants, and scaling factors can be fitted in an experiment-wise or global fashion. The user may select from several numerical integrators, optimization algorithms, and calibration strategies like fitting in normal or logarithmic parameter space.

Interpretation of the goodness-of-fit

Fitting-data-with-PottersWheel.png

The quality of a fit is characterized by its chi-squared value. As a rule of thumb, for N fitted data points and p calibrated parameters, the chi-squared value should have a similar value as N  p or at least N. Statistically, this is expressed using a chi-squared test resulting in a p-value above a significance threshold of e.g. 0.05. For lower p-values, the model is

Apart from further chi-squared based characteristics like AIC and BIC, data-model-residual analyses exist, e.g. to investigate whether the residuals follow a Gaussian distribution. Finally, parameter confidence intervals may be estimated using either the Fisher information matrix approximation or based on the profile-likelihood function, if parameters are not unambiguously identifiable.

If the fit is not acceptable, the model has to be refined and the procedure continues with step 2. Else, the dynamic model properties can be examined and predictions calculated.

Model refinement

If the model structure is not able to explain the experimental measurements, a set of physiologically reasonable alternative models should be created. In order to avoid redundant model paragraphs and copy-and-paste errors, this can be done using a common core-model which is the same for all variants. Then, daughter-models are created and fitted to the data, preferably using batch processing strategies based on MATLAB scripts. As a starting point to envision suitable model variants, the PottersWheel equalizer may be used to understand the dynamic behavior of the original system.

Model analysis and prediction

A mathematical model may serve to display the concentration time-profile of unobserved species, to determine sensitive parameters representing potential targets within a clinical setting, or to calculate model characteristics like the half-life of a species.

Each analysis step may be stored into a modeling report, which may be exported as a Latex-based PDF.

Experimental design

An experimental setting corresponds to specific characteristics of driving input functions and initial concentrations. In a signal transduction pathway model the concentration of a ligand like EGF may be controlled experimentally. The driving input designer allows investigating the effect of a continuous, ramp, or pulse stimulation in combination with varying initial concentrations using the equalizer. In order to discriminate competing model hypotheses, the designed experiment should have as different observable time-profiles as possible.

Parameter identifiability

Many dynamical systems can only be observed partially, i.e. not all system species are accessible experimentally. For biological applications the amount and quality of experimental data is often limited. In this setting parameters can be structurally or practically non-identifiable. Then, parameters may compensate each other and fitted parameter values strongly depend on initial guesses. In PottersWheel non-identifiability can be detected using the Profile Likelihood Approach. [2] For characterizing functional relationships between the non-identifiable parameters PottersWheel applies random and systematic fit sequences. [3]

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems by minimizing the sum of the squares of the residuals made in the results of each individual equation.

<span class="mw-page-title-main">Overfitting</span> Flaw in machine learning computer model

In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitted model is a mathematical model that contains more parameters than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variation as if that variation represented underlying model structure.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

In statistics, deviance is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (SSR) in ordinary least squares to cases where model-fitting is achieved by maximum likelihood. It plays an important role in exponential dispersion models and generalized linear models.

<span class="mw-page-title-main">Curve fitting</span> Process of constructing a curve that has the best fit to a series of data points

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data.

<span class="mw-page-title-main">Nonlinear regression</span> Regression analysis

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations.

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981. They used RANSAC to solve the Location Determination Problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral sciences. A definition of SEM is difficult without reference to highly technical language, but a good starting place is the name itself.

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

<span class="mw-page-title-main">Local regression</span> Moving average and polynomial regression method for smoothing data

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS and LOWESS, both pronounced. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

In statistics, the restrictedmaximum likelihood (REML) approach is a particular form of maximum likelihood estimation that does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect.

In statistics, a generalized linear mixed model (GLMM) is an extension to the generalized linear model (GLM) in which the linear predictor contains random effects in addition to the usual fixed effects. They also inherit from GLMs the idea of extending linear mixed models to non-normal data.

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

In statistics Wilks' theorem offers an asymptotic distribution of the log-likelihood ratio statistic, which can be used to produce confidence intervals for maximum-likelihood estimates or as a test statistic for performing the likelihood-ratio test.

Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the phenomenon in a certain interval.

Identifiability analysis is a group of methods found in mathematical statistics that are used to determine how well the parameters of a model are estimated by the quantity and quality of experimental data. Therefore, these methods explore not only identifiability of a model, but also the relation of the model to particular experimental data or, more generally, the data collection process.

References

  1. T. Maiwald and J. Timmer (2008) "Dynamical Modeling and Multi-Experiment Fitting with PottersWheel", Bioinformatics 24(18):2037–2043
  2. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmüller and J. Timmer, Bioinformatics 2009
  3. Data-based identifiability analysis of non-linear dynamical models, S. Hengl, C. Kreutz, J. Timmer and T. Maiwald, Bioinformatics 2007 23(19):2612–2618