Identifiability analysis

Last updated November 07, 2024

Identifiability analysis is a group of methods found in mathematical statistics that are used to determine how well the parameters of a model are estimated by the quantity and quality of experimental data.^[1] Therefore, these methods explore not only identifiability of a model, but also the relation of the model to particular experimental data or, more generally, the data collection process.

Introduction

Assuming a model is fit to experimental data, the goodness of fit does not reveal how reliable the parameter estimates are. The goodness of fit is also not sufficient to prove the model was chosen correctly. For example, if the experimental data is noisy or if there is an insufficient number of data points, it could be that the estimated parameter values could vary drastically without significantly influencing the goodness of fit. To address these issues the identifiability analysis could be applied as an important step to ensure correct choice of model, and sufficient amount of experimental data. The purpose of this analysis is either a quantified proof of correct model choice and integrality of experimental data acquired or such analysis can serve as an instrument for the detection of non-identifiable and sloppy parameters, helping planning the experiments and in building and improvement of the model at the early stages.

Structural and practical identifiability analysis

Structural identifiability analysis is a particular type of analysis in which the model structure itself is investigated for non-identifiability^[2]. Recognized non-identifiabilities may be removed analytically through substitution of the non-identifiable parameters with their combinations or by another way. The model overloading with number of independent parameters after its application to simulate finite experimental dataset may provide the good fit to experimental data by the price of making fitting results not sensible to the changes of parameters values, therefore leaving parameter values undetermined. Structural methods are also referred to as a priori, because non-identifiability analysis in this case could also be performed prior to the calculation of the fitting score functions, by exploring the number degrees of freedom (statistics) for the model and the number of independent experimental conditions to be varied.

Practical identifiability analysis can be performed by exploring the fit of existing model to experimental data. Once the fitting in any measure was obtained, parameter identifiability analysis can be performed either locally near a given point (usually near the parameter values provided the best model fit) or globally over the extended parameter space. The common example of the practical identifiability analysis is profile likelihood method^[3].

Notes

↑ Cobelli & DiStefano 1980.
↑ Anstett-Collin, F.; Denis-Vidal, L.; Millérioux, G. (2020-01-01). "A priori identifiability: An overview on definitions and approaches". Annual Reviews in Control. 50: 139–149. doi:10.1016/j.arcontrol.2020.10.006. ISSN 1367-5788.
↑ Wieland, Franz-Georg; Hauber, Adrian L.; Rosenblatt, Marcus; Tönsing, Christian; Timmer, Jens (2021-03-01). "On structural and practical identifiability". Current Opinion in Systems Biology. 25: 60–69. arXiv: 2102.05100 . doi:10.1016/j.coisb.2021.03.005. ISSN 2452-3100.

Related Research Articles

<span class="mw-page-title-main">Overfitting</span> Flaw in mathematical modelling

In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitted model is a mathematical model that contains more parameters than can be justified by the data. In a mathematical sense, these parameters represent the degree of a polynomial. The essence of overfitting is to have unknowingly extracted some of the residual variation as if that variation represented underlying model structure.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters.

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981. They used RANSAC to solve the Location Determination Problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

NONMEM is a non-linear mixed-effects modeling software package developed by Stuart L. Beal and Lewis B. Sheiner in the late 1970s at University of California, San Francisco, and expanded by Robert Bauer at Icon PLC. Its name is an acronym for NON-linear mixed effects modeling but it is especially powerful in the context of population pharmacokinetics, pharmacometrics, and PK/PD models. NONMEM models are written in NMTRAN, a dedicated model specification language that is translated into FORTRAN, compiled on the fly and executed by a command-line script. Results are presented as text output files including tables. There are multiple interfaces to assist modelers with housekeeping of files, tracking of model development, goodness-of-fit evaluations and graphical output, such as PsN and xpose and Wings for NONMEM. Current version for NONMEM is 7.5.

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social science research. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct. As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog (1969) and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. Examples of measured variables could be the physical height, weight, and pulse rate of a human being. Usually, researchers would have a large number of measured variables, which are assumed to be related to a smaller number of "unobserved" factors. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not to be confused with the closely related task of model selection, the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs.

PottersWheel is a MATLAB toolbox for mathematical modeling of time-dependent dynamical systems that can be expressed as chemical reaction networks or ordinary differential equations (ODEs). It allows the automatic calibration of model parameters by fitting the model to experimental measurements. CPU-intensive functions are written or – in case of model dependent functions – dynamically generated in C. Modeling can be done interactively using graphical user interfaces or based on MATLAB scripts using the PottersWheel function library. The software is intended to support the work of a mathematical modeler as a real potter's wheel eases the modeling of pottery.

<span class="mw-page-title-main">Conformational ensembles</span> Computational models of intrinsically-disordered proteins

In computational chemistry, conformational ensembles, also known as structural ensembles, are experimentally constrained computational models describing the structure of intrinsically unstructured proteins. Such proteins are flexible in nature, lacking a stable tertiary structure, and therefore cannot be described with a single structural representation. The techniques of ensemble calculation are relatively new on the field of structural biology, and are still facing certain limitations that need to be addressed before it will become comparable to classical structural description methods such as biological macromolecular crystallography.

Jeffrey Scott Tanaka was an American psychologist and statistician, known for his work in educational psychology, social psychology and various fields of statistics including structural equation modeling.

SAAM II, short for "Simulation Analysis and Modeling" version 2.0, is a renowned computer program designed for scientific research in the field of bioscience. It is a descriptive and exploratory tool in drug development, tracers, metabolic disorders, and pharmacokinetics/pharmacodynamics research. It is grounded in the principles of multi-compartment model theory, which is a widely-used approach for modeling complex biological systems. SAAM II facilitates the construction and simulation of models, providing researchers with a friendly user interface allowing the quick run and multi-fitting of simple and complex structures and data. SAAM II is used by many Pharma and Pharmacy Schools as a drug development, research, and educational tool.

In the area of system identification, a dynamical system is structurally identifiable if it is possible to infer its unknown parameters by measuring its output over time. This problem arises in many branch of applied mathematics, since dynamical systems are commonly utilized to model physical processes and these models contain unknown parameters that are typically estimated using experimental data.

References

Brun, Roland; Reichert, Peter; Künsch, Hans R. (2001). "Practical identifiability analysis of large environmental simulation models". Water Resources Research . 37 (4): 1015–1030. Bibcode:2001WRR....37.1015B. doi: 10.1029/2000WR900350 .
Cobelli, C.; DiStefano, J. (1980). "Parameter and structural identifiability concepts and ambiguities: a critical review and analysis". Am. J. Physiol. Regul. Integr. Comp. Physiol. 239 (239): 7–24. doi:10.1152/ajpregu.1980.239.1.R7. PMID 7396041.
Gutenkunst, Ryan N.; Waterfall, Joshua J.; Casey, Fergal P.; Brown, Kevin S.; Myers, Christopher R.; Sethna, James P. (2007). "Universally Sloppy Parameter Sensitivities in Systems Biology Models". PLOS Computational Biology . 3 (10): –189. arXiv: q-bio/0701039 . doi: 10.1371/journal.pcbi.0030189 . PMC 2000971 . PMID 17922568.
Lavielle, M.; Aarons, L. (2015), "What do we mean by identifiability in mixed effects models?", Journal of Pharmacokinetics and Pharmacodynamics, 43: 111–122; doi:10.1007/s10928-015-9459-4.
Myasnikova, E.; Samsonova, A.; Kozlov, K.; Samsonova, M.; Reinitz, J. (2001-01-01). "Registration of the expression patterns of Drosophila segmentation genes by two independent methods". Bioinformatics . 17 (1): 3–12. doi: 10.1093/bioinformatics/17.1.3 . PMID 11222257.
Raue, A.; Kreutz, C.; Maiwald, T.; Bachmann, J.; Schilling, M.; Klingmuller, U.; Timmer, J. (2009-08-01). "Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood". Bioinformatics . 25 (15): 1923–1929. doi: 10.1093/bioinformatics/btp358 . PMID 19505944.
Stanhope, S.; Rubin, J. E.; Swigon D. (2014), "Identifiability of linear and linear-in-parameters dynamical systems from a single trajectory", SIAM Journal on Applied Dynamical Systems, 13: 1792–1815; doi:10.1137/130937913.
Vandeginste, B.; Bates, D. M.; Watts, D. G. (1988). "Nonlinear regression analysis: Its applications". Journal of Chemometrics . 3 (3) (published 1989): 544–545. doi:10.1002/cem.1180030313. ISBN 0471-816434.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTECobelli_&_DiStefano1980-1] Cobelli & DiStefano 1980.

[2] Anstett-Collin, F.; Denis-Vidal, L.; Millérioux, G. (2020-01-01). "A priori identifiability: An overview on definitions and approaches". Annual Reviews in Control. 50: 139–149. doi:10.1016/j.arcontrol.2020.10.006. ISSN 1367-5788.

[3] Wieland, Franz-Georg; Hauber, Adrian L.; Rosenblatt, Marcus; Tönsing, Christian; Timmer, Jens (2021-03-01). "On structural and practical identifiability". Current Opinion in Systems Biology. 25: 60–69. arXiv: 2102.05100 . doi:10.1016/j.coisb.2021.03.005. ISSN 2452-3100.

[1]

[2]

[3]