This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Response modeling methodology (RMM) is a general platform for statistical modeling of a linear/nonlinear relationship between a response variable (dependent variable) and a linear predictor (a linear combination of predictors/effects/factors/independent variables), often denoted the linear predictor function. It is generally assumed that the modeled relationship is monotone convex (delivering monotone convex function) or monotone concave (delivering monotone concave function). However, many non-monotone functions, like the quadratic equation, are special cases of the general model.
RMM was initially developed as a series of extensions to the original inverse Box–Cox transformation: where y is a percentile of the modeled response, Y (the modeled random variable), z is the respective percentile of a normal variate and λ is the Box–Cox parameter. As λ goes to zero, the inverse Box–Cox transformation becomes: an exponential model. Therefore, the original inverse Box-Cox transformation contains a trio of models: linear (λ = 1), power (λ ≠ 1, λ ≠ 0) and exponential (λ = 0). This implies that on estimating λ, using sample data, the final model is not determined in advance (prior to estimation) but rather as a result of estimating. In other words, data alone determine the final model.
Extensions to the inverse Box–Cox transformation were developed by Shore (2001a [1] ) and were denoted Inverse Normalizing Transformations (INTs). They had been applied to model monotone convex relationships in various engineering areas, mostly to model physical properties of chemical compounds (Shore et al., 2001a, [1] and references therein). Once it had been realized that INT models may be perceived as special cases of a much broader general approach for modeling non-linear monotone convex relationships, the new Response Modeling Methodology had been initiated and developed (Shore, 2005a, [2] 2011 [3] and references therein).
The RMM model expresses the relationship between a response, Y (the modeled random variable), and two components that deliver variation to Y:
The basic RMM model describes Y in terms of the LP, two possibly correlated zero-mean normal errors, ε1 and ε2 (with correlation ρ and standard deviations σε1 and σε2, respectively) and a vector of parameters {α,λ,μ} (Shore, 2005a, [2] 2011 [3] ):
and ε1 represents uncertainty (measurement imprecision or otherwise) in the explanatory variables (included in the LP). This is in addition to uncertainty associated with the response (ε2). Expressing ε1 and ε2 in terms of standard normal variates, Z1 and Z2, respectively, having correlation ρ, and conditioning Z2 | Z1 = z1 (Z2 given that Z1 is equal to a given value z1), we may write in terms of a single error, ε:
where Z is a standard normal variate, independent of both Z1 and Z2, ε is a zero-mean error and d is a parameter. From these relationships, the associated RMM quantile function is (Shore, 2011 [3] ):
or, after re-parameterization:
where y is the percentile of the response (Y), z is the respective standard normal percentile, ε is the model's zero-mean normal error with constant variance, σ, {a,b,c,d} are parameters and MY is the response median (z = 0), dependent on values of the parameters and the value of the LP, η:
where μ (or m) is an additional parameter.
If it may be assumed that cz<<η, the above model for RMM quantile function can be approximated by:
The parameter “c” cannot be “absorbed” into the parameters of the LP (η) since “c” and LP are estimated in two separate stages (as expounded below).
If the response data used to estimate the model contain values that change sign, or if the lowest response value is far from zero (for example, when data are left-truncated), a location parameter, L, may be added to the response so that the expressions for the quantile function and for the median become, respectively:
As shown earlier, the inverse Box–Cox transformation depends on a single parameter, λ, which determines the final form of the model (whether linear, power or exponential). All three models thus constitute mere points on a continuous spectrum of monotonic convexity, spanned by λ. This property, where different known models become mere points on a continuous spectrum, spanned by the model's parameters, is denoted the Continuous Monotonic Convexity (CMC) property. The latter characterizes all RMM models, and it allows the basic “linear-power-exponential” cycle (underlying the inverse Box–Cox transformation) to be repeated ad infinitum, allowing for ever more convex models to be derived. Examples for such models are an exponential-power model or an exponential-exponential-power model (see explicit models expounded further on). Since the final form of the model is determined by the values of RMM parameters, this implies that the data, used to estimate the parameters, determine the final form of the estimated RMM model (as with the Box–Cox inverse transformation). The CMC property thus grant RMM models high flexibility in accommodating the data used to estimate the parameters. References given below display published results of comparisons between RMM models and existing models. These comparisons demonstrate the effectiveness of the CMC property.
Ignoring RMM errors (ignore the terms cz, dz, and e in the percentile model), we obtain the following RMM models, presented in an increasing order of monotone convexity:
Adding two new parameters by introducing for η (in the percentile model): , a new cycle of “linear-power-exponential” is iterated to produce models with stronger monotone convexity (Shore, 2005a, [2] 2011, [3] 2012 [4] ):
It is realized that this series of monotonic convex models, presented as they appear in a hierarchical order on the “Ladder of Monotonic Convex Functions” (Shore, 2011 [3] ), is unlimited from above. However, all models are mere points on a continuous spectrum, spanned by RMM parameters. Also note that numerous growth models, like the Gompertz function, are exact special cases of the RMM model.
The k-th non-central moment of Y is (assuming L = 0; Shore, 2005a, [2] 2011 [3] ):
Expanding Yk, as given on the right-hand-side, into a Taylor series around zero, in terms of powers of Z (the standard normal variate), and then taking expectation on both sides, assuming that cZ ≪ η so that η + cZ ≈ η, an approximate simple expression for the k-th non-central moment, based on the first six terms in the expansion, is:
An analogous expression may be derived without assuming cZ ≪ η. This would result in a more accurate (however lengthy and cumbersome) expression. Once cZ in the above expression is neglected, Y becomes a log-normal random variable (with parameters that depend on η).
RMM models may be used to model random variation (as a general platform for distribution fitting) or to model systematic variation (analogously to generalized linear models, GLM).
In the former case (no systematic variation, namely, η = constant), RMM Quantile function is fitted to known distributions. If the underlying distribution is unknown, the RMM quantile function is estimated using available sample data. Modeling random variation with RMM is addressed and demonstrated in Shore (2011 [3] and references therein).
In the latter case (modeling systematic variation), RMM models are estimated assuming that variation in the linear predictor (generated via variation in the regressor-variables) contribute to the overall variation of the modeled response variable (Y). This case is addressed and demonstrated in Shore (2005a, [2] 2012 [4] and relevant references therein). Estimation is conducted in two stages. First the median is estimated by minimizing the sum of absolute deviations (of fitted model from sample data points). In the second stage, the remaining two parameters (not estimated in the first stage, namely, {c,d}), are estimated. Three estimation approaches are presented in Shore (2012 [4] ): maximum likelihood, moment matching and nonlinear quantile regression.
AS of 2021, RMM literature addresses three areas:
(1) Developing INTs and later the RMM approach, with allied estimation methods;
(2) Exploring the properties of RMM and comparing RMM effectiveness to other current modelling approaches (for distribution fitting or for modelling systematic variation);
(3) Applications.
Shore (2003a [5] ) developed Inverse Normalizing Transformations (INTs) in the first years of the 21st century and has applied them to various engineering disciplines like statistical process control (Shore, 2000a, [1] b, [6] 2001a, [7] b, [8] 2002a [9] ) and chemical engineering (Shore at al., 2002 [10] ). Subsequently, as the new Response Modeling Methodology (RMM) had been emerging and developing into a full-fledged platform for modeling monotone convex relationships (ultimately presented in a book, Shore, 2005a [2] ), RMM properties were explored (Shore, 2002b, [11] 2004a, [12] b, [13] 2008a, [14] 2011 [3] ), estimation procedures developed (Shore, 2005a, [2] b, [15] 2012 [4] ) and the new modeling methodology compared to other approaches, for modeling random variation (Shore 2005c, [16] 2007, [17] 2010; [18] Shore and A’wad 2010 [19] ), and for modeling systematic variation (Shore, 2008b [20] ).
Concurrently, RMM had been applied to various scientific and engineering disciplines and compared to current models and modeling approaches practiced therein. For example, chemical engineering (Shore, 2003b; [21] Benson-Karhi et al., 2007; [22] Shacham et al., 2008; [23] Shore and Benson-Karhi, 2010 [24] ), statistical process control (Shore, 2014; [25] Shore et al., 2014; [26] Danoch and Shore, 2016 [27] ), reliability engineering (Shore, 2004c; [28] Ladany and Shore, 2007 [29] ), forecasting (Shore and Benson-Karhi, 2007 [30] ), ecology (Shore, 2014 [25] ), and the medical profession (Shore et al., 2014; [26] Benson-Karhi et al., 2017 [31] ).
In optics, a Gaussian beam is an idealized beam of electromagnetic radiation whose amplitude envelope in the transverse plane is given by a Gaussian function; this also implies a Gaussian intensity (irradiance) profile. This fundamental (or TEM00) transverse Gaussian mode describes the intended output of many lasers, as such a beam diverges less and can be focused better than any other. When a Gaussian beam is refocused by an ideal lens, a new Gaussian beam is produced. The electric and magnetic field amplitude profiles along a circular Gaussian beam of a given wavelength and polarization are determined by two parameters: the waistw0, which is a measure of the width of the beam at its narrowest point, and the position z relative to the waist.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.
In Hamiltonian mechanics, a canonical transformation is a change of canonical coordinates (q, p) → that preserves the form of Hamilton's equations. This is sometimes known as form invariance. Although Hamilton's equations are preserved, it need not preserve the explicit form of the Hamiltonian itself. Canonical transformations are useful in their own right, and also form the basis for the Hamilton–Jacobi equations and Liouville's theorem.
In materials science and continuum mechanics, viscoelasticity is the property of materials that exhibit both viscous and elastic characteristics when undergoing deformation. Viscous materials, like water, resist both shear flow and strain linearly with time when a stress is applied. Elastic materials strain when stretched and immediately return to their original state once the stress is removed.
In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.
In plasmas and electrolytes, the Debye length, is a measure of a charge carrier's net electrostatic effect in a solution and how far its electrostatic effect persists. With each Debye length the charges are increasingly electrically screened and the electric potential decreases in magnitude by 1/e. A Debye sphere is a volume whose radius is the Debye length. Debye length is an important parameter in plasma physics, electrolytes, and colloids. The corresponding Debye screening wave vector for particles of density , charge at a temperature is given by in Gaussian units. Expressions in MKS units will be given below. The analogous quantities at very low temperatures are known as the Thomas–Fermi length and the Thomas–Fermi wave vector. They are of interest in describing the behaviour of electrons in metals at room temperature.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In probability theory and statistics, the generalized inverse Gaussian distribution (GIG) is a three-parameter family of continuous probability distributions with probability density function
Intrabeam scattering (IBS) is an effect in accelerator physics where collisions between particles couple the beam emittance in all three dimensions. This generally causes the beam size to grow. In proton accelerators, intrabeam scattering causes the beam to grow slowly over a period of several hours. This limits the luminosity lifetime. In circular lepton accelerators, intrabeam scattering is counteracted by radiation damping, resulting in a new equilibrium beam emittance with a relaxation time on the order of milliseconds. Intrabeam scattering creates an inverse relationship between the smallness of the beam and the number of particles it contains, therefore limiting luminosity.
Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.
In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.
Viscoplasticity is a theory in continuum mechanics that describes the rate-dependent inelastic behavior of solids. Rate-dependence in this context means that the deformation of the material depends on the rate at which loads are applied. The inelastic behavior that is the subject of viscoplasticity is plastic deformation which means that the material undergoes unrecoverable deformations when a load level is reached. Rate-dependent plasticity is important for transient plasticity calculations. The main difference between rate-independent plastic and viscoplastic material models is that the latter exhibit not only permanent deformations after the application of loads but continue to undergo a creep flow as a function of time under the influence of the applied load.
In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
In mathematics, the slow manifold of an equilibrium point of a dynamical system occurs as the most common example of a center manifold. One of the main methods of simplifying dynamical systems, is to reduce the dimension of the system to that of the slow manifold—center manifold theory rigorously justifies the modelling. For example, some global and regional models of the atmosphere or oceans resolve the so-called quasi-geostrophic flow dynamics on the slow manifold of the atmosphere/oceanic dynamics, and is thus crucial to forecasting with a climate model.
Statistical Football prediction is a method used in sports betting, to predict the outcome of football matches by means of statistical tools. The goal of statistical match prediction is to outperform the predictions of bookmakers, who use them to set odds on the outcome of football matches.
In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.
The Bianconi–Barabási model is a model in network science that explains the growth of complex evolving networks. This model can explain that nodes with different characteristics acquire links at different rates. It predicts that a node's growth depends on its fitness and can calculate the degree distribution. The Bianconi–Barabási model is named after its inventors Ginestra Bianconi and Albert-László Barabási. This model is a variant of the Barabási–Albert model. The model can be mapped to a Bose gas and this mapping can predict a topological phase transition between a "rich-get-richer" phase and a "winner-takes-all" phase.
In image analysis, the generalized structure tensor (GST) is an extension of the Cartesian structure tensor to curvilinear coordinates. It is mainly used to detect and to represent the "direction" parameters of curves, just as the Cartesian structure tensor detects and represents the direction in Cartesian coordinates. Curve families generated by pairs of locally orthogonal functions have been the best studied.
In probability theory, statistics, and machine learning, the continuous Bernoulli distribution is a family of continuous probability distributions parameterized by a single shape parameter , defined on the unit interval , by:
{{cite book}}
: CS1 maint: DOI inactive as of November 2024 (link){{cite book}}
: CS1 maint: DOI inactive as of November 2024 (link)