In statistics, a random vector x is classically represented by a probability density function. In a set-membership approach or set estimation, x is represented by a set X to which x is assumed to belong. This means that the support of the probability distribution function of x is included inside X. On the one hand, representing random vectors by sets makes it possible to provide fewer assumptions on the random variables (such as independence) and dealing with nonlinearities is easier. On the other hand, a probability distribution function provides a more accurate information than a set enclosing its support.
Set membership estimation (or set estimation for short) is an estimation approach which considers that measurements are represented by a set Y (most of the time a box of where m is the number of measurements) of the measurement space. If p is the parameter vector and f is the model function, then the set of all feasible parameter vectors is
where P0 is the prior set for the parameters. Characterizing P corresponds to a set-inversion problem. [1]
When f is linear the feasible set P can be described by linear inequalities and can be approximated using linear programming techniques. [2]
When f is nonlinear, the resolution can be performed using interval analysis. The feasible set P is then approximated by an inner and an outer subpavings. The main limitation of the method is its exponential complexity with respect to the number of parameters. [3]
Consider the following model
where p1 and p2 are the two parameters to be estimated.
Assume that at times t1 = −1, t2 = 1, t3 = 2, the following interval measurements have been collected: as illustrated by Figure 1. The corresponding measurement set (here a box) is
The model function is defined by
The components of f are obtained using the model for each time measurement. After solving the set inversion problem, we get the approximation depicted on Figure 2. Red boxes are inside the feasible set P and blue boxes are outside P.
Set estimation can be used to estimate the state of a system described by state equations using a recursive implementation. When the system is linear, the corresponding feasible set for the state vector can be described by polytopes or by ellipsoids [4] . [5] When the system is nonlinear, the set can be enclosed by subpavings. [6]
When outliers occur, the set estimation method generally returns an empty set. This is due to the fact that the intersection between sets of parameter vectors that are consistent with the ith data bar is empty. To be robust with respect to outliers, we generally characterize the set of parameter vectors that are consistent with all data bars except q of them. This is possible using the notion of q-relaxed intersection.
In machine learning, supervised learning (SL) is a paradigm where a model is trained using input objects and desired output values, which are often human-made labels. The training process builds a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to accurately determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured via a generalization error.
In regression analysis, least squares is a parameter estimation method based on minimizing the sum of the squares of the residuals made in the results of each individual equation.
In statistics and control theory, Kalman filtering is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unknown variables that tend to be more accurate than those based on a single measurement, by estimating a joint probability distribution over the variables for each time-step. The filter is constructed as a mean squared error minimiser, but an alternative derivation of the filter is also provided showing how the filter relates to maximum likelihood statistics. The filter is named after Rudolf E. Kálmán.
Fractal flames are a member of the iterated function system class of fractals created by Scott Draves in 1992. Draves' open-source code was later ported into Adobe After Effects graphics software and translated into the Apophysis fractal flame editor.
In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations (iterations).
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. The "M" initial stands for "maximum likelihood-type".
Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.
Polynomial chaos (PC), also called polynomial chaos expansion (PCE) and Wiener chaos expansion, is a method for representing a random variable in terms of a polynomial function of other random variables. The polynomials are chosen to be orthogonal with respect to the joint probability distribution of these random variables. Note that despite its name, PCE has no immediate connections to chaos theory. The word "chaos" here should be understood as "random".
In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.
The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.
Moving horizon estimation (MHE) is an optimization approach that uses a series of measurements observed over time, containing noise and other inaccuracies, and produces estimates of unknown variables or parameters. Unlike deterministic approaches, MHE requires an iterative approach that relies on linear programming or nonlinear programming solvers to find a solution.
In mathematics, set inversion is the problem of characterizing the preimage X of a set Y by a function f, i.e., X = f −1(Y ) = {x ∈ Rn | f(x) ∈ Y }. It can also be viewed as the problem of describing the solution set of the quantified constraint "Y(f (x))", where Y( y) is a constraint, e.g. an inequality, describing the set Y.
In mathematics, a subpaving is a set of nonoverlapping boxes of R⁺. A subset X of Rⁿ can be approximated by two subpavings X⁻ and X⁺ such that
X⁻ ⊂ X ⊂ X⁺.
The relaxed intersection of m sets corresponds to the classical intersection between sets except that it is allowed to relax few sets in order to avoid an empty intersection. This notion can be used to solve constraints satisfaction problems that are inconsistent by relaxing a small number of constraints. When a bounded-error approach is considered for parameter estimation, the relaxed intersection makes it possible to be robust with respect to some outliers.
In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.
In regression analysis, an interval predictor model (IPM) is an approach to regression where bounds on the function to be approximated are obtained. This differs from other techniques in machine learning, where usually one wishes to estimate point values or an entire probability distribution. Interval Predictor Models are sometimes referred to as a nonparametric regression technique, because a potentially infinite set of functions are contained by the IPM, and no specific distribution is implied for the regressed variables.
Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.