Well-behaved statistic

Last updated

Although the term well-behaved statistic often seems to be used in the scientific literature in somewhat the same way as is well-behaved in mathematics (that is, to mean "non-pathological" [1] [2] ) it can also be assigned precise mathematical meaning, and in more than one way. In the former case, the meaning of this term will vary from context to context. In the latter case, the mathematical conditions can be used to derive classes of combinations of distributions with statistics which are well-behaved in each sense.

Contents

First Definition: The variance of a well-behaved statistical estimator is finite and one condition on its mean is that it is differentiable in the parameter being estimated. [3]

Second Definition: The statistic is monotonic, well-defined, and locally sufficient. [4]

Conditions for a Well-Behaved Statistic: First Definition

More formally the conditions can be expressed in this way. is a statistic for that is a function of the sample, . For to be well-behaved we require:

: Condition 1

differentiable in , and the derivative satisfies:

: Condition 2

Conditions for a Well-Behaved Statistic: Second Definition

In order to derive the distribution law of the parameter T, compatible with , the statistic must obey some technical properties. Namely, a statistic s is said to be well-behaved if it satisfies the following three statements:

  1. monotonicity. A uniformly monotone relation exists between s and ? for any fixed seed – so as to have a unique solution of (1);
  2. well-defined. On each observed s the statistic is well defined for every value of ?, i.e. any sample specification such that has a probability density different from 0 – so as to avoid considering a non-surjective mapping from to , i.e. associating via to a sample a ? that could not generate the sample itself;
  3. local sufficiency. constitutes a true T sample for the observed s, so that the same probability distribution can be attributed to each sampled value. Now, is a solution of (1) with the seed . Since the seeds are equally distributed, the sole caveat comes from their independence or, conversely from their dependence on ? itself. This check can be restricted to seeds involved by s, i.e. this drawback can be avoided by requiring that the distribution of is independent of ?. An easy way to check this property is by mapping seed specifications into s specifications. The mapping of course depends on ?, but the distribution of will not depend on ?, if the above seed independence holds – a condition that looks like a local sufficiency of the statistic S.

The remainder of the present article is mainly concerned with the context of data mining procedures applied to statistical inference and, in particular, to the group of computationally intensive procedure that have been called algorithmic inference.

Algorithmic inference

In algorithmic inference, the property of a statistic that is of most relevance is the pivoting step which allows to transference of probability-considerations from the sample distribution to the distribution of the parameters representing the population distribution in such a way that the conclusion of this statistical inference step is compatible with the sample actually observed.

By default, capital letters (such as U, X) will denote random variables and small letters (u, x) their corresponding realizations and with gothic letters (such as ) the domain where the variable takes specifications. Facing a sample , given a sampling mechanism , with scalar, for the random variable X, we have

The sampling mechanism , of the statistic s, as a function ? of with specifications in , has an explaining function defined by the master equation:

for suitable seeds and parameter ?

Example

For instance, for both the Bernoulli distribution with parameter p and the exponential distribution with parameter ? the statistic is well-behaved. The satisfaction of the above three properties is straightforward when looking at both explaining functions: if , 0 otherwise in the case of the Bernoulli random variable, and for the Exponential random variable, giving rise to statistics

and

Vice versa, in the case of X following a continuous uniform distribution on the same statistics do not meet the second requirement. For instance, the observed sample gives . But the explaining function of this X is . Hence a master equation would produce with a U sample and a solution . This conflicts with the observed sample since the first observed value should result greater than the right extreme of the X range. The statistic is well-behaved in this case.

Analogously, for a random variable X following the Pareto distribution with parameters K and A (see Pareto example for more detail of this case),

and

can be used as joint statistics for these parameters.

As a general statement that holds under weak conditions, sufficient statistics are well-behaved with respect to the related parameters. The table below gives sufficient / Well-behaved statistics for the parameters of some of the most commonly used probability distributions.

Common distribution laws together with related sufficient and well-behaved statistics.
DistributionDefinition of density functionSufficient/Well-behaved statistic
Uniform discrete
Bernoulli
Binomial
Geometric
Poisson
Uniform continuous
Negative exponential
Pareto
Gaussian
Gamma

Related Research Articles

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

In directional statistics, the von Mises–Fisher distribution, is a probability distribution on the -sphere in . If the distribution reduces to the von Mises distribution on the circle.

Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computing, bioinformatics, and, long ago, structural probability . The main focus is on the algorithms which compute statistics rooting the study of a random phenomenon, along with the amount of data they must feed on to produce reliable results. This shifts the interest of mathematicians from the study of the distribution laws to the functional properties of the statistics, and the interest of computer scientists from the algorithms for processing data to the information they process.

Twisting properties in general terms are associated with the properties of samples that identify with statistics that are suitable for exchange.

Bootstrapping populations in statistics and mathematics starts with a sample observed from a random variable.

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

References

  1. Dawn Iacobucci. "Mediation analysis and categorical variables: The final frontier" (PDF). Retrieved 7 February 2017.
  2. John DiNardo; Jason Winfree. "The Law of Genius and Home Runs Refuted" (PDF). Retrieved 7 February 2017.
  3. A DasGupta. "(no title)" (PDF). Retrieved 7 February 2017.{{cite web}}: Cite uses generic title (help)
  4. Apolloni, B; Bassis, S.; Malchiodi, D.; Witold, P. (2008). The Puzzle of Granular Computing. Studies in Computational Intelligence. Vol. 138. Berlin: Springer.