Bayes linear statistics

Last updated

Bayes linear statistics is a subjectivist statistical methodology and framework. Traditional subjective Bayesian analysis is based upon fully specified probability distributions, which are very difficult to specify at the necessary level of detail. Bayes linear analysis attempts to solve this problem by developing theory and practise for using partially specified probability models. Bayes linear in its current form has been primarily developed by Michael Goldstein. Mathematically and philosophically it extends Bruno de Finetti's Operational Subjective approach to probability and statistics.

Contents

Motivation

Consider first a traditional Bayesian Analysis where you expect to shortly know D and you would like to know more about some other observable B. In the traditional Bayesian approach it is required that every possible outcome is enumerated i.e. every possible outcome is the cross product of the partition of a set of B and D. If represented on a computer where B requires n bits and Dm bits then the number of states required is . The first step to such an analysis is to determine a persons subjective probabilities e.g. by asking about their betting behaviour for each of these outcomes. When we learn D conditional probabilities for B are determined by the application of Bayes' rule.

Practitioners of subjective Bayesian statistics routinely analyse datasets where the size of this set is large enough that subjective probabilities cannot be meaningfully determined for every element of D×B. This is normally accomplished by assuming exchangeability and then the use of parameterized models with prior distributions over parameters and appealing to the de Finetti's theorem to justify that this produces valid operational subjective probabilities over D×B. The difficulty with such an approach is that the validity of the statistical analysis requires that the subjective probabilities are a good representation of an individual's beliefs however this method results in a very precise specification over D×B and it is often difficult to articulate what it would mean to adopt these belief specifications.

In contrast to the traditional Bayesian paradigm Bayes linear statistics following de Finetti uses Prevision or subjective expectation as a primitive, probability is then defined as the expectation of an indicator variable. Instead of specifying a subjective probability for every element in the partition D×B the analyst specifies subjective expectations for just a few quantities that they are interested in or feel knowledgeable about. Then instead of conditioning an adjusted expectation is computed by a rule that is a generalization of Bayes' rule that is based upon expectation.

The use of the word linear in the title refers to de Finetti's arguments that probability theory is a linear theory (de Finetti argued against the more common measure theory approach).

Example

In Bayes linear statistics, the probability model is only partially specified, and it is not possible to calculate conditional probability by Bayes' rule. Instead Bayes linear suggests the calculation of an Adjusted Expectation.

To conduct a Bayes linear analysis it is necessary to identify some values that you expect to know shortly by making measurements D and some future value which you would like to know B. Here D refers to a vector containing data and B to a vector containing quantities you would like to predict. For the following example B and D are taken to be two-dimensional vectors i.e.

In order to specify a Bayes linear model it is necessary to supply expectations for the vectors B and D, and to also specify the correlation between each component of B and each component of D.

For example the expectations are specified as:

and the covariance matrix is specified as :

The repetition in this matrix, has some interesting implications to be discussed shortly.

An adjusted expectation is a linear estimator of the form

where and are chosen to minimise the prior expected loss for the observations i.e. in this case. That is for

where

are chosen in order to minimise the prior expected loss in estimating

In general the adjusted expectation is calculated with

Setting to minimise

From a proof provided in (Goldstein and Wooff 2007) it can be shown that:

For the case where Var(D) is not invertible the Moore–Penrose pseudoinverse should be used instead.

Furthermore, the adjusted variance of the variable X after observing the data D is given by

See also

Related Research Articles

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.

Cauchy distribution Probability distribution

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.

Pareto distribution Probability distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto,, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena. Originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

Beta distribution Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by alpha (α) and beta (β), that appear as exponents of the random variable and control the shape of the distribution. The generalization to multiple variables is called a Dirichlet distribution.

Gamma distribution Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ.
  2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
Beta function

In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral

Kriging Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In decision theory, subjective expected utility is the attractiveness of an economic opportunity as perceived by a decision-maker in the presence of risk. Characterizing the behavior of decision-makers as using subjective expected utility was promoted and axiomatized by L. J. Savage in 1954 following previous work by Ramsey and von Neumann. The theory of subjective expected utility combines two subjective concepts: first, a personal utility function, and second a personal probability distribution.

Dirichlet distribution Probability distribution

In probability and statistics, the Dirichlet distribution, often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

The algebra of random variables in statistics, provides rules for the symbolic manipulation of random variables, while avoiding delving too deeply into the mathematically sophisticated ideas of probability theory. Its symbolism allows the treatment of sums, products, ratios and general functions of random variables, as well as dealing with operations such as finding the probability distributions and the expectations, variances and covariances of such combinations.

Inverse-gamma distribution Two-parameter family of continuous probability distributions

In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

Beta-binomial distribution

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of a set of other variables. After obtaining the posterior probability of the coefficients of this linear function, as well as other parameters describing the distribution of the regressand, the model then allows prediction of conditional on observed values of . The simplest and most widely used version of this model is the normal linear model, in which the distribution of given follows a normal distribution. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically, but generally posteriors have to be approximated.

Polynomial chaos (PC), also called polynomial chaos expansion (PCE) and Wiener chaos expansion, is a method for representing a random variable in terms of a polynomial function of other random variables. The polynomials are chosen to be orthogonal with respect to the joint probability distribution of these random variables. PCE can be used, e.g., to determine the evolution of uncertainty in a dynamical system when there is probabilistic uncertainty in the system parameters. Note that despite its name, PCE has no immediate connections to chaos theory.

Conway–Maxwell–Poisson distribution Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

Half-normal distribution Probability distribution

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

Beta rectangular distribution

In probability theory and statistics, the beta rectangular distribution is a probability distribution that is a finite mixture distribution of the beta distribution and the continuous uniform distribution. The support is of the distribution is indicated by the parameters a and b, which are the minimum and maximum values respectively. The distribution provides an alternative to the beta distribution such that it allows more density to be placed at the extremes of the bounded interval of support. Thus it is a bounded distribution that allows for outliers to have a greater chance of occurring than does the beta distribution.

References

- "Foresight: its Logical Laws, Its Subjective Sources," (translation of the 1937 article in French) in H. E. Kyburg and H. E. Smokler (eds), Studies in Subjective Probability, New York: Wiley, 1964.