Bayesian multivariate linear regression

Last updated

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

Contents

Details

Consider a regression problem where the dependent variable to be predicted is not a single real-valued scalar but an m-length vector of correlated real numbers. As in the standard regression setup, there are n observations, where each observation i consists of k−1 explanatory variables, grouped into a vector of length k (where a dummy variable with a value of 1 has been added to allow for an intercept coefficient). This can be viewed as a set of m related regression problems for each observation i:

where the set of errors are all correlated. Equivalently, it can be viewed as a single regression problem where the outcome is a row vector and the regression coefficient vectors are stacked next to each other, as follows:

The coefficient matrix B is a matrix where the coefficient vectors for each regression problem are stacked horizontally:

The noise vector for each observation i is jointly normal, so that the outcomes for a given observation are correlated:

We can write the entire regression problem in matrix form as:

where Y and E are matrices. The design matrix X is an matrix with the observations stacked vertically, as in the standard linear regression setup:

The classical, frequentists linear least squares solution is to simply estimate the matrix of regression coefficients using the Moore-Penrose pseudoinverse:

To obtain the Bayesian solution, we need to specify the conditional likelihood and then find the appropriate conjugate prior. As with the univariate case of linear Bayesian regression, we will find that we can specify a natural conditional conjugate prior (which is scale dependent).

Let us write our conditional likelihood as [1]

writing the error in terms of and yields

We seek a natural conjugate prior—a joint density which is of the same functional form as the likelihood. Since the likelihood is quadratic in , we re-write the likelihood so it is normal in (the deviation from classical sample estimate).

Using the same technique as with Bayesian linear regression, we decompose the exponential term using a matrix-form of the sum-of-squares technique. Here, however, we will also need to use the Matrix Differential Calculus (Kronecker product and vectorization transformations).

First, let us apply sum-of-squares to obtain new expression for the likelihood:

We would like to develop a conditional form for the priors:

where is an inverse-Wishart distribution and is some form of normal distribution in the matrix . This is accomplished using the vectorization transformation, which converts the likelihood from a function of the matrices to a function of the vectors .

Write

Let

where denotes the Kronecker product of matrices A and B, a generalization of the outer product which multiplies an matrix by a matrix to generate an matrix, consisting of every combination of products of elements from the two matrices.

Then

which will lead to a likelihood which is normal in .

With the likelihood in a more tractable form, we can now find a natural (conditional) conjugate prior.

Conjugate prior distribution

The natural conjugate prior using the vectorized variable is of the form: [1]

where

and

Posterior distribution

Using the above prior and likelihood, the posterior distribution can be expressed as: [1]

where . The terms involving can be grouped (with ) using:

with

This now allows us to write the posterior in a more useful form:

This takes the form of an inverse-Wishart distribution times a Matrix normal distribution:

and

The parameters of this posterior are given by:

See also

Related Research Articles

Lorentz transformation Family of linear transformations

In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.

Pauli matrices Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

Hookes law Physical law: force needed to deform a spring scales linearly with distance

In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance—that is, Fs = kx, where k is a constant factor characteristic of the spring, and x is small compared to the total possible deformation of the spring. The law is named after 17th-century British physicist Robert Hooke. He first stated the law in 1676 as a Latin anagram. He published the solution of his anagram in 1678 as: ut tensio, sic vis. Hooke states in the 1678 work that he was aware of the law since 1660.

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed due to prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

Maxwell stress tensor

The Maxwell stress tensor is a symmetric second-order tensor used in classical electromagnetism to represent the interaction between electromagnetic forces and mechanical momentum. In simple situations, such as a point charge moving freely in a homogeneous magnetic field, it is easy to calculate the forces on the charge from the Lorentz force law. When the situation becomes more complicated, this ordinary procedure can become impractically difficult, with equations spanning multiple lines. It is therefore convenient to collect many of these terms in the Maxwell stress tensor, and to use tensor arithmetic to find the answer to the problem at hand.

Electromagnetic stress–energy tensor

In relativistic physics, the electromagnetic stress–energy tensor is the contribution to the stress–energy tensor due to the electromagnetic field. The stress–energy tensor describes the flow of energy and momentum in spacetime. The electromagnetic stress–energy tensor contains the negative of the classical Maxwell stress tensor that governs the electromagnetic interactions.

Maxwells equations in curved spacetime Electromagnetism in general relativity

In physics, Maxwell's equations in curved spacetime govern the dynamics of the electromagnetic field in curved spacetime or where one uses an arbitrary coordinate system. These equations can be viewed as a generalization of the vacuum Maxwell's equations which are normally formulated in the local coordinates of flat spacetime. But because general relativity dictates that the presence of electromagnetic fields induce curvature in spacetime, Maxwell's equations in flat spacetime should be viewed as a convenient approximation.

Newman–Penrose formalism Notation in general relativity

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model's parameters.

Mathematical descriptions of the electromagnetic field Formulations of electromagnetism

There are various mathematical descriptions of the electromagnetic field that are used in the study of electromagnetism, one of the four fundamental interactions of nature. In this article, several approaches are discussed, although the equations are in terms of electric and magnetic fields, potentials, and charges with currents, generally speaking.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Viscoplasticity Theory in continuum mechanics

Viscoplasticity is a theory in continuum mechanics that describes the rate-dependent inelastic behavior of solids. Rate-dependence in this context means that the deformation of the material depends on the rate at which loads are applied. The inelastic behavior that is the subject of viscoplasticity is plastic deformation which means that the material undergoes unrecoverable deformations when a load level is reached. Rate-dependent plasticity is important for transient plasticity calculations. The main difference between rate-independent plastic and viscoplastic material models is that the latter exhibit not only permanent deformations after the application of loads but continue to undergo a creep flow as a function of time under the influence of the applied load.

Normal-inverse-gamma distribution

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

Wigner rotation

In theoretical physics, the composition of two non-collinear Lorentz boosts results in a Lorentz transformation that is not a pure boost but is the composition of a boost and a rotation. This rotation is called Thomas rotation, Thomas–Wigner rotation or Wigner rotation. The rotation was discovered and proved by Ludwik Silberstein in his 1914 book 'Relativity', rediscovered by Llewellyn Thomas in 1926, and rederived by Wigner in 1939. Wigner acknowledged Silberstein. If a sequence of non-collinear boosts returns an object to its initial velocity, then the sequence of Wigner rotations can combine to produce a net rotation called the Thomas precession.

Rock mass plasticity

Plasticity theory for rocks is concerned with the response of rocks to loads beyond the elastic limit. Historically, conventional wisdom has it that rock is brittle and fails by fracture while plasticity is identified with ductile materials. In field scale rock masses, structural discontinuities exist in the rock indicating that failure has taken place. Since the rock has not fallen apart, contrary to expectation of brittle behavior, clearly elasticity theory is not the last work.

De-sparsified lasso contributes to construct confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in high-dimensional model.

References

  1. 1 2 3 Peter E. Rossi, Greg M. Allenby, Rob McCulloch. Bayesian Statistics and Marketing. John Wiley & Sons, 2012, p. 32.