GHK algorithm

Last updated January 03, 2025

The GHK algorithm (Geweke, Hajivassiliou and Keane)^[1] is an importance sampling method for simulating choice probabilities in the multivariate probit model. These simulated probabilities can be used to recover parameter estimates from the maximized likelihood equation using any one of the usual well known maximization methods (Newton's method, BFGS, etc.). Train^[2] has well documented steps for implementing this algorithm for a multinomial probit model. What follows here will apply to the binary multivariate probit model.

Consider the case where one is attempting to evaluate the choice probability of $\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )$ where $\mathbf {y_{i}} =(y_{1},...,y_{J}),\ (i=1,...,N)$ and where we can take $j$ as choices and $i$ as individuals or observations, $\mathbf {X_{i}\beta }$ is the mean and $\Sigma$ is the covariance matrix of the model. The probability of observing choice $\mathbf {y_{i}}$ is

{\begin{aligned}\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int _{A_{J}}\cdots \int _{A_{1}}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )dy_{1}^{*}\dots dy_{J}^{*}\\\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int \mathbb {1} _{y^{*}\in A}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )d\mathbf {y} _{i}^{*}\end{aligned}}

Where $A=A_{1}\times \cdots \times A_{J}$ and,

A_{j}={\begin{cases}(-\infty ,0]&y_{j}=0\\(0,\infty )&y_{j}=1\end{cases}}

Unless $J$ is small (less than or equal to 2) there is no closed form solution for the integrals defined above (some work has been done with $J=3$ ^[3]). The alternative to evaluating these integrals closed form or by quadrature methods is to use simulation. GHK is a simulation method to simulate the probability above using importance sampling methods.

Evaluating $\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=\int \mathbb {1} _{y^{*}\in A}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )d\mathbf {y} _{i}^{*}$ is simplified by recognizing that the latent data model $\mathbf {y_{i}^{*}} =\mathbf {X_{i}\beta } +\epsilon$ can be rewritten using a Cholesky factorization, $\Sigma =CC'$ . This gives $\mathbf {y_{i}^{*}} =\mathbf {X_{i}\beta } +C\eta _{i}$ where the $\eta _{i}$ terms are distributed $N(0,\mathbf {I} )$ .

Using this factorization and the fact that the $\eta _{i}$ are distributed independently one can simulate draws from a truncated multivariate normal distribution using draws from a univariate random normal.

For example, if the region of truncation $\mathbf {A}$ has lower and upper limits equal to $[a,b]$ (including a,b = $\pm \infty$ ) then the task becomes

{\begin{array}{lcl}a<&y_{1}^{*}&<b\\a<&y_{2}^{*}&<b\\\vdots &\vdots &\vdots \\a<&y_{J}^{*}&<b\\\end{array}}

Note: $\mathbf {y_{i}^{*}} =\mathbf {X_{i}\beta } +C\eta _{i}$ , substituting:

{\begin{array}{lcl}a<&x_{1}\beta _{1}+c_{11}\eta _{1}&<b\\a<&x_{2}\beta _{2}+c_{21}\eta _{1}+c_{22}\eta _{2}&<b\\\vdots &\vdots &\vdots \\a<&x_{J}\beta _{J}+\sum _{k=1}^{J}c_{J,k}\eta _{k}&<b\\\end{array}}

Rearranging above,

{\begin{array}{ccc}{\frac {a-x_{1}\beta _{1}}{c_{11}}}&<\eta _{1}<&{\frac {b-x_{1}\beta _{1}}{c_{11}}}\\{\frac {a-(x_{2}\beta _{2}+c_{21}\eta _{1})}{c_{22}}}&<\eta _{2}<&{\frac {b-(x_{2}\beta _{2}+c_{21}\eta _{1})}{c_{22}}}\\\vdots &\vdots &\vdots \\{\frac {a-(x_{J}\beta _{J}+\sum _{k=1}^{J-1}c_{J,k})}{c_{J,J}}}&<\eta _{k}<&{\frac {b-(x_{J}\beta _{J}+\sum _{k=1}^{J-1}c_{J,k})}{c_{J,J}}}\\\end{array}}

Now all one needs to do is iteratively draw from the truncated univariate normal distribution with the given bounds above. This can be done by the inverse CDF method and noting the truncated normal distribution is given by,

u={\frac {\Phi ({\frac {x-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}

Where $u$ will be a number between 0 and 1 because the above is a CDF. This suggests to generate random draws from the truncated distribution one has to solve for $x$ giving,

x=\sigma F^{-1}(u*(F(\beta )-F(\alpha ))+F(\alpha ))+\mu

where $\alpha ={\frac {a-\mu }{\sigma }}$ and $\beta ={\frac {b-\mu }{\sigma }}$ and $F$ is the standard normal CDF. With such draws one can reconstruct the $\mathbf {y_{i}^{*}}$ by its simplified equation using the Cholesky factorization. These draws will be conditional on the draws coming before and using properties of normals the product of the conditional PDFs will be the joint distribution of the $\mathbf {y_{i}^{*}}$ ,

q(\mathbf {y_{i}^{*}} |\mathbf {X_{1}\beta } ,\Sigma )=q(y_{1}^{*}|\mathbf {X_{1}\beta } ,\Sigma )q(y_{2}^{*}|y_{1}^{*},\mathbf {X_{1}\beta } ,\Sigma )\dots q(y_{J}^{*}|y_{1}^{*},\dots ,y_{J-1}^{*},\mathbf {X_{1}\beta } ,\Sigma )

Where $q(\cdot )$ is the multivariate normal distribution.

Because $y_{j}^{*}$ conditional on $y_{k},\ k<j$ is restricted to the set $A$ by the setup using the Cholesky factorization then we know that $q(\cdot )$ is a truncated multivariate normal. The distribution function of a truncated normal is,

{\frac {\phi ({\frac {x-\mu }{\sigma }})}{\sigma (\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }}))}}

Therefore, $y_{j}^{*}$ has distribution,

{\begin{aligned}q(\mathbf {y_{i}^{*}} |\mathbf {X_{i}\beta } ,\Sigma )&={\frac {{\frac {1}{c_{11}}}\phi _{1}{\Big (}{\frac {y_{j}^{*}-x_{1}\beta }{c_{11}}}{\Big )}}{{\Big (}\Phi _{1}{\Big (}{\frac {b-x_{1}\beta }{c_{11}}}{\Big )}-\Phi _{1}{\Big (}{\frac {a-x_{1}\beta }{c_{11}}}{\Big )}{\Big )}}}\times \dots \times {\frac {{\frac {1}{c_{JJ}}}\phi _{J}{\Big (}{\frac {y_{J}^{*}-(x_{J}\beta +c_{J1}\eta _{1}+c_{J2}\eta _{2}+\dots +c_{JJ-1}\eta _{J-1})}{c_{JJ}}}{\Big )}}{{\Big (}\Phi _{J}{\Big (}{\frac {b-(x_{J}\beta +c_{J1}\eta _{1}+c_{J2}\eta _{2}+\dots +c_{JJ-1}\eta _{J-1})}{c_{JJ}}}{\Big )}-\Phi _{J}{\Big (}{\frac {a-(x_{J}\beta +c_{J1}\eta _{1}+c_{J2}\eta _{2}+\dots +c_{JJ-1}\eta _{J-1}}{c_{JJ}}}{\Big )}{\Big )}}}\\&={\frac {\prod _{j=1}^{J}{\frac {1}{c_{jj}}}\phi _{j}{\Big (}{\frac {y_{j}^{*}-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}}{\prod _{j=1}^{J}{\Big (}\Phi _{j}{\Big (}{\frac {b-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}-\Phi {\Big (}{\frac {a-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}{\Big )}}}\end{aligned}}

where $\phi _{j}$ is the standard normal pdf for choice $j$ .

Because $y_{j|\{y_{k<j}^{*}\}}^{*}\sim N(\mathbf {X_{i}\beta } +\sum _{k=1}^{k<j}c_{jk}\eta _{k},c_{jj}^{2})$ the above standardization makes each term mean 0 variance 1.

Let the denominator $\prod _{j=1}^{J}\Phi _{j}{\Big (}{\frac {b-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}-\Phi {\Big (}{\frac {a-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}=\prod _{j=1}^{J}l_{jj}$ and the numerator $\prod _{j=1}^{J}{\frac {1}{c_{jj}}}\phi _{j}{\Big (}{\frac {y_{j}^{*}-\sum _{k=1}^{k<j}c_{jk}\eta _{k}}{c_{jj}}}{\Big )}=f_{N}(\mathbf {y_{i}^{*}} |\mathbf {X_{i}\beta } ,\Sigma )$ where $f_{N}(\cdot )$ is the multivariate normal PDF.

Going back to the original goal, to evaluate the

{\begin{aligned}\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int _{A_{j}}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )dy_{j}^{*}\\\end{aligned}}

Using importance sampling we can evaluate this integral,

{\begin{aligned}\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int _{A_{j}}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )dy_{j}^{*}\\=&\int _{A_{j}}{\frac {f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )}{q(\mathbf {y_{i}^{*}} |\mathbf {X_{i}\beta } ,\Sigma )}}q(\mathbf {y_{i}^{*}} |\mathbf {X_{i}\beta } ,\Sigma )dy_{j}^{*}\\=&\int _{A_{j}}{\frac {f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )}{\frac {f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )}{\prod _{j=1}^{J}l_{jj}}}}q(\mathbf {y_{i}^{*}} |\mathbf {X_{i}\beta } ,\Sigma )dy_{j}^{*}\\=&\mathbb {E} _{\mathbf {q} }{\Big (}\prod _{j=1}^{J}l_{jj}{\Big )}\\\end{aligned}}

This is well approximated by ${\frac {1}{S}}\sum _{s=1}^{S}\prod _{j=1}^{J}l_{jj}$ .

Related Research Articles

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-1/2 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way. It has become vital in the building of the Standard Model.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In special relativity, a four-vector is an object with four components, which transform in a specific way under Lorentz transformations. Specifically, a four-vector is an element of a four-dimensional vector space considered as a representation space of the standard representation of the Lorentz group, the representation. It differs from a Euclidean vector in how its magnitude is determined. The transformations that preserve this magnitude are the Lorentz transformations, which include spatial rotations and boosts.

Linear elasticity is a mathematical model as to how solid objects deform and become internally stressed by prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In quantum field theory, the Lehmann–Symanzik–Zimmermann (LSZ) reduction formula is a method to calculate S-matrix elements from the time-ordered correlation functions of a quantum field theory. It is a step of the path that starts from the Lagrangian of some quantum field theory and leads to prediction of measurable quantities. It is named after the three German physicists Harry Lehmann, Kurt Symanzik and Wolfhart Zimmermann.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In differential geometry, the four-gradient $is the four-vector analogue of the gradient from vector calculus.$

In relativistic physics, the electromagnetic stress–energy tensor is the contribution to the stress–energy tensor due to the electromagnetic field. The stress–energy tensor describes the flow of energy and momentum in spacetime. The electromagnetic stress–energy tensor contains the negative of the classical Maxwell stress tensor that governs the electromagnetic interactions.

The covariant formulation of classical electromagnetism refers to ways of writing the laws of classical electromagnetism in a form that is manifestly invariant under Lorentz transformations, in the formalism of special relativity using rectilinear inertial coordinate systems. These expressions both make it simple to prove that the laws of classical electromagnetism take the same form in any inertial coordinate system, and also provide a way to translate the fields and forces from one frame to another. However, this is not as general as Maxwell's equations in curved spacetime or non-rectilinear coordinate systems.

Intrabeam scattering (IBS) is an effect in accelerator physics where collisions between particles couple the beam emittance in all three dimensions. This generally causes the beam size to grow. In proton accelerators, intrabeam scattering causes the beam to grow slowly over a period of several hours. This limits the luminosity lifetime. In circular lepton accelerators, intrabeam scattering is counteracted by radiation damping, resulting in a new equilibrium beam emittance with a relaxation time on the order of milliseconds. Intrabeam scattering creates an inverse relationship between the smallness of the beam and the number of particles it contains, therefore limiting luminosity.

In mathematics and mathematical physics, raising and lowering indices are operations on tensors which change their type. Raising and lowering indices are a form of index manipulation in tensor expressions.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

The Luttinger–Kohn model is a flavor of the k·p perturbation theory used for calculating the structure of multiple, degenerate electronic bands in bulk and quantum well semiconductors. The method is a generalization of the single band k·p theory.

In statistics, an errors-in-variables model or a measurement error model is a regression model that accounts for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.$

This article summarizes several identities in exterior calculus, a mathematical notation used in differential geometry.

In physics and mathematics, the Klein–Kramers equation or sometimes referred as Kramers–Chandrasekhar equation is a partial differential equation that describes the probability density function $f$ of a Brownian particle in phase space $(r, p)$ . It is a special case of the Fokker–Planck equation.

References

↑ Hajivassiliou, Vassilis (1994). "CLASSICAL ESTIMATION METHODS FOR LDV MODELS USING SIMULATION" (PDF). Handbook of Econometrics. doi:10.1016/S1573-4412(05)80009-1.
↑ Train, Kenneth (2003). Discrete Choice Methods With Simulation. Cambridge University Press.
↑ Greene, William (2003). Econometric Analysis. Prentice Hall.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hajivassiliou, Vassilis (1994). "CLASSICAL ESTIMATION METHODS FOR LDV MODELS USING SIMULATION" (PDF). Handbook of Econometrics. doi:10.1016/S1573-4412(05)80009-1.

[2] Train, Kenneth (2003). Discrete Choice Methods With Simulation. Cambridge University Press.

[3] Greene, William (2003). Econometric Analysis. Prentice Hall.

[1]

[2]

[3]