Mixed logit

Last updated February 06, 2025 • 5 min readFrom Wikipedia, The Free Encyclopedia

Mixed logit is a fully general statistical model for examining discrete choices. It overcomes three important limitations of the standard logit model by allowing for random taste variation across choosers, unrestricted substitution patterns across choices, and correlation in unobserved factors over time.^[1] Mixed logit can choose any distribution $f$ for the random coefficients, unlike probit which is limited to the normal distribution. It is called "mixed logit" because the choice probability is a mixture of logits, with $f$ as the mixing distribution.^[2] It has been shown that a mixed logit model can approximate to any degree of accuracy any true random utility model of discrete choice, given appropriate specification of variables and the coefficient distribution.^[3]

Random taste variation

The standard logit model's "taste" coefficients, or $\beta$ 's, are fixed, which means the $\beta$ 's are the same for everyone. Mixed logit has different $\beta$ 's for each person (i.e., each decision maker.)

In the standard logit model, the utility of person $n$ for alternative $i$ is:

U_{ni}=\beta x_{ni}+\varepsilon _{ni}

with

\varepsilon _{ni}

~ iid extreme value

For the mixed logit model, this specification is generalized by allowing $\beta _{n}$ to be random. The utility of person $n$ for alternative $i$ in the mixed logit model is:

U_{ni}=\beta _{n}x_{ni}+\varepsilon _{ni}

with

\varepsilon _{ni}

~ iid extreme value

\quad \beta _{n}\sim f(\beta |\theta )

where θ are the parameters of the distribution of $\beta _{n}$ 's over the population, such as the mean and variance of $\beta _{n}$ .

Conditional on $\beta _{n}$ , the probability that person $n$ chooses alternative $i$ is the standard logit formula:

L_{ni}(\beta _{n})={\frac {e^{\beta _{n}X_{ni}}}{\sum _{j}e^{\beta _{n}X_{nj}}}}

However, since $\beta _{n}$ is random and not known, the (unconditional) choice probability is the integral of this logit formula over the density of $\beta _{n}$ .

P_{ni}=\int L_{ni}(\beta )f(\beta |\theta )d\beta

This model is also called the random coefficient logit model since $\beta _{n}$ is a random variable. It allows the slopes of utility (i.e., the marginal utility) to be random, which is an extension of the random effects model where only the intercept was stochastic.

Any probability density function can be specified for the distribution of the coefficients in the population, i.e., for $f(\beta |\theta )$ . The most widely used distribution is normal, mainly for its simplicity. For coefficients that take the same sign for all people, such as a price coefficient that is necessarily negative or the coefficient of a desirable attribute, distributions with support on only one side of zero, like the lognormal, are used.^[4]^[5] When coefficients cannot logically be unboundedly large or small, then bounded distributions are often used, such as the $S_{b}$ or triangular distributions.

Unrestricted substitution patterns

The mixed logit model can represent general substitution pattern because it does not exhibit logit's restrictive independence of irrelevant alternatives (IIA) property. The percentage change in person $n$ 's unconditional probability of choosing alternative $i$ given a percentage change in the mth attribute of alternative $j$ (the elasticity of $P_{ni}$ with respect to $x_{nj}^{m}$ ) is

{\text{Elasticity}}_{P_{ni},x_{nj}^{m}}=-{\frac {x_{nj}^{m}}{P_{ni}}}\int \beta ^{m}L_{ni}(\beta )L_{nj}(\beta )f(\beta )d\beta =-x_{nj}^{m}\int \beta ^{m}L_{nj}(\beta ){\frac {L_{ni}(\beta )}{P_{ni}}}f(\beta )d\beta

where $\beta ^{m}$ is the mth element of $\beta$ .^[1]^[5] It can be seen from this formula that a ten-percent reduction for $P_{ni}$ need not imply (as with logit) a ten-percent reduction in each other alternative $P_{nj}$ .^[1] The reason is that the relative percentages depend on the correlation between the conditional likelihood that person $n$ will choose alternative $i,L_{ni},$ and the conditional likelihood that person $n$ will choose alternative $j,L_{nj},$ over various draws of $\beta$ .

Correlation in unobserved factors over time

Standard logit does not take into account any unobserved factors that persist over time for a given decision maker. This can be a problem if you are using panel data, which represent repeated choices over time. By applying a standard logit model to panel data you are making the assumption that the unobserved factors that affect a person's choice are new every time the person makes the choice. That is a very unlikely assumption. To take into account both random taste variation and correlation in unobserved factors over time, the utility for respondent n for alternative i at time t is specified as follows:

U_{nit}=\beta _{n}X_{nit}+\varepsilon _{nit}

where the subscript t is the time dimension. We still make the logit assumption which is that $\varepsilon$ is i.i.d extreme value. That means that $\varepsilon$ is independent over time, people, and alternatives. $\varepsilon$ is essentially just white noise. However, correlation over time and over alternatives arises from the common effect of the $\beta$ 's, which enter utility in each time period and each alternative.

To examine the correlation explicitly, assume that the β's are normally distributed with mean ${\bar {\beta }}$ and variance $\sigma ^{2}$ . Then the utility equation becomes:

U_{nit}=({\bar {\beta }}+\sigma \eta _{n})X_{nit}+\varepsilon _{nit}

and η is a draw from the standard normal density. Rearranging, the equation becomes:

U_{nit}={\bar {\beta }}X_{nit}+(\sigma \eta _{n}X_{nit}+\varepsilon _{nit})

U_{nit}={\bar {\beta }}X_{nit}+e_{nit}

where the unobserved factors are collected in $e_{nit}=\sigma \eta _{n}X_{nit}+\varepsilon _{nit}$ . Of the unobserved factors, $\varepsilon _{nit}$ is independent over time, and $\sigma \eta _{n}X_{nit}$ is not independent over time or alternatives.

Then the covariance between alternatives $i$ and $j$ is,

{\text{Cov}}(e_{nit},e_{njt})=\sigma ^{2}(X_{nit}X_{njt})

and the covariance between time $t$ and $q$ is

{\text{Cov}}(e_{nit},e_{niq})=\sigma ^{2}(X_{nit}X_{niq})

By specifying the X's appropriately, one can obtain any pattern of covariance over time and alternatives.

Conditional on $\beta _{n}$ , the probability of the sequence of choices by a person is simply the product of the logit probability of each individual choice by that person:

L_{n}(\beta _{n})=\prod _{t}{\frac {e^{\beta _{n}X_{nit}}}{\sum _{j}e^{\beta _{n}X_{njt}}}}

since $\varepsilon _{nit}$ is independent over time. Then the (unconditional) probability of the sequence of choices is simply the integral of this product of logits over the density of $\beta$ .

P_{ni}=\int L_{n}(\beta )f(\beta |\theta )d\beta

Simulation

Unfortunately there is no closed form for the integral that enters the choice probability, and so the researcher must simulate P_n. Fortunately for the researcher, simulating P_n can be very simple. There are four basic steps to follow

1. Take a draw from the probability density function that you specified for the 'taste' coefficients. That is, take a draw from $f(\beta |\theta )$ and label the draw $\beta ^{r}$ , for $r=1$ representing the first draw.

2. Calculate $L_{n}(\beta ^{r})$ . (The conditional probability.)

3. Repeat many times, for $r=2,...,R$ .

4. Average the results

Then the formula for the simulation look like the following,

${\tilde {P}}_{ni}={\frac {\sum _{r}L_{ni}(\beta ^{r})}{R}}$

where R is the total number of draws taken from the distribution, and r is one draw.

Once this is done you will have a value for the probability of each alternative i for each respondent n.

Related Research Articles

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

The Ising model, named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables.

In statistics, a linear probability model (LPM) is a special case of a binary regression model. Here the dependent variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more explanatory variables. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by linear regression.

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis. On the other hand, discrete choice analysis examines situations in which the potential outcomes are discrete, such that the optimum is not characterized by standard first-order conditions. Thus, instead of examining "how much" as in problems with continuous choice variables, discrete choice analysis examines "which one". However, discrete choice analysis can also be used to examine the chosen quantity when only a few distinct quantities must be chosen from, such as the number of vehicles a household chooses to own and the number of minutes of telecommunications service a customer decides to purchase. Techniques such as logistic regression and probit regression can be used for empirical analysis of discrete choice.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of ⁠ $⁠$ independent Bernoulli trials, where each trial has probability of success ⁠ $⁠$ . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated, then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. J.R. Ashford and R.R. Sowden initially proposed an approach for multivariate probit analysis. Siddhartha Chib and Edward Greenberg extended this idea and also proposed simulation-based inference methods for the multivariate probit model which simplified and generalized parameter estimation.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logit model as one method of multiclass classification. It is not to be confused with the multivariate probit model, which is used to model correlated binary outcomes for more than one independent variable.

Although the concept choice models is widely understood and practiced these days, it is often difficult to acquire hands-on knowledge in simulating choice models. While many stat packages provide useful tools to simulate, researchers attempting to test and simulate new choice models with data often encounter problems from as simple as scaling parameter to misspecification. This article goes beyond simply defining discrete choice models. Rather, it aims at providing a comprehensive overview of how to simulate such models in computer.

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

In statistics and econometrics, the maximum score estimator is a nonparametric estimator for discrete choice models developed by Charles Manski in 1975. Unlike the multinomial probit and multinomial logit estimators, it makes no assumptions about the distribution of the unobservable part of utility. However, its statistical properties are more complicated than the multinomial probit and logit models, making statistical inference difficult. To address these issues, Joel Horowitz proposed a variant, called the smoothed maximum score estimator.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

Dynamic discrete choice (DDC) models, also known as discrete choice models of dynamic programming, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the present value of utility, generalizing the utility theory upon which discrete choice models are based.

References

1 2 3 "Train, K. (2003) Discrete Choice Methods with Simulation" (PDF). Econometrics Laboratory University of California at Berkeley. Retrieved 2025-02-05.
↑ Hensher, David A. & William H. Greene (2003). "The Mixed Logit Model: The State of Practice," Transportation, Vol. 30, pp. 133–176, at p. 135.
↑ McFadden, D. and Train, K. (2000). “Mixed MNL Models for Discrete Response,” Journal of Applied Econometrics, Vol. 15, No. 5, pp. 447-470.
↑ David Revelt and Train, K (1998). "Mixed Logit with Repeated Choices: Households' Choices of Appliance Efficiency Level," Review of Economics and Statistics, Vol. 80, No. 4, pp. 647-657
1 2 Train, K (1998)."Recreation Demand Models with Taste Variation," Land Economics, Vol. 74, No. 2, pp. 230-239.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[dca-1] 1 2 3 "Train, K. (2003) Discrete Choice Methods with Simulation" (PDF). Econometrics Laboratory University of California at Berkeley. Retrieved 2025-02-05.

[2] Hensher, David A. & William H. Greene (2003). "The Mixed Logit Model: The State of Practice," Transportation, Vol. 30, pp. 133–176, at p. 135.

[mt-mnl-3] McFadden, D. and Train, K. (2000). “Mixed MNL Models for Discrete Response,” Journal of Applied Econometrics, Vol. 15, No. 5, pp. 447-470.

[rt-4] David Revelt and Train, K (1998). "Mixed Logit with Repeated Choices: Households' Choices of Appliance Efficiency Level," Review of Economics and Statistics, Vol. 80, No. 4, pp. 647-657

[rec-5] 1 2 Train, K (1998)."Recreation Demand Models with Taste Variation," Land Economics, Vol. 74, No. 2, pp. 230-239.

[1]

[2]

[3]

[4]

[5]