Choice model simulation

Last updated July 23, 2020

Although the concept choice models is widely understood and practiced these days, it is often difficult to acquire hands-on knowledge in simulating choice models. While many stat packages provide useful tools to simulate, researchers attempting to test and simulate new choice models with data often encounter problems from as simple as scaling parameter to misspecification. This article goes beyond simply defining discrete choice models. Rather, it aims at providing a comprehensive overview of how to simulate such models in computer.

Defining choice set

When a researcher has some consumer choice data in his/her hand and tries to construct a choice model and simulate it against the data, he/she needs to first define a choice set. A Choice Set in discrete choice models is defined to be finite, exhaustive, and mutually exclusive. For instance, consider households' choice of how many laptops to own. The researcher can define the choice set depending on the nature of the data and the interpretation they wish to draw, as long as it satisfies three properties mentioned above. Some examples of choice sets that meet the categories are the following:

0, 1, More than 1 laptop
0, 1, 2, More than 2 laptops
Less than 2, 2, 3, 4, More than 4 laptops

Defining consumer utility

Suppose a student is trying to decide which pub he/she should go for a beer after his/her last final exam. Suppose there are two pubs in the town of the college: an Irish pub and an American pub. The researcher wishes to predict which pub he/she will choose based on the price (P) of beer and the distance (D) to each pub, assuming they are known to the researcher. Then, the consumer utilities for choosing the Irish pub and the American pub can be defined:

U_{i}=\alpha P_{i}+\beta D_{i}+\varepsilon _{i}\,

(1)

U_{a}=\alpha P_{a}+\beta D_{a}+\varepsilon _{a}\,

(2)

where $\varepsilon$ captures unobserved variables that affect consumer utilities.

Defining choice probabilities

Once the consumer utilities have been specified, the researcher can derive choice probabilities. Namely, the probability of the student choosing the Irish pub over the American pub is

{\begin{aligned}P_{i}&=\Pr(U_{i}>U_{a})=\Pr(\alpha P_{i}+\beta D_{i}+\varepsilon _{i}>\alpha P_{a}+\beta D_{a}+\varepsilon _{a})\\&=\Pr(\varepsilon _{i}-\varepsilon _{a}>\alpha P_{a}+\beta D_{a}-\alpha P_{i}-\beta D_{i})\end{aligned}}

Denoting the observed portion of the utility function as V,

P_{i}=\Pr(\varepsilon _{i}-\varepsilon _{a}>V_{a}-V_{i})

(3)

In the end, discrete choice modeling comes down to specifying the distribution of $\varepsilon$ (or $\varepsilon _{i}-\varepsilon _{a}$ ) and solving the integral over the range of $\varepsilon$ to calculate $P_{i}$ . Extending this to more general situations with

N consumers (n = 1, 2, ..., N),
J choices of consumption (j = 1, 2, ... , J),

The choice probability of consumer n choosing j can be written as

P_{nj}=\Pr(U_{nj}>U_{ni})

(4)

for all i other than j

Identification

1. What's irrelevant

From equation (4), it's obvious that $P_{nj}$ does not change as long as the inequality in the probability argument on the right side stays the same. In other words, adding or multiplying by a constant to all $U_{n1}...U_{nJ}$ does not change the choice probably, thus no change in interpretation.

2. Alternative-specific constants

Unlike adding a constant to all the utilities, adding alternative-specific constants does alter the choice probabilities. Suppose alternative-specific constants C_i and C_a are added to (1) and (2):

U_{i}=C_{i}+\alpha P_{i}+\beta D_{i}+\varepsilon _{i}\,

U_{a}=C_{a}+\alpha P_{a}+\beta D_{a}+\varepsilon _{a}\,

Then, depending on the value of the estimated alternative-specific constants, the choice probability may change. Also, if we write the choice probability in the format of (3),

P_{i}=\Pr(\varepsilon _{i}-\varepsilon _{a}>(C_{a}-C_{i})+\alpha P_{a}-\alpha P_{i}+\beta D_{a}-\beta D_{i})

only the difference between $C_{a}andC_{i}$ affects the choice probability (i.e. our estimation can only identify the difference). So it's convenient to normalize all the alternative-specific constants to one of the alternatives. If we normalize to $C_{i}$ , then we estimate the following model:

U_{i}=\alpha P_{i}+\beta D_{i}+\varepsilon _{i}\,

U_{a}=(C_{a}-C_{i})+\alpha P_{a}+\beta D_{a}+\varepsilon _{a}\,

When there are more than 2 choices in the choice set, we can pick any choice i and normalize the alternative-specific constants to that choice by subtracting $C_{i}$ from all other alternative-specific constants.

3. Sociodemographic variables

In deciding between the Irish pub and the American pub, if the researcher has access to additional sociodemographic variables such as income, they can enter the consumer utility equation in various ways. Denote the student's income as Y. If the researcher believes that the income affects the utility linearly, then

U_{i}=\alpha P_{i}+\beta D_{i}+\gamma Y+\varepsilon _{i}\,

If the researcher believes that the sociodemographic variable interacts with other variable such as price, then the utility can be written as

U_{i}=\alpha P_{i}/Y+\beta D_{i}+\varepsilon _{i}\,

General models

As mentioned earlier, calculation and justification of choice probabilities rely on the properties of the error (i.e. the unobservables) distribution function the researcher specifies. Here is the quick overview of frequently used models that each differs in specification

1. Logit:

Assumes unobserved factors have the same variance with zero correlation across alternatives.
iid extreme value unobserved factors
The cumulative distribution of difference in extreme values is Logistics function
Logistics function has a closed form solution => No simulation necessary.

2. GEV (Generalized extreme value distribution)

Allows correlation in unobserved factors across alternatives.
iid extreme value unobserved factors
The cumulative distribution of difference in extreme values is Logistics function
Logistics function has a closed form solution => No simulation necessary.

3. Probit

Unobserved factors have a jointly normal distribution.
No closed form for the cumulative distribution of normal distribution. Simulation necessary.

4. Mixed logit

Allows any distribution in unobserved factors
No closed form for the cumulative distribution of normal distribution. Simulation necessary.

Related Research Articles

In statistical mechanics, Maxwell–Boltzmann statistics describes the average distribution of non-interacting material particles over various energy states in thermal equilibrium, and is applicable when the temperature is high enough or the particle density is low enough to render quantum effects negligible.

In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.

In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

In econometrics, endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term. The distinction between endogenous and exogenous variables originated in simultaneous equations models, where one separates variables whose values are determined by the model from variables which are predetermined; ignoring simultaneity in the estimation leads to biased estimates as it violates the exogeneity assumption of the Gauss–Markov theorem. The problem of endogeneity is unfortunately, oftentimes ignored by researchers conducting non-experimental research and doing so precludes making policy recommendations. Instrumental variable techniques are commonly used to address this problem.

The Havriliak–Negami relaxation is an empirical modification of the Debye relaxation model in electromagnetism. Unlike the Debye model, the Havriliak–Negami relaxation accounts for the asymmetry and broadness of the dielectric dispersion curve. The model was first used to describe the dielectric relaxation of some polymers, by adding two exponential parameters to the Debye equation:

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variables. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis. On the other hand, discrete choice analysis examines situations in which the potential outcomes are discrete, such that the optimum is not characterized by standard first-order conditions. Thus, instead of examining “how much” as in problems with continuous choice variables, discrete choice analysis examines “which one.” However, discrete choice analysis can also be used to examine the chosen quantity when only a few distinct quantities must be chosen from, such as the number of vehicles a household chooses to own and the number of minutes of telecommunications service a customer decides to purchase. Techniques such as logistic regression and probit regression can be used for empirical analysis of discrete choice.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of $independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.$

In mathematical finance, the SABR model is a stochastic volatility model, which attempts to capture the volatility smile in derivatives markets. The name stands for "stochastic alpha, beta, rho", referring to the parameters of the model. The SABR model is widely used by practitioners in the financial industry, especially in the interest rate derivative markets. It was developed by Patrick S. Hagan, Deep Kumar, Andrew Lesniewski, and Diana Woodward.

In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated, then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. This approach was initially developed by Siddhartha Chib and Edward Greenberg.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector $, and an observation drawn from a multinomial distribution with probability vector p and number of trials n . The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.$

Mixed logit is a fully general statistical model for examining discrete choices. The motivation for the mixed logit model arises from the limitations of the standard logit model. The standard logit model has three primary limitations, which mixed logit solves: "It obviates the three limitations of standard logit by allowing for random taste variation, unrestricted substitution patterns, and correlation in unobserved factors over time." Mixed logit can also utilize any distribution for the random coefficients, unlike probit which is limited to the normal distribution. It has been shown that a mixed logit model can approximate to any degree of accuracy any true random utility model of discrete choice, given an appropriate specification of variables and distribution of coefficients."

In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logit model as one method of multiclass classification. It is not to be confused with the multivariate probit model, which is used to model correlated binary outcomes for more than one independent variable.

In mathematical statistics, polynomial least squares comprises a broad range of statistical methods for estimating an underlying polynomial that describes observations. These methods include polynomial regression, curve fitting, linear regression, least squares, ordinary least squares, simple linear regression, linear least squares, approximation theory and method of moments. Polynomial least squares has applications in radar trackers, estimation theory, signal processing, statistics, and econometrics.

In statistics and econometrics, the maximum score estimator is a nonparametric estimator for discrete choice models developed by Charles Manski in 1975. Unlike the multinomial probit and multinomial logit estimators, it makes no assumptions about the distribution of the unobservable part of utility. However, its statistical properties are more complicated than the multinomial probit and logit models, making statistical inference difficult. To address these issues, Joel Horowitz proposed a variant, called the smoothed maximum score estimator.

Dynamic discrete choice (DDC) models, also known as discrete choice models of dynamic programming, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the present value of utility, generalizing the utility theory upon which discrete choice models are based.

References

A Nevo (2000). "Practitioners Guide to Estimation of Random Coefficients Logit Models of Demand," Journal of Economics & Management Strategy, 9(4), 513–548
Colombino, U. (2010). Equilibrium policy simulations with random utility models of labour supply, Carlo Alberto Notebooks 156, Collegio Carlo Alberto.
Kenneth E. Train, " Discrete Choice Methods with Simulation", Massachusetts: Cambridge University Press, 2003.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.