Polytomous Rasch model

Last updated

The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. For example, the model is applicable to the use of Likert scales, rating scales, and to educational assessment items for which successively higher integer scores are intended to indicate increasing levels of competence or attainment.

Contents

Background and overview

The polytomous Rasch model was derived by Andrich (1978), subsequent to derivations by Rasch (1961) and Andersen (1977), through resolution of relevant terms of a general form of Rasch's model into threshold and discrimination parameters. When the model was derived, Andrich focused on the use of Likert scales in psychometrics, both for illustrative purposes and to aid in the interpretation of the model.

The model is sometimes referred to as the Rating Scale Model when (i) items have the same number of thresholds and (ii) in turn, the difference between any given threshold location and the mean of the threshold locations is equal or uniform across items. This is, however, a potentially misleading name for the model because it is far more general in its application than to so-called rating scales. The model is also sometimes referred to as the Partial Credit Model, particularly when applied in educational contexts. The Partial Credit Model (Masters, 1982) has an identical algebraic form but was derived from a different starting point at a later time, and is interpreted in a somewhat different manner. The Partial Credit Model also allows different thresholds for different items. Although this name for the model is often used, Andrich (2005) provides a detailed analysis of problems associated with elements of Masters' approach, which relate specifically to the type of response process that is compatible with the model, and to empirical situations in which estimates of threshold locations are disordered. These issues are discussed in the elaboration of the model that follows.

The model is a general probabilistic measurement model which provides a theoretical foundation for the use of sequential integer scores, in a manner that preserves the distinctive property that defines Rasch models: specifically, total raw scores are sufficient statistics for the parameters of the models. See the main article for the Rasch model for elaboration of this property. In addition to preserving this property, the model permits a stringent empirical test of the hypothesis that response categories represent increasing levels of a latent attribute or trait, hence are ordered. The reason the model provides a basis for testing this hypothesis is that it is empirically possible that thresholds will fail to display their intended ordering.

In this more general form of the Rasch model for dichotomous data, the score on a particular item is defined as the count of the number of threshold locations on the latent trait surpassed by the individual. This does not mean that a measurement process entails making such counts in a literal sense; rather, threshold locations on a latent continuum are usually inferred from a matrix of response data through an estimation process such as Conditional Maximum likelihood estimation. In general, the central feature of the measurement process is that individuals are classified into one of a set of contiguous, or adjoining, ordered categories. A response format employed in a given experimental context may achieve this in a number of ways. For example, respondents may choose a category they perceive best captures their level of endorsement of a statement (such as 'strongly agree'), judges may classify persons into categories based on well-defined criteria, or a person may categorise a physical stimulus based on perceived similarity to a set of reference stimuli.

The polytomous Rasch model specialises to the model for dichotomous data when responses are classifiable into only two categories. In this special case, the item difficulty and (single) threshold are identical. The concept of a threshold is elaborated on in the following section.

The Polytomous Rasch Model

First, let

be an integer random variable where is the maximum score for item i. That is, the variable is a random variable that can take on integer values between 0 and a maximum of .

In the polytomous Rasch model (Andrich, 1978), the probability of the outcome is

where is the kth threshold location of item i on a latent continuum, is the location of person n on the same continuum, and is the maximum score for item i. These equations are the same as

where the value of is chosen for computational convenience that is: .

The Rating Scale Model

Similarly, the Rasch "Rating Scale" model (Andrich, 1978) is

where is the difficulty of item i and is the kth threshold location of the rating scale which is in common to all the items. m is the maximum score and is identical for all the items. is chosen for computational convenience.

Application

Applied in a given empirical context, the model can be considered a mathematical hypothesis that the probability of a given outcome is a probabilistic function of these person and item parameters. The graph showing the relation between the probability of a given category as a function of person location is referred to as a Category Probability Curve (CPC). An example of the CPCs for an item with five categories, scored from 0 to 4, is shown in Figure 1.

Figure 1: Rasch category probability curves for an item with five ordered categories CPCs.png
Figure 1: Rasch category probability curves for an item with five ordered categories

A given threshold partitions the continuum into regions above and below its location. The threshold corresponds with the location on a latent continuum at which it is equally likely a person will be classified into adjacent categories, and therefore to obtain one of two successive scores. The first threshold of item i, , is the location on the continuum at which a person is equally likely to obtain a score of 0 or 1, the second threshold is the location at which a person is equally likely to obtain a score of 1 and 2, and so on. In the example shown in Figure 1, the threshold locations are 1.5, 0.5, 0.5, and 1.5 respectively.

Respondents may obtain scores in many different ways. For example, where Likert response formats are employed, Strongly Disagree may be assigned 0, Disagree a 1, Agree a 2, and Strongly Agree a 3. In the context of assessment in educational psychology, successively higher integer scores may be awarded according to explicit criteria or descriptions which characterise increasing levels of attainment in a specific domain, such as reading comprehension. The common and central feature is that some process must result in classification of each individual into one of a set of ordered categories that collectively comprise an assessment item.

Elaboration of the model

In elaborating on features of the model, Andrich (2005) clarifies that its structure entails a simultaneous classification process, which results in a single manifest response, and involves a series of dichotomous latent responses. In addition, the latent dichotomous responses operate within a Guttman structure and associated response space, as is characterised to follow.

Let

be a set of independent dichotomous random variables. Andrich (1978, 2005) shows that the polytomous Rasch model requires that these dichotomous responses conform with a latent Guttman response subspace:

in which x ones are followed by m-x zeros. For example, in the case of two thresholds, the permissible patterns in this response subspace are:

where the integer score x implied by each pattern (and vice versa) is as shown. The reason this subspace is implied by the model is as follows. Let

be the probability that and let . This function has the structure of the Rasch model for dichotomous data. Next, consider the following conditional probability in the case two thresholds:

It can be shown that this conditional probability is equal to

which, in turn, is the probability given by the polytomous Rasch model. From the denominator of these equations, it can be seen that the probability in this example is conditional on response patterns of or . It is therefore evident that in general, the response subspace , as defined earlier, is intrinsic to the structure of the polytomous Rasch model. This restriction on the subspace is necessary to the justification for integer scoring of responses: i.e. such that the score is simply the count of ordered thresholds surpassed. Andrich (1978) showed that equal discrimination at each of the thresholds is also necessary to this justification.

In the polytomous Rasch model, a score of x on a given item implies that an individual has simultaneously surpassed x thresholds below a certain region on the continuum, and failed to surpass the remaining m  x thresholds above that region. In order for this to be possible, the thresholds must be in their natural order, as shown in the example of Figure 1. Disordered threshold estimates indicate a failure to construct an assessment context in which classifications represented by successive scores reflect increasing levels of the latent trait. For example, consider a situation in which there are two thresholds, and in which the estimate of the second threshold is lower on the continuum than the estimate of the first threshold. If the locations are taken literally, classification of a person into category 1 implies that the person's location simultaneously surpasses the second threshold but fails to surpass the first threshold. In turn, this implies a response pattern {0,1}, a pattern which does not belong to the subspace of patterns that is intrinsic to the structure of the model, as described above.

When threshold estimates are disordered, the estimates cannot therefore be taken literally; rather the disordering, in itself, inherently indicates that the classifications do not satisfy criteria that must logically be satisfied in order to justify the use of successive integer scores as a basis for measurement. To emphasise this point, Andrich (2005) uses an example in which grades of fail, pass, credit, and distinction are awarded. These grades, or classifications, are usually intended to represent increasing levels of attainment. Consider a person A, whose location on the latent continuum is at the threshold between regions on the continuum at which a pass and credit are most likely to be awarded. Consider also another person B, whose location is at the threshold between the regions at which a credit and distinction are most likely to be awarded. In the example considered by Andrich (2005, p. 25), disordered thresholds would, if taken literally, imply that the location of person A (at the pass/credit threshold) is higher than that of person B (at the credit/distinction threshold). That is, taken literally, the disordered threshold locations would imply that a person would need to demonstrate a higher level of attainment to be at the pass/credit threshold than would be needed to be at the credit/distinction threshold. Clearly, this disagrees with the intent of such a grading system. The disordering of the thresholds would, therefore, indicate that the manner in which grades are being awarded is not in agreement with the intention of the grading system. That is, the disordering would indicate that the hypothesis implicit in the grading system - that grades represent ordered classifications of increasing performance - is not substantiated by the structure of the empirical data.

Related Research Articles

The laser diode rate equations model the electrical and optical performance of a laser diode. This system of ordinary differential equations relates the number or density of photons and charge carriers (electrons) in the device to the injection current and to device and material parameters such as carrier lifetime, photon lifetime, and the optical gain.

In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Compartmental models are a very general modelling technique. They are often applied to the mathematical modelling of infectious diseases. The population is assigned to compartments with labels – for example, S, I, or R,. People may progress between compartments. The order of the labels usually shows the flow patterns between the compartments; for example SEIS means susceptible, exposed, infectious, then susceptible again.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.

Estimation of a Rasch model is used to estimate the parameters of the Rasch model. Various techniques are employed to estimate the parameters from matrices of response data. The most common approaches are types of maximum likelihood estimation, such as joint and conditional maximum likelihood estimation. Joint maximum likelihood (JML) equations are efficient, but inconsistent for a finite number of items, whereas conditional maximum likelihood (CML) equations give consistent and unbiased item estimates. Person estimates are generally thought to have bias associated with them, although weighted likelihood estimation methods for the estimation of person parameters reduce the bias.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis. On the other hand, discrete choice analysis examines situations in which the potential outcomes are discrete, such that the optimum is not characterized by standard first-order conditions. Thus, instead of examining "how much" as in problems with continuous choice variables, discrete choice analysis examines "which one". However, discrete choice analysis can also be used to examine the chosen quantity when only a few distinct quantities must be chosen from, such as the number of vehicles a household chooses to own and the number of minutes of telecommunications service a customer decides to purchase. Techniques such as logistic regression and probit regression can be used for empirical analysis of discrete choice.

The softmax function, also known as softargmax or normalized exponential function, converts a vector of K real numbers into a probability distribution of K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

<span class="mw-page-title-main">Stretched exponential function</span>

The stretched exponential function

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

<span class="mw-page-title-main">Biological neuron model</span> Mathematical descriptions of the properties of certain cells in the nervous system

Biological neuron models, also known as spiking neuron models, are mathematical descriptions of the conduction of electrical signals in neurons. Neurons are electrically excitable cells within the nervous system, able to fire electric signals, called action potentials, across a neural network.These mathematical models describe the role of the biophysical and geometrical characteristics of neurons on the conduction of electrical activity.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

The Scherrer equation, in X-ray diffraction and crystallography, is a formula that relates the size of sub-micrometre crystallites in a solid to the broadening of a peak in a diffraction pattern. It is often referred to, incorrectly, as a formula for particle size measurement or analysis. It is named after Paul Scherrer. It is used in the determination of size of crystals in the form of powder.

An affine term structure model is a financial model that relates zero-coupon bond prices to a spot rate model. It is particularly useful for deriving the yield curve – the process of determining spot rate model inputs from observable bond market data. The affine class of term structure models implies the convenient form that log bond prices are linear functions of the spot rate.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

The spike response model (SRM) is a spiking neuron model in which spikes are generated by either a deterministic or a stochastic threshold process. In the SRM, the membrane voltage V is described as a linear sum of the postsynaptic potentials (PSPs) caused by spike arrivals to which the effects of refractoriness and adaptation are added. The threshold is either fixed or dynamic. In the latter case it increases after each spike. The SRM is flexible enough to account for a variety of neuronal firing pattern in response to step current input. The SRM has also been used in the theory of computation to quantify the capacity of spiking neural networks; and in the neurosciences to predict the subthreshold voltage and the firing times of cortical neurons during stimulation with a time-dependent current stimulation. The name Spike Response Model points to the property that the two important filters and of the model can be interpreted as the response of the membrane potential to an incoming spike (response kernel , the PSP) and to an outgoing spike (response kernel , also called refractory kernel). The SRM has been formulated in continuous time and in discrete time. The SRM can be viewed as a generalized linear model (GLM) or as an (integrated version of) a generalized integrate-and-fire model with adaptation.

References