Incremental validity

Last updated

Incremental validity is a type of validity that is used to determine whether a new psychometric assessment will increase the predictive ability beyond that provided by an existing method of assessment. [1] In other words, incremental validity seeks to answer if the new test adds much information that cannot be obtained with simpler, already existing methods. [2]

Contents

Definition and examples

When an assessment is used with the purpose of predicting an outcome (perhaps another test score or some other behavioral measure), a new instrument must show that it is able to increase our knowledge or prediction of the outcome variable beyond what is already known based on existing instruments. [3]

A positive example may be a clinician who uses an interview technique as well as a specific questionnaire to determine if a patient has mental illness and has better success at determining mental illness than a clinician who uses the interview technique alone. Thus, the specific questionnaire would be considered incrementally valid. Because the questionnaire in conjunction with the interview produced more accurate determinations, and added information for the clinician, the questionnaire is incrementally valid.

Statistical tests

Incremental validity is usually assessed using multiple regression methods. A regression model with other variables is fitted to the data first and then the focal variable is added to the model. A significant change in the R-square statistic (using an F-test to determine significance) is interpreted as an indication that the newly added variable offers significant additional predictive power for the dependent variable over variables previously included in the regression model. Recall that the R-square statistic in multiple regression reflects the percent of variance accounted for in the Y variable using all X variables. Thus, the change in R-square will reflect the percent of variance explained by the variable added to the model. The change in R-square is more appropriate than simply looking at the raw correlations because the raw correlations do not reflect the overlap of the newly introduced measure and the existing measures. [3]

An example this method is in the prediction of college grade point average (GPA) where high school GPA and admissions test scores (e.g., SAT, ACT) usually account for a large proportion of variance in college GPA. The use of admissions tests is supported by incremental validity evidence. For example, the pre-2000 SAT correlated .34 with freshman GPA while high school GPA correlated .36 with freshman GPA. [4] It might seem that both measures are strong predictors of freshman GPA, but in fact high school GPA and SAT scores are also strongly correlated, so we need to test for how much predictive power we get from the SAT when we account for high school GPA. The incremental validity is indicated by the change in R-square when high school GPA is included in the model. In this case, high school GPA accounts for 13% of the variance in freshman GPA and the combination of high school GPA plus SAT accounts for 20% of the variance in freshman GPA. Therefore, the SAT adds 7 percentage points to our predictive power. If this is significant and deemed an important improvement, then we can say that the SAT has incremental validity over using high school GPA alone to predict freshman GPA. Any new admissions criterion or test must add additional predictive power (incremental validity) in order to be useful in predicting college GPA when high school GPA and test scores are already known.

See also

Related Research Articles

<span class="mw-page-title-main">Standard score</span> How many standard deviations apart from the mean an observed datum is

In statistics, the standard score is the number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

Validity is the main extent to which a concept, conclusion or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool is the degree to which the tool measures what it claims to measure. Validity is based on the strength of a collection of different types of evidence described in greater detail below.

Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behaviour of the observed data, without explaining it.

In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

In statistics, multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariable regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, unit-weighted regression is a simplified and robust version of multiple regression analysis where only the intercept term is estimated. That is, it fits a model

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests
  2. Bootstrapping
  3. Cross validation

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications. As such, it encompasses a variety of statistical techniques from predictive modeling and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. It represents a major subset of machine learning applications; in some contexts, it is synonymous with machine learning.

This entry will describe the proper narrow and technical meaning of "ecological validity" as proposed by Egon Brunswik as part of the Brunswik Lens Model, the relation of "ecological validity" in "representative design" of research, and will outline the common misuses of the "ecological validity." For a more detailed explanation, see Hammond (1998).

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

In statistics, Mallows's Cp, named for Colin Lingwood Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value of Cp means that the model is relatively precise.

In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.

<span class="mw-page-title-main">Exploratory factor analysis</span> Statistical method in psychology

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. Examples of measured variables could be the physical height, weight, and pulse rate of a human being. Usually, researchers would have a large number of measured variables, which are assumed to be related to a smaller number of "unobserved" factors. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

References

  1. Sackett, Paul R.; Lievens, Filip (2008). "Personnel Selection". Annual Review of Psychology . 59: 419–450. doi:10.1146/annurev.psych.59.103006.093716. PMID   17854285.
  2. Lillenfield et al. 2005 "What's wrong with this picture?" www.psychologicalscience.org http://www.psychologicalscience.org/newsresearch/publications/journals/sa1_2.pdf
  3. 1 2 Haynes, S.N.; Lench, H.C. (2003). "Incremental Validity of New Clinical Assessment Measures" (PDF). Psychological Assessment. 15 (4): 456–466. doi:10.1037/1040-3590.15.4.456. PMID   14692842 . Retrieved 13 December 2013.
  4. Bridgeman, B.; McCamley-Jenkins, L.; Ervin, N. (2000). "Predictions of Freshman Grade-Point Average From the Revised and Recentered SAT® I: Reasoning Test" (PDF). Ets Rr 00-1. Educational Testing Service, Princeton, NJ. Retrieved 13 December 2013.