Predictive validity

Last updated

In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure. [1] [2]

For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings. Such a cognitive test would have predictive validity if the observed correlation were statistically significant.

Predictive validity shares similarities with concurrent validity in that both are generally measured as correlations between a test and some criterion measure. In a study of concurrent validity the test is administered at the same time as the criterion is collected. This is a common method of developing validity evidence for employment tests: A test is administered to incumbent employees, then a rating of those employees' job performance is, or has already been, obtained independently of the test (often, as noted above, in the form of a supervisor rating). Note the possibility for restriction of range both in test scores and performance scores: The incumbent employees are likely to be a more homogeneous and higher performing group than the applicant pool at large.

In a strict study of predictive validity, the test scores are collected first. Then, at some later time the criterion measure is collected. Thus, for predictive validity, the employment test example is slightly different: Tests are administered, perhaps to job applicants, and then after those individuals work in the job for a year, their test scores are correlated with their first year job performance scores. Another relevant example is SAT scores: These are validated by collecting the scores during the examinee's senior year and high school and then waiting a year (or more) to correlate the scores with their first year college grade point average. Thus predictive validity provides somewhat more useful data about test validity because it has greater fidelity to the real situation in which the test will be used. After all, most tests are administered to find out something about future behavior.

As with many aspects of social science, the magnitude of the correlations obtained from predictive validity studies is usually not high. [3] A typical predictive validity for an employment test might obtain a correlation in the neighborhood of r = .35. Higher values are occasionally seen and lower values are very common. Nonetheless, the utility (that is the benefit obtained by making decisions using the test) provided by a test with a correlation of .35 can be quite substantial. More information, and an explanation of the relationship between variance and predictive validity, can be found here. [4]

Predictive validity in modern validity theory

The latest Standards for Educational and Psychological Testing [5] reflect Samuel Messick's model of validity [6] and do not use the term "predictive validity." Rather, the Standards describe validity-supporting "Evidence Based on Relationships [between the test scores and] Other Variables."

Predictive validity involves testing a group of subjects for a certain construct, and then comparing them with results obtained at some point in the future.

Related Research Articles

<span class="mw-page-title-main">Psychological statistics</span>

Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, factor analysis, experimental designs, and Bayesian statistics. The article also discusses journals in the same field.

<span class="mw-page-title-main">Psychometrics</span> Theory and technique of psychological measurement

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

<span class="mw-page-title-main">Myers–Briggs Type Indicator</span> Model of personality types

In personality typology, the Myers–Briggs Type Indicator (MBTI) is an introspective self-report questionnaire indicating differing psychological preferences in how people perceive the world and make decisions. Despite its popularity, it has been widely regarded as pseudoscience by the scientific community. The test attempts to assign a value to each of four categories: introversion or extraversion, sensing or intuition, thinking or feeling, and judging or perceiving. One letter from each category is taken to produce a four-letter test result, such as "ISTJ" or "ENFP".

Emotional intelligence (EI) is most often defined as the ability to perceive, use, understand, manage, and handle emotions. People with high emotional intelligence can recognize their own emotions and those of others, use emotional information to guide thinking and behavior, discern between different feelings and label them appropriately, and adjust emotions to adapt to environments.

<span class="mw-page-title-main">Psychological testing</span> Administration of psychological tests

Psychological testing is the administration of psychological tests. Psychological tests are administered by trained evaluators. A person's responses are evaluated according to carefully prescribed guidelines. Scores are thought to reflect individual or group differences in the construct the test purports to measure. The science behind psychological testing is psychometrics.

In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:

"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores."

Validity is the main extent to which a concept, conclusion or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool is the degree to which the tool measures what it claims to measure. Validity is based on the strength of a collection of different types of evidence described in greater detail below.

The g factor is a construct developed in psychometric investigations of cognitive abilities and human intelligence. It is a variable that summarizes positive correlations among different cognitive tasks, reflecting the fact that an individual's performance on one type of cognitive task tends to be comparable to that person's performance on other kinds of cognitive tasks. The g factor typically accounts for 40 to 50 percent of the between-individual performance differences on a given cognitive test, and composite scores based on many tests are frequently regarded as estimates of individuals' standing on the g factor. The terms IQ, general intelligence, general cognitive ability, general mental ability, and simply intelligence are often used interchangeably to refer to this common core shared by cognitive tests. However, the g factor itself is a mathematical construct indicating the level of observed correlation between cognitive tasks. The measured value of this construct depends on the cognitive tasks that are used, and little is known about the underlying causes of the observed correlations.

Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects. Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence such as content validity and criterion validity.

<span class="mw-page-title-main">Personnel selection</span> Methodical process used to hire

Personnel selection is the methodical process used to hire individuals. Although the term can apply to all aspects of the process the most common meaning focuses on the selection of workers. In this respect, selected prospects are separated from rejected applicants with the intention of choosing the person who will be the most successful and make the most valuable contributions to the organization. Its effect on the group is discerned when the selected accomplish their desired impact to the group, through achievement or tenure. The procedure of selection takes after strategy to gather data around a person so as to figure out whether that individual ought to be utilized. The strategies used must be in compliance with the various laws in respect to work force selection.

<span class="mw-page-title-main">Wonderlic test</span> Intelligence test

The Wonderlic Contemporary Cognitive Ability Test is an assessment used to measure the cognitive ability and problem-solving aptitude of prospective employees for a range of occupations. The test was created in 1939 by Eldon F. Wonderlic. It consists of 50 multiple choice questions to be answered in 12 minutes. The score is calculated as the number of correct answers given in the allotted time, and a score of 20 is intended to indicate average intelligence.

In psychometrics, criterion validity, or criterion-related validity, is the extent to which an operationalization of a construct, such as a test, relates to, or predicts, a theoretical representation of the construct—the criterion. Criterion validity is often divided into concurrent and predictive validity based on the timing of measurement for the "predictor" and outcome. Concurrent validity refers to a comparison between the measure in question and an outcome assessed at the same time. Standards for Educational & Psychological Tests states, "concurrent validity reflects only the status quo at a particular time." Predictive validity, on the other hand, compares the measure in question with an outcome assessed at a later time. Although concurrent and predictive validity are similar, it is cautioned to keep the terms and findings separated. "Concurrent validity should not be used as a substitute for predictive validity without an appropriate supporting rationale." Criterion validity is typically assessed by comparison with a gold standard test.

Concurrent validity is a type of evidence that can be gathered to defend the use of a test for predicting other outcomes. It is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonstrated when a test correlates well with a measure that has previously been validated. The two measures may be for the same construct, but more often used for different, but presumably related, constructs.

The Vividness of Visual Imagery Questionnaire (VVIQ) was developed in 1973 by the British psychologist David Marks. The VVIQ consists of 16 items in four groups of 4 items in which the participant is invited to consider the mental image formed in thinking about specific scenes and situations. The vividness of the image is rated along a 5-point scale. The questionnaire has been widely used as a measure of individual differences in vividness of visual imagery. The large body of evidence confirms that the VVIQ is a valid and reliable psychometric measure of visual image vividness.

Employment testing is the practice of administering written, oral, or other tests as a means of determining the suitability or desirability of a job applicant. The premise is that if scores on a test correlate with job performance, then it is economically useful for the employer to select employees based on scores from that test.

<span class="mw-page-title-main">Situational judgement test</span>

A situational judgement test (SJT), or situational stress test (SStT) or inventory (SSI) is a type of psychological test which presents the test-taker with realistic, hypothetical scenarios and ask them to identify the most appropriate response or to rank the responses in the order they feel is most effective. SJTs can be presented to test-takers through a variety of modalities, such as booklets, films, or audio recordings. SJTs represent a distinct psychometric approach from the common knowledge-based multiple choice item. They are often used in industrial-organizational psychology applications such as personnel selection. Situational judgement tests tend to determine behavioral tendencies, assessing how an individual will behave in a certain situation, and knowledge instruction, which evaluates the effectiveness of possible responses. Situational judgement tests could also reinforce the status quo with an organization.

Test validity is the extent to which a test accurately measures what it is supposed to measure. In the fields of psychological testing and educational testing, "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests". Although classical models divided the concept into various "validities", the currently dominant view is that validity is a single unitary construct.

Incremental validity is a type of validity that is used to determine whether a new psychometric assessment will increase the predictive ability beyond that provided by an existing method of assessment. In other words, incremental validity seeks to answer if the new test adds much information that cannot be obtained with simpler, already existing methods.

A pre-hire assessment is a test or questionnaire that candidates complete as part of the job application process. The use of a valid and expert assessment is an effective way to determine which applicants are the most qualified for a specific job based on their strengths and preferences. Employers typically use the results to determine how well each candidate's strengths and preferences match the job requirements.

References

  1. Cronbach, L.J., & Meehl, P.E. (1955). Construct validity for psychological tests. Psychological Bulletin, 52, 281–302.
  2. The Marketing Accountability Standards Board (MASB) endorses this definition as part of its ongoing Common Language in Marketing Project.
  3. "Where Predictive Validity May Fail To Make The Grade".
  4. "Do Psychometric Tests Work?".
  5. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  6. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.