![]() | This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations .(June 2022) |
The person-fit analysis is a technique for determining if a person's results on a given test are valid, meaning they are a result of the trait being tested, and not some external factor such as cheating, falling asleep in the middle of the test or otherwise.
An item-score vector is a list of "scores" that a person gets on the items of a test, where "1" is correct and "0" is incorrect. For example, if a person took a ten-item quiz and answered only the first five questions correctly, the vector would be {1, 1, 1, 1, 1, 0, 0, 0, 0, 0}. The analysis can determine how unlikely an item-score vector is compared to a hypothesized test theory model such as item response theory, or compared with the majority of item-score vectors in the sample.
In individual decision-making in fields such as education, psychology, and personnel selection, it is important that test users have confidence in the test scores used. The validity of individual test scores may be threatened when the examinee's answers are governed by factors other than the psychological trait of interest—factors that can range from something as benign as the examinee dozing off to concerted fraud efforts. Person-fit methods are used to detect item-score vectors where such external factors may be relevant, and as a result, indicate invalid measurement.
Unfortunately, person-fit statistics can only tell if the set of responses is likely or unlikely to be valid, and does not prove anything. The results of the analysis might look like an examinee cheated, but the ability to prove it by returning to when the test was administered is not possible. This limits its practical applicability on an individual scale. However, it might be useful on a larger scale; if most examinees at a certain test site or with a certain proctor have unlikely responses, an investigation might be warranted.
Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, factor analysis, experimental designs, and Bayesian statistics. The article also discusses journals in the same field.
Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.
In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:
"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores."
In psychometrics, item response theory is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.
The Thematic Apperception Test (TAT) is a projective psychological test developed during the 1930s by Henry A. Murray and Christiana D. Morgan at Harvard University. Proponents of the technique assert that subjects' responses, in the narratives they make up about ambiguous pictures of people, reveal their underlying motives, concerns, and the way they see the social world. Historically, the test has been among the most widely researched, taught, and used of such techniques.
A personality test is a method of assessing human personality constructs. Most personality assessment instruments are in fact introspective self-report questionnaire measures or reports from life records (L-data) such as rating scales. Attempts to construct actual performance tests of personality have been very limited even though Raymond Cattell with his colleague Frank Warburton compiled a list of over 2000 separate objective tests that could be used in constructing objective personality tests. One exception, however, was the Objective-Analytic Test Battery, a performance test designed to quantitatively measure 10 factor-analytically discerned personality trait dimensions. A major problem with both L-data and Q-data methods is that because of item transparency, rating scales, and self-report questionnaires are highly susceptible to motivational and response distortion ranging from lack of adequate self-insight to downright dissimulation depending on the reason/motivation for the assessment being undertaken.
In psychology, a projective test is a personality test designed to let a person respond to ambiguous stimuli, presumably revealing hidden emotions and internal conflicts projected by the person into the test. This is sometimes contrasted with a so-called "objective test" / "self-report test", which adopt a "structured" approach as responses are analyzed according to a presumed universal standard, and are limited to the content of the test. The responses to projective tests are content analyzed for meaning rather than being based on presuppositions about meaning, as is the case with objective tests. Projective tests have their origins in psychoanalysis, which argues that humans have conscious and unconscious attitudes and motivations that are beyond or hidden from conscious awareness.
In trait theory, the Big Five personality traits are a group of five characteristics used to study personality:
Computerized adaptive testing (CAT) is a form of computer-based test that adapts to the examinee's ability level. For this reason, it has also been called tailored testing. In other words, it is a form of computer-administered test in which the next item or set of items selected to be administered depends on the correctness of the test taker's responses to the most recent items administered.
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.
A computerized classification test (CCT) refers to a Performance Appraisal System that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fail," but the term also includes tests that classify examinees into more than two categories. While the term may generally be considered to refer to all computer-administered tests for classification, it is usually used to refer to tests that are interactively administered or of variable-length, similar to computerized adaptive testing (CAT). Like CAT, variable-length CCTs can accomplish the goal of the test with a fraction of the number of items used in a conventional fixed-form test.
The Millon Clinical Multiaxial Inventory – Fourth Edition (MCMI-IV) is the most recent edition of the Millon Clinical Multiaxial Inventory. The MCMI is a psychological assessment tool intended to provide information on personality traits and psychopathology, including specific mental disorders outlined in the DSM-5. It is intended for adults with at least a 5th grade reading level who are currently seeking mental health services. The MCMI was developed and standardized specifically on clinical populations, and the authors are very specific that it should not be used with the general population or adolescents. However, there is evidence base that shows that it may still retain validity on non-clinical populations, and so psychologists will sometimes administer the test to members of the general population, with caution. The concepts involved in the questions and their presentation make it unsuitable for those with below average intelligence or reading ability.
Differential item functioning (DIF) is a statistical property of a test item that indicates how likely it is for individuals from distinct groups, possessing similar abilities, to respond differently to the item. It manifests when individuals from different groups, with comparable skill levels, do not have an equal likelihood of answering a question correctly. There are two primary types of DIF: uniform DIF, where one group consistently has an advantage over the other, and nonuniform DIF, where the advantage varies based on the individual's ability level. The presence of DIF requires review and judgment, but it doesn't always signify bias. DIF analysis provides an indication of unexpected behavior of items on a test. DIF characteristic of an item isn't solely determined by varying probabilities of selecting a specific response among individuals from different groups. Rather, DIF becomes pronounced when individuals from different groups, who possess the same underlying true ability, exhibit differing probabilities of giving a certain response. Even when uniform bias is present, test developers sometimes resort to assumptions such as DIF biases may offset each other due to the extensive work required to address it, compromising test ethics and perpetuating systemic biases. Common procedures for assessing DIF are Mantel-Haenszel procedure, logistic regression, item response theory (IRT) based methods, and confirmatory factor analysis (CFA) based methods.
The Sixteen Personality Factor Questionnaire (16PF) is a self-reported personality test developed over several decades of empirical research by Raymond B. Cattell, Maurice Tatsuoka and Herbert Eber. The 16PF provides a measure of personality and can also be used by psychologists, and other mental health professionals, as a clinical instrument to help diagnose psychiatric disorders, and help with prognosis and therapy planning. The 16PF can also provide information relevant to the clinical and counseling process, such as an individual's capacity for insight, self-esteem, cognitive style, internalization of standards, openness to change, capacity for empathy, level of interpersonal trust, quality of attachments, interpersonal needs, attitude toward authority, reaction toward dynamics of power, frustration tolerance, and coping style. Thus, the 16PF instrument provides clinicians with a normal-range measurement of anxiety, adjustment, emotional stability and behavioral problems. Clinicians can use 16PF results to identify effective strategies for establishing a working alliance, to develop a therapeutic plan, and to select effective therapeutic interventions or modes of treatment. It can also be used within other areas of psychology, such as career and occupational selection.
The attribute hierarchy method (AHM), is a cognitively based psychometric procedure developed by Jacqueline Leighton, Mark Gierl, and Steve Hunka at the Centre for Research in Applied Measurement and Evaluation (CRAME) at the University of Alberta. The AHM is one form of cognitive diagnostic assessment that aims to integrate cognitive psychology with educational measurement for the purposes of enhancing instruction and student learning. A cognitive diagnostic assessment (CDA), is designed to measure specific knowledge states and cognitive processing skills in a given domain. The results of a CDA yield a profile of scores with detailed information about a student’s cognitive strengths and weaknesses. This cognitive diagnostic feedback has the potential to guide instructors, parents and students in their teaching and learning processes.
Psychometric software refers to specialized programs used for the psychometric analysis of data obtained from tests, questionnaires, polls or inventories that measure latent psychoeducational variables. Although some psychometric analyses can be performed using general statistical software such as SPSS, most require specialized tools designed specifically for psychometric purposes.
The Revised NEO Personality Inventory is a personality inventory that assesses an individual on five dimensions of personality. These are the same dimensions found in the Big Five personality traits. These traits are openness to experience, conscientiousness, extraversion(-introversion), agreeableness, and neuroticism. In addition, the NEO PI-R also reports on six subcategories of each Big Five personality trait.
Measurement invariance or measurement equivalence is a statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds. Violations of measurement invariance may preclude meaningful interpretation of measurement data. Tests of measurement invariance are increasingly used in fields such as psychology to supplement evaluation of measurement quality rooted in classical test theory.
The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.
In statistical models applied to psychometrics, congeneric reliability a single-administration test score reliability coefficient, commonly referred to as composite reliability, construct reliability, and coefficient omega. is a structural equation model (SEM)-based reliability coefficients and is obtained from on a unidimensional model. is the second most commonly used reliability factor after tau-equivalent reliability(; also known as Cronbach's alpha), and is often recommended as its alternative.