Mokken scale

Last updated

The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971. [1]

Contents

Mokken Scales have been used in psychology, [2] education, [3] [4] political science, [1] [5] public opinion, [6] medicine [7] and nursing. [8] [9]

Overview

An example of an item response function An Item Response Function.jpeg
An example of an item response function
Item response functions that differ in their difficulty Item Response Functions that differ in their difficulty.jpeg
Item response functions that differ in their difficulty
Item response functions that differ in their discrimination function Item Response Functions that differ in their discrimination functon.jpeg
Item response functions that differ in their discrimination function

Mokken scaling belongs to item response theory. In essence, a Mokken scale is a non-parametric, probabilistic version of Guttman scale. Both Guttman and Mokken scaling can be used to assess whether a number of items measure the same underlying concept. Both Guttman and Mokken scaling are based on the assumption that the items are hierarchically ordered: this means that they are ordered by degree of "difficulty". Difficulty here means the percentage of respondents that answers the question affirmatively. The hierarchical order means that a respondent who answered a difficult question correctly is assumed to answer an easy question correctly. [10] The key difference between a Guttman and Mokken scale is that Mokken scaling is probabilistic in nature. The assumption is not that every respondent who answered a difficult question affirmatively will necessarily answer an easy question affirmatively. Violations of this are called Guttman errors. Instead, the assumption is that respondents who answered a difficult question affirmatively are more likely to answer an easy question affirmatively. The scalability of the scale is measured by Loevinger's coefficient H. H compares the actual Guttman errors to the expected number of errors if the items would be unrelated. [10]

The chance that a respondent will answer an item correctly is described by an item response function. Mokken scales are similar to Rasch scales, in that they both adapted Guttman scales to a probabilistic model. However, Mokken scaling is described as 'non-parametric' because it makes no assumptions about the precise shape of the item response function, only that it is monotone and non-decreasing. The key difference between Mokken scales and Rasch scales is that the latter assumes that all items have the same item response function. In Mokken scaling the Item Response Functions differ for different items. [5]

Mokken scales can come in two forms: first as the Double Monotonicity model, where the items can differ in their difficulty. It is essentially an ordinal version of Rasch scale; and second, as the Monotone Homogeneity model, where items differ in their discrimination parameter, which means that there can be a weaker relationship between some items and the latent variable and other items and the latent variable. [5] Double Monotonicity models are used most often.

Monotone homogeneity

Monotone homogeneity models are based on three assumptions. [5]

  1. There is a unidimensional latent trait on which subject and items can be ordered.
  2. The item response function is monotonically nondecreasing. This means that as one moves from one side of the latent variable to the other, the chance of giving a positive response should never decrease.
  3. The items are locally stochastically independent: this means that responses to any two items by the same respondent should not be the function any other aspect of the respondent or the item, but his or her position on the latent trait. [5]

Double monotonicity and invariant item ordering

The Double Monotonicity model adds a fourth assumption, namely non-intersecting Item response functions, resulting in items that remain invariant rank-ordering. [11] There has been some confusion in Mokken scaling between the concepts of Double Monotonicity model and invariant item ordering. [12] The latter implies that all respondents to a series of questions all respond to them in the same order across the whole range of the latent trait. For dichotomously scored items, the Double Monotonicity model can mean invariant item ordering; however, for polytomously scored items this does not necessarily hold. [13] For invariant item ordering to hold not only should the item response functions not intersect, also, the item step response function between one level and the next within each item must not intersect. [14]

Sample size

The issue of sample size for Mokken scaling is largely unresolved. Work using simulated samples and varying the item quality in the scales (Loevinger's coefficient and the correlation between scales) suggests that, where the quality of the items is high that lower samples sizes in the region of 250–500 are required compared with sample sizes of 1250–1750 where the item quality is low. [3] Using real data from the Warwick Edinburgh Mental Well Being Scale (WEMWBS) [15] suggests that the required sample size depends on the Mokken scaling parameters of interest as they do not all respond in the same way to varying sample size. [16]

Extensions

While Mokken scaling analysis was originally developed to measure the extent to which individual dichotomous items form a scale, it has since been extended for polytomous items. [5] Moreover, while Mokken scaling analysis is a confirmatory method, meant to test whether a number of items form a coherent scale (like confirmatory factor analysis), an Automatic Item Selection Procedure has been developed to explore which latent dimensions structure responses on a number of observable items (like factor analysis). [17]

Analysis

Mokken scaling software is available within the public domain statistical software R (programming language) and also within the data analysis and statistical software stata. MSP5 for Windows for use on personal computers is no longer compatible with current versions of Microsoft Windows. Also within the R (programming language), unusual response patterns in Mokken Scales can be checked using the package PerFit. [18] Two guides on how to conduct a Mokken scale analysis have been published. [19] [20]

Related Research Articles

Nonparametric statistics is the type of statistics that is not restricted by assumptions concerning the nature of the population from which a sample is drawn. This is opposed to parametric statistics, for which a problem is restricted a priori by assumptions concerning the specific distribution of the population and parameters. Nonparametric statistics is based on either not assuming a particular distribution or having a distribution specified but with the distribution's parameters not specified in advance. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.

In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum, while other methods provide only for relative ordering of the entities.

Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.

Cronbach's alpha, also known as rho-equivalent reliability or coefficient alpha, is a reliability coefficient and a measure of the internal consistency of tests and measures.

In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.

<span class="mw-page-title-main">Likert scale</span> Psychometric measurement scale

A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.

<span class="mw-page-title-main">Questionnaire</span> Series of questions for gathering information

A questionnaire is a research instrument that consists of a set of questions for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of close-ended questions and open-ended questions. Open-ended, long-term questions offer the respondent the ability to elaborate on their thoughts. The Research questionnaire was developed by the Statistical Society of London in 1838.

In the analysis of multivariate observations designed to assess subjects with respect to an attribute, a Guttman scale is a single (unidimensional) ordinal scale for the assessment of the attribute, from which the original observations may be reproduced. The discovery of a Guttman scale in data depends on their multivariate distribution's conforming to a particular structure. Hence, a Guttman scale is a hypothesis about the structure of the data, formulated with respect to a specified attribute and a specified population and cannot be constructed for any given set of observations. Contrary to a widespread belief, a Guttman scale is not limited to dichotomous variables and does not necessarily determine an order among the variables. But if variables are all dichotomous, the variables are indeed ordered by their sensitivity in recording the assessed attribute, as illustrated by Example 1.

The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.

The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. For example, the model is applicable to the use of Likert scales, rating scales, and to educational assessment items for which successively higher integer scores are intended to indicate increasing levels of competence or attainment.

Phrase completion scales are a type of psychometric scale used in questionnaires. Developed in response to the problems associated with Likert scales, Phrase completions are concise, unidimensional measures that tap ordinal level data in a manner that approximates interval level data.

In statistics, scale analysis is a set of methods to analyze survey data, in which responses to questions are combined to measure a latent variable. These items can be dichotomous or polytomous. Any measurement for such data is required to be reliable, valid, and homogeneous with comparable results over different studies.

Acquiescence bias, also known as agreement bias, is a category of response bias common to survey research in which respondents have a tendency to select a positive response option or indicate a positive connotation disproportionately more frequently. Respondents do so without considering the content of the question or their 'true' preference. Acquiescence is sometimes referred to as "yea-saying" and is the tendency of a respondent to agree with a statement when in doubt. Questions affected by acquiescence bias take the following format: a stimulus in the form of a statement is presented, followed by 'agree/disagree,' 'yes/no' or 'true/false' response options. For example, a respondent might be presented with the statement "gardening makes me feel happy," and would then be expected to select either 'agree' or 'disagree.' Such question formats are favoured by both survey designers and respondents because they are straightforward to produce and respond to. The bias is particularly prevalent in the case of surveys or questionnaires that employ truisms as the stimuli, such as: "It is better to give than to receive" or "Never a lender nor a borrower be". Acquiescence bias can introduce systematic errors that affect the validity of research by confounding attitudes and behaviours with the general tendency to agree, which can result in misguided inference. Research suggests that the proportion of respondents who carry out this behaviour is between 10% and 20%.

The person-fit analysis is a technique for determining if a person's results on a given test are valid, meaning they are a result of the trait being tested, and not some external factor such as cheating, falling asleep in the middle of the test or otherwise.

In statistics, polychoric correlation is a technique for estimating the correlation between two hypothesised normally distributed continuous latent variables, from two observed ordinal variables. Tetrachoric correlation is a special case of the polychoric correlation applicable when both observed variables are dichotomous. These names derive from the polychoric and tetrachoric series which are used for estimation of these correlations.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

Cultural consensus theory is an approach to information pooling which supports a framework for the measurement and evaluation of beliefs as cultural; shared to some extent by a group of individuals. Cultural consensus models guide the aggregation of responses from individuals to estimate (1) the culturally appropriate answers to a series of related questions and (2) individual competence in answering those questions. The theory is applicable when there is sufficient agreement across people to assume that a single set of answers exists. The agreement between pairs of individuals is used to estimate individual cultural competence. Answers are estimated by weighting responses of individuals by their competence and then combining responses.

<span class="mw-page-title-main">Anton Formann</span>

Anton K. Formann was an Austrian research psychologist, statistician, and psychometrician. He is renowned for his contributions to item response theory, latent class analysis, the measurement of change, mixture models, categorical data analysis, and quantitative methods for research synthesis (meta-analysis).

<span class="mw-page-title-main">Roger Watson (academic)</span> British academic (born 1955)

Roger Watson is a British academic. He is Academic Dean in the School of Nursing, Southwest Medical University, China and Professor of Nursing, Caritas Institute of Higher Education, Hong Kong. Until 2022 was the Professor of Nursing at the University of Hull. He is the editor-in-chief of Nurse Education in Practice and an Editorial Board Member of the WikiJournal of Medicine. Watson was the Founding Chair of the Lancet Commission on Nursing, and a founding member of the Global Advisory Group for the Future of Nursing. Watson was elected Vice President of the National Conference of University Professors in 2020 and became President in 2022 until 2024.

The Edinburgh Feeding Evaluation in Dementia (EdFED) Scale is a psychometric screening tool to assess difficulty with self-feeding in older people with dementia. It was developed at The University of Edinburgh by Roger Watson and Ian Deary.

References

  1. 1 2 Mokken, Rob (1971). A theory and procedure of scale analysis: With applications in political research. Walter de Gruyter.
  2. Bedford, A.; Watson, R.; Lyne, J.; Tibbles, J.; Davies, F.; Deary, I.J. (2009). "Mokken scaling and principal components analyses of the CORE-OM in a large clinical sample". Clinical Psychology and Psychotherapy. 17 (1): 51–62. doi:10.1002/cpp.649. PMID   19728291. S2CID   10445195.
  3. 1 2 Straat, J.H., Van Ark, L.A. and Sijtsma, K. (2014) Minimum Sample Size Requirements for Mokken Scale Analysis in Educational and Psychological Measurement Volume: 74 issue: 5, page(s): 809822
  4. Palmgren, P.J., Brodin, U., Nilsson G.H., Watson, R., Stenfors, T. (2018) Investigating psychometric properties and dimensional structure of an educational environment measure (DREEM) using Mokken scale analysis – a pragmatic approach BMC Medical Education volume = 18, issue = 1, article 235 doi : 10.1186/s12909-018-1334-8
  5. 1 2 3 4 5 6 van Schuur, Wijbrandt (2003). "Mokken scale analysis: Between the Guttman scale and parametric item response theory". Political Analysis. 11 (2): 139–163. doi: 10.1093/pan/mpg002 .
  6. Gillespie, M.; Tenvergert, E.M.; Kingma, J. (1987). "[Using Mokken scale analysis to develop unidimensional scales ]". Quantity and Quality. 21 (4): 393–408. doi:10.1007/BF00172565. S2CID   118280333.
  7. Stochl, J.; Jones, P.B.; Croudance, C.J. (2012). "Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers". BMC Medical Research Methodology. 12: 74. doi: 10.1186/1471-2288-12-74 . PMC   3464599 . PMID   22686586.
  8. Cook, N.F., McCance, T., McCormack, B., Barr, O., Slater, P. (2018) Perceived caring attributes and priorities of pre‐registration nursing students throughout a nursing curriculum underpinned by person‐centredness Journal of Clinical Nursing doi: 10.1111/jocn.14341
  9. Aleo, G., Bagnasco, A., Watson, R., Dyson, J., Cowdell, F., Catania, G., Zanini, M.P., Cozani, E., Parodi, A., Saso, L. (2019) Comparing questionnaires across cultures: Using Mokken scaling to compare the Italian and English versions of the MOLES index Nursing Open doi: 10.1002/nop2.297
  10. 1 2 Crichton, N. (1999) "Mokken Scale Analysis" Journal of Clinical Nursing 8, 388
  11. Sijtsma, Klaas; Molenaar, Ivo (2002). Introduction to Nonparametric Item Response Theory – SAGE Research Methods. doi:10.4135/9781412984676. ISBN   9780761908128 . Retrieved 2019-11-06.{{cite book}}: |website= ignored (help)
  12. Meijer, R.R. (2010) A comment on Watson, Deary, and Austin (2007) and Watson, Roberts, Gow, and Deary (2008): How to investigate whether personality items form a hierarchical scale? Personality and Individual Differences doi: 10.1016/j.paid.2009.11.004
  13. Ligtvoet, R., van der Ark, L.A., te Marvelde J.M., and Sijtsma, K. (2010) Investigating an Invariant Item Ordering for Polytomously Scored Items in Educational and Psychological Measurement Volume: 70 issue: 4, page(s): 578–595
  14. Sijtsma, K., Meijer R.R., van der Ark, L.A. (2011) Mokken scale analysis as time goes by: An update for scaling practitioners Personality and Individual Differences (2011) Volume: 50, page(s): 31–37
  15. Stewart-Brown, Sarah; Janmohamed, Kulsum. Parkinson, Jane (ed.). "Warwick-Edinburgh Mental Well-being Scale: User Guide" (PDF). Retrieved 1 December 2022.
  16. Watson, Roger; Egberink, Iris JL; Kirke, Lisa; Tendeiro, Jorge N.; Doyle, Frank (2018). "What are the minimal sample size requirements for Mokken scaling? An empirical example with the Warwick–Edinburgh Mental Well-Being Scale". Health Psychology and Behavioral Medicine. 6 (1): 203–213. doi: 10.1080/21642850.2018.1505520 . PMC   8114397 . PMID   34040828.
  17. van der Ark, L.A. (Andries) (2012). "New Developments in Mokken Scale Analysis in R". Journal of Statistical Software. 48 (5). doi: 10.18637/jss.v048.i05 .
  18. Meijer, R.R., Niessen, A.S.M., and Tendeiro, J.N. (2015) "A practical guide to check the consistency of item response patterns in clinical research through person-fit statistics: examples and a computer programme" Assessment 23, 56-62
  19. Sijtsma, K; van der Ark, A (2016). "A tutorial on how to do a Mokken scale analysis on your test and questionnaire data". British Journal of Mathematical and Statistical Psychology. 70 (1): 137–185. doi:10.1111/bmsp.12078. hdl: 11245.1/459fd643-a539-445a-a67a-b62b88c5a262 . PMID   27958642.
  20. Wind, Stefanie A. (2017). "An Instructional Module on Mokken Scale Analysis". Educational Measurement: Issues and Practice. 36 (2): 50–66. doi:10.1111/emip.12153.