Scale (social sciences)

Last updated

In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum, while other methods provide only for relative ordering of the entities.

Contents

The level of measurement is the type of data that is measured.

The word scale, including in academic literature, is sometimes used to refer to another composite measure, that of an index. Those concepts are however different. [1]

Scale construction decisions

Scale construction method

Scales constructed should be representative of the construct that it intends to measure. [2] It is possible that something similar to the scale a person intends to create will already exist, so including those scale(s) and possible dependent variables in one's survey may increase validity of one's scale.

  1. Begin by generating at least ten items to represent each of the sub-scales. Administer the survey; the more representative and larger the sample, the more credibility one will have in the scales.
  2. Review the means and standard deviations for the items, dropping any items with skewed means or very low variance.
  3. Run an exploratory factor analysis with oblique rotation on items for the scales - it is important to differentiate them based on their loading on factors to create sub-scales that represents the construct. Request factors with eigenvalues (for calculating eigenvalue for each factor square the factor loading's and sum down the columns) greater than 1. It is easier to group the items by targeted scales. The more distinct the other items, the better the chances the items will load better in one's own scale.
  4. “Cleanly loaded items” are those items that load at least .40 on one factor and more than .10 greater on that factor than on any others. Identify those in the factor pattern.
  5. “Cross loaded items” are those that do not meet the above criterion. These are candidates to drop.
  6. Identify factors with only a few items that do not represent clear concepts, these are “uninterpretable scales.” Also identify any factors with only one item. These components and their items are candidates to drop.
  7. Look at the candidates to drop and the factors to be dropped. Is there anything that needs to be retained because it is critical to one's construct. For example, if a conceptually important item only cross loads on a factor to be dropped, it is good to keep it for the next round.
  8. Drop the items, and run a confirmatory factor analysis asking the program to give only the number of factors after dropping the uninterpretable and single-item ones. Go through the process again starting at Step 3. Here various test reliability measures could also be taken.
  9. Keep running through the process until one get “clean factors” (until all factors have cleanly loaded items).
  10. Run the Alpha in the statistical program (asking for the Alpha's if each item is dropped). Any scales with insufficient Alphas should be dropped and the process repeated from Step 3. [Coefficient alpha=number of items2 x average correlation between different items/sum of all correlations in the correlation matrix (including the diagonal values)]
  11. Run correlational or regressional statistics to ensure the validity of the scale. For better practices, keep the final factors and all loadings of yours and similar scales selected in the Appendix of the created scale.

Multi-Item and Single-Item Scales

In most practical situations, multi-item scales are more effective in predicting outcomes compared to single items. The use of single-item measures in research is advised cautiously, their use should be limited to specific circumstances. [3] [4]

CriterionMulti-item scaleSingle-item scale
Construct concretenessAbstractConcrete
Construct dimensionality/complexityMultidimensional, moderately complexUnidimensional or extremely complex
Semantic redundancyLowHigh
Primary role of constructDependent or independent variableModerator or control variable
Desired precisionHighLow
Monitoring changesAppropriateProblematic
Sampled populationHomogenousDiverse
Sample sizeLargeLimited

Table: Criteria for Assessing the Potential Use of Single-Item Measures [4]

Data types

The type of information collected can influence scale construction. Different types of information are measured in different ways.

  1. Some data are measured at the nominal level . That is, any numbers used are mere labels; they express no mathematical properties. Examples are SKU inventory codes and UPC bar codes.
  2. Some data are measured at the ordinal level . Numbers indicate the relative position of items, but not the magnitude of difference. An example is a preference ranking.
  3. Some data are measured at the interval level . Numbers indicate the magnitude of difference between items, but there is no absolute zero point. Examples are attitude scales and opinion scales.
  4. Some data are measured at the ratio level . Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include: age, income, price, costs, sales revenue, sales volume, and market share.

Composite measures

Composite measures of variables are created by combining two or more separate empirical indicators into a single measure. Composite measures measure complex concepts more adequately than single indicators, extend the range of scores available and are more efficient at handling multiple items.

In addition to scales, there are two other types of composite measures. Indexes are similar to scales except multiple indicators of a variable are combined into a single measure. The index of consumer confidence, for example, is a combination of several measures of consumer attitudes. A typology is similar to an index except the variable is measured at the nominal level.

Indexes are constructed by accumulating scores assigned to individual attributes, while scales are constructed through the assignment of scores to patterns of attributes.

While indexes and scales provide measures of a single dimension, typologies are often employed to examine the intersection of two or more dimensions. Typologies are very useful analytical tools and can be easily used as independent variables, although since they are not unidimensional it is difficult to use them as a dependent variable.

Comparative and non comparative scaling

With comparative scaling, the items are directly compared with each other (example: Does one prefer Pepsi or Coke?). In noncomparative scaling each item is scaled independently of the others. (Example: How does one feel about Coke?)

Comparative scaling techniques

Non-comparative scaling techniques

Scale evaluation

Scales should be tested for reliability, generalizability, and validity. Generalizability is the ability to make inferences from a sample to the population, given the scale one have selected. Reliability is the extent to which a scale will produce consistent results. Test-retest reliability checks how similar the results are if the research is repeated under similar circumstances. Alternative forms reliability checks how similar the results are if the research is repeated using different forms of the scale. Internal consistency reliability checks how well the individual measures included in the scale are converted into a composite measure.

Scales and indexes have to be validated. Internal validation checks the relation between the individual measures included in the scale, and the composite scale itself. External validation checks the relation between the composite scale and other indicators of the variable, indicators not included in the scale. Content validation (also called face validity) checks how well the scale measures what is supposed to measured. Criterion validation checks how meaningful the scale criteria are relative to other possible criteria. Construct validation checks what underlying construct is being measured. There are three variants of construct validity. They are convergent validity, discriminant validity, and nomological validity (Campbell and Fiske, 1959; Krus and Ney, 1978). The coefficient of reproducibility indicates how well the data from the individual measures included in the scale can be reconstructed from the composite scale.

See also

Related Research Articles

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:

"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores."

Validity is the main extent to which a concept, conclusion, or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool is the degree to which the tool measures what it claims to measure. Validity is based on the strength of a collection of different types of evidence described in greater detail below.

Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.

Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.

Quantitative marketing research is the application of quantitative research techniques to the field of marketing research. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive process in which both the buyer and seller reach a satisfying agreement on the "four Ps" of marketing: Product, Price, Place (location) and Promotion.

In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.

<span class="mw-page-title-main">Likert scale</span> Psychometric measurement scale

A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.

In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test. It measures whether several items that propose to measure the same general construct produce similar scores. For example, if a respondent expressed agreement with the statements "I like to ride bicycles" and "I've enjoyed riding bicycles in the past", and disagreement with the statement "I hate bicycles", this would be indicative of good internal consistency of the test.

Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects. Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence such as content validity and criterion validity.

Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scales, of measurement: nominal, ordinal, interval, and ratio. This framework of distinguishing levels of measurement originated in psychology and has since had a complex history, being adopted and extended in some disciplines and by some scholars, and criticized or rejected by others. Other classifications include those by Mosteller and Tukey, and by Chrisman.

<span class="mw-page-title-main">Questionnaire</span> Series of questions for gathering information

A questionnaire is a research instrument that consists of a set of questions for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of close-ended questions and open-ended questions. Open-ended, long-term questions offer the respondent the ability to elaborate on their thoughts. The Research questionnaire was developed by the Statistical Society of London in 1838.

Personality Assessment Inventory (PAI), developed by Leslie Morey, is a self-report 344-item personality test that assesses a respondent's personality and psychopathology. Each item is a statement about the respondent that the respondent rates with a 4-point scale. It is used in various contexts, including psychotherapy, crisis/evaluation, forensic, personnel selection, pain/medical, and child custody assessment. The test construction strategy for the PAI was primarily deductive and rational. It shows good convergent validity with other personality tests, such as the Minnesota Multiphasic Personality Inventory and the Revised NEO Personality Inventory.

Phrase completion scales are a type of psychometric scale used in questionnaires. Developed in response to the problems associated with Likert scales, phrase completions are concise, unidimensional measures that tap ordinal level data in a manner that approximates interval level data.

A rating scale is a set of categories designed to obtain information about a quantitative or a qualitative attribute. In the social sciences, particularly psychology, common examples are the Likert response scale and 0-10 rating scales, where a person selects the number that reflecting the perceived quality of a product.

In psychology, discriminant validity tests whether concepts or measurements that are not supposed to be related are actually unrelated.

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social science research. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct. As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog (1969) and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

The multitrait-multimethod (MTMM) matrix is an approach to examining construct validity developed by Campbell and Fiske (1959). It organizes convergent and discriminant validity evidence for comparison of how a measure relates to other measures. The conceptual approach has influenced experimental design and measurement theory in psychology, including applications in structural equation models.

<span class="mw-page-title-main">Index (statistics)</span> Statistical term, a compound measure in statistics

In statistics and research design, an index is a composite statistic – a measure of changes in a representative group of individual data points, or in other words, a compound measure that aggregates multiple indicators. Indexes – also known as composite indicators – summarize and rank specific observations.

The Comrey Personality Scales is a personality test developed by Andrew L. Comrey in 1970. The CPT measures eight main scales and two validity scales. The test is currently distributed by Educational and Industrial Testing Service. The test consists of 180 items rated on a seven-point scale.

References

  1. Earl Babbie (1 January 2012). The Practice of Social Research. Cengage Learning. p. 162. ISBN   978-1-133-04979-1.
  2. McDonald, Roderick P. (2013-06-17). Test Theory: A Unified Treatment. Psychology Press. ISBN   978-1-135-67531-8.
  3. Diamantopoulos, Adamantio; Sarstedt, Marko; Fuchs, Christoph (2012). "Guidelines for choosing between multi-item and single-item scales for construct measurement: a predictive validity perspective". Journal of the Academy of Marketing Science. 40. doi:10.1007/s11747-011-0300-3. hdl: 1959.13/1052296 .
  4. 1 2 Fuchs, Christoph; Diamantopoulos, Adamantios (2009). "Using single-item measures for construct measurement in management research: Conceptual issues and application guidelines" (PDF). Die Betriebswirtschaft. 69 (2).
  5. U.-D. Reips and F. Funke (2008) "Interval level measurement with visual analogue scales in Internet-based research: VAS Generator." doi : 10.3758/BRM.40.3.699

Further reading