Cultural consensus theory is an approach to information pooling [1] (aggregation, data fusion) which supports a framework for the measurement and evaluation of beliefs as cultural; shared to some extent by a group of individuals. Cultural consensus models guide the aggregation of responses from individuals to estimate (1) the culturally appropriate answers to a series of related questions (when the answers are unknown) and (2) individual competence (cultural competence) in answering those questions. The theory is applicable when there is sufficient agreement across people to assume that a single set of answers exists. The agreement between pairs of individuals is used to estimate individual cultural competence. Answers are estimated by weighting responses of individuals by their competence and then combining responses.
Cultural consensus theory assumes that cultural beliefs are learned and shared across people and that there is a common understanding of what the world and society are all about. [2] Since the amount of information in a culture is too large for any one individual to master, individuals know different subsets of the cultural knowledge and vary in their cultural competence. Cultural beliefs are beliefs held by a majority of culture members. Given a set of questions, on the same topic, shared cultural beliefs or norms regarding the answers can be estimated by aggregating the responses across a sample of culture members. When an agreement is close to absolute, estimating answers is straightforward. The problem addressed by cultural consensus theory is how to estimate beliefs when there is some degree of heterogeneity present in responses. In general, cultural consensus theory provides a framework for determining whether responses are sufficiently homogeneous to estimate a single set of shared answers and then estimating the answers and individual cultural competence in answering the questions. The theory is designed for the estimation of “culturally correct” answers to questions that are unknown a priori to the researcher, as well as item response effects (e.g., knowledge level, response biases, item difficulty). [3]
Cultural consensus models do not create consensus or explain why consensus exists; they simply facilitate the discovery and description of possible consensus. A high degree of agreement among raters must be present in responses to use consensus theory – only with high agreement does it make sense to aggregate responses to estimate beliefs of the group. Although there are statistical methods to evaluate whether agreement among raters is greater than chance (Binomial test, Friedman test, or Kendall's coefficient of concordance), these methods do not provide a best estimate of the “true” answers nor do they estimate competence of the raters. Cultural consensus theory can estimate competence from the agreement between subjects and then, answers are estimated by “weighting” individual responses by competence prior to aggregation.
A very important feature in the aggregation of responses is that the combined responses of individuals will be more accurate than the responses of each individual included in the aggregation. Reliability theory in psychology (specifically, the reliability coefficient and the Spearman–Brown prediction formula) provides a mathematical estimate of the accuracy or validity of aggregated responses from the number of units being combined and the level of agreement among the units. In this case, the accuracy of aggregated responses can be calculated from the number of subjects and the average Pearson correlation coefficient between all pairs of subjects (across questions).
To use cultural consensus theory, at least three assumptions must be met:
Cultural consensus theory encompasses formal and informal models. Practically speaking, these models are often used to estimate cultural beliefs, including the degree to which individuals report such beliefs. [5] The formal cultural consensus model models the decision-making process for answering questions. [6] [7] This version is limited to categorical-type responses: multiple-choice type questions (including those with dichotomous true/false or yes/no responses) and responses to open-ended questions (with a single word or short phrase response for each question). This version of the model has a series of additional assumptions that must be met, i.e., no response bias. [6] [8] The formal model has direct parallels in signal detection theory and latent class analysis. An informal version of the model is available as a set of analytic procedures and obtains similar information with fewer assumptions. [9] The informal model parallels a factor analysis on people (without rotation) and thus has similarities to Q factor analysis (as in Q Methodology). The informal version of the model can accommodate interval estimates and ranked response data. Both approaches provide estimates of the culturally correct answers and estimates of individual differences in the accuracy of reported information.
One specific method of the formal version used in the analysis of data is the mathematical model, which is a set of logical axioms as well as derived propositions and assumptions that explain how empirical variables fit in the model's parameters. [5] The informal model, on the other hand, uses reliability analysis. [10]
Cultural competence is estimated from the similarity in responses between pairs of subjects since the agreement between a pair of respondents is a function of their individual competencies. In the formal model, the similarity is the probability that matched responses occur (match method. [6] or the probability of particular response combinations occur (covariance method [7] ). Simple match or covariance measures are then corrected for guessing and the proportion of positive responses, respectively. In the informal model, similarity is calculated with a Pearson correlation coefficient. [9]
A matrix of agreement coefficients between all pairs of subjects is then factored with a minimum residual factoring method (principal axis factoring without rotation) to solve for the unknown competence values on the main diagonal. (For the informal model, the maximum likelihood factor analysis algorithm is preferred, but principal axis factoring can be used as well.) To determine whether the solution meets cultural consensus criteria, that only a single factor is present, a goodness of fit rule is used. If the ratio of the first to second eigenvalues is large with subsequently small values and all first factor loadings are positive, then it is assumed that the data contain only a single factor or a single response pattern.
Individual competence values are used to weight the responses and estimate the culturally correct answers. In the formal model, a confidence level (Bayesian adjusted probabilities) is obtained for each answer from the pattern of responses and the individual competence scores. In the informal model, responses are also weighted, using a linear model. When factoring a correlation matrix, the estimated answers appear as the first set of factor scores. Also, note that factor scores are usually provided as standardized variables (mean of zero), but may be transformed back to your original data collection units.
When used as a method for analysis, the cultural consensus theory allows the following: the determination whether the observed variability in knowledge is cultural; the measurement of cultural competence that each individual possesses; and, the determination of culturally correct knowledge. [4]
Cultural Consensus analyses may be performed with software applications. The formal consensus model is currently only available in the software packages ANTHROPAC or UCINET. Analysis procedures for the informal model are available in most statistical packages. The informal model can be run within a factor analysis procedure, requesting the minimum-residual (principal axis factoring) algorithm method that solves for the missing diagonal without rotation. When factor analysis is used for consensus applications, ~~the data must be transposed, so that questions are the unit of analysis (the rows in a data matrix) and people are the variables~~ (the columns in the data matrix).
An advantage of cultural consensus is the availability of necessary sample size information and that necessary sample sizes do not need to be very large. Sample size determination in a consensus analysis is similar to other types of analyses; namely, that when variability is low, power is high and small samples will suffice. Here, variability is the agreement (competence) among subjects. For the formal model, sample size can be estimated from the level of agreement (e.g., assuming a low average competence level of .50), the proportion of items to be correctly classified (assuming a high level, .95), and high confidence (.999) a minimum sample size of 29 (per subgroup) is necessary.[1,5] For higher levels of competence and lower levels of accuracy and confidence, smaller samples sizes are necessary. Similarly, sample size can be estimated with reliability theory and the Spearman–Brown prophecy formula (applied to people instead of items). For a relatively low level of agreement (an average correlation of .25 between people, comparable to an average competence of .50) and a high degree of desired validity (.95 correlation between the estimated answers and the true answers), a study would require a minimum sample size of 30 subjects. [11]
In summary, cultural consensus theory offers a framework for estimating cultural beliefs. A formal model is based on the decision-making process model of how questions are answered (with parameters for competence, response bias, and guessing). The model proceeds from axioms and uses mathematical proofs to arrive at estimates of competence and answers to a series of questions. The informal model is a set of statistical procedures that provides similar information. Given a series of related questions, the agreement between people's reported answers is used to estimate their cultural competence. Cultural competence is how much an individual knows or shares group beliefs. Since the extraction of individual competencies depends upon having a single factor solution, the ratio of the first and second eigenvalues (> 3:1) serves as a goodness-of-fit indicator that a single factor is present in the pattern of responses. Culturally correct answers are estimated by weighting and combining individuals’ responses.
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.
Analysis is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle, though analysis as a formal concept is a relatively recent development.
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. The subset is meant to reflect the whole population and statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.
In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:
"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores."
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors plus "error" terms, hence factor analysis can be thought of as a special case of errors-in-variables models.
In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.
A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.
In psychology, the false consensus effect, also known as consensus bias, is a pervasive cognitive bias that causes people to "see their own behavioral choices and judgments as relatively common and appropriate to existing circumstances". In other words, they assume that their personal qualities, characteristics, beliefs, and actions are relatively widespread through the general population.
A personality test is a method of assessing human personality constructs. Most personality assessment instruments are in fact introspective self-report questionnaire measures or reports from life records (L-data) such as rating scales. Attempts to construct actual performance tests of personality have been very limited even though Raymond Cattell with his colleague Frank Warburton compiled a list of over 2000 separate objective tests that could be used in constructing objective personality tests. One exception, however, was the Objective-Analytic Test Battery, a performance test designed to quantitatively measure 10 factor-analytically discerned personality trait dimensions. A major problem with both L-data and Q-data methods is that because of item transparency, rating scales, and self-report questionnaires are highly susceptible to motivational and response distortion ranging from lack of adequate self-insight to downright dissimulation depending on the reason/motivation for the assessment being undertaken.
In psychology, the Asch conformity experiments or the Asch paradigm were a series of studies directed by Solomon Asch studying if and how individuals yielded to or defied a majority group and the effect of such influences on beliefs and opinions.
The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, published in 2004, is a book written by James Surowiecki about the aggregation of information in groups, resulting in decisions that, he argues, are often better than could have been made by any single member of the group. The book presents numerous case studies and anecdotes to illustrate its argument, and touches on several fields, primarily economics and psychology.
Mathematical psychology is an approach to psychological research that is based on mathematical modeling of perceptual, thought, cognitive and motor processes, and on the establishment of law-like rules that relate quantifiable stimulus characteristics with quantifiable behavior. The mathematical approach is used with the goal of deriving hypotheses that are more exact and thus yield stricter empirical validations. There are five major research areas in mathematical psychology: learning and memory, perception and psychophysics, choice and decision-making, language and thinking, and measurement and scaling.
Consensus-based assessment expands on the common practice of consensus decision-making and the theoretical observation that expertise can be closely approximated by large numbers of novices or journeymen. It creates a method for determining measurement standards for very ambiguous domains of knowledge, such as emotional intelligence, politics, religion, values and culture in general. From this perspective, the shared knowledge that forms cultural consensus can be assessed in much the same way as expertise or general intelligence.
The wisdom of the crowd is the collective opinion of a diverse and independent group of individuals rather than that of a single expert. This process, while not new to the Information Age, has been pushed into the mainstream spotlight by social information sites such as Quora, Reddit, Stack Exchange, Wikipedia, Yahoo! Answers, and other web resources which rely on collective human knowledge. An explanation for this phenomenon is that there is idiosyncratic noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.
Absolute probability judgement is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. 'HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.
For the last several decades research in cross-cultural psychology has focused on the cultural patterning and positioning of values. Unfortunately, values have low predictive power for actual behavior. Researchers at the Chinese University of Hong Kong decided to develop a questionnaire to measure beliefs, i.e., what is believed to be true about the world, to add to the power of values, i.e., what the person believes is valuable, in predicting behavior.
The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.
William W. "Bill" Dressler is an American anthropologist known for his concept of cultural consonance and work on cultural models especially in the context of biocultural medical anthropology. He has done fieldwork in Mexico, Brazil, the West Indies, and the United States, and worked at the University of Alabama since 1978. He is now Professor Emeritus. In 2023, he was elected to the National Academy of Sciences.