Configural frequency analysis

Last updated November 12, 2024

Configural frequency analysis (CFA) is a method of exploratory data analysis, introduced by Gustav A. Lienert in 1969.^[1] The goal of a configural frequency analysis is to detect patterns in the data that occur significantly more (such patterns are called Types) or significantly less often (such patterns are called Antitypes) than expected by chance. Thus, the idea of a CFA is to provide the identified types and antitypes some insight into the structure of the data. Types are interpreted as concepts which are constituted by a pattern of variable values. Antitypes are interpreted as patterns of variable values that do not in general occur together.

Basic idea of the CFA algorithm

We explain the basic idea of CFA by a simple example. Assume that we have a data set that describes for each of n patients if they show certain symptoms s₁, ..., s_m. We assume for simplicity that a symptom is shown or not, i.e. we have a dichotomous data set.

Each record in the data set is thus an m-tuple (x₁, ..., x_m) where each x_i is either equal to 0 (patient does not show symptom i) or 1 (patient does show symptom i). Each such m-tuple is called a configuration. Let C be the set of all possible configurations, i.e. the set of all possible m-tuples on {0,1}^m. The data set can thus be described by listing the observed frequencies f(c) of all possible configurations in C.

The basic idea of CFA is to estimate the frequency of each configuration under the assumption that the m symptoms are statistically independent. Let e(c) be this estimated frequency under the assumption of independence.

Let p_i(1) be the probability that a member of the investigated population shows symptom s_i and p_i(0) be the probability that a member of the investigated population does not show symptom s_i. Under the assumption that all symptoms are independent we can calculate the expected relative frequency of a configuration c = (c₁ , ..., c_m) by:

e(c)=n\prod _{i=1}^{m}p_{i}(c_{i}).

Now f(c) and e(c) can be compared by a statistical test (typical tests applied in CFA are Pearson's chi-squared test, the binomial test or the hypergeometric test of Lehmacher).

If the statistical test suggests for a given $\alpha$ -level that the difference between f(c) and e(c) is significant then c is called a type if f(c) > e(c) and is called an antitype if f(c) < e(c). If there is no significant difference between f(c) and e(c), then c is neither a type nor an antitype. Thus, each configuration c can have in principle three different states. It can be a type, an antitype, or not classified.

Types and antitypes are defined symmetrically. But in practical applications researchers are mainly interested to detect types. For example, clinical studies are typically interested to detect symptom combinations that are indicators for a disease. These are by definition symptom combinations which occur more often than expected by chance, i.e. types.

Control of the alpha level

Since in CFA a significance test is applied in parallel for each configuration c there is a high risk to commit a type I error (i.e. to detect a type or antitype when the null hypothesis is true). The currently most popular method to control this is to use the Bonferroni correction for the α-level.^[2] There are a number of alternative methods to control the α-level. One alternative, the Holm–Bonferroni method introduced by Sture Holm, considers the number of tests already finished when the ith test is performed.^[3] Thus, in this method the alpha–level is not constant for all tests.

Algorithm in the non-dichotomous case

In our example above we assumed for simplicity that the symptoms are dichotomous. This is however not a necessary restriction. CFA can also be applied for symptoms (or more general attributes of an object) that are not dichotomous but have a finite number of degrees. In this case a configuration is an element of C = S₁ x ... x S_m, where S_i is the set of the possible degrees for symptom s_i.^[2]^[4]^[5]^[6]

Chance model

The assumption of the independence of the symptoms can be replaced by another method to calculate the expected frequencies e(c) of the configurations. Such a method is called a chance model.

In most applications of CFA the assumption that all symptoms are independent is used as the chance model. A CFA using that chance model is called first-order CFA. This is the classical method of CFA that is in many publications even considered to be the only CFA method. An example of an alternative chance model is the assumption that all configurations have the same probability. A CFA using that chance model is called zero-order CFA.

Related Research Articles

Biostatistics is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as is parametric statistics. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.

In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true. This is because circular reasoning would be involved: something seems true in the limited data set; therefore we hypothesize that it is true in general; therefore we wrongly test it on the same, limited data set, which seems to confirm that it is true. Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing.

The Kruskal–Wallis test by ranks, Kruskal–Wallis $test$ , or one-way ANOVA on ranks is a non-parametric statistical test for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.

The Friedman test is a non-parametric statistical test developed by Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

Chi-square automatic interaction detection (CHAID) is a decision tree technique based on adjusted significance testing. The technique was developed in South Africa in 1975 and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of AID and THAID procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s. A history of earlier supervised tree methods together with a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen, can be found in Ritschard.

In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values.

Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data. Fundamentally, two types of errors can occur: type I and type II. Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

Item tree analysis (ITA) is a data analytical method which allows constructing a hierarchical structure on the items of a questionnaire or test from observed response patterns.
Assume that we have a questionnaire with m items and that subjects can answer positive (1) or negative (0) to each of these items, i.e. the items are dichotomous. If n subjects answer the items this results in a binary data matrix D with m columns and n rows. Typical examples of this data format are test items which can be solved (1) or failed (0) by subjects. Other typical examples are questionnaires where the items are statements to which subjects can agree (1) or disagree (0).
Depending on the content of the items it is possible that the response of a subject to an item j determines her or his responses to other items. It is, for example, possible that each subject who agrees to item j will also agree to item i. In this case we say that item j implies item i. The goal of an ITA is to uncover such deterministic implications from the data set D.

Boolean analysis was introduced by Flament (1976). The goal of a Boolean analysis is to detect deterministic dependencies between the items of a questionnaire or similar data-structures in observed response patterns. These deterministic dependencies have the form of logical formulas connecting the items. Assume, for example, that a questionnaire contains items i, j, and k. Examples of such deterministic dependencies are then i → j, i ∧ j → k, and i ∨ j → k.

Psychometric software refers to specialized programs used for the psychometric analysis of data obtained from tests, questionnaires, polls or inventories that measure latent psychoeducational variables. Although some psychometric analyses can be performed using general statistical software such as SPSS, most require specialized tools designed specifically for psychometric purposes.

Mark Stemmler, was Professor of Psychological Methodology and Quality Assurance at the Faculty of Psychology and Sports Science, Bielefeld University from 2007 to 2011. He was also a member of the Center for Statistics at Bielefeld University. Currently he is Professor of Psychological Assessment at the Department of Psychology and Sports Science at the University of Erlangen-Nuremberg. Since 2010 he is also adjunct professor at the College of Health and Human Development at the Pennsylvania State University.

In statistics, a false coverage rate (FCR) is the average rate of false coverage, i.e. not covering the true parameters, among the selected intervals.

Measurement invariance or measurement equivalence is a statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds. Violations of measurement invariance may preclude meaningful interpretation of measurement data. Tests of measurement invariance are increasingly used in fields such as psychology to supplement evaluation of measurement quality rooted in classical test theory.

Alexander von Eye is a German-American psychologist and former professor of Methods in Psychology at the University of Vienna in Vienna, Austria. Before joining the University of Vienna in 2012, he taught at Michigan State University, where he served as chair of the Unit of Developmental Psychology from 2003 to 2008. Before joining Michigan State University in 1993, he served as Professor of Human Development and Psychology at Penn State University. He has developed methods for analyzing categorical and longitudinal data in psychology. He is a fellow of the American Psychological Association and the American Psychological Society. As of 2015, he lived in Montpellier, France.

References

↑ Lienert, G. A. (1969). "Die Konfigurationsfrequenzanalyse als Klassifikationsmethode in der klinischen Psychologie" [Configural frequency analysis as a classification method in clinical psychology]. In Irle, M. (ed.). Bericht über den 26. Kongress der Deutschen Gesellschaft für Psychologie in Tübingen 1968. Göttingen: Hogrefe. pp. 244–253.
1 2 Krauth, J.; Lienert, G. A. (1973). KFA. Die Konfigurationsfrequenzanalyse und ihre Anwendungen in Psychologie und Medizin[CFA. Configural frequency analysis and its application in psychology and medicine]. Freiburg: Alber.
↑ Holm, S. (1979). "A simple sequential rejective multiple test procedure". Scandinavian Journal of Statistics . 6 (2): 65–70. JSTOR 4615733.
↑ von Eye, A. (1990). Introduction to Configural Frequency Analysis: The search for types and antitypes in cross-classifications. Cambridge, UK: Cambridge University Press. ISBN 0521380901.
↑ Lautsch, E.; Weber, S. (1990). Konfigurationsfrequenzanalyse (KFA). Berlin: Volk und Wissen.
↑ Krauth, J. (1993). Einführung in die Konfigurationsfrequenzanalyse (KFA)[Introduction to Configural Frequency Analysis (CFA)]. Weinheim: Beltz, Psychologie Verlags Union. ISBN 3621271821.