Latent class model

Last updated September 05, 2025

In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent).

When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d.

In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability.^[1]^[3]

The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another.^[1]

Model

Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987^{[ incomplete short citation ]}).

In one form, the LCM is written as

p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},

where $T$ is the number of latent classes and $p_{t}$ are the so-called recruitment or unconditional probabilities that should sum to one. $p_{i_{n},t}^{n}$ are the marginal or conditional probabilities.

For a two-way latent class model, the form is

p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.

This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization.

The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label.

Related methods

There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit.

In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen.

Application

LCA may be used in many fields, such as: collaborative filtering,^[4] Behavior Genetics ^[5] and Evaluation of diagnostic tests.^[6]

References

1 2 3 Lazarsfeld, P.F. and Henry, N.W. (1968) Latent structure analysis. Boston: Houghton Mifflin
↑ Formann, A. K. (1984). Latent Class Analyse: Einführung in die Theorie und Anwendung [Latent class analysis: Introduction to theory and application]. Weinheim: Beltz.
↑ Teichert, Thorsten (2000). "Das Latent-Ciass Verfahren zur Segmentierung von wahlbasierten Conjoint-Daten. Befunde einer empirischen Anwendung" . Marketing ZFP. 22 (3): 227–240. doi:10.15358/0344-1369-2000-3-227. ISSN 0344-1369.
↑ Cheung, Kwok-Wai; Tsui, Kwok-Ching; Liu, Jiming (2004). "Extended latent class models for collaborative recommendation". IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 34 (1): 143–148. CiteSeerX 10.1.1.6.2234 . doi:10.1109/TSMCA.2003.818877. S2CID 11628144.
↑ Eaves, L. J., Silberg, J. L., Hewitt, J. K., Rutter, M., Meyer, J. M., Neale, M. C., & Pickles, A (1993). "Analyzing twin resemblance in multisymptom data: genetic applications of a latent class model for symptoms of conduct disorder in juvenile boys". Behavior Genetics. 23 (1): 5–19. doi:10.1007/bf01067550. PMID 8476390. S2CID 40678009.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Bermingham, M. L., Handel, I. G., Glass, E. J., Woolliams, J. A., de Clare Bronsvoort, B. M., McBride, S. H., Skuce, R. A., Allen, A . R., McDowell, S. W. J., & Bishop, S. C. (2015). "Hui and Walter's latent-class model extended to estimate diagnostic test properties from surveillance data: a latent model for latent data". Scientific Reports. 5 11861. Bibcode:2015NatSR...511861B. doi:10.1038/srep11861. PMC 4493568 . PMID 26148538.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Linda M. Collins; Stephanie T. Lanza (2010). Latent class and latent transition analysis for the social, behavioral, and health sciences. New York: Wiley. ISBN 978-0-470-22839-5.
Allan L. McCutcheon (1987). Latent class analysis. Quantitative Applications in the Social Sciences Series No. 64. Thousand Oaks, California: SAGE Publications. ISBN 978-0-521-59451-6.
Leo A. Goodman (1974). "Exploratory latent structure analysis using both identifiable and unidentifiable models". Biometrika . 61 (2): 215–231. doi:10.1093/biomet/61.2.215.
Paul F. Lazarsfeld, Neil W. Henry (1968). Latent Structure Analysis.

External links

Statistical Innovations, Home Page, 2016. Website with latent class software (Latent GOLD 5.1), free demonstrations, tutorials, user guides, and publications for download. Also included: online courses, FAQs, and other related software.
The Methodology Center, Latent Class Analysis, a research center at Penn State, free software, FAQ
John Uebersax, Latent Class Analysis, 2006. A web-site with bibliography, software, links and FAQ for latent class analysis

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Lazarsfeld-1] 1 2 3 Lazarsfeld, P.F. and Henry, N.W. (1968) Latent structure analysis. Boston: Houghton Mifflin

[2] Formann, A. K. (1984). Latent Class Analyse: Einführung in die Theorie und Anwendung [Latent class analysis: Introduction to theory and application]. Weinheim: Beltz.

[3] Teichert, Thorsten (2000). "Das Latent-Ciass Verfahren zur Segmentierung von wahlbasierten Conjoint-Daten. Befunde einer empirischen Anwendung" . Marketing ZFP. 22 (3): 227–240. doi:10.15358/0344-1369-2000-3-227. ISSN 0344-1369.

[Cheung2004-4] Cheung, Kwok-Wai; Tsui, Kwok-Ching; Liu, Jiming (2004). "Extended latent class models for collaborative recommendation". IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 34 (1): 143–148. CiteSeerX 10.1.1.6.2234 . doi:10.1109/TSMCA.2003.818877. S2CID 11628144.

[Eaves2013-5] Eaves, L. J., Silberg, J. L., Hewitt, J. K., Rutter, M., Meyer, J. M., Neale, M. C., & Pickles, A (1993). "Analyzing twin resemblance in multisymptom data: genetic applications of a latent class model for symptoms of conduct disorder in juvenile boys". Behavior Genetics. 23 (1): 5–19. doi:10.1007/bf01067550. PMID 8476390. S2CID 40678009.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Bermingham2015-6] Bermingham, M. L., Handel, I. G., Glass, E. J., Woolliams, J. A., de Clare Bronsvoort, B. M., McBride, S. H., Skuce, R. A., Allen, A . R., McDowell, S. W. J., & Bishop, S. C. (2015). "Hui and Walter's latent-class model extended to estimate diagnostic test properties from surveillance data: a latent model for latent data". Scientific Reports. 5 11861. Bibcode:2015NatSR...511861B. doi:10.1038/srep11861. PMC 4493568 . PMID 26148538.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]