Constrained clustering

Last updated June 27, 2025

In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a data clustering algorithm.^[1] A cluster in which the members conform to all must-link and cannot-link constraints is called a chunklet.

Types of constraints

Both a must-link and a cannot-link constraint define a relationship between two data instances. Together, the sets of these constraints act as a guide for which a constrained clustering algorithm will attempt to find chunklets (clusters in the dataset which satisfy the specified constraints).

A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster.
A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster.

Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions.^[2]

Examples

Examples of constrained clustering algorithms include:

COP K-means ^[1]
PCKmeans (Pairwise Constrained K-means) ^[3]
CMWK-Means (Constrained Minkowski Weighted K-Means) ^[4]

References

1 2 Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. (2001). "Constrained K-means Clustering with Background Knowledge". Proceedings of the Eighteenth International Conference on Machine Learning. pp. 577–584.
↑ Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). "Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT). pp. 331–342. doi:10.5441/002/edbt.2014.31.
↑ Basu, Sugato; Banerjee, Arindam; Mooney, Raymond J. (April 2004). Active Semi-Supervision for Pairwise Constrained Clustering (PDF). Proceedings of the 2004 SIAM International Conference on Data Mining. pp. 333–344.
↑ de Amorim, R. C. (2012). "Constrained Clustering with Minkowski Weighted K-Means". Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics. pp. 13–17. doi:10.1109/CINTI.2012.6496753.

This machine learning-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[copkmeans-1] 1 2 Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. (2001). "Constrained K-means Clustering with Background Knowledge". Proceedings of the Eighteenth International Conference on Machine Learning. pp. 577–584.

[pourrajabi-2] Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). "Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT). pp. 331–342. doi:10.5441/002/edbt.2014.31.

[3] Basu, Sugato; Banerjee, Arindam; Mooney, Raymond J. (April 2004). Active Semi-Supervision for Pairwise Constrained Clustering (PDF). Proceedings of the 2004 SIAM International Conference on Data Mining. pp. 333–344.

[4] Amorim, R. C. (2012). "Constrained Clustering with Minkowski Weighted K-Means". Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics. pp. 13–17. doi:10.1109/CINTI.2012.6496753.

[1]

[2]

[3]

[4]

Constrained clustering

Contents

Types of constraints

Examples

References