Clustering with Constraints

Davidson, Ian

doi:10.1007/978-1-4614-8265-9_610

Ian Davidson³

23 Accesses

Synonyms

Semi-supervised clustering

Definition

The area of clustering with constraints makes use of hints or advice in the form of constraints to aid or bias the clustering process. The most prevalent form of advice are conjunctions of pair-wise instance level constraints of the form must-link (ML) and cannot-link (CL) which state that pairs of instances should be in the same or different clusters respectively. Given a set of points P to cluster and a set of constraints C, the aim of clustering with constraints is to use the constraints to improve the clustering results. Constraints have so far being used in two main ways: (i) Writing algorithms that use a standard distance metric but attempt to satisfy all or as many constraints as possible and (ii) Using the constraints to learn a distance function that is then used in the clustering algorithm.

Historical Background

The idea of using constraints to guide clustering was first introduced by Wagstaff and Cardie in their seminal paper...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Basu S, Banerjee A, Mooney R. Semi-supervised clustering by seeding. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 27–34.
Google Scholar
Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining; 2004.
Google Scholar
Basu S, Davidson I, Wagstaff K, editors. Constrained clustering: advances in algorithms, theory and applications. New York: Chapman & Hall/CRC Press; 2008.
Google Scholar
Cohn D, Caruana R, McCallum A. Semi-supervised clustering with user feedback. Technical Report 2003–1892. Cornell University; 2003.
Google Scholar
Davidson I, Ravi SS. Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Principles of Data Mining and Knowledge Discovery, 9th European Conference; 2005. p. 59–70.
Chapter Google Scholar
Davidson I, Ravi SS. Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the SIAM International Conference on Data Mining; 2005.
Google Scholar
Davidson I, Ravi SS. Identifying and generating easy sets of constraints for clustering. In: Proceedings of the 15th National Conference on AI; 2006.
Google Scholar
Davidson I, Ester M, Ravi SS. Efficient incremental clustering with constraints. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2007. p. 204–49.
Google Scholar
Davidson I, Ravi SS. Intractability and clustering with constraints. In: Proceedings of the 24th International Conference on Machine Learning; 2007. p. 201–8.
Google Scholar
Davidson I, Ravi SS. The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Mining Knowl Discov. 2007;14(1):25–61.
Article MathSciNet Google Scholar
Gondek D, Hofmann T. Non-redundant data clustering. In: Proceedings of the 2004 IEEE International Conference on Data Mining; 2004. p. 75–82.
Google Scholar
Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 307–14.
Google Scholar
Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning; 2000. p. 1103–10.
Google Scholar
Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 577–84.
Google Scholar
Xing E, Ng A, Jordan M, Russell S. Distance metric learning, with application to clustering with side-information. Adv Neural Inf Process Syst. 2002;15:505.
Google Scholar

Download references

Author information

Authors and Affiliations

University of California-Davis, Davis, CA, USA
Ian Davidson

Authors

Ian Davidson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ian Davidson .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Department of Computer Science and Engineering, The University of California at Riverside, Bourns College of Engineering, Riverside, CA, USA
Dimitrios Gunopulos

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Davidson, I. (2018). Clustering with Constraints. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_610

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_610
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics