Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Clustering with Constraints

  • Ian Davidson
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_610

Synonyms

Semi-supervised clustering

Definition

The area of clustering with constraints makes use of hints or advice in the form of constraints to aid or bias the clustering process. The most prevalent form of advice are conjunctions of pair-wise instance level constraints of the form must-link (ML) and cannot-link (CL) which state that pairs of instances should be in the same or different clusters respectively. Given a set of points P to cluster and a set of constraints C, the aim of clustering with constraints is to use the constraints to improve the clustering results. Constraints have so far being used in two main ways: (i) Writing algorithms that use a standard distance metric but attempt to satisfy all or as many constraints as possible and (ii) Using the constraints to learn a distance function that is then used in the clustering algorithm.

Historical Background

The idea of using constraints to guide clustering was first introduced by Wagstaff and Cardie in their seminal paper...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Basu S, Banerjee A, Mooney R. Semi-supervised clustering by seeding. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 27–34.Google Scholar
  2. 2.
    Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining; 2004.Google Scholar
  3. 3.
    Basu S, Davidson I, Wagstaff K, editors. Constrained clustering: advances in algorithms, theory and applications. New York: Chapman & Hall/CRC Press; 2008.Google Scholar
  4. 4.
    Cohn D, Caruana R, McCallum A. Semi-supervised clustering with user feedback. Technical Report 2003–1892. Cornell University; 2003.Google Scholar
  5. 5.
    Davidson I, Ravi SS. Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Principles of Data Mining and Knowledge Discovery, 9th European Conference; 2005. p. 59–70.CrossRefGoogle Scholar
  6. 6.
    Davidson I, Ravi SS. Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the SIAM International Conference on Data Mining; 2005.Google Scholar
  7. 7.
    Davidson I, Ravi SS. Identifying and generating easy sets of constraints for clustering. In: Proceedings of the 15th National Conference on AI; 2006.Google Scholar
  8. 8.
    Davidson I, Ester M, Ravi SS. Efficient incremental clustering with constraints. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2007. p. 204–49.Google Scholar
  9. 9.
    Davidson I, Ravi SS. Intractability and clustering with constraints. In: Proceedings of the 24th International Conference on Machine Learning; 2007. p. 201–8.Google Scholar
  10. 10.
    Davidson I, Ravi SS. The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Mining Knowl Discov. 2007;14(1):25–61.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gondek D, Hofmann T. Non-redundant data clustering. In: Proceedings of the 2004 IEEE International Conference on Data Mining; 2004. p. 75–82.Google Scholar
  12. 12.
    Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 307–14.Google Scholar
  13. 13.
    Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning; 2000. p. 1103–10.Google Scholar
  14. 14.
    Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 577–84.Google Scholar
  15. 15.
    Xing E, Ng A, Jordan M, Russell S. Distance metric learning, with application to clustering with side-information. Adv Neural Inf Process Syst. 2002;15:505.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of California-DavisDavisUSA

Section editors and affiliations

  • Dimitrios Gunopulos
    • 1
  1. 1.Department of Computer Science and EngineeringThe University of California at Riverside, Bourns College of EngineeringRiversideUSA