Skip to main content

Clustering with Constraints

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 23 Accesses

Synonyms

Semi-supervised clustering

Definition

The area of clustering with constraints makes use of hints or advice in the form of constraints to aid or bias the clustering process. The most prevalent form of advice are conjunctions of pair-wise instance level constraints of the form must-link (ML) and cannot-link (CL) which state that pairs of instances should be in the same or different clusters respectively. Given a set of points P to cluster and a set of constraints C, the aim of clustering with constraints is to use the constraints to improve the clustering results. Constraints have so far being used in two main ways: (i) Writing algorithms that use a standard distance metric but attempt to satisfy all or as many constraints as possible and (ii) Using the constraints to learn a distance function that is then used in the clustering algorithm.

Historical Background

The idea of using constraints to guide clustering was first introduced by Wagstaff and Cardie in their seminal paper...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Basu S, Banerjee A, Mooney R. Semi-supervised clustering by seeding. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 27–34.

    Google Scholar 

  2. Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining; 2004.

    Google Scholar 

  3. Basu S, Davidson I, Wagstaff K, editors. Constrained clustering: advances in algorithms, theory and applications. New York: Chapman & Hall/CRC Press; 2008.

    Google Scholar 

  4. Cohn D, Caruana R, McCallum A. Semi-supervised clustering with user feedback. Technical Report 2003–1892. Cornell University; 2003.

    Google Scholar 

  5. Davidson I, Ravi SS. Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Principles of Data Mining and Knowledge Discovery, 9th European Conference; 2005. p. 59–70.

    Chapter  Google Scholar 

  6. Davidson I, Ravi SS. Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the SIAM International Conference on Data Mining; 2005.

    Google Scholar 

  7. Davidson I, Ravi SS. Identifying and generating easy sets of constraints for clustering. In: Proceedings of the 15th National Conference on AI; 2006.

    Google Scholar 

  8. Davidson I, Ester M, Ravi SS. Efficient incremental clustering with constraints. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2007. p. 204–49.

    Google Scholar 

  9. Davidson I, Ravi SS. Intractability and clustering with constraints. In: Proceedings of the 24th International Conference on Machine Learning; 2007. p. 201–8.

    Google Scholar 

  10. Davidson I, Ravi SS. The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Mining Knowl Discov. 2007;14(1):25–61.

    Article  MathSciNet  Google Scholar 

  11. Gondek D, Hofmann T. Non-redundant data clustering. In: Proceedings of the 2004 IEEE International Conference on Data Mining; 2004. p. 75–82.

    Google Scholar 

  12. Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 307–14.

    Google Scholar 

  13. Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning; 2000. p. 1103–10.

    Google Scholar 

  14. Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 577–84.

    Google Scholar 

  15. Xing E, Ng A, Jordan M, Russell S. Distance metric learning, with application to clustering with side-information. Adv Neural Inf Process Syst. 2002;15:505.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Davidson .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Davidson, I. (2018). Clustering with Constraints. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_610

Download citation

Publish with us

Policies and ethics