Abstract
Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as must-link and cannot-link. This type of constraints has been successfully used in popular clustering algorithms, such as k-means and hierarchical agglomerative clustering. This paper shows how clustering trees can support instance level constraints. Clustering trees are decision trees that partition the instances into homogeneous clusters. Clustering trees provide a symbolic description for each cluster. To handle non-trivial constraint sets, we extend clustering trees to support disjunctive descriptions. The paper’s main contribution is ClusILC, an efficient algorithm for building such trees. We present experiments comparing ClusILC to COP-k-means.
Chapter PDF
Similar content being viewed by others
References
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: 17th Int’l Conf. on Machine Learning, pp. 1103–1110 (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: 18th Int’l Conf. on Machine Learning, pp. 577–584 (2001)
Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: 21st Int’l Conf. on Machine Learning, pp. 81–88 (2004)
Basu, S., Bilenko, M., Mooney, R.: A probabilistic framework for semi-supervised clustering. In: 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 59–68. ACM Press, New York (2004)
Davidson, I., Ravi, S.: Clustering with constraints: Feasibility issues and the K-means algorithm. In: SIAM Int’l Data Mining Conf. (2005)
Davidson, I., Ravi, S.: Agglomerative hierarchical clustering with constraints: Theoretical and empirical results. In: 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 59–70 (2005)
Klein, D., Kamvar, S., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: 19th Int’l Conf. on Machine Learning, pp. 307–314 (2002)
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: 15th Int’l Conf. on Machine Learning, pp. 55–63 (1998)
Michalski, R., Stepp, R.: Learning from observation: Conceptual clustering. In: Machine Learning: An Artificial Intelligence Approach, vol. 1, Tioga Publishing Company (1983)
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: 4th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pp. 222–233 (2006)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Francisco (1993)
Merz, C., Murphy, P.: UCI repository of machine learning databases, University of California, Department of Information and Computer Science, Irvine, CA (1996), http://www.ics.uci.edu/~mlearn/MLRepository.html
Davidson, I., Wagstaff, K., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 115–126 (2006)
Raileanu, L., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41(1), 77–93 (2004)
Kocev, D., Struyf, J., Džeroski, S.: Beam search induction and similarity constraints for predictive clustering trees. In: 5th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers (to appear, 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Struyf, J., Džeroski, S. (2007). Clustering Trees with Instance Level Constraints. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)