Abstract
The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose “ConstraintSelector”, an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the top-ranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.
Part of this work is partially finance under project TIN2008-05924 of Ministry of Science and Innovation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. Int. Conf. on Machine Learning, pp. 19–26 (2002)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1999)
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University (2003)
Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM 2005: Society for Industrial and Applied Mathematics Int. Conf. on Data Mining Int. Conf. in Data Mining (2005)
Davidson, I., Ravi, S.S., Ester, M.: Efficient Incremental Constrained Clustering. In: KDD’07: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 240–249 (2007)
Davidson, I., Wagstaff, K., Basu, S.: Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)
Greene, D., Cunningham, P.: Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 140–151. Springer, Heidelberg (2007)
Halkidi, M., Gunopulos, D., Kumar, N., Vazirgiannis, M., Domeniconi, C.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM 2005: Proc. of the 5th IEEE Int. Conf. on Data Mining, pp. 637–640 (2005)
Morring, B.D., Martinez, T.R.: Weighted Instance Typicality Search (WITS): A Nearest Neighbor Data Reduction Algorithm. Intelligent Data Analysis 8(1), 61–78 (2004)
Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.) RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 216–223. Springer, Heidelberg (2007)
Vallejo, C.G., Troyano, J.A., Ortega, F.J.: WIRS: Un algoritmo de reducción de instancias basado en ranking. In: Borrajo, D., Castillo, L., Corchado, J.M. (eds.) CAEPIA 2007. LNCS (LNAI), vol. 4788, pp. 327–336. Springer, Heidelberg (2007)
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML 2000: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research (JAIR) 1, 1–34 (1997)
Zhang, J.: Selecting typical instances in instance-based learning. In: ML’92: Proc. of the 9th Int. Workshop on Machine Learning, pp. 470–479 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ruiz, C., Vallejo, C.G., Spiliopoulou, M., Menasalvas, E. (2010). Automated Constraint Selection for Semi-supervised Clustering Algorithm. In: Meseguer, P., Mandow, L., Gasca, R.M. (eds) Current Topics in Artificial Intelligence. CAEPIA 2009. Lecture Notes in Computer Science(), vol 5988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14264-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-14264-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14263-5
Online ISBN: 978-3-642-14264-2
eBook Packages: Computer ScienceComputer Science (R0)