Automated Constraint Selection for Semi-supervised Clustering Algorithm

Ruiz, Carlos; Vallejo, Carlos G.; Spiliopoulou, Myra; Menasalvas, Ernestina

doi:10.1007/978-3-642-14264-2_16

Carlos Ruiz²²,
Carlos G. Vallejo²³,
Myra Spiliopoulou²⁴ &
…
Ernestina Menasalvas²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5988))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

636 Accesses
1 Citations

Abstract

The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose “ConstraintSelector”, an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the top-ranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.

Part of this work is partially finance under project TIN2008-05924 of Ministry of Science and Innovation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. Int. Conf. on Machine Learning, pp. 19–26 (2002)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1999)
Article Google Scholar
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University (2003)
Google Scholar
Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM 2005: Society for Industrial and Applied Mathematics Int. Conf. on Data Mining Int. Conf. in Data Mining (2005)
Google Scholar
Davidson, I., Ravi, S.S., Ester, M.: Efficient Incremental Constrained Clustering. In: KDD’07: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 240–249 (2007)
Google Scholar
Davidson, I., Wagstaff, K., Basu, S.: Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)
Chapter Google Scholar
Greene, D., Cunningham, P.: Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 140–151. Springer, Heidelberg (2007)
Chapter Google Scholar
Halkidi, M., Gunopulos, D., Kumar, N., Vazirgiannis, M., Domeniconi, C.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM 2005: Proc. of the 5th IEEE Int. Conf. on Data Mining, pp. 637–640 (2005)
Google Scholar
Morring, B.D., Martinez, T.R.: Weighted Instance Typicality Search (WITS): A Nearest Neighbor Data Reduction Algorithm. Intelligent Data Analysis 8(1), 61–78 (2004)
Google Scholar
Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.) RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 216–223. Springer, Heidelberg (2007)
Chapter Google Scholar
Vallejo, C.G., Troyano, J.A., Ortega, F.J.: WIRS: Un algoritmo de reducción de instancias basado en ranking. In: Borrajo, D., Castillo, L., Corchado, J.M. (eds.) CAEPIA 2007. LNCS (LNAI), vol. 4788, pp. 327–336. Springer, Heidelberg (2007)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML 2000: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)
Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research (JAIR) 1, 1–34 (1997)
MathSciNet MATH Google Scholar
Zhang, J.: Selecting typical instances in instance-based learning. In: ML’92: Proc. of the 9th Int. Workshop on Machine Learning, pp. 470–479 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Informatica, Universidad Politecnica de Madrid, Spain
Carlos Ruiz & Ernestina Menasalvas
Department of Computer Languages and Systems, Universidad de Sevilla, Spain
Carlos G. Vallejo
Faculty of Computer Science, Magdeburg University, Germany
Myra Spiliopoulou

Authors

Carlos Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Carlos G. Vallejo
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Ernestina Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIA - CSIC, Campus UAB s/n, 08193, Bellaterra, Spain
Pedro Meseguer
Dpto. Lenguajes y Ciencias de la Computación, Universidad de Málaga, Campus de Teatinos, 29071, Málaga, Spain
Lawrence Mandow
Dpto. Lenguajes y Sistemas Informáticos, ETS Ingeniería Informática, University of Seville, Av. Reina Mercedes S/N, 41012, Sevilla, Spain
Rafael M. Gasca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruiz, C., Vallejo, C.G., Spiliopoulou, M., Menasalvas, E. (2010). Automated Constraint Selection for Semi-supervised Clustering Algorithm. In: Meseguer, P., Mandow, L., Gasca, R.M. (eds) Current Topics in Artificial Intelligence. CAEPIA 2009. Lecture Notes in Computer Science(), vol 5988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14264-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-14264-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14263-5
Online ISBN: 978-3-642-14264-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics