Skip to main content

Automated Constraint Selection for Semi-supervised Clustering Algorithm

  • Conference paper
Current Topics in Artificial Intelligence (CAEPIA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5988))

Included in the following conference series:

Abstract

The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose “ConstraintSelector”, an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the top-ranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.

Part of this work is partially finance under project TIN2008-05924 of Ministry of Science and Innovation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. Int. Conf. on Machine Learning, pp. 19–26 (2002)

    Google Scholar 

  2. Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)

    Google Scholar 

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1999)

    Article  Google Scholar 

  4. Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University (2003)

    Google Scholar 

  5. Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM 2005: Society for Industrial and Applied Mathematics Int. Conf. on Data Mining Int. Conf. in Data Mining (2005)

    Google Scholar 

  6. Davidson, I., Ravi, S.S., Ester, M.: Efficient Incremental Constrained Clustering. In: KDD’07: Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 240–249 (2007)

    Google Scholar 

  7. Davidson, I., Wagstaff, K., Basu, S.: Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Greene, D., Cunningham, P.: Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 140–151. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Halkidi, M., Gunopulos, D., Kumar, N., Vazirgiannis, M., Domeniconi, C.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM 2005: Proc. of the 5th IEEE Int. Conf. on Data Mining, pp. 637–640 (2005)

    Google Scholar 

  10. Morring, B.D., Martinez, T.R.: Weighted Instance Typicality Search (WITS): A Nearest Neighbor Data Reduction Algorithm. Intelligent Data Analysis 8(1), 61–78 (2004)

    Google Scholar 

  11. Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)

    Article  Google Scholar 

  12. Ruiz, C., Spiliopoulou, M., Menasalvas, E.: C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.) RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 216–223. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Vallejo, C.G., Troyano, J.A., Ortega, F.J.: WIRS: Un algoritmo de reducción de instancias basado en ranking. In: Borrajo, D., Castillo, L., Corchado, J.M. (eds.) CAEPIA 2007. LNCS (LNAI), vol. 4788, pp. 327–336. Springer, Heidelberg (2007)

    Google Scholar 

  14. Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML 2000: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  15. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research (JAIR) 1, 1–34 (1997)

    MathSciNet  MATH  Google Scholar 

  16. Zhang, J.: Selecting typical instances in instance-based learning. In: ML’92: Proc. of the 9th Int. Workshop on Machine Learning, pp. 470–479 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruiz, C., Vallejo, C.G., Spiliopoulou, M., Menasalvas, E. (2010). Automated Constraint Selection for Semi-supervised Clustering Algorithm. In: Meseguer, P., Mandow, L., Gasca, R.M. (eds) Current Topics in Artificial Intelligence. CAEPIA 2009. Lecture Notes in Computer Science(), vol 5988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14264-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14264-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14263-5

  • Online ISBN: 978-3-642-14264-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics