Skip to main content

Background Knowledge Integration in Clustering Using Purity Indexes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6291))

Abstract

In recent years, the use of background knowledge to improve the data mining process has been intensively studied. Indeed, background knowledge along with knowledge directly or indirectly provided by the user are often available. However, it is often difficult to formalize this kind of knowledge, as it is often dependent of the domain. In this article, we studied the integration of knowledge as labeled objects in clustering algorithms. Several criteria allowing the evaluation of the purity of a clustering are presented and their behaviours are compared using artificial datasets. Advantages and drawbacks of each criterion are analyzed in order to help the user to make a choice among them.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using hmm. In: International Conference on Spoken Language Processing, September 2002, pp. 573–576 (2002)

    Google Scholar 

  2. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: International Conference on Machine Learning, pp. 19–26 (2002)

    Google Scholar 

  3. Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SIAM International Conference on Data Mining, pp. 333–344 (2004)

    Google Scholar 

  4. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: International Conference on Knowledge Discovery and Data Mining, pp. 59–68 (2004)

    Google Scholar 

  5. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: International Conference on Machine Learning, pp. 81–88 (2004)

    Google Scholar 

  6. Bouchachia, A., Pedrycz, W.: Data clustering with partial supervision. Data Min. Knowl. Discov. 12(1), 47–78 (2006)

    Article  MathSciNet  Google Scholar 

  7. Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 115–126 (2006)

    Google Scholar 

  8. Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 224–227 (1979)

    Article  Google Scholar 

  9. Demiriz, A., Bennett, K., Embrechts, M.: Semi-supervised clustering using genetic algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, pp. 809–814 (1999)

    Google Scholar 

  10. Eick, C.F., Zeidat, N., Zhao, Z.: Supervised clustering - algorithms and benefits. In: International Conference on Tools with Artificial Intelligence, pp. 774–776 (2004)

    Google Scholar 

  11. Fung, B.C., Wang, K., Wang, L., Hung, P.C.: Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering 68(6), 552–575 (2009)

    Article  Google Scholar 

  12. Gao, J., Tan, P., Cheng, H.: Semi-supervised clustering with partial background information. In: SIAM International Conference on Data Mining, pp. 489–493 (2006)

    Google Scholar 

  13. Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised fuzzy clustering. Pattern Recognition 41(5), 1851–1861 (2008)

    Article  Google Scholar 

  14. Huang, R., Lam, W.: An active learning framework for semi-supervised document clustering with language modeling. Data & Knowledge Engineering 68(1), 49–67 (2009)

    Article  Google Scholar 

  15. Klein, D., Kamvar, S., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: The Nineteenth International Conference on Machine Learning, pp. 307–314 (2002)

    Google Scholar 

  16. Kumar, N., Kummamuru, K.: Semisupervised clustering with metric learning using relative comparisons. IEEE Transactions on Knowledge and Data Engineering 20(4), 496–503 (2008)

    Article  Google Scholar 

  17. Loia, V., Pedrycz, W., Senatore, S.: Semantic web content analysis: A study in proximity-based collaborative clustering. IEEE Transactions on Fuzzy Systems 15(6), 1294–1312 (2007)

    Article  Google Scholar 

  18. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  19. Pedrycz, W.: Fuzzy clustering with a knowledge-based guidance. Pattern Recognition Letters 25(4), 469–480 (2004)

    Article  MathSciNet  Google Scholar 

  20. Pedrycz, W.: Collaborative and knowledge-based fuzzy clustering. International Journal of Innovative, Computing, Information and Control 1(3), 1–12 (2007)

    MathSciNet  Google Scholar 

  21. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 622–626 (1971)

    Article  Google Scholar 

  22. van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  23. Solomonoff, A., Mielke, A., Schmidt, M., Gish, H.: Clustering speakers by their voices. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, May 1998, vol. 2, pp. 757–760 (1998)

    Google Scholar 

  24. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: International Conference on Machine Learning, pp. 557–584 (2001)

    Google Scholar 

  25. Wagstaff, K.L.: Value, cost, and sharing: Open issues in constrained clustering. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 1–10. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Forestier, G., Wemmert, C., Gançarski, P. (2010). Background Knowledge Integration in Clustering Using Purity Indexes. In: Bi, Y., Williams, MA. (eds) Knowledge Science, Engineering and Management. KSEM 2010. Lecture Notes in Computer Science(), vol 6291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15280-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15280-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15279-5

  • Online ISBN: 978-3-642-15280-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics