Background Knowledge Integration in Clustering Using Purity Indexes

Forestier, Germain; Wemmert, Cédric; Gançarski, Pierre

doi:10.1007/978-3-642-15280-1_6

Background Knowledge Integration in Clustering Using Purity Indexes

Germain Forestier²¹,
Cédric Wemmert²¹ &
Pierre Gançarski²¹

Conference paper

1448 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6291))

Abstract

In recent years, the use of background knowledge to improve the data mining process has been intensively studied. Indeed, background knowledge along with knowledge directly or indirectly provided by the user are often available. However, it is often difficult to formalize this kind of knowledge, as it is often dependent of the domain. In this article, we studied the integration of knowledge as labeled objects in clustering algorithms. Several criteria allowing the evaluation of the purity of a clustering are presented and their behaviours are compared using artificial datasets. Advantages and drawbacks of each criterion are analyzed in order to help the user to make a choice among them.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using hmm. In: International Conference on Spoken Language Processing, September 2002, pp. 573–576 (2002)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: International Conference on Machine Learning, pp. 19–26 (2002)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SIAM International Conference on Data Mining, pp. 333–344 (2004)
Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: International Conference on Knowledge Discovery and Data Mining, pp. 59–68 (2004)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: International Conference on Machine Learning, pp. 81–88 (2004)
Google Scholar
Bouchachia, A., Pedrycz, W.: Data clustering with partial supervision. Data Min. Knowl. Discov. 12(1), 47–78 (2006)
Article MathSciNet Google Scholar
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 115–126 (2006)
Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 224–227 (1979)
Article Google Scholar
Demiriz, A., Bennett, K., Embrechts, M.: Semi-supervised clustering using genetic algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, pp. 809–814 (1999)
Google Scholar
Eick, C.F., Zeidat, N., Zhao, Z.: Supervised clustering - algorithms and benefits. In: International Conference on Tools with Artificial Intelligence, pp. 774–776 (2004)
Google Scholar
Fung, B.C., Wang, K., Wang, L., Hung, P.C.: Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering 68(6), 552–575 (2009)
Article Google Scholar
Gao, J., Tan, P., Cheng, H.: Semi-supervised clustering with partial background information. In: SIAM International Conference on Data Mining, pp. 489–493 (2006)
Google Scholar
Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised fuzzy clustering. Pattern Recognition 41(5), 1851–1861 (2008)
Article Google Scholar
Huang, R., Lam, W.: An active learning framework for semi-supervised document clustering with language modeling. Data & Knowledge Engineering 68(1), 49–67 (2009)
Article Google Scholar
Klein, D., Kamvar, S., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: The Nineteenth International Conference on Machine Learning, pp. 307–314 (2002)
Google Scholar
Kumar, N., Kummamuru, K.: Semisupervised clustering with metric learning using relative comparisons. IEEE Transactions on Knowledge and Data Engineering 20(4), 496–503 (2008)
Article Google Scholar
Loia, V., Pedrycz, W., Senatore, S.: Semantic web content analysis: A study in proximity-based collaborative clustering. IEEE Transactions on Fuzzy Systems 15(6), 1294–1312 (2007)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Pedrycz, W.: Fuzzy clustering with a knowledge-based guidance. Pattern Recognition Letters 25(4), 469–480 (2004)
Article MathSciNet Google Scholar
Pedrycz, W.: Collaborative and knowledge-based fuzzy clustering. International Journal of Innovative, Computing, Information and Control 1(3), 1–12 (2007)
MathSciNet Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 622–626 (1971)
Article Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Solomonoff, A., Mielke, A., Schmidt, M., Gish, H.: Clustering speakers by their voices. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, May 1998, vol. 2, pp. 757–760 (1998)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: International Conference on Machine Learning, pp. 557–584 (2001)
Google Scholar
Wagstaff, K.L.: Value, cost, and sharing: Open issues in constrained clustering. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 1–10. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Image Sciences, Computer Sciences and Remote Sensing Laboratory, University of Strasbourg, France
Germain Forestier, Cédric Wemmert & Pierre Gançarski

Authors

Germain Forestier
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Wemmert
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Gançarski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Mathematics, University of Ulster, Newtownabbey, Co. Antrim, BT37 0QB, UK
Yaxin Bi
Innovation and Technology Research Laboratory, University of Technology, 2007, Sydney, NSW, Australia
Mary-Anne Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Forestier, G., Wemmert, C., Gançarski, P. (2010). Background Knowledge Integration in Clustering Using Purity Indexes. In: Bi, Y., Williams, MA. (eds) Knowledge Science, Engineering and Management. KSEM 2010. Lecture Notes in Computer Science(), vol 6291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15280-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-15280-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15279-5
Online ISBN: 978-3-642-15280-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics