Skip to main content

A Probabilistic Model Using Information Theoretic Measures for Cluster Ensembles

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3077))

Abstract

This paper presents a probabilistic model for combining cluster ensembles utilizing information theoretic measures. Starting from a co-association matrix which summarizes the ensemble, we extract a set of association distributions, which are modelled as discrete probability distributions of the object labels, conditional on each data object. The key objectives are, first, to model the associations of neighboring data objects, and second, to allow for the manipulation of the defined probability distributions using statistical and information theoretic means. A Jensen-Shannon Divergence based Clustering Combination (JSDCC) method is proposed. The method selects cluster prototypes from the set of association distributions based on entropy maximization and maximization of the generalized JS divergence among the selected prototypes. The method proceeds by grouping association distributions by minimizing their JS divergences to the selected prototypes. By aggregating the grouped association distributions, we can represent empirical cluster conditional probability distributions of the object labels, for each of the combined clusters. Finally, data objects are assigned to their most likely clusters, and their cluster assignment probabilities are estimated. Experiments are performed to assess the presented method and compare its performance with other alternative co-association based methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  3. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitionings. In: Conference on Artificial Intelligence (AAAI 2002), Edmonton, July 2002, pp. 93–98. AAAI/MIT Press (2002)

    Google Scholar 

  4. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR) 3, 583–617 (2002)

    Article  MathSciNet  Google Scholar 

  5. Fred, A., Jain, A.K.: Data clustering using evidence accumulation. In: Proceedings of the 16th International Conference on Pattern Recognition. ICPR 2002, Quebec City, Quebec, Canada, August 2002, vol. 4, pp. 276–280 (2002)

    Google Scholar 

  6. Fred, A., Jain, A.K.: Robust data clustering. In: Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2003, Madison - Wisconsin, USA (June 2003)

    Google Scholar 

  7. Dimitriadou, E., Weingessel, A., Hornik, K.: Voting-merging: An ensemble method for clustering. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 217–224. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Ayad, H., Kamel, M.: Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 166–175. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Ayad, H., Kamel, M.: Refined shared nearest neighbors graph for combining multiple data clusterings. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 307–318. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: IEEE Intl. Conf. on Data Mining 2003, Proceedings, Melbourne, Fl., November 2003, pp. 331–338 (2003)

    Google Scholar 

  11. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report TR 95-035, Department of Computer Science and Engineering, University of Minnesota (1995)

    Google Scholar 

  12. Fischer, B., Buhmann, J.M.: Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(4), 513–518 (2003)

    Article  Google Scholar 

  13. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)

    Article  MATH  Google Scholar 

  14. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)

    Book  MATH  Google Scholar 

  15. Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Journal 27, 379–423 (1948)

    MATH  MathSciNet  Google Scholar 

  16. Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1995)

    Article  MATH  Google Scholar 

  17. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Transactions on Computers C-22(11), 1025–1034 (1973)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ayad, H., Basir, O., Kamel, M. (2004). A Probabilistic Model Using Information Theoretic Measures for Cluster Ensembles. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25966-4_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22144-9

  • Online ISBN: 978-3-540-25966-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics