Skip to main content

Information Theoretic Clustering Using Minimum Spanning Trees

  • Conference paper
Pattern Recognition (DAGM/OAGM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7476))

Abstract

In this work we propose a new information-theoretic clustering algorithm that infers cluster memberships by direct optimization of a non-parametric mutual information estimate between data distribution and cluster assignment. Although the optimization objective has a solid theoretical foundation it is hard to optimize. We propose an approximate optimization formulation that leads to an efficient algorithm with low runtime complexity. The algorithm has a single free parameter, the number of clusters to find. We demonstrate superior performance on several synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6 (2005)

    Google Scholar 

  2. Barber, F.: Kernelized infomax clustering. In: Neural Information Processing Systems (2006)

    Google Scholar 

  3. Curtin, R.R., Cline, J.R., Slagle, N.P., Amidon, M.L., Gray, A.G.: MLPACK: A scalable C++ machine learning library. In: BigLearning: Algorithms, Systems, and Tools for Learning at Scale (2011)

    Google Scholar 

  4. Dhillon, I., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  5. Faivishevsky, L., Goldberger, J.: A nonparametric information theoretic clustering algorithm. In: International Conference on Machine Learning (2010)

    Google Scholar 

  6. Gokcay, E., Principe, J.: Information theoretic clustering. Pattern Analysis and Machine Intelligence 24 (2002)

    Google Scholar 

  7. Gomes, R., Krause, A., Perona, P.: Discriminative clustering by regularized information maximization. In: Neural Information Processing Systems (2010)

    Google Scholar 

  8. Gower, J., Ross, G.: Minimum spanning trees and single linkage cluster analysis. Applied Statistics (1969)

    Google Scholar 

  9. Grygorash, O., Zhou, Y., Jorgensen, Z.: Minimum spanning tree based clustering algorithms. In: International Conference on Tools with Artificial Intelligence (2006)

    Google Scholar 

  10. Hero III, A., Michel, O.: Asymptotic theory of greedy approximations to minimal k-point random graphs. Information Theory 45 (1999)

    Google Scholar 

  11. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2 (1985)

    Google Scholar 

  12. Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: International Joint Conference of Artificial Intelligence (2003)

    Google Scholar 

  13. Lloyd, S.: Least squares quantization in PCM. Information Theory 28 (1982)

    Google Scholar 

  14. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability (1967)

    Google Scholar 

  15. March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, applications. In: International Conference on Knowledge Discovery and Data Mining (2010)

    Google Scholar 

  16. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Neural Information Processing Systems (2002)

    Google Scholar 

  17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12 (2011)

    Google Scholar 

  18. Pettis, K., Bailey, T., Jain, A., Dubes, R.: An intrinsic dimensionality estimator from near-neighbor information. Pattern Analysis and Machine Intelligence 1 (1979)

    Google Scholar 

  19. Rand, W.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association (1971)

    Google Scholar 

  20. Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence 22 (2000)

    Google Scholar 

  21. Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Neural Information Processing Systems (1999)

    Google Scholar 

  22. Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  23. Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers 100 (1971)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müller, A.C., Nowozin, S., Lampert, C.H. (2012). Information Theoretic Clustering Using Minimum Spanning Trees. In: Pinz, A., Pock, T., Bischof, H., Leberl, F. (eds) Pattern Recognition. DAGM/OAGM 2012. Lecture Notes in Computer Science, vol 7476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32717-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32717-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32716-2

  • Online ISBN: 978-3-642-32717-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics