Skip to main content

On Autonomous k-Means Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. One of the simplest clustering approaches is the iterative k-means algorithm. The quality of k-means clustering suffers from being confined to run with fixed k rather than being able to dynamically alter the value of k. Moreover, it would be much more elegant if the user did not have to supply the number of clusters for the algorithm.

In this paper we consider recently proposed autonomous versions of the k-means algorithm. We demonstrate some of their shortcomings and put forward solutions for their deficiencies. In particular, we examine the problem of automatically determining a good initial candidate as the number of clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)

    Google Scholar 

  2. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)

    Article  Google Scholar 

  3. MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36, 1084 (1965)

    Article  MathSciNet  Google Scholar 

  4. Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)

    Google Scholar 

  5. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, Chichester (1973)

    MATH  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  7. Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proc. 15th Intl. Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  8. Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  9. Hamerly, G., Elkan, C.: Learning the k in k-means. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)

    Google Scholar 

  10. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. 20th Intl. Conf. on Very Large Data Bases, pp. 144–155. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  11. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, pp. 103–114. ACM Press, New York (1995)

    Google Scholar 

  12. Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, pp. 73–84. ACM Press, New York (1998)

    Google Scholar 

  13. Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proc. 16th Conf. on Uncertainty in Artificial Intelligence, pp. 397–405. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  14. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. 20th Intl. Conf. on Machine Learning, pp. 147–153. AAAI Press, Menlo Park (2003)

    Google Scholar 

  15. Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proc. 5th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 23–32. ACM Press, New York (1999)

    Chapter  Google Scholar 

  16. Fayyad, U.M., Reina, C., Bradley, P.S.: Initialization of iterative refinement clustering. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 194–198. AAAI Press, Menlo Park (1998)

    Google Scholar 

  17. Bischof, H., Leonardis, A., Selb, A.: MDL-principle for robust vector quantisation. Pattern Analysis and Applications 2, 59–72 (1999)

    Article  MATH  Google Scholar 

  18. Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 277–281. AAAI Press, Menlo Park (1999)

    Chapter  Google Scholar 

  19. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proc. 36th Annual Symp. on Theory of Computing, pp. 291–300. ACM Press, New York (2004)

    Google Scholar 

  20. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symp. on Foundations on Computer Science, pp. 454–462. IEEE Press, Los Alamitos (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elomaa, T., Koivistoinen, H. (2005). On Autonomous k-Means Clustering. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_24

Download citation

  • DOI: https://doi.org/10.1007/11425274_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25878-0

  • Online ISBN: 978-3-540-31949-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics