On Autonomous k-Means Clustering

Elomaa, Tapio; Koivistoinen, Heidi

doi:10.1007/11425274_24

On Autonomous k-Means Clustering

Tapio Elomaa²² &
Heidi Koivistoinen²²

Conference paper

1152 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. One of the simplest clustering approaches is the iterative k-means algorithm. The quality of k-means clustering suffers from being confined to run with fixed k rather than being able to dynamically alter the value of k. Moreover, it would be much more elegant if the user did not have to supply the number of clusters for the algorithm.

In this paper we consider recently proposed autonomous versions of the k-means algorithm. We demonstrate some of their shortcomings and put forward solutions for their deficiencies. In particular, we examine the problem of automatically determining a good initial candidate as the number of clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36, 1084 (1965)
Article MathSciNet Google Scholar
Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, Chichester (1973)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proc. 15th Intl. Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Hamerly, G., Elkan, C.: Learning the k in k-means. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. 20th Intl. Conf. on Very Large Data Bases, pp. 144–155. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, pp. 103–114. ACM Press, New York (1995)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, pp. 73–84. ACM Press, New York (1998)
Google Scholar
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proc. 16th Conf. on Uncertainty in Artificial Intelligence, pp. 397–405. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. 20th Intl. Conf. on Machine Learning, pp. 147–153. AAAI Press, Menlo Park (2003)
Google Scholar
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proc. 5th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 23–32. ACM Press, New York (1999)
Chapter Google Scholar
Fayyad, U.M., Reina, C., Bradley, P.S.: Initialization of iterative refinement clustering. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 194–198. AAAI Press, Menlo Park (1998)
Google Scholar
Bischof, H., Leonardis, A., Selb, A.: MDL-principle for robust vector quantisation. Pattern Analysis and Applications 2, 59–72 (1999)
Article MATH Google Scholar
Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 277–281. AAAI Press, Menlo Park (1999)
Chapter Google Scholar
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proc. 36th Annual Symp. on Theory of Computing, pp. 291–300. ACM Press, New York (2004)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symp. on Foundations on Computer Science, pp. 454–462. IEEE Press, Los Alamitos (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101, Tampere, Finland
Tapio Elomaa & Heidi Koivistoinen

Authors

Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar
Heidi Koivistoinen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRIS - UFR d’Informatique, Université Claude Bernard Lyon 1, 43, boulevard du 11 novembre 1918, 69622, Villeurbanne, France
Mohand-Said Hacid
Department of Computer Science, State University of New York, 12222, Albany, NY, USA
Neil V. Murray
Department of Computer Science, University of North Carolina, 28223, Charlotte, NC, USA
Zbigniew W. Raś
Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elomaa, T., Koivistoinen, H. (2005). On Autonomous k-Means Clustering. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_24

Download citation

DOI: https://doi.org/10.1007/11425274_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics