K-Means Clustering Seeds Initialization Based on Centrality, Sparsity, and Isotropy

Kang, Pilsung; Cho, Sungzoon

doi:10.1007/978-3-642-04394-9_14

K-Means Clustering Seeds Initialization Based on Centrality, Sparsity, and Isotropy

Pilsung Kang¹⁸ &
Sungzoon Cho¹⁸

Conference paper

1983 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5788))

Abstract

K-Means is the most commonly used clustering algorithm. Despite its numerous advantages, it has a crucial drawback: the final cluster structure entirely relies on the choice of initial seeds. In this paper, a new seeds initialization algorithm based on centrality, sparsity, and isotropy is proposed. Preliminary experiments show that the proposed algorithm not only resulted in better clustering structures, but also accelerated the convergence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berkin, P.: A servey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Berlin (2006)
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multiprocessors. In: Proceedings of the fifth ACM SIGKDD, Large-scale Parallel KDD Systems Workshop, San Diego, CA, USA, pp. 245–260 (1999)
Google Scholar
Trujillo, M., Izquierdo, E.: Combining k-Means and semivariogram-based grid clustering. In: Proceedings of the 47th International Symposium ELMAR focused on Multimedia Systems and Applications, Zadar, Croatia, pp. 9–12 (2005)
Google Scholar
He, J., Tan, A., Tan, C., Sung, S.: ART-C: A neural architecture for self-organization under constraints. In: Proceedings of International Joint Conference on Neural Networks (IJCNN 2002), Hawaii, USA, pp. 2550–2555 (2002)
Google Scholar
Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
MATH Google Scholar
Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons Ltd., New York (1990)
Book MATH Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Article Google Scholar
Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognition Letters 28(8), 965–973 (2007)
Article Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognition 36(2), 451–461 (2003)
Article Google Scholar
Pen̄a, J., Lozano, J., Larran̄aga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)
Article Google Scholar
Mitra, P., Murthy, C., Pal, S.K.: Density-based multiscale data condensation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(6), 734–747 (2002)
Article Google Scholar
Kang, P., Cho, S.: Locally linear reconstruction for inatance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)
Article MATH Google Scholar
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial engineering, Seoul National University, 599 Gwanangno, Gwanak-gu, Seoul, Republic of Korea
Pilsung Kang & Sungzoon Cho

Authors

Pilsung Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sungzoon Cho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, Universidad de Burgos, Calle Francisco de Vitoria, S/N, Edifico C, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, Sackville Street Building, Sackville Street, M60 1QD, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, P., Cho, S. (2009). K-Means Clustering Seeds Initialization Based on Centrality, Sparsity, and Isotropy. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-04394-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics