Abstract
K-Means is the most commonly used clustering algorithm. Despite its numerous advantages, it has a crucial drawback: the final cluster structure entirely relies on the choice of initial seeds. In this paper, a new seeds initialization algorithm based on centrality, sparsity, and isotropy is proposed. Preliminary experiments show that the proposed algorithm not only resulted in better clustering structures, but also accelerated the convergence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berkin, P.: A servey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Berlin (2006)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multiprocessors. In: Proceedings of the fifth ACM SIGKDD, Large-scale Parallel KDD Systems Workshop, San Diego, CA, USA, pp. 245–260 (1999)
Trujillo, M., Izquierdo, E.: Combining k-Means and semivariogram-based grid clustering. In: Proceedings of the 47th International Symposium ELMAR focused on Multimedia Systems and Applications, Zadar, Croatia, pp. 9–12 (2005)
He, J., Tan, A., Tan, C., Sung, S.: ART-C: A neural architecture for self-organization under constraints. In: Proceedings of International Joint Conference on Neural Networks (IJCNN 2002), Hawaii, USA, pp. 2550–2555 (2002)
Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons Ltd., New York (1990)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognition Letters 28(8), 965–973 (2007)
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognition 36(2), 451–461 (2003)
Pen̄a, J., Lozano, J., Larran̄aga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)
Mitra, P., Murthy, C., Pal, S.K.: Density-based multiscale data condensation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(6), 734–747 (2002)
Kang, P., Cho, S.: Locally linear reconstruction for inatance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, P., Cho, S. (2009). K-Means Clustering Seeds Initialization Based on Centrality, Sparsity, and Isotropy. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-04394-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)