Abstract
K-means algorithm is a relatively simple and fast gather clustering algorithm. However, the initial clustering center of the traditional k-means algorithm was generated randomly from the dataset, and the clustering result was unstable. In this paper, we propose a novel method to optimize the selection of initial centroids for k-means algorithm based on the small world network. This paper firstly models a text document set as a network which has small world phenomenon and then use small-world’s characteristics to form k initial centroids. Experimental evaluation on documents croups show clustering results (total cohesion, purity, recall) obtained by proposed method comparable with traditional k-means algorithm. The experiments show that results are obtained by the proposed algorithm can be relatively stability and efficiency. Therefore, this method can be considered as an effective application in the domain of text documents, especially in using text clustering for topic detection.
Chapter PDF
Similar content being viewed by others
References
Feldman, R., Sanger, J.: The text mining handbook, pp. 82–92. Posts & Telecom Press, Beijing (2009)
Aggarwal, C., Zhai, C.: A survey of text clustering algorithms, pp. 77–128. Springer (2012)
Cutting, D., Karger, D., Pedersen, J., Scatter/Gather, J.: A Cluster-based Approach to Browsing Large Document Collections. In: ACM SIGIR Conference (1992)
Likas, A., Vlassis, N., Jakob, J.V.: The global k-means algorithm algorithm. Pattern Recognition 36(2), 451–461 (2003)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: ACM-SIAM Symposium (2007)
Onoda, T., Sakai, M., Yamada, S.: Independent Component Analysis based Seeding method for k-means Clustering. In: IEEE/WIC/ACM Conference (2011)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, pp. 385–387. Posts & Telecom Press (2011)
Cancho, R.F., Sole, R.V.: The small world of human language. The Royal Society of London, Biological Sciences(Series B) 268(1482), 2261–2265 (2001)
Wars, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)
Thomas, M.J.F., Edward, M.R.: Graph Drawing by Force-directed Placement. Software: Practice and Experience 21(11), 1129–1164 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Shen, S., Meng, Z. (2012). Optimization of Initial Centroids for K-Means Algorithm Based on Small World Network. In: Shi, Z., Leake, D., Vadera, S. (eds) Intelligent Information Processing VI. IIP 2012. IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32891-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-32891-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32890-9
Online ISBN: 978-3-642-32891-6
eBook Packages: Computer ScienceComputer Science (R0)