MST-Based Cluster Initialization for K-Means

Reddy, Damodar; Mishra, Devender; Jana, Prasanta K.

doi:10.1007/978-3-642-17857-3_33

Damodar Reddy⁴,
Devender Mishra⁴ &
Prasanta K. Jana⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 131))

Included in the following conference series:

International Conference on Computer Science and Information Technology

1138 Accesses
7 Citations

Abstract

Clustering is an exploratory data analysis tool that has gained enormous attention in the recent years specifically for gene expression data analysis.The K-means clustering is a method of cluster analysis which aims to partition n data points into K clusters. The K-means is possibly the best known and most widely used clustering technique. However, K-means does not necessarily find the optimal cluster configuration due to its significant sensitiveness in random selection of the initial cluster centers. On the other hand, MST-based clustering algorithm suffers from the selection of the inconsistent edges to produce quality clusters. In this paper, we present a novel method that bridges the K-means and the MST-based clustering algorithms. The proposed method not only overcomes the problem of random selection of the initial cluster centers for the former and the inconsistent edges for the later one but also automate them. We perform extensive experiments on the proposed method using both synthetic as well as the real world data sets. The experimental results show that the algorithm is able to produce desired clusters even for complex and high dimensional data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbasi, A.A., Younis, M.: A Survey on Clustering Algorithms for Wireless Sensor Networks. Computer Communications, 2826–2841 (2007)
Google Scholar
Villmann, T., Albani, C.: Clustering of Categoric Data in Medicine-Application of Evolutionary Algorithms. Springer, Heidelberg (2006)
MATH Google Scholar
Garibaldi, U., Costantini, D., Donadio, S., Viarengo, P.: Herding and Clustering in Economics: The Yule-Zipf-Simon Model. In: Computational Economics, vol. 27, pp. 115–134. Springer, Heidelberg (2006)
Google Scholar
Madeira, S., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Article Google Scholar
Kerr, G., Ruskina, H.J., Crane, M., Doolan, P.: Techniques for Clustering Gene Expression Data. Computers in Biology and Medicine 38, 283–293 (2008)
Article Google Scholar
Mitra, S., Banka, H.: Multi-objective Evolutionary Biclustering of Gene Expression Data. Pattern Recognition 39, 2464–2477 (2006)
Article MATH Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)
Article Google Scholar
Jain, A.K.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Bandyopadhyay, S., Maulik, U.: An Evolutionary Technique Based on K-Means Algorithm for Optimal Clustering in \({\Re}^{\rm N}\). Information Science-Applications 146, 221–237 (2002)
Article MathSciNet MATH Google Scholar
Victor, S.P., John Peter, S.: A Novel Minimum Spanning Tree Based Clustering Algorithm for Image Mining. European Journal of Scientific Research 40, 540–546 (2010)
Google Scholar
Zhong, C., Miao, D., Wang, R.: A Graph-theoretical Clustering Method Based on Two Rounds of Minimum Spanning Trees. Pattern Recognition 43, 752–766 (2010)
Article MATH Google Scholar
Han, A., Zhu, D.: DNA Computing Model for the Minimum Spanning Tree Problem. In: Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006). IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Zahn, C.T.: Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Trans. on Computers 20(1), 68–86 (1971)
Article MATH Google Scholar
He, Y., Chen, L.: MinClue: A MST-based Clustering Method with Auto-Threshold-Detection. In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore (2004)
Google Scholar
Wang, X., Xiali, W., Wilkes, D.M.: A Divide and Conquer Approach for Minimum Spanning Tree-based Clustering. IEEE Transactions on Knowledge and Data Engineering 21 945–988 (2009)
Article Google Scholar
Jain, A.K.: Data Clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 651–666 (2010)
Article Google Scholar
Forgy, E.W.: Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications. Biometrics 21, 768–769 (1965)
Google Scholar
Tou, J.T., Gonzales, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communication 28, 84–95 (1980)
Article Google Scholar
Babu, G.P., Murty, M.N.: A Near-optimal Initial Seed Value Selection in K-means Algorithm using a Genetic Algorithm. Pattern Recognition 14(10), 763–769 (1993)
Article MATH Google Scholar
Pelleg, D., Moore, A.: Accelerating Exact K-means Algorithms with Geometric Reasoning. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM, New York (1999)
Chapter Google Scholar
Huang, C.M., Harris, R.W.: A Comparison of Several Codebook Generation Approaches. IEEE Transactions on Image Processing, 108–112 (1993)
Google Scholar
Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A New Initialization Technique for Generalized Lioyd Iteration. IEEE Signal Process Lett. 1(10), 144–146 (1994)
Article Google Scholar
Al-Daoud, M.B., Roberts, S.A.: New Methods for the Initialization of Clusters. Pattern Recognition Letters 7, 451–455 (1996)
Article Google Scholar
Thiesson, B., Meck, B., Chickering, C., Heckerman, D.: Learning Mixtures of Bayesian Networks. Microsoft Technical Report, Redmond
Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining Initial Points for K-means Clustering. In: Proc. 15th Internat. Conf. on Machine Learning, San Francisco, CA, pp. 91–99 (1998)
Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The Global K-means Clustering Algorithm. Pattern Recognition 36, 451–461 (2003)
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster Center Initialization Algorithm for K-means Clustering. Pattern Recognition Lett., 1293–1302 (2004)
Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based Multiscale Data Condensation. In: IEEE Transactions in Pattern Analysis and Machine Intelligence, vol. 24, pp. 734–747. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Redmond, S.J., Heneghan, C.: A Method for Initializing the K-means Clustering Algorithm using kd-trees. Pattern Recognition Letters 28, 965–973 (2007)
Article Google Scholar
Pena, J.M., Lozano, J.A., Larranaga, P.: An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm. Pattern Recognition 20, 1027–1040 (1999)
Article Google Scholar
Su, T., Jennifer, D.: A Deterministic Method for Initializing K-means Clustering. In: IEEE International Conference on Tools with Artificial Intelligence (2004)
Google Scholar
Lu, J.F., Tang, J.B., Tang, J.M., Yang, J.Y.: Hierarchical Initialization Approach for K-Means Clustering. Pattern Recognition 25, 787–795 (2008)
Article Google Scholar
Ray, S., Turi, R.H.: Determination of Number of Clusters in K-means Clustering and Application in Colour Image Segmentation. In: Proceedings of 4th International Conference (ICAPRDT 1999), Calcutta, pp. 137–143 (1999)
Google Scholar
UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/dataset

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Indian School of Mines, Dhanbad, 826 004, India
Damodar Reddy, Devender Mishra & Prasanta K. Jana

Authors

Damodar Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Devender Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Prasanta K. Jana
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jackson State University, 39217, Jackson, MS, USA
Natarajan Meghanathan
Deptt. of Electronics and Computer Engg., Indian Institute of Technology, Roorkee, India
Brajesh Kumar Kaushik
Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reddy, D., Mishra, D., Jana, P.K. (2011). MST-Based Cluster Initialization for K-Means. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-17857-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17856-6
Online ISBN: 978-3-642-17857-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics