Abstract
Clustering is an exploratory data analysis tool that has gained enormous attention in the recent years specifically for gene expression data analysis.The K-means clustering is a method of cluster analysis which aims to partition n data points into K clusters. The K-means is possibly the best known and most widely used clustering technique. However, K-means does not necessarily find the optimal cluster configuration due to its significant sensitiveness in random selection of the initial cluster centers. On the other hand, MST-based clustering algorithm suffers from the selection of the inconsistent edges to produce quality clusters. In this paper, we present a novel method that bridges the K-means and the MST-based clustering algorithms. The proposed method not only overcomes the problem of random selection of the initial cluster centers for the former and the inconsistent edges for the later one but also automate them. We perform extensive experiments on the proposed method using both synthetic as well as the real world data sets. The experimental results show that the algorithm is able to produce desired clusters even for complex and high dimensional data points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abbasi, A.A., Younis, M.: A Survey on Clustering Algorithms for Wireless Sensor Networks. Computer Communications, 2826–2841 (2007)
Villmann, T., Albani, C.: Clustering of Categoric Data in Medicine-Application of Evolutionary Algorithms. Springer, Heidelberg (2006)
Garibaldi, U., Costantini, D., Donadio, S., Viarengo, P.: Herding and Clustering in Economics: The Yule-Zipf-Simon Model. In: Computational Economics, vol. 27, pp. 115–134. Springer, Heidelberg (2006)
Madeira, S., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Kerr, G., Ruskina, H.J., Crane, M., Doolan, P.: Techniques for Clustering Gene Expression Data. Computers in Biology and Medicine 38, 283–293 (2008)
Mitra, S., Banka, H.: Multi-objective Evolutionary Biclustering of Gene Expression Data. Pattern Recognition 39, 2464–2477 (2006)
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)
Jain, A.K.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Bandyopadhyay, S., Maulik, U.: An Evolutionary Technique Based on K-Means Algorithm for Optimal Clustering in \({\Re}^{\rm N}\). Information Science-Applications 146, 221–237 (2002)
Victor, S.P., John Peter, S.: A Novel Minimum Spanning Tree Based Clustering Algorithm for Image Mining. European Journal of Scientific Research 40, 540–546 (2010)
Zhong, C., Miao, D., Wang, R.: A Graph-theoretical Clustering Method Based on Two Rounds of Minimum Spanning Trees. Pattern Recognition 43, 752–766 (2010)
Han, A., Zhu, D.: DNA Computing Model for the Minimum Spanning Tree Problem. In: Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006). IEEE Computer Society, Los Alamitos (2006)
Zahn, C.T.: Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Trans. on Computers 20(1), 68–86 (1971)
He, Y., Chen, L.: MinClue: A MST-based Clustering Method with Auto-Threshold-Detection. In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore (2004)
Wang, X., Xiali, W., Wilkes, D.M.: A Divide and Conquer Approach for Minimum Spanning Tree-based Clustering. IEEE Transactions on Knowledge and Data Engineering  21 945–988 (2009)
Jain, A.K.: Data Clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 651–666 (2010)
Forgy, E.W.: Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications. Biometrics 21, 768–769 (1965)
Tou, J.T., Gonzales, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communication 28, 84–95 (1980)
Babu, G.P., Murty, M.N.: A Near-optimal Initial Seed Value Selection in K-means Algorithm using a Genetic Algorithm. Pattern Recognition 14(10), 763–769 (1993)
Pelleg, D., Moore, A.: Accelerating Exact K-means Algorithms with Geometric Reasoning. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM, New York (1999)
Huang, C.M., Harris, R.W.: A Comparison of Several Codebook Generation Approaches. IEEE Transactions on Image Processing, 108–112 (1993)
Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A New Initialization Technique for Generalized Lioyd Iteration. IEEE Signal Process Lett. 1(10), 144–146 (1994)
Al-Daoud, M.B., Roberts, S.A.: New Methods for the Initialization of Clusters. Pattern Recognition Letters 7, 451–455 (1996)
Thiesson, B., Meck, B., Chickering, C., Heckerman, D.: Learning Mixtures of Bayesian Networks. Microsoft Technical Report, Redmond
Bradley, P.S., Fayyad, U.M.: Refining Initial Points for K-means Clustering. In: Proc. 15th Internat. Conf. on Machine Learning, San Francisco, CA, pp. 91–99 (1998)
Likas, A., Vlassis, N., Verbeek, J.J.: The Global K-means Clustering Algorithm. Pattern Recognition 36, 451–461 (2003)
Khan, S.S., Ahmad, A.: Cluster Center Initialization Algorithm for K-means Clustering. Pattern Recognition Lett., 1293–1302 (2004)
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based Multiscale Data Condensation. In: IEEE Transactions in Pattern Analysis and Machine Intelligence, vol. 24, pp. 734–747. IEEE Computer Society, Los Alamitos (2002)
Redmond, S.J., Heneghan, C.: A Method for Initializing the K-means Clustering Algorithm using kd-trees. Pattern Recognition Letters 28, 965–973 (2007)
Pena, J.M., Lozano, J.A., Larranaga, P.: An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm. Pattern Recognition 20, 1027–1040 (1999)
Su, T., Jennifer, D.: A Deterministic Method for Initializing K-means Clustering. In: IEEE International Conference on Tools with Artificial Intelligence (2004)
Lu, J.F., Tang, J.B., Tang, J.M., Yang, J.Y.: Hierarchical Initialization Approach for K-Means Clustering. Pattern Recognition 25, 787–795 (2008)
Ray, S., Turi, R.H.: Determination of Number of Clusters in K-means Clustering and Application in Colour Image Segmentation. In: Proceedings of 4th International Conference (ICAPRDT 1999), Calcutta, pp. 137–143 (1999)
UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/dataset
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reddy, D., Mishra, D., Jana, P.K. (2011). MST-Based Cluster Initialization for K-Means. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-17857-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17856-6
Online ISBN: 978-3-642-17857-3
eBook Packages: Computer ScienceComputer Science (R0)