Skip to main content

MST-Based Cluster Initialization for K-Means

  • Conference paper
Advances in Computer Science and Information Technology (CCSIT 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 131))

Abstract

Clustering is an exploratory data analysis tool that has gained enormous attention in the recent years specifically for gene expression data analysis.The K-means clustering is a method of cluster analysis which aims to partition n data points into K clusters. The K-means is possibly the best known and most widely used clustering technique. However, K-means does not necessarily find the optimal cluster configuration due to its significant sensitiveness in random selection of the initial cluster centers. On the other hand, MST-based clustering algorithm suffers from the selection of the inconsistent edges to produce quality clusters. In this paper, we present a novel method that bridges the K-means and the MST-based clustering algorithms. The proposed method not only overcomes the problem of random selection of the initial cluster centers for the former and the inconsistent edges for the later one but also automate them. We perform extensive experiments on the proposed method using both synthetic as well as the real world data sets. The experimental results show that the algorithm is able to produce desired clusters even for complex and high dimensional data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi, A.A., Younis, M.: A Survey on Clustering Algorithms for Wireless Sensor Networks. Computer Communications, 2826–2841 (2007)

    Google Scholar 

  2. Villmann, T., Albani, C.: Clustering of Categoric Data in Medicine-Application of Evolutionary Algorithms. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  3. Garibaldi, U., Costantini, D., Donadio, S., Viarengo, P.: Herding and Clustering in Economics: The Yule-Zipf-Simon Model. In: Computational Economics, vol. 27, pp. 115–134. Springer, Heidelberg (2006)

    Google Scholar 

  4. Madeira, S., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)

    Article  Google Scholar 

  5. Kerr, G., Ruskina, H.J., Crane, M., Doolan, P.: Techniques for Clustering Gene Expression Data. Computers in Biology and Medicine 38, 283–293 (2008)

    Article  Google Scholar 

  6. Mitra, S., Banka, H.: Multi-objective Evolutionary Biclustering of Gene Expression Data. Pattern Recognition 39, 2464–2477 (2006)

    Article  MATH  Google Scholar 

  7. Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)

    Article  Google Scholar 

  8. Jain, A.K.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  9. Bandyopadhyay, S., Maulik, U.: An Evolutionary Technique Based on K-Means Algorithm for Optimal Clustering in \({\Re}^{\rm N}\). Information Science-Applications 146, 221–237 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Victor, S.P., John Peter, S.: A Novel Minimum Spanning Tree Based Clustering Algorithm for Image Mining. European Journal of Scientific Research 40, 540–546 (2010)

    Google Scholar 

  11. Zhong, C., Miao, D., Wang, R.: A Graph-theoretical Clustering Method Based on Two Rounds of Minimum Spanning Trees. Pattern Recognition 43, 752–766 (2010)

    Article  MATH  Google Scholar 

  12. Han, A., Zhu, D.: DNA Computing Model for the Minimum Spanning Tree Problem. In: Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006). IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  13. Zahn, C.T.: Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Trans. on Computers 20(1), 68–86 (1971)

    Article  MATH  Google Scholar 

  14. He, Y., Chen, L.: MinClue: A MST-based Clustering Method with Auto-Threshold-Detection. In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore (2004)

    Google Scholar 

  15. Wang, X., Xiali, W., Wilkes, D.M.: A Divide and Conquer Approach for Minimum Spanning Tree-based Clustering. IEEE Transactions on Knowledge and Data Engineering  21 945–988 (2009)

    Article  Google Scholar 

  16. Jain, A.K.: Data Clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 651–666 (2010)

    Article  Google Scholar 

  17. Forgy, E.W.: Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications. Biometrics 21, 768–769 (1965)

    Google Scholar 

  18. Tou, J.T., Gonzales, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)

    Google Scholar 

  19. Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communication 28, 84–95 (1980)

    Article  Google Scholar 

  20. Babu, G.P., Murty, M.N.: A Near-optimal Initial Seed Value Selection in K-means Algorithm using a Genetic Algorithm. Pattern Recognition 14(10), 763–769 (1993)

    Article  MATH  Google Scholar 

  21. Pelleg, D., Moore, A.: Accelerating Exact K-means Algorithms with Geometric Reasoning. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM, New York (1999)

    Chapter  Google Scholar 

  22. Huang, C.M., Harris, R.W.: A Comparison of Several Codebook Generation Approaches. IEEE Transactions on Image Processing, 108–112 (1993)

    Google Scholar 

  23. Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A New Initialization Technique for Generalized Lioyd Iteration. IEEE Signal Process Lett. 1(10), 144–146 (1994)

    Article  Google Scholar 

  24. Al-Daoud, M.B., Roberts, S.A.: New Methods for the Initialization of Clusters. Pattern Recognition Letters 7, 451–455 (1996)

    Article  Google Scholar 

  25. Thiesson, B., Meck, B., Chickering, C., Heckerman, D.: Learning Mixtures of Bayesian Networks. Microsoft Technical Report, Redmond

    Google Scholar 

  26. Bradley, P.S., Fayyad, U.M.: Refining Initial Points for K-means Clustering. In: Proc. 15th Internat. Conf. on Machine Learning, San Francisco, CA, pp. 91–99 (1998)

    Google Scholar 

  27. Likas, A., Vlassis, N., Verbeek, J.J.: The Global K-means Clustering Algorithm. Pattern Recognition 36, 451–461 (2003)

    Article  Google Scholar 

  28. Khan, S.S., Ahmad, A.: Cluster Center Initialization Algorithm for K-means Clustering. Pattern Recognition Lett., 1293–1302 (2004)

    Google Scholar 

  29. Mitra, P., Murthy, C.A., Pal, S.K.: Density-based Multiscale Data Condensation. In: IEEE Transactions in Pattern Analysis and Machine Intelligence, vol. 24, pp. 734–747. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  30. Redmond, S.J., Heneghan, C.: A Method for Initializing the K-means Clustering Algorithm using kd-trees. Pattern Recognition Letters 28, 965–973 (2007)

    Article  Google Scholar 

  31. Pena, J.M., Lozano, J.A., Larranaga, P.: An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm. Pattern Recognition 20, 1027–1040 (1999)

    Article  Google Scholar 

  32. Su, T., Jennifer, D.: A Deterministic Method for Initializing K-means Clustering. In: IEEE International Conference on Tools with Artificial Intelligence (2004)

    Google Scholar 

  33. Lu, J.F., Tang, J.B., Tang, J.M., Yang, J.Y.: Hierarchical Initialization Approach for K-Means Clustering. Pattern Recognition 25, 787–795 (2008)

    Article  Google Scholar 

  34. Ray, S., Turi, R.H.: Determination of Number of Clusters in K-means Clustering and Application in Colour Image Segmentation. In: Proceedings of 4th International Conference (ICAPRDT 1999), Calcutta, pp. 137–143 (1999)

    Google Scholar 

  35. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/dataset

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reddy, D., Mishra, D., Jana, P.K. (2011). MST-Based Cluster Initialization for K-Means. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17857-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17856-6

  • Online ISBN: 978-3-642-17857-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics