Skip to main content

A Study on Initial Centroids Selection for Partitional Clustering Algorithms

  • Conference paper
  • First Online:
Software Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 731))

Abstract

Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, H.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, Burlington (2002)

    MATH  Google Scholar 

  2. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, pp. 281–297 (1967)

    Google Scholar 

  3. Dunham, M.: Data Mining: Introductory and Advanced Concepts. Pearson Education, London (2006)

    Google Scholar 

  4. Llyod, S.: Least Squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  5. Khan, S.S., Ahmed, A.: Cluster center initialization algorithm for k-means algorithm. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)

    Article  Google Scholar 

  6. Deelers, S., Auwatanamongkol, S.: Engineering k-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 26, pp. 323–328 (2007)

    Google Scholar 

  7. Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 91–99 (1998)

    Google Scholar 

  8. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2003)

    Article  Google Scholar 

  9. Yuan, F., Meng, Z.H., Zhang, H.X., Dong, C.R.: A new algorithm to get the initial centroids. In: Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, pp. 26–29 (2004)

    Google Scholar 

  10. Barakbah, A.R., Helen, A.: Optimized K-means: an algorithm of initial centroids optimization for K-means. In: Proceedings of Soft Computing, Intelligent Systems and Information Technology (SIIT), pp. 63–66 (2005)

    Google Scholar 

  11. Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A.: An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ. Sci. 7(10), 1626–1633 (2006)

    Article  Google Scholar 

  12. Barakbah, A.R., Arai, K.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Rep. Fac. Sci. Eng. 36(1) (2007) (Saga University, Japan)

    Google Scholar 

  13. Barakbah, A.R, Kiyoki, Y.: A pillar algorithm for K-means optimization by distance maximization for initial centroid designation. In: IEEE Symposium on Computational Intelligence and Data Mining (IDM), Nashville-Tennessee, pp. 61–68 (2009)

    Google Scholar 

  14. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, pp. 1027–1035 (2007)

    Google Scholar 

  15. Ahmed, A.H., Ashour, W.: An initialization method for the K-means algorithm using RNN and coupling degree. Int. J. Comput. Appl. (0975–8887) 25(1) (2011)

    Google Scholar 

  16. Huang, L., Du, S., Zhang, Y., Ju, Y., Li, Z.: K-means initial clustering center optimal algorithm based on Kruskal. J. Inf. Comput. Sci. 9(9), 2387–2392 (2012)

    Google Scholar 

  17. Kruskal, J.: On the shortest spanning subtree and the travelling salesman problem. Proc. Am. Math. Soc, 48–50 (1956)

    Google Scholar 

  18. Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A, Saake, G.: An efficient K-means with good initial starting points. Georgian Electron. Sci. J. Comput. Sci. Telecommun. 19(2) (2009)

    Google Scholar 

  19. Reddy, D., Jana, P.K.: Initialization for K-mean clustering Voronoi diagram. In: International Conference on C3IT-2012, Hooghly, Procedia Technology (Elsevier) vol. 4, pp. 395–400, Feb 2012

    Google Scholar 

  20. Preparata, F.P., Shamos, M.I.: Computational Geometry—An Introduction. Springer, Berlin, Heidelberg, Tokyo (1985)

    Book  Google Scholar 

  21. Naik, A., Satapathy, S.C., Parvathi, K.: Improvement of initial cluster center of c-means using teaching learning based optimization. Procedia Technol. 6, 428–435 (2012)

    Article  Google Scholar 

  22. Yang, S.Z., Luo, S.W.: A novel algorithm for initializing clustering centers. In: Proceedings of International Conference on IEEE Machine Learning and Cybernetics, China, vol. 9, pp. 5579–5583 (2005)

    Google Scholar 

  23. Ye, Y., Huang, J., Chen, X., Zhou, S., Williams, G., Xu, X.: Neighborhood density method for selecting initial cluster centers in K-means clustering. In: Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 3918, pp. 189–198 (2006)

    Chapter  Google Scholar 

  24. Zhou, S., Zhao, J.: A neighborhood-based clustering algorithm. In: PAKD 2005. LNAI 3518, pp. 361–371 (1982)

    Chapter  Google Scholar 

  25. Mahmud, M.S., Rahman, M., Akhtar, N.: Improvement of K-means clustering algorithm with better initial centroids based on weighted average. In: 7th International Conference on Electrical and Computer Engineering, Dhaka, Bangladesh, Dec 2012

    Google Scholar 

  26. Kwedlo, W., Iwanowicz, P.: Using genetic algorithm for selection of initial cluster centers for the K-means method. In: International Conference on Artificial Intelligence and Soft Computing. Springer Notes on Artificial Intelligence, pp. 165–172 (2010)

    Chapter  Google Scholar 

  27. Arora, N., Motwani, M.: Sum of distance based algorithm for clustering web data. Int. J. Comput. Appl. 87(7), 26–30 (2014)

    Google Scholar 

  28. Arora, N., Motwani, M.: A distance based clustering algorithm. Int. J. Comput. Eng. Technol. 5(5), 109–119 (2014)

    Google Scholar 

  29. Arora, N., Motwani, M.: Optimizing K-Means by fixing initial cluster centers. Int. J. Curr. Eng. Technol. 4(3), 2101–2107 (2014)

    Article  Google Scholar 

  30. MathWorks MatLab: The Language of Technical Computing (2009)

    Google Scholar 

  31. Karypis, G.: CLUTO: A Clustering Toolkit. Release 2.1.1, Tech. Rep. No. 02-017. University of Minnesota, Department of Computer Science, Minneapolis, MN 55455 (2003)

    Google Scholar 

  32. Kowalski, G.: Information Retrieval Systems—Theory and Implementation. Kluwer Academic Publishers (1997)

    Google Scholar 

  33. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)

    Article  Google Scholar 

  34. Duygulu, P., et al.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European Conference on Computer Vision, pp. 97–112 (2002)

    Chapter  Google Scholar 

  35. Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 947–963 (2001)

    Article  Google Scholar 

  36. Forina, M., Aeberhard, S.: UCI Machine Learning Repository (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahesh Motwani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Motwani, M., Arora, N., Gupta, A. (2019). A Study on Initial Centroids Selection for Partitional Clustering Algorithms. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds) Software Engineering. Advances in Intelligent Systems and Computing, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-10-8848-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8848-3_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8847-6

  • Online ISBN: 978-981-10-8848-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics