Automatic clustering using an improved artificial bee colony optimization for customer segmentation

  • R. J. Kuo
  • Ferani E. Zulvia
Regular Paper


In cluster analysis, determining number of clusters is an important issue because information about the most appropriate number of clusters do not exist in the real-world problems. Automatic clustering is a clustering approach which is able to automatically find the most suitable number of clusters as well as divide the instances into the corresponding clusters. This study proposes a novel automatic clustering algorithm using a hybrid of improved artificial bee colony optimization algorithm and K-means algorithm (iABC). The proposed iABC algorithm improves the onlooker bee exploration scheme by directing their movements to a better location. Instead of using a random neighborhood location, the improved onlooker bee considers the data centroid to find a better initial centroid for the K-means algorithm. To increase efficiency of the improvement, the updating process is only applied on the worst cluster centroid. The proposed iABC algorithm is verified using some benchmark datasets. The computational result indicates that the proposed iABC algorithm outperforms the original ABC algorithm for automatic clustering problem. Furthermore, the proposed iABC algorithm is utilized to solve the customer segmentation problem. The result reveals that the iABC algorithm has better and more stable result than original ABC algorithm.


Automatic clustering Artificial bee colony optimization algorithm K-Means algorithm Customer segmentation 


  1. 1.
    Aguilera PA, Fernández A, Ropero RF, Molina L (2012) Groundwater quality assessment using data clustering based on hybrid Bayesian networks. Stoch Environ Res Risk Assess 27:435–447. CrossRefGoogle Scholar
  2. 2.
    Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35:1197–1208. CrossRefMATHGoogle Scholar
  3. 3.
    Bult JR, Wansbeek T (1995) Optimal selection for direct mail. Mark Sci 14:378–394CrossRefGoogle Scholar
  4. 4.
    Chang D-X, Zhang X-D, Zheng C-W, Zhang D-M (2010) A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem. Pattern Recognit 43:1346–1360. CrossRefMATHGoogle Scholar
  5. 5.
    Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 38:218–237. CrossRefGoogle Scholar
  6. 6.
    Das S, Chowdhury A, Abraham A (2009) A bacterial evolutionary algorithm for automatic data clustering. Paper presented at the proceedings of the eleventh conference on congress on evolutionary computation, Trondheim, NorwayGoogle Scholar
  7. 7.
    Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227. CrossRefGoogle Scholar
  8. 8.
    Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    El-Bathy N, Gloster C, Azar G, El-Bathy M, Stein G, Stevenson R (2014) Intelligent surveillance lifecycle architecture for epidemiological data clustering using Twitter and novel genetic algorithm. In: IEEE international conference on electro information technology, Milwaukee, USA, 5–7, pp 149–155.
  10. 10.
    Gao Y, Wang S, Liu S (2009) Automatic clustering based on GA-FCM for pattern recognition. In: Second international symposium on computational intelligence and design, 2009. ISCID ’09, 12–14 Dec 2009, pp 146–149.
  11. 11.
    Garai G, Chaudhuri BB (2004) A novel genetic algorithm for automatic clustering. Pattern Recognit Lett 25:173–187. CrossRefGoogle Scholar
  12. 12.
    Hamka F, Bouwman H, de Reuver M, Kroesen M (2014) Mobile customer segmentation based on smartphone measurement. Telemat Inform 31:220–227. CrossRefGoogle Scholar
  13. 13.
    Hasenstab K, Sugar C, Telesca D, Jeste S, Şentürk D (2016) Robust functional clustering of ERP data with application to a study of implicit learning in autism. Biostatistics 17:484–498. MathSciNetCrossRefGoogle Scholar
  14. 14.
    He H, Tan Y (2012) A two-stage genetic algorithm for automatic clustering. Neurocomputing 81:49–59. CrossRefGoogle Scholar
  15. 15.
    Huang C-W, Lin K-P, Wu M-C, Hung K-C, Liu G-S, Jen C-H (2014) Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image. Soft Comput 19:459–470. CrossRefGoogle Scholar
  16. 16.
    Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666. CrossRefGoogle Scholar
  17. 17.
    Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Erciyes University, Engineering Faculty, Computer Engineering Department, KayseriGoogle Scholar
  18. 18.
    Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2012) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42:21–57. CrossRefGoogle Scholar
  19. 19.
    Krishna K, Narasimha Murty M (1999) Genetic K-means algorithm. IEEE Trans Syst Man Cyberne Part B Cybern 29:433–439. CrossRefGoogle Scholar
  20. 20.
    Kuo RJ, Huang YD, Lin C-C, Wu Y-H, Zulvia FE (2014) Automatic kernel clustering with bee colony optimization algorithm. Inf Sci 283:107–122. CrossRefGoogle Scholar
  21. 21.
    Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140CrossRefGoogle Scholar
  22. 22.
    Lichman M (2013) UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences.
  23. 23.
    Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math Comput 218:1267–1279MathSciNetMATHGoogle Scholar
  24. 24.
    Marinakis Y, Marinaki M, Matsatsinis N (July 2009) A hybrid discrete Artificial Bee Colony—GRASP algorithm for clustering. In: 2009 International conference on computers and industrial engineering, 6–9, pp 548–553.
  25. 25.
    Murthy CA, Chowdhury N (1996) In search of optimal clusters using genetic algorithms. Pattern Recognit Lett 17:825–832. CrossRefGoogle Scholar
  26. 26.
    Omran M, Salman A, Engelbrecht A (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8:332–344. MathSciNetCrossRefGoogle Scholar
  27. 27.
    Ozturk C, Hancer E, Karaboga D (2015) Dynamic clustering with improved binary artificial bee colony algorithm. Appl Soft Comput 28:69–80. CrossRefGoogle Scholar
  28. 28.
    Pan S-M, Cheng K-S (2007) Evolution-based tabu search approach to automatic clustering systems. IEEE Trans Man Cybern Part C Appl Rev 37:827–838. CrossRefGoogle Scholar
  29. 29.
    Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2:315–337. CrossRefGoogle Scholar
  30. 30.
    Saha S, Bandyopadhyay S (2009) A new point symmetry based fuzzy genetic clustering technique for automatic evolution of clusters. Inf Sci 179:3230–3246. CrossRefMATHGoogle Scholar
  31. 31.
    Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43:738–751. CrossRefMATHGoogle Scholar
  32. 32.
    Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27:387–397. CrossRefGoogle Scholar
  33. 33.
    Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. Trans Syst Man Cybern Part B 35:1156–1167. CrossRefGoogle Scholar
  34. 34.
    Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Education Inc, BostonGoogle Scholar
  35. 35.
    Tseng LY, Bien Yang S (2001) A genetic approach to the automatic clustering problem. Pattern Recognit 34:415–424. CrossRefMATHGoogle Scholar
  36. 36.
    Turi RH (2001) Clustering-based colour image segmentation. Monash, MelbourneGoogle Scholar
  37. 37.
    Wang J, Liu P, She FHM, Nahavandi S, Kouzani A (2013) Biomedical time series clustering based on non-negative sparse coding and probabilistic topic model. Comput Methods Programs Biomed 111:629–641. CrossRefGoogle Scholar
  38. 38.
    Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847. CrossRefGoogle Scholar
  39. 39.
    Yan X, Zhu Y, Zou W, Wang L (2012) A new approach for data clustering using hybrid artificial bee colony algorithm. Neurocomputing 97:241–250CrossRefGoogle Scholar
  40. 40.
    Ypma TJ (1995) Historical development of the Newton–Raphson method. SIAM Rev 37:531–551. MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Zhang D, Liu X, Guan Z (2006) A dynamic clustering algorithm based on PSO and its application in fuzzy identification. In: 2006 International conference on intelligent information hiding and multimedia, 17–20, pp 232–235.

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Industrial ManagementNational Taiwan University of Science and TechnologyTaipeiTaiwan
  2. 2.Department of Logistics EngineeringPertamina UniversityJakartaIndonesia

Personalised recommendations