Advertisement

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

  • Ashish Kumar Tripathi
  • Kapil Sharma
  • Manju Bala
Original Article
  • 74 Downloads

Abstract

In the past one decade there has been significant increase in the growth of digital data. Therefore, good data mining techniques are important for the better decision making. Clustering is one of the key element in the field of data mining. K-means is a very popular algorithm present in the literature which is widely used for the clustering purpose. However k-means algorithm suffers from the problem of stucking into local optimum solution because of it’s dependency on the random initialization of initial cluster center. In this paper a novel variant of Bat algorithm based on dynamic frequency is introduced. Further the proposed variant is hybridized with K-means to present a new approach for clustering in distributed environment. Since evolutionary computation is very computation intensive, traditional sequential algorithms are not able to provide satisfactory results within the reasonable amount of time for the large scale data problems. To mitigate this problem the proposed variant is parallelized using the MapReduce model in the Hadoop framework. The experimental results show that the proposed algorithm has outperformed K-means, PSO and Bat algorithm on eighty percent of the benchmark datasets in terms of intra-cluster distance. Further DBPKBA has also achieved significant speedup for dealing with massive datasets with increase in the number of nodes.

Keywords

Bat algorithm Hadoop MapReduce Large data sets DFBPKBA 

References

  1. Aljarah I, Ludwig SA (2013) Towards a scalable intrusion detection system based on parallel pso clustering using mapreduce. In: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, ACM, pp 169–170Google Scholar
  2. Bansal JC, Sharma H, Jadon SS, Clerc M (2014) Spider monkey optimization algorithm for numerical optimization. Memet Comput 6(1):31–47CrossRefGoogle Scholar
  3. Bhavani R, Sadasivam GS, Kumaran R (2011) A novel parallel hybrid k-means-de-aco clustering approach for genomic clustering using mapreduce. In: 2011 world congress on information and communication technologies (WICT), IEEE, pp 132–137Google Scholar
  4. Blake C, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, IrvineGoogle Scholar
  5. Cai S-J, Tsai P-W (2016) Echolocation guided evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):153–162Google Scholar
  6. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  7. del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inf Sci 285:112–137CrossRefGoogle Scholar
  8. Fayyad UM, Wierse A, Grinstein GG (2002) Information visualization in data mining and knowledge discovery. Morgan Kaufmann, BurlingtonGoogle Scholar
  9. Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769Google Scholar
  10. Frontpage–hadoop wiki. http://wiki.apache.org/hadoop/, (Accessed on 09/17/2016)
  11. Gong Y-J, Chen W-N, Zhan Z-H, Zhang J, Li Y, Zhang Q, Li J-J (2015) Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl Soft Comput 34:286–300CrossRefGoogle Scholar
  12. Hatamlou A, Abdullah S, Nezamabadi-Pour H (2012) A combined approach for clustering based on k-means and gravitational search algorithms. Swarm Evolut Comput 6:47–52CrossRefGoogle Scholar
  13. Jadon SS, Bansal JC, Tiwari R, Sharma H (2014) Artificial bee colony algorithm with global and local neighborhoods. Int J Syst Assur Eng Manag 1–13Google Scholar
  14. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666CrossRefGoogle Scholar
  15. Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188CrossRefGoogle Scholar
  16. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkMATHGoogle Scholar
  17. Khezr SN, Navimipour NJ (2015) Mapreduce and its application in optimization algorithms: a comprehensive study. Majlesi J Multimed Process 4(3)Google Scholar
  18. Lin K-Y, Xu L-H, Wu J-H (2004) A fast fuzzy c-means clustering for color image segmentation. J Image Gr 2:005Google Scholar
  19. Lin C-Y, Pai Y-M, Tsai K-H, Wen CH-P, Wang L-C (2013) Parallelizing modified cuckoo search on mapreduce architecture. J Electr Sci Technol 11(2):115–123Google Scholar
  20. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp 3818–3824Google Scholar
  21. Meena MJ, Chandran K, Karthik A, Samuel AV (2012) An enhanced aco algorithm to select features for text categorization and its parallelization. Expert Syst Appl 39(5):5861–5871CrossRefGoogle Scholar
  22. Moertini VS, Venica L (2016) Enhancing parallel k-means using map reduce for discovering knowledge from big data. In: 2016 IEEE international conference on cloud computing and big data analysis (ICCCBDA), IEEE, pp 81–87Google Scholar
  23. Nguyen T, Pan J, Chu S, Roddick JF, Dao TK (2016) Optimization localization in wireless sensor network based on multi-objective firefly algorithm. J Netw Intell 1(4):130–138Google Scholar
  24. Sharma K, Chhamunya V, Gupta P, Sharma H, Bansal JC (2015) Fitness based particle swarm optimization. Int J Syst Assur Eng Manag 6(3):319–329CrossRefGoogle Scholar
  25. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, 1–10Google Scholar
  26. Tsai P-W, Zhang J, Zhang S, Istanda V, Liao L-C, Pan J-S (2015) Improving swarm intelligence accuracy with cosine functions for evolved bat algorithm. J Inf Hiding Multimed Signal Process 6:1194–1202Google Scholar
  27. Tsai PW, Zhang J, Liu Y, He Y, Zhang S, Pan J-S (2016) Undulating swarm intelligence agents in wave increasing evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):21–30Google Scholar
  28. Verma A, Llorà X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using mapreduce. In: 2009 ninth international conference on intelligent systems design and applications, IEEE, pp 13–18Google Scholar
  29. Wang J, Yuan D, Jiang M (2012) Parallel k-pso based on mapreduce. In: 2012 IEEE 14th international conference on communication technology (ICCT), IEEE, pp 1203–1208Google Scholar
  30. Wu B, Wu G, Yang M (2012) A mapreduce based ant colony optimization approach to combinatorial optimization problems. In: 2012 eighth international conference on natural computation (ICNC), IEEE, pp 728–732Google Scholar
  31. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678CrossRefGoogle Scholar
  32. Xu X, Ji Z, Yuan F, Liu X (2014) A novel parallel approach of cuckoo search using mapreduce. In: 2014 international conference on computer, communications and information technology (CCIT 2014), Atlantis PressGoogle Scholar
  33. Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), Springer, pp 65–74Google Scholar
  34. Yang X-S, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3):141–149CrossRefGoogle Scholar
  35. Yang S, Wu R, Wang M, Jiao L (2010) Evolutionary clustering based vector quantization and spiht coding for image compression. Pattern Recogn Lett 31(13):1773–1780CrossRefGoogle Scholar
  36. You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A mapreduce based parallel svm for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43CrossRefGoogle Scholar

Copyright information

© The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden 2017

Authors and Affiliations

  1. 1.Delhi Technological UniversityDelhiIndia
  2. 2.IP College of WomenDelhiIndia

Personalised recommendations