Parallel Bat Algorithm-Based Clustering Using MapReduce

Ashish, Tripathi; Kapil, Sharma; Manju, Bala

doi:10.1007/978-981-10-4600-1_7

Parallel Bat Algorithm-Based Clustering Using MapReduce

Tripathi Ashish⁶,
Sharma Kapil⁶ &
Bala Manju⁶

Conference paper
First Online: 03 November 2017

792 Accesses
32 Citations

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 4))

Abstract

As we are going through the era of big data where the size of the data is increasing very rapidly resulting into the failure of traditional clustering methods on such a massive data sets. If the size of data exceeds the storage capacity or memory of the system, the task of clustering will become more complex and time intensive. To overcome this problem, this paper proposes a fast and efficient parallel bat algorithm (PBA) for the data clustering using the map-reduce architecture. Efficient using the evolutionary approach for clustering purpose rather than using traditional algorithm like k-means and fast by paralyzing it using the Hadoop and map-reduce architecture. The PBA algorithm works by dividing the large data set into small blocks and clustering these smaller data blocks in parallel. The proposed algorithm inherits the bat algorithm features to cluster the data set. The proposed algorithm is validated on five benchmark data sets against particle swarm optimization with different number of nodes. Experimental results show that the PBA algorithm is giving competitive results as compared to the particle swarm optimization and also providing the significant speedup with increasing number of nodes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

D. Che, M. Safran, and Z. Peng, “From big data to big data mining: challenges, issues, and opportunities,” in Database Systems for Advanced Applications, 2013.
Google Scholar
X. Cui, P. Zhu, X. Yang, K. Li, and C. Ji, “Optimized big data k-means clustering using mapreduce,” The Journal of Supercomputing, vol. 70, pp. 1249–1259, 2014.
Article Google Scholar
J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp. 107–113, 2008.
Article Google Scholar
A. Elsayed, H. M. Mokhtar, and O. Ismail, “Ontology based document clustering using mapreduce,” arXiv preprint arXiv:1505.02891, 2015.
L. D. Geronimo, F. Ferrucci, A. Murolo, and F. Sarro, “A parallel genetic algorithm based on hadoop mapreduce for the automatic generation of junit test suites,” in Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on, 2012.
Google Scholar
Y.-J. Gong, W.-N. Chen, Z.-H. Zhan, J. Zhang, Y. Li, Q. Zhang, and J.-J. Li, “Distributed evolutionary algorithms and their models: A survey of the state-of-the-art” Applied Soft Computing, vol. 34, pp. 286–300, 2015.
Article Google Scholar
Y. He, H. Tan, W. Luo, H. Mao, D. Ma, S. Feng, and J. Fan, “Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce,” in Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on, 2011.
Google Scholar
H.-G. Li, G.-Q. Wu, X.-G. Hu, J. Zhang, L. Li, and X. Wu, “K-means clustering with bagging and mapreduce,” in System Sciences (HICSS), 2011 44th Hawaii International Conference on, 2011.
Google Scholar
A. W. McNabb, C. K. Monson, and K. D. Seppi, “Parallel pso using mapreduce,” in Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, 2007.
Google Scholar
A. Verma, X. Llorà, D. E. Goldberg, and R. H. Campbell, “Scaling genetic algorithms using mapreduce,” in Intelligent Systems Design and Applications, 2009. ISDA’09. Ninth International Conference on, 2009.
Google Scholar
Y. Xu and T. You, “Minimizing thermal residual stresses in ceramic matrix composites by using iterative mapreduce guided particle swarm optimization algorithm,” Composite Structures, vol. 99, pp. 388–396, 2013.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jaypee Institute of Information Technology Noida, Delhi Technological University Delhi, IP College of Women Delhi, New Delhi, India
Tripathi Ashish, Sharma Kapil & Bala Manju

Authors

Tripathi Ashish
View author publications
You can also search for this author in PubMed Google Scholar
Sharma Kapil
View author publications
You can also search for this author in PubMed Google Scholar
Bala Manju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tripathi Ashish .

Editor information

Editors and Affiliations

University of Murcia, Murcia, Spain
Gregorio Martinez Perez
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Munesh C. Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashish, T., Kapil, S., Manju, B. (2018). Parallel Bat Algorithm-Based Clustering Using MapReduce. In: Perez, G., Mishra, K., Tiwari, S., Trivedi, M. (eds) Networking Communication and Data Knowledge Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 4. Springer, Singapore. https://doi.org/10.1007/978-981-10-4600-1_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-4600-1_7
Published: 03 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4599-8
Online ISBN: 978-981-10-4600-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics