The concept of big data refers to the huge amount of information that the organizations process, analyse and store. In the real-world scenario, some big data possess other features such as credit card fraud detection big data, extreme weather forecast big data and so on. In order to deal with the problem of classifying the binary imbalanced big data, based on MapReduce framework (MRF), an enhanced model is proposed for the process of classification in this paper. An optimization based on MRF is used for dealing with the imbalanced big data using the deep learning network for classification. The mappers in the MRF carry out the feature selection process with the proposed Adaptive E-Bat algorithm, which is a combination of adaptive, Exponential Weighted Moving Average (EWMA) and the Bat algorithm (BA) concepts. Using the features, the reducers perform the classification using Deep Belief Network (DBN) that is trained with the proposed Adaptive E-Bat algorithm. The performance of the proposed Adaptive E-Bat DBN method is evaluated in terms of metrics, namely accuracy and True Positive rate (TPR); a higher accuracy of 0.8998 and higher TPR of 0.9144 are obtained, that show the superiority of the proposed Adaptive E-Bat DBN method in effective big data classification.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Wu X, Zhu X, Wu G Q and Ding W 2014 Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26(1): 97–107
H. Karau, A. Konwinski, P. Wendell and M. Zaharia 2015 Learning Spark: lightning-fast Big Data Analytics.
Ekhool-Top learning management system from https://ekhool.com/ (2016)
U. Fayyad and R. Uthurusamy 2002 Evolving data into mining solutions for insights. Communications of Computers in Entertainment 45(8): 28–31
A. Fernández et al 2014 Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(5): 380–409
Mazumder, S., Bhadoria, R.S. and Deka, G.C 2017 Distributed computing in big data analytics (Scalable computing and communications)
Swarnkar, M. and Bhadoria, R.S 2017 Security issues and challenges in big data analytics in distributed environment. In: Distributed computing in big data analytics, pp. 83–94
Sharma, U. and Bhadoria, R.S 2016 Supportive architectural analysis for big data. In: The human element of big data. Chapman and Hall/CRC, pp. 137–154
J. Gama 2010 Knowledge discovery from data streams.
J. Dean and S. Ghemawat 2004 MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150
Mayer-Schönberger V and K. Cukier 2013 Big data: a revolution that will transform how we live”, work and think
D. Aha 1997 Lazy learning. Dordrecht, The Netherlands: Kluwer
C. C. Aggarwal 2015 Data mining: the textbook. Cham, Switzerland: Springer
S. Ramírez-Gallego, B. Krawczyk, S. García, M. Woźniak, J. M. Benítez and F. Herrera 2017 Nearest neighbor classification for high-speed big data streams using Spark. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(10): 2727–2739
Wu Z, Lin W, Zhang Z, Wen A and Lin L 2017 An ensemble random forest algorithm for insurance big data analysis. In: Proceedings of the IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC)
D. Han, C. G. Giraud-Carrier and S. Li 2015 Efficient mining of high-speed uncertain data streams. Applied Intelligence 43(4): 773–785
Zhai, J., Zhang, S., Zhang, M. and Liu, X 2018 Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Computing 22: 3519–3531
R. Varatharajan, Manogaran G and Priyan M K 2018 A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools and Applications 77: 10195–10215
Elkano M, Galar M, Sanz J and Bustince H 2018 CHI-BD: a fuzzy rule-based classification system for Big Data classification problems. Fuzzy Sets and Systems 348: 75–101
Singh D, Roy D and Krishna Mohan C 2017 DiP-SVM: Distribution preserving kernel support vector machine for Big Data. IEEE Transactions on Big Data 3(1): 79–90
Duan M, Li K, Liao X and Li K 2018 A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Transactions on Neural Networks and Learning Systems 29(6): 2337–2351
Chen J, Li K, Zhuo Tang S, Bilal K, Yu S, Weng C and Li K 2017 A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems 28(4): 919–933
Hababeh I, Gharaibeh A, Nofal S and Khalil I 2018 An integrated methodology for big data classification and security for improving cloud systems data mobility. IEEE Access 7: 9153–9163
Y. Yang, Y.Wang and X. Yuan 2012 Bidirectional extreme learning machine for regression problem and its learning effectiveness. IEEE Transactions on Neural Networks and Learning Systems 23(9): 1498–1505
L. D. Briceno, H. J. Siegel, A. A. Maciejewski, M. Oltikar, and J. Brateman 2011 Heuristics for robust resource allocation of satellite weather data processing on a heterogeneous parallel system. IEEE Transactions on Parallel and Distributed Systems 22(11): 1780–1787
A. Spark June 2016 Spark MLlib – random forest. Website
Yang, X.S 2011 Bat algorithm for multi-objective optimisation. International Journal of Bio-Inspired Computation 3(5): 267–274
Fister I, Fong S and Brest J 2014 A Novel Hybrid Self-Adaptive Bat Algorithm., The Scientific World Journal (Recent Advances in Information Technology) 2014 https://doi.org/10.1155/2014/709738
Saccucci M S, Amin R W and Lucas J M 1992 Exponentially weighted moving average control schemes with variable sampling intervals. Communications in Statistics – Simulation and Computation 21(3): 627–657
Vojt B J 2016 Deep neural networks and their implementation., Master Thesis, Department of Theoretical Computer Science and Mathematical Logic, Prague
Breast cancer dataset. http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29 (accessed in March 2018)
Hepatitis dataset. https://archive.ics.uci.edu/ml/datasets/hepatitis (accessed in March 2018)
Pima Indian diabetes dataset. https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes (accessed in March 2018)
Heart disease dataset. http://archive.ics.uci.edu/ml/datasets/heart+disease (accessed in March 2018)
Poker hand data set. https://archive.ics.uci.edu/ml/datasets/Poker+Hand (accessed in 2002)
SUSY data set. https://archive.ics.uci.edu/ml/datasets/SUSY# (accessed in July 2014)
H. Ke, D. Chen, X. Li, Y. Tang, T. Shah and R. Ranjan Towards brain big data classification: epileptic EEG identification with a lightweight VGGNet on global MIC. IEEE Access PP(99): 1-1
About this article
Cite this article
Md Mujeeb, S., Praveen Sam, R. & Madhavi, K. Adaptive Exponential Bat algorithm and deep learning for big data classification. Sādhanā 46, 15 (2021). https://doi.org/10.1007/s12046-020-01521-z
- Big data
- MapReduce framework
- Exponential Weighted Moving Average
- Deep Belief Network