Adaptive Exponential Bat algorithm and deep learning for big data classification

Abstract

The concept of big data refers to the huge amount of information that the organizations process, analyse and store. In the real-world scenario, some big data possess other features such as credit card fraud detection big data, extreme weather forecast big data and so on. In order to deal with the problem of classifying the binary imbalanced big data, based on MapReduce framework (MRF), an enhanced model is proposed for the process of classification in this paper. An optimization based on MRF is used for dealing with the imbalanced big data using the deep learning network for classification. The mappers in the MRF carry out the feature selection process with the proposed Adaptive E-Bat algorithm, which is a combination of adaptive, Exponential Weighted Moving Average (EWMA) and the Bat algorithm (BA) concepts. Using the features, the reducers perform the classification using Deep Belief Network (DBN) that is trained with the proposed Adaptive E-Bat algorithm. The performance of the proposed Adaptive E-Bat DBN method is evaluated in terms of metrics, namely accuracy and True Positive rate (TPR); a higher accuracy of 0.8998 and higher TPR of 0.9144 are obtained, that show the superiority of the proposed Adaptive E-Bat DBN method in effective big data classification.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

References

  1. 1

    Wu X, Zhu X, Wu G Q and Ding W 2014 Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26(1): 97–107

    Article  Google Scholar 

  2. 2

    H. Karau, A. Konwinski, P. Wendell and M. Zaharia 2015 Learning Spark: lightning-fast Big Data Analytics.

  3. 3

    Ekhool-Top learning management system from https://ekhool.com/ (2016)

  4. 4

    U. Fayyad and R. Uthurusamy 2002 Evolving data into mining solutions for insights. Communications of Computers in Entertainment 45(8): 28–31

    Google Scholar 

  5. 5

    A. Fernández et al 2014 Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(5): 380–409

    Google Scholar 

  6. 6

    Mazumder, S., Bhadoria, R.S. and Deka, G.C 2017 Distributed computing in big data analytics (Scalable computing and communications)

  7. 7

    Swarnkar, M. and Bhadoria, R.S 2017 Security issues and challenges in big data analytics in distributed environment. In: Distributed computing in big data analytics, pp. 83–94

  8. 8

    Sharma, U. and Bhadoria, R.S 2016 Supportive architectural analysis for big data. In: The human element of big data. Chapman and Hall/CRC, pp. 137–154

  9. 9

    J. Gama 2010 Knowledge discovery from data streams.

  10. 10

    J. Dean and S. Ghemawat 2004 MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150

  11. 11

    Mayer-Schönberger V and K. Cukier 2013 Big data: a revolution that will transform how we live”, work and think

  12. 12

    D. Aha 1997 Lazy learning. Dordrecht, The Netherlands: Kluwer

    Google Scholar 

  13. 13

    C. C. Aggarwal 2015 Data mining: the textbook. Cham, Switzerland: Springer

    Google Scholar 

  14. 14

    S. Ramírez-Gallego, B. Krawczyk, S. García, M. Woźniak, J. M. Benítez and F. Herrera 2017 Nearest neighbor classification for high-speed big data streams using Spark. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(10): 2727–2739

    Article  Google Scholar 

  15. 15

    Wu Z, Lin W, Zhang Z, Wen A and Lin L 2017 An ensemble random forest algorithm for insurance big data analysis. In: Proceedings of the IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

  16. 16

    D. Han, C. G. Giraud-Carrier and S. Li 2015 Efficient mining of high-speed uncertain data streams. Applied Intelligence 43(4): 773–785

    Article  Google Scholar 

  17. 17

    Zhai, J., Zhang, S., Zhang, M. and Liu, X 2018 Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Computing 22: 3519–3531

    Article  Google Scholar 

  18. 18

    R. Varatharajan, Manogaran G and Priyan M K 2018 A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools and Applications 77: 10195–10215

    Article  Google Scholar 

  19. 19

    Elkano M, Galar M, Sanz J and Bustince H 2018 CHI-BD: a fuzzy rule-based classification system for Big Data classification problems. Fuzzy Sets and Systems 348: 75–101

    MathSciNet  Article  Google Scholar 

  20. 20

    Singh D, Roy D and Krishna Mohan C 2017 DiP-SVM: Distribution preserving kernel support vector machine for Big Data. IEEE Transactions on Big Data 3(1): 79–90

    Article  Google Scholar 

  21. 21

    Duan M, Li K, Liao X and Li K 2018 A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Transactions on Neural Networks and Learning Systems 29(6): 2337–2351

    MathSciNet  Article  Google Scholar 

  22. 22

    Chen J, Li K, Zhuo Tang S, Bilal K, Yu S, Weng C and Li K 2017 A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems 28(4): 919–933

    Article  Google Scholar 

  23. 23

    Hababeh I, Gharaibeh A, Nofal S and Khalil I 2018 An integrated methodology for big data classification and security for improving cloud systems data mobility. IEEE Access 7: 9153–9163

    Article  Google Scholar 

  24. 24

    Y. Yang, Y.Wang and X. Yuan 2012 Bidirectional extreme learning machine for regression problem and its learning effectiveness. IEEE Transactions on Neural Networks and Learning Systems 23(9): 1498–1505

    Article  Google Scholar 

  25. 25

    L. D. Briceno, H. J. Siegel, A. A. Maciejewski, M. Oltikar, and J. Brateman 2011 Heuristics for robust resource allocation of satellite weather data processing on a heterogeneous parallel system. IEEE Transactions on Parallel and Distributed Systems 22(11): 1780–1787

    Article  Google Scholar 

  26. 26

    A. Spark June 2016 Spark MLlib – random forest. Website

  27. 27

    Yang, X.S 2011 Bat algorithm for multi-objective optimisation. International Journal of Bio-Inspired Computation 3(5): 267–274

    Article  Google Scholar 

  28. 28

    Fister I, Fong S and Brest J 2014 A Novel Hybrid Self-Adaptive Bat Algorithm., The Scientific World Journal (Recent Advances in Information Technology) 2014 https://doi.org/10.1155/2014/709738

  29. 29

    Saccucci M S, Amin R W and Lucas J M 1992 Exponentially weighted moving average control schemes with variable sampling intervals. Communications in Statistics – Simulation and Computation 21(3): 627–657

  30. 30

    Vojt B J 2016 Deep neural networks and their implementation., Master Thesis, Department of Theoretical Computer Science and Mathematical Logic, Prague

  31. 31

    Breast cancer dataset. http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29 (accessed in March 2018)

  32. 32

    Hepatitis dataset. https://archive.ics.uci.edu/ml/datasets/hepatitis (accessed in March 2018)

  33. 33

    Pima Indian diabetes dataset. https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes (accessed in March 2018)

  34. 34

    Heart disease dataset. http://archive.ics.uci.edu/ml/datasets/heart+disease (accessed in March 2018)

  35. 35

    Poker hand data set. https://archive.ics.uci.edu/ml/datasets/Poker+Hand (accessed in 2002)

  36. 36

    SUSY data set. https://archive.ics.uci.edu/ml/datasets/SUSY# (accessed in July 2014)

  37. 37

    H. Ke, D. Chen, X. Li, Y. Tang, T. Shah and R. Ranjan Towards brain big data classification: epileptic EEG identification with a lightweight VGGNet on global MIC. IEEE Access PP(99): 1-1

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to S Md Mujeeb.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Md Mujeeb, S., Praveen Sam, R. & Madhavi, K. Adaptive Exponential Bat algorithm and deep learning for big data classification. Sādhanā 46, 15 (2021). https://doi.org/10.1007/s12046-020-01521-z

Download citation

Keywords

  • Big data
  • MapReduce framework
  • Exponential Weighted Moving Average
  • adaptive
  • Deep Belief Network