Skip to main content
Log in

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

A Publisher Correction to this article was published on 28 September 2023

This article has been updated

Abstract

The improvement of Hadoop performance has received considerable attention from researchers in cloud computing fields. Most studies have focused on improving the performance of a Hadoop cluster. Notably, various parameters are required to configure Hadoop and must be adjusted to improve performance. This paper proposes a mechanism to improve Hadoop, schedule jobs, and allocate and utilize resources. Specifically, we present an improved ant colony optimization method to schedule jobs according to the job size and the time expected for execution. Priority is given to the job with the minimum data size and minimum response time. The resource usage and running jobs by data node are predicted using an artificial neural network, and job activity and resource usage are monitored using the resource manager. Moreover, we enhance the Hadoop Name node performance by adding an aggregator node to the default HDFS framework architecture. The changes involve four entities: the name node, secondary name node, aggregator nodes, and data nodes, where the aggregator node is responsible for assigning the jobs among the data node, and the Name node keeps tracking only the aggregator nodes. We test the overall scheme among Amazon EC2 and S3, and show the results of throughput and CPU response time for different data sizes. Finally, we show that the proposed approach shows significant improvement compare to native Hadoop and other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

References

  1. Wang T, Wang J, Nguyen SN, Yang Z, Mi N, Sheng B. Ea2s2: an efficient application-aware storage system for big data processing in heterogeneous clusters. In: 2017 26th international conference on computer communication and networks (ICCCN). IEEE; 2017. p. 1–9.

  2. Subrahmanyam K, Thanekar SA, Bagwan A. Improving Hadoop performance by enhancing name node capabilities. J Soc Technol Environ Sci. 2017;6(2):1–8.

    Google Scholar 

  3. Usama M, Liu M, Chen M. Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw. 2017;3(4):260–73.

    Article  Google Scholar 

  4. Han S, Choi W, Muwafiq R, Nah Y. Impact of memory size on bigdata processing based on Hadoop and spark. In: Proceedings of the international conference on research in adaptive and convergent systems. ACM; 2017. p. 275–80.

  5. Nghiem PP, Figueira SM. Towards efficient resource provisioning in MapReduce. J Parallel Distrib Comput. 2016;95:29–41.

    Article  Google Scholar 

  6. Wang K, Yang Y, Qiu X, Gao Z. MOSM: an approach for efficient storing massive small files on Hadoop. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA). IEEE; 2017. p. 397–401.

  7. Kim H-G. Effects of design factors of HDFS on a I/O performance. J Comput Sci. 2018;14:304–9.

    Article  Google Scholar 

  8. Nazini H, Sasikala T. Simulating aircraft landing and take off scheduling in distributed framework environment using Hadoop file system. Cluster Comput. 2018;22:1–9.

    Google Scholar 

  9. Luo X, Fu X. Configuration optimization method of Hadoop system performance based on genetic simulated annealing algorithm. Cluster Comput. 2018;22:1–9.

    Google Scholar 

  10. Guo M. Design and realization of bank history data management system based on Hadoop 2.0. Cluster Comput. 2018;22:1–7.

    Google Scholar 

  11. Aydin G, Hallac IR. Distributed log analysis on the cloud using mapreduce. arXiv preprint arXiv:1802.03589. 2018.

  12. Yao Y, Tai J, Sheng B, Mi N. LSPS: a job size-based scheduler for efficient task assignments in Hadoop. IEEE Trans Cloud Comput. 2015;3(4):411–24.

    Article  Google Scholar 

  13. Bhatnagar R. Machine learning and big data processing: a technological perspective and review. In: International conference on advanced machine learning technologies and applications. Springer; 2018. p. 468–78.

  14. Lu Q, Li S, Zhang W, Zhang L. A genetic algorithm-based job scheduling model for big data analytics. EURASIP J Wirel Commun Netw. 2016;2016(1):152.

    Article  Google Scholar 

  15. Hua X, Huang MC, Liu P. Hadoop configuration tuning with ensemble modeling and metaheuristic optimization. IEEE Access. 2018;6:44161–74.

    Article  Google Scholar 

  16. Ba-Alwi FM, Ammar SM. Improved FTWeighted HashT Apriori algorithm for big data using Hadoop MapReduce model. J Adv Math Comput Sci. 2018;27(1):1–11.

    Article  Google Scholar 

  17. Singh S, Garg R, Mishra P. Performance optimization of MapReduce-based apriori algorithm on Hadoop cluster. Comput Electr Eng. 2018;67:348–64.

    Article  Google Scholar 

  18. Soualhia M, Khomh F, Tahar S. A dynamic and failure-aware task scheduling framework for Hadoop. IEEE Trans Cloud Comput. 2018. https://doi.org/10.1109/TCC.2018.2805812.

    Article  Google Scholar 

  19. Wang J, Qiu M, Guo B, Zong Z. Phase—reconfigurable shuffle optimization for Hadoop MapReduce. IEEE Trans Cloud Comput. 2015. https://doi.org/10.1109/TCC.2015.2459707.

    Article  Google Scholar 

  20. Kc K, Anyanwu K. Scheduling Hadoop jobs to meet deadlines. In: 2010 IEEE second international conference on cloud computing technology and science (CloudCom). IEEE; 2010. p. 388–92.

  21. Guo Y, Wu L, Yu W, Wu B, Wang X. The improved job scheduling algorithm of Hadoop platform. arXiv preprint arXiv:1506.03004. 2015.

  22. Brahmwar M, Kumar M, Sikka G. Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci. 2016;89:203–8.

    Article  Google Scholar 

  23. Gu R, Yang X, Yan J, Sun Y, Wang B, Yuan C, Huang Y. SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J Parallel Distrib Comput. 2014;74(3):2166–79.

    Article  Google Scholar 

  24. Alshammari H, Lee J, Bajwa H. H2hadoop: improving Hadoop performance using the metadata of related jobs. IEEE Trans Cloud Comput. 2016;6:1031–40.

    Article  Google Scholar 

  25. Jeon S, Chung H, Choi W, Shin H, Chun J, Kim JT, Nah Y. MapReduce tuning to improve distributed machine learning performance. In: 2018 IEEE first international conference on artificial intelligence and knowledge engineering (AIKE). IEEE; 2018. p. 198–200.

  26. Chung H, Nah Y. Performance comparison of distributed processing of large volume of data on top of Xen and Docker-based virtual clusters. In: International conference on database systems for advanced applications. Springer; 2017. p. 103–13.

  27. Chen C-T, Hung L-J, Hsieh S-Y, Buyya R, Zomaya Y. Heterogeneous job allocation scheduler for Hadoop MapReduce using dynamic grouping integrated neighboring search. IEEE Trans Cloud Comput. 2017. https://doi.org/10.1109/TCC.2017.2748586.

    Article  Google Scholar 

  28. Sneha S, Sebastian S. Improved fair scheduling algorithm for Hadoop clustering. Oriental J Comput Sci Technol. 2017;10:194–200.

    Article  Google Scholar 

  29. Choi D, Jeon M, Kim N, Lee B-D. An enhanced data-locality-aware task scheduling algorithm for Hadoop applications. IEEE Syst J. 2017;99:1–12.

    Google Scholar 

  30. Guo Y, Rao J, Cheng D, Zhou X. ishuffle: improving Hadoop performance with shuffle-on-write. IEEE Trans Parallel Distrib Syst. 2017;28(6):1649–62.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MIST (Ministry of Science and ICT), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute for Information & communications Technology Promotion) (2017-0-00091). This work was supported by “Human Resources Program in Energy Technology” of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea. (No. 20174030201740).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rayan Alanazi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alanazi, R., Alhazmi, F., Chung, H. et al. A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network. SN COMPUT. SCI. 1, 184 (2020). https://doi.org/10.1007/s42979-020-00182-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00182-3

Keywords

Navigation