A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

Alanazi, Rayan; Alhazmi, Fawaz; Chung, Haejin; Nah, Yunmook

doi:10.1007/s42979-020-00182-3

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

Original Research
Published: 24 May 2020

Volume 1, article number 184, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Rayan Alanazi ORCID: orcid.org/0000-0002-7559-0493¹,
Fawaz Alhazmi²,
Haejin Chung³ &
…
Yunmook Nah⁴

1522 Accesses
6 Citations
Explore all metrics

A Publisher Correction to this article was published on 28 September 2023

This article has been updated

Abstract

The improvement of Hadoop performance has received considerable attention from researchers in cloud computing fields. Most studies have focused on improving the performance of a Hadoop cluster. Notably, various parameters are required to configure Hadoop and must be adjusted to improve performance. This paper proposes a mechanism to improve Hadoop, schedule jobs, and allocate and utilize resources. Specifically, we present an improved ant colony optimization method to schedule jobs according to the job size and the time expected for execution. Priority is given to the job with the minimum data size and minimum response time. The resource usage and running jobs by data node are predicted using an artificial neural network, and job activity and resource usage are monitored using the resource manager. Moreover, we enhance the Hadoop Name node performance by adding an aggregator node to the default HDFS framework architecture. The changes involve four entities: the name node, secondary name node, aggregator nodes, and data nodes, where the aggregator node is responsible for assigning the jobs among the data node, and the Name node keeps tracking only the aggregator nodes. We test the overall scheme among Amazon EC2 and S3, and show the results of throughput and CPU response time for different data sizes. Finally, we show that the proposed approach shows significant improvement compare to native Hadoop and other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

Article 30 August 2023

A novel energy-based task scheduling in fog computing environment: an improved artificial rabbits optimization approach

Article 10 April 2024

An improved cuckoo search algorithm for global optimization

Article 12 April 2024

Change history

28 September 2023
A Correction to this paper has been published: https://doi.org/10.1007/s42979-023-02168-3

References

Wang T, Wang J, Nguyen SN, Yang Z, Mi N, Sheng B. Ea2s2: an efficient application-aware storage system for big data processing in heterogeneous clusters. In: 2017 26th international conference on computer communication and networks (ICCCN). IEEE; 2017. p. 1–9.
Subrahmanyam K, Thanekar SA, Bagwan A. Improving Hadoop performance by enhancing name node capabilities. J Soc Technol Environ Sci. 2017;6(2):1–8.
Google Scholar
Usama M, Liu M, Chen M. Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw. 2017;3(4):260–73.
Article Google Scholar
Han S, Choi W, Muwafiq R, Nah Y. Impact of memory size on bigdata processing based on Hadoop and spark. In: Proceedings of the international conference on research in adaptive and convergent systems. ACM; 2017. p. 275–80.
Nghiem PP, Figueira SM. Towards efficient resource provisioning in MapReduce. J Parallel Distrib Comput. 2016;95:29–41.
Article Google Scholar
Wang K, Yang Y, Qiu X, Gao Z. MOSM: an approach for efficient storing massive small files on Hadoop. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA). IEEE; 2017. p. 397–401.
Kim H-G. Effects of design factors of HDFS on a I/O performance. J Comput Sci. 2018;14:304–9.
Article Google Scholar
Nazini H, Sasikala T. Simulating aircraft landing and take off scheduling in distributed framework environment using Hadoop file system. Cluster Comput. 2018;22:1–9.
Google Scholar
Luo X, Fu X. Configuration optimization method of Hadoop system performance based on genetic simulated annealing algorithm. Cluster Comput. 2018;22:1–9.
Google Scholar
Guo M. Design and realization of bank history data management system based on Hadoop 2.0. Cluster Comput. 2018;22:1–7.
Google Scholar
Aydin G, Hallac IR. Distributed log analysis on the cloud using mapreduce. arXiv preprint arXiv:1802.03589. 2018.
Yao Y, Tai J, Sheng B, Mi N. LSPS: a job size-based scheduler for efficient task assignments in Hadoop. IEEE Trans Cloud Comput. 2015;3(4):411–24.
Article Google Scholar
Bhatnagar R. Machine learning and big data processing: a technological perspective and review. In: International conference on advanced machine learning technologies and applications. Springer; 2018. p. 468–78.
Lu Q, Li S, Zhang W, Zhang L. A genetic algorithm-based job scheduling model for big data analytics. EURASIP J Wirel Commun Netw. 2016;2016(1):152.
Article Google Scholar
Hua X, Huang MC, Liu P. Hadoop configuration tuning with ensemble modeling and metaheuristic optimization. IEEE Access. 2018;6:44161–74.
Article Google Scholar
Ba-Alwi FM, Ammar SM. Improved FTWeighted HashT Apriori algorithm for big data using Hadoop MapReduce model. J Adv Math Comput Sci. 2018;27(1):1–11.
Article Google Scholar
Singh S, Garg R, Mishra P. Performance optimization of MapReduce-based apriori algorithm on Hadoop cluster. Comput Electr Eng. 2018;67:348–64.
Article Google Scholar
Soualhia M, Khomh F, Tahar S. A dynamic and failure-aware task scheduling framework for Hadoop. IEEE Trans Cloud Comput. 2018. https://doi.org/10.1109/TCC.2018.2805812.
Article Google Scholar
Wang J, Qiu M, Guo B, Zong Z. Phase—reconfigurable shuffle optimization for Hadoop MapReduce. IEEE Trans Cloud Comput. 2015. https://doi.org/10.1109/TCC.2015.2459707.
Article Google Scholar
Kc K, Anyanwu K. Scheduling Hadoop jobs to meet deadlines. In: 2010 IEEE second international conference on cloud computing technology and science (CloudCom). IEEE; 2010. p. 388–92.
Guo Y, Wu L, Yu W, Wu B, Wang X. The improved job scheduling algorithm of Hadoop platform. arXiv preprint arXiv:1506.03004. 2015.
Brahmwar M, Kumar M, Sikka G. Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci. 2016;89:203–8.
Article Google Scholar
Gu R, Yang X, Yan J, Sun Y, Wang B, Yuan C, Huang Y. SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J Parallel Distrib Comput. 2014;74(3):2166–79.
Article Google Scholar
Alshammari H, Lee J, Bajwa H. H2hadoop: improving Hadoop performance using the metadata of related jobs. IEEE Trans Cloud Comput. 2016;6:1031–40.
Article Google Scholar
Jeon S, Chung H, Choi W, Shin H, Chun J, Kim JT, Nah Y. MapReduce tuning to improve distributed machine learning performance. In: 2018 IEEE first international conference on artificial intelligence and knowledge engineering (AIKE). IEEE; 2018. p. 198–200.
Chung H, Nah Y. Performance comparison of distributed processing of large volume of data on top of Xen and Docker-based virtual clusters. In: International conference on database systems for advanced applications. Springer; 2017. p. 103–13.
Chen C-T, Hung L-J, Hsieh S-Y, Buyya R, Zomaya Y. Heterogeneous job allocation scheduler for Hadoop MapReduce using dynamic grouping integrated neighboring search. IEEE Trans Cloud Comput. 2017. https://doi.org/10.1109/TCC.2017.2748586.
Article Google Scholar
Sneha S, Sebastian S. Improved fair scheduling algorithm for Hadoop clustering. Oriental J Comput Sci Technol. 2017;10:194–200.
Article Google Scholar
Choi D, Jeon M, Kim N, Lee B-D. An enhanced data-locality-aware task scheduling algorithm for Hadoop applications. IEEE Syst J. 2017;99:1–12.
Google Scholar
Guo Y, Rao J, Cheng D, Zhou X. ishuffle: improving Hadoop performance with shuffle-on-write. IEEE Trans Parallel Distrib Syst. 2017;28(6):1649–62.
Article Google Scholar

Download references

Acknowledgements

This research was supported by the MIST (Ministry of Science and ICT), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute for Information & communications Technology Promotion) (2017-0-00091). This work was supported by “Human Resources Program in Energy Technology” of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea. (No. 20174030201740).

Author information

Authors and Affiliations

Department of Computer Science and Information Technology, Jouf University, Al-Jawf, Saudi Arabia
Rayan Alanazi
Department of Computer and Network Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Fawaz Alhazmi
Computer Engineering Department (Department of Data Base Engineering), Dakook University, Yongin, South Korea
Haejin Chung
Applied Computer Engineering Department, Dankook University, Yongin, South Korea
Yunmook Nah

Authors

Rayan Alanazi
View author publications
You can also search for this author in PubMed Google Scholar
Fawaz Alhazmi
View author publications
You can also search for this author in PubMed Google Scholar
Haejin Chung
View author publications
You can also search for this author in PubMed Google Scholar
Yunmook Nah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rayan Alanazi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alanazi, R., Alhazmi, F., Chung, H. et al. A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network. SN COMPUT. SCI. 1, 184 (2020). https://doi.org/10.1007/s42979-020-00182-3

Download citation

Received: 12 July 2019
Accepted: 24 April 2020
Published: 24 May 2020
DOI: https://doi.org/10.1007/s42979-020-00182-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

Abstract

Access this article

Similar content being viewed by others

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

A novel energy-based task scheduling in fog computing environment: an improved artificial rabbits optimization approach

An improved cuckoo search algorithm for global optimization

Change history

28 September 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

Abstract

Access this article

Similar content being viewed by others

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

A novel energy-based task scheduling in fog computing environment: an improved artificial rabbits optimization approach

An improved cuckoo search algorithm for global optimization

Change history

28 September 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation