Abstract
Recently it is observed that Yahoo, Facebook, mobile devices, sensors, scientific instruments, etc., are generating a huge amount of data. It is a challenge to store, manage, process, and analyze this data. Apache Hadoop Yarn is a framework which provides a solution for big data. In this paper, we have evaluated the performance of Apache Hadoop Yarn MapReduce jobs such as Pi, TeraGen, TeraSort, and Wordcount on single cluster node. After evaluating performance; jobs are classified into various classes like low CPU intensive job, high CPU intensive job based on CPU utilization (%). Based on the classification, Apache Hadoop Yarn MapReduce jobs executed on multi-cluster environment and evaluated performance. It is found that execution time has increased for low CPU intensive job and decreased for high CPU intensive job. Also, a total CPU time is decreased for low and high CPU intensive job. In addition, CPU Utilization is decreased for low CPU intensive job and increased for high CPU intensive job when number of nodes increased.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vinod Kumar, V., et al. (2013) Apache Hadoop Yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, ACM.
Apache Hadoop. http://hadoop.apache.org.
Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
Maurya, M., & Mahajan, S. (2012). Performance analysis of MapReduce programs on Hadoop cluster. In 2012 World Congress on Information and Communication Technologies (WICT), IEEE.
Joshi, S. B. (2012). Apache Hadoop performance-tuning methodologies and best practices. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE ‘12). ACM, New York, pp. 241–242 doi:10.1145/2188286.2188323 http://doi.acm.org/10.1145/2188286.2188323.
Liu, Z., & Mu, D. (2012). Analysis of resource usage profile for MapReduce applications using Hadoop on cloud. In 2012 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (ICQR2MSE), pp. 1500, 1504, 15–18 June 2012.
Kamal, Kc. & Freeh, V. W. Tuning Hadoop map slot value using CPU metric.
Yao, Y., Wang, J., Sheng B., & Mi, N. (2013). Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster. In 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 1,8, June 28 2013-July 3 2013.
Wang, K., Lin, X., & Tang, W., Predator—an experience guided configuration optimizer for Hadoop MapReduce. In 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 419, 426, 3–6 Dec 2012.
Feng, B., Lu, J., Zhou, Y. & Yang, N. (2012). Energy efficiency for MapReduce workloads: an in-depth study. In Zhang, R., Zhang, Y. (Eds.), Proceedings of the Australasian Database Conference (ADC 2012), Melbourne, Australia. CRPIT, vol. 124. ACS, pp. 61–70.
Lin, W., & Liu, J. (2013). Performance analysis of MapReduce program in heterogeneous cloud computing. Journal of Networks, 8(8), 1734–1741.
Kazuki, Y. et al. (2013). Implementation and evaluation of the JobTracker initiative task scheduling on Hadoop. In 2013 First International Symposium on Computing and Networking (CANDAR), IEEE.
Dhok, J., & Varma, V. (2005). Using pattern classification for task assignment in mapreduce. Hyderabad: International Institute of Information Technology.
Benslimane, Z., Liu, Q., & Hongming, Z. (2013). Predicting Hadoop Parameters.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Mathiya, B.J., Desai, V.L. (2016). Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment. In: Satapathy, S., Joshi, A., Modi, N., Pathak, N. (eds) Proceedings of International Conference on ICT for Sustainable Development. Advances in Intelligent Systems and Computing, vol 408. Springer, Singapore. https://doi.org/10.1007/978-981-10-0129-1_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-0129-1_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0127-7
Online ISBN: 978-981-10-0129-1
eBook Packages: EngineeringEngineering (R0)