Skip to main content

Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment

  • Conference paper
  • First Online:
Proceedings of International Conference on ICT for Sustainable Development

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 408))

  • 1561 Accesses

Abstract

Recently it is observed that Yahoo, Facebook, mobile devices, sensors, scientific instruments, etc., are generating a huge amount of data. It is a challenge to store, manage, process, and analyze this data. Apache Hadoop Yarn is a framework which provides a solution for big data. In this paper, we have evaluated the performance of Apache Hadoop Yarn MapReduce jobs such as Pi, TeraGen, TeraSort, and Wordcount on single cluster node. After evaluating performance; jobs are classified into various classes like low CPU intensive job, high CPU intensive job based on CPU utilization (%). Based on the classification, Apache Hadoop Yarn MapReduce jobs executed on multi-cluster environment and evaluated performance. It is found that execution time has increased for low CPU intensive job and decreased for high CPU intensive job. Also, a total CPU time is decreased for low and high CPU intensive job. In addition, CPU Utilization is decreased for low CPU intensive job and increased for high CPU intensive job when number of nodes increased.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vinod Kumar, V., et al. (2013) Apache Hadoop Yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, ACM.

    Google Scholar 

  2. Apache Hadoop. http://hadoop.apache.org.

  3. Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.

    Article  Google Scholar 

  4. Maurya, M., & Mahajan, S. (2012). Performance analysis of MapReduce programs on Hadoop cluster. In 2012 World Congress on Information and Communication Technologies (WICT), IEEE.

    Google Scholar 

  5. Joshi, S. B. (2012). Apache Hadoop performance-tuning methodologies and best practices. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE ‘12). ACM, New York, pp. 241–242 doi:10.1145/2188286.2188323 http://doi.acm.org/10.1145/2188286.2188323.

  6. Liu, Z., & Mu, D. (2012). Analysis of resource usage profile for MapReduce applications using Hadoop on cloud. In 2012 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (ICQR2MSE), pp. 1500, 1504, 15–18 June 2012.

    Google Scholar 

  7. Kamal, Kc. & Freeh, V. W. Tuning Hadoop map slot value using CPU metric.

    Google Scholar 

  8. Yao, Y., Wang, J., Sheng B., & Mi, N. (2013). Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster. In 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 1,8, June 28 2013-July 3 2013.

    Google Scholar 

  9. Wang, K., Lin, X., & Tang, W., Predator—an experience guided configuration optimizer for Hadoop MapReduce. In 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 419, 426, 3–6 Dec 2012.

    Google Scholar 

  10. Feng, B., Lu, J., Zhou, Y. & Yang, N. (2012). Energy efficiency for MapReduce workloads: an in-depth study. In Zhang, R., Zhang, Y. (Eds.), Proceedings of the Australasian Database Conference (ADC 2012), Melbourne, Australia. CRPIT, vol. 124. ACS, pp. 61–70.

    Google Scholar 

  11. Lin, W., & Liu, J. (2013). Performance analysis of MapReduce program in heterogeneous cloud computing. Journal of Networks, 8(8), 1734–1741.

    Article  Google Scholar 

  12. Kazuki, Y. et al. (2013). Implementation and evaluation of the JobTracker initiative task scheduling on Hadoop. In 2013 First International Symposium on Computing and Networking (CANDAR), IEEE.

    Google Scholar 

  13. Dhok, J., & Varma, V. (2005). Using pattern classification for task assignment in mapreduce. Hyderabad: International Institute of Information Technology.

    Google Scholar 

  14. Benslimane, Z., Liu, Q., & Hongming, Z. (2013). Predicting Hadoop Parameters.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhavin J. Mathiya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Mathiya, B.J., Desai, V.L. (2016). Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment. In: Satapathy, S., Joshi, A., Modi, N., Pathak, N. (eds) Proceedings of International Conference on ICT for Sustainable Development. Advances in Intelligent Systems and Computing, vol 408. Springer, Singapore. https://doi.org/10.1007/978-981-10-0129-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0129-1_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0127-7

  • Online ISBN: 978-981-10-0129-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics