Skip to main content

Experimental Study on Performance and Energy Consumption of Hadoop in Cloud Environments

  • Conference paper
  • First Online:
Book cover Cloud Computing and Services Science (CLOSER 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 740))

Included in the following conference series:

Abstract

The big data applications are a resource and energy intensive applications. Cloud providers wish to better utilize the technologies of virtualization in order to solve the evolving needs of infrastructures, alongside the growing demand. The virtualization technology based on container is increasingly popular in the high performance domain, this work is the evaluation of this technology in the context of big data and cloud computing domains. It focuses on the software Hadoop, as a big data application, it evaluates the performance impact and energy consumption. The objective is to understand the tradeoff between performance and energy efficiency depending on the technology of virtualization. The outcomes of this paper are: Firstly, the evaluation of the technology of virtualization based on containers on the cloud using Hadoop as a big data application. Secondly, the comparison of the traditional virtualization with the merging container technology. We analyze the impact of the coexistence of virtual machines (or containers) on the CPU, memory, hard disk throughput and network bandwidth. Thirdly, the reduction of the big data application deployment cost using the cloud. Fourthly, the Hadoop community finds an in-depth study of the resource consumption depending on the deployment environment. Our evaluation shows that: (i) The container (Docker) technology is a performance enhancement and energy saving technology compared to the traditional technology of virtualization. (ii) Performance of Hadoop cluster based on containers is significantly better than the traditional virtualization technology. (iii) Data replication rate influences the completion date of job. (vi) Coexisting containers (or virtual machines) influence the energy consumption and the completion time of the applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fadika, Z., Govindaraju, M., Canon, R., Ramakrishnan, L.: Evaluating Hadoop for data-intensive scientific operations. In: 5th IEEE International Conference on Cloud Computing, pp. 67–74. IEEE Press, Honolulu (2012)

    Google Scholar 

  2. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? J. Commun. ACM. 53, 64–71 (2010)

    Article  Google Scholar 

  3. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)

    Google Scholar 

  4. Yunhong, G., Grossman, R.L.: Lessons learned from a year’s worth of benchmarks of large data clouds. In: 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 3:1–3:6. ACM, New York (2009)

    Google Scholar 

  5. Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Grid information services for distributed resource sharing. In: 12th International Conference on Grid Computing, pp. 90–97. IEEE Computer Society, Washington, D.C. (2011)

    Google Scholar 

  6. Shafer, J., Rixner, S., Cox, A.L.: The Hadoop distributed filesystem: balancing portability and performance. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 122–133. IEEE Press, White Plains (2010)

    Google Scholar 

  7. Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 10th International Conference on Complex Intelligent and Software Intensive Systems, pp. 433–438. IEEE Computer Society, Washington, D.C. (2010)

    Google Scholar 

  8. Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. J. Proc. VLDB Endow. 3, 472–483 (2010)

    Article  Google Scholar 

  9. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: Agrawal, D., Candan, K.S., Li, W.-S. (eds.) New Frontiers in Information and Software as Services. LNBIP, vol. 74, pp. 209–228. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19294-4_9

    Chapter  Google Scholar 

  10. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)

    Google Scholar 

  11. Understanding Full Virtualization, Paravirtualization and Hardware Assist. http://ww.vmware.com/files/pdf/VMware_paravirtualization.pdf

  12. Intel Virtualization Technology (Intel VT). http://www.intel.com/content/www/us/en/virtualization/virtualization-technology/intel-virtualization-technology.html

  13. Massie, M., Li, B., Nicholes, B., Vuksan, V., Alexander, R., Buchbinder, J., Costa, F., Dean, A., Josephsen, D., Phaal, P., Pocock, D.: Monitoring with Ganglia. O’Reilly Media Inc., Sebastopol (2012)

    Google Scholar 

  14. Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Setting the direction for Big Data benchmark standards. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 197–208. Springer, Heidelberg (2013)

    Google Scholar 

  15. Hadoop Wiki PowerBy. https://wiki.apache.org/hadoop/PoweredBy

  16. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)

    Article  Google Scholar 

  17. Joe, L., Steve, C., Bruce, H., Rebecca, D., Evan, H., Danielle, S., Danielle, S., Andrew, F.: Report to Congress on Server and Data Center Energy Efficiency. U.S. Environmental Protection Agency, New York (2007)

    Google Scholar 

  18. Pierre, D.: American Data Centers Are Wasting Huge Amounts of Energy. U.S. Environmental Protection Agency, New York (2014). www.nrdc.org/energy

  19. Data Centres Energy Efficiency. http://iet.jrc.ec.europa.eu/energyefficiency/ict-codes-conduct/data-centres-energy-efficiency

  20. Xu, G., Xu, F., Ma, H.: Deploying and researching Hadoop in virtual machines. In: IEEE International Conference on Automation and Logistics, pp. 395–399. IEEE Press, Zhengzhou (2012)

    Google Scholar 

  21. Peinl, R., Holzschuher, F.: The Docker ecosystem needs consolidation. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 535–542 (2015)

    Google Scholar 

  22. Reshetova, E., Karhunen, J., Nyman, T., Asokan, N.: Security of OS-level virtualization technologies: Technical report. CoRR (2014)

    Google Scholar 

  23. Surviving the Zombie Apocalypse Containers, KVM, Xen, and Security. https://archive.fosdem.org/2015/schedule/event/zombieapocalypse/

  24. Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 21st IEEE Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240. IEEE Press, Belfast (2013)

    Google Scholar 

  25. Wen, Y., Zhao, J., Zhao, G., Chen, H., Wang, D.: A survey of virtualization technologies focusing on untrusted code execution. In: 6th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 378–383. IEEE Press, Palermo (2012)

    Google Scholar 

  26. Jlassi, A., Martineau, P., Tkindt, V.: Offline scheduling of map and reduce tasks on Hadoop systems. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 178–185 (2015)

    Google Scholar 

  27. Getting Started with systemd. https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/

  28. Hadoop Performance Tuning Guide - AMD. http://www.admin-magazine.com/HPC/Vendors/AMD/Whitepaper-Hadoop-Performance-Tuning-Guide

  29. Gandomi, A., Haide, M.: Beyond the hype: Big Data concepts, methods, and analytics. J. Int. J. Inf. Manag. 35, 137–144 (2015)

    Article  Google Scholar 

  30. Xavier, M.G., Neves, M.V., De Rose, C. A. F.: A Performance comparison of container-based virtualization systems for MapReduce clusters. In: 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 299–306. IEEE Press, Torino (2014)

    Google Scholar 

Download references

Acknowledgements

This work was sponsored in part by the CYRES GROUP in France and French National Research Agency under the grant CIFRE n\(^\mathrm{o}\) 2012/1403.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aymen Jlassi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jlassi, A., Martineau, P. (2017). Experimental Study on Performance and Energy Consumption of Hadoop in Cloud Environments. In: Helfert, M., Ferguson, D., Méndez Muñoz, V., Cardoso, J. (eds) Cloud Computing and Services Science. CLOSER 2016. Communications in Computer and Information Science, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-62594-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62594-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62593-5

  • Online ISBN: 978-3-319-62594-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics