Advertisement

Experimental Study on Performance and Energy Consumption of Hadoop in Cloud Environments

  • Aymen JlassiEmail author
  • Patrick Martineau
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 740)

Abstract

The big data applications are a resource and energy intensive applications. Cloud providers wish to better utilize the technologies of virtualization in order to solve the evolving needs of infrastructures, alongside the growing demand. The virtualization technology based on container is increasingly popular in the high performance domain, this work is the evaluation of this technology in the context of big data and cloud computing domains. It focuses on the software Hadoop, as a big data application, it evaluates the performance impact and energy consumption. The objective is to understand the tradeoff between performance and energy efficiency depending on the technology of virtualization. The outcomes of this paper are: Firstly, the evaluation of the technology of virtualization based on containers on the cloud using Hadoop as a big data application. Secondly, the comparison of the traditional virtualization with the merging container technology. We analyze the impact of the coexistence of virtual machines (or containers) on the CPU, memory, hard disk throughput and network bandwidth. Thirdly, the reduction of the big data application deployment cost using the cloud. Fourthly, the Hadoop community finds an in-depth study of the resource consumption depending on the deployment environment. Our evaluation shows that: (i) The container (Docker) technology is a performance enhancement and energy saving technology compared to the traditional technology of virtualization. (ii) Performance of Hadoop cluster based on containers is significantly better than the traditional virtualization technology. (iii) Data replication rate influences the completion date of job. (vi) Coexisting containers (or virtual machines) influence the energy consumption and the completion time of the applications.

Keywords

Cloud computing Virtualization Hadoop MapReduce Power consumption Performance 

Notes

Acknowledgements

This work was sponsored in part by the CYRES GROUP in France and French National Research Agency under the grant CIFRE n\(^\mathrm{o}\) 2012/1403.

References

  1. 1.
    Fadika, Z., Govindaraju, M., Canon, R., Ramakrishnan, L.: Evaluating Hadoop for data-intensive scientific operations. In: 5th IEEE International Conference on Cloud Computing, pp. 67–74. IEEE Press, Honolulu (2012)Google Scholar
  2. 2.
    Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? J. Commun. ACM. 53, 64–71 (2010)CrossRefGoogle Scholar
  3. 3.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)Google Scholar
  4. 4.
    Yunhong, G., Grossman, R.L.: Lessons learned from a year’s worth of benchmarks of large data clouds. In: 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 3:1–3:6. ACM, New York (2009)Google Scholar
  5. 5.
    Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Grid information services for distributed resource sharing. In: 12th International Conference on Grid Computing, pp. 90–97. IEEE Computer Society, Washington, D.C. (2011)Google Scholar
  6. 6.
    Shafer, J., Rixner, S., Cox, A.L.: The Hadoop distributed filesystem: balancing portability and performance. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 122–133. IEEE Press, White Plains (2010)Google Scholar
  7. 7.
    Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 10th International Conference on Complex Intelligent and Software Intensive Systems, pp. 433–438. IEEE Computer Society, Washington, D.C. (2010)Google Scholar
  8. 8.
    Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. J. Proc. VLDB Endow. 3, 472–483 (2010)CrossRefGoogle Scholar
  9. 9.
    Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: Agrawal, D., Candan, K.S., Li, W.-S. (eds.) New Frontiers in Information and Software as Services. LNBIP, vol. 74, pp. 209–228. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19294-4_9 CrossRefGoogle Scholar
  10. 10.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)Google Scholar
  11. 11.
    Understanding Full Virtualization, Paravirtualization and Hardware Assist. http://ww.vmware.com/files/pdf/VMware_paravirtualization.pdf
  12. 12.
  13. 13.
    Massie, M., Li, B., Nicholes, B., Vuksan, V., Alexander, R., Buchbinder, J., Costa, F., Dean, A., Josephsen, D., Phaal, P., Pocock, D.: Monitoring with Ganglia. O’Reilly Media Inc., Sebastopol (2012)Google Scholar
  14. 14.
    Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Setting the direction for Big Data benchmark standards. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 197–208. Springer, Heidelberg (2013)Google Scholar
  15. 15.
  16. 16.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)CrossRefGoogle Scholar
  17. 17.
    Joe, L., Steve, C., Bruce, H., Rebecca, D., Evan, H., Danielle, S., Danielle, S., Andrew, F.: Report to Congress on Server and Data Center Energy Efficiency. U.S. Environmental Protection Agency, New York (2007)Google Scholar
  18. 18.
    Pierre, D.: American Data Centers Are Wasting Huge Amounts of Energy. U.S. Environmental Protection Agency, New York (2014). www.nrdc.org/energy
  19. 19.
  20. 20.
    Xu, G., Xu, F., Ma, H.: Deploying and researching Hadoop in virtual machines. In: IEEE International Conference on Automation and Logistics, pp. 395–399. IEEE Press, Zhengzhou (2012)Google Scholar
  21. 21.
    Peinl, R., Holzschuher, F.: The Docker ecosystem needs consolidation. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 535–542 (2015)Google Scholar
  22. 22.
    Reshetova, E., Karhunen, J., Nyman, T., Asokan, N.: Security of OS-level virtualization technologies: Technical report. CoRR (2014)Google Scholar
  23. 23.
    Surviving the Zombie Apocalypse Containers, KVM, Xen, and Security. https://archive.fosdem.org/2015/schedule/event/zombieapocalypse/
  24. 24.
    Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 21st IEEE Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240. IEEE Press, Belfast (2013)Google Scholar
  25. 25.
    Wen, Y., Zhao, J., Zhao, G., Chen, H., Wang, D.: A survey of virtualization technologies focusing on untrusted code execution. In: 6th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 378–383. IEEE Press, Palermo (2012)Google Scholar
  26. 26.
    Jlassi, A., Martineau, P., Tkindt, V.: Offline scheduling of map and reduce tasks on Hadoop systems. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 178–185 (2015)Google Scholar
  27. 27.
  28. 28.
  29. 29.
    Gandomi, A., Haide, M.: Beyond the hype: Big Data concepts, methods, and analytics. J. Int. J. Inf. Manag. 35, 137–144 (2015)CrossRefGoogle Scholar
  30. 30.
    Xavier, M.G., Neves, M.V., De Rose, C. A. F.: A Performance comparison of container-based virtualization systems for MapReduce clusters. In: 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 299–306. IEEE Press, Torino (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Université François-Rabelais de Tours, CNRS, LI EA 6300, OC ERL CNRS 6305ToursFrance

Personalised recommendations