Skip to main content

Analysis of Big Data Platform with OpenStack and Hadoop

  • Conference paper
  • First Online:
Book cover Advances in Services Computing (APSCC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10065))

Included in the following conference series:

Abstract

In the era of big data, the cloud infrastructure needs to strongly support big data. As a distributed computational framework, Hadoop is one of the de facto leading software tools for solving big data problems. The cloud infrastructure has been proven to be a good support for three-tier architecture applications. In this paper, we construct a Hadoop big data platform based on OpenStack cloud. At the same time, we design three experimental scenarios, carry out a set of experiments using the standard Hadoop benchmarks TestDFSIO, TeraSort and PI, and examine the performance. Our experiments reveal that the disk read operation of physical servers can be a bottleneck for TestDFSIO and TeraSort. Wider allocation of VMs over physical servers achieves better performance for read jobs of TestDFSIO and TeraSort. For CPU-intensive job PI, the best practice is to centralize the allocation of VMs over physical machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 7(1), 1–5 (2012)

    Google Scholar 

  2. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)

    Article  Google Scholar 

  3. Kotiyal, B., Kumar, A., Pant, B., et al.: Big data: mining of log file through Hadoop. In: 2013 International Conference on Human Computer Interactions (ICHCI), pp. 1–7. IEEE (2013)

    Google Scholar 

  4. Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and Map Reduce. In: 2012 Nirma University International Conference on Engineering (NUiCONE), pp. 1–5. IEEE (2012)

    Google Scholar 

  5. Nandimath, J., Banerjee, E., Patil, A., et al.: Big data analysis using Apache Hadoop. In: 2013 IEEE 14th International Conference on Information Reuse and Integration (IRI), pp. 700–703. IEEE (2013)

    Google Scholar 

  6. Hadoop. http://Hadoop.apache.org/Introduction

  7. Song, G., Meng, Z., Huet, F., et al.: A Hadoop MapReduce performance prediction method. In: High Performance Computing and Communications, pp. 820–825. IEEE (2013)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Yang, H., Dasdan, A., Hsiao, R.L., et al.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM (2007)

    Google Scholar 

  10. Ko, B.M., Lee, J., Jo, H.: Toward enhancing block I/O performance for virtualized Hadoop cluster. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 481–482. IEEE Computer Society (2014)

    Google Scholar 

  11. Vasconcelos, P.R.M., de Araújo Freitas, G.A.: Performance analysis of Hadoop MapReduce on an OpenNebula cloud with KVM and OpenVZ virtualizations. In: 2014 9th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 471–476. IEEE (2014)

    Google Scholar 

  12. Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 433–438. IEEE (2010)

    Google Scholar 

  13. Li, J., Wang Q, Jayasinghe D, et al.: Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks. In: 2013 IEEE International Congress on Big Data, pp. 9–16. IEEE (2013)

    Google Scholar 

  14. Ishii, M., Han, J., Makino, H.: Design and performance evaluation for Hadoop clusters on virtualized environment. In: The International Conference on Information Networking 2013 (ICOIN), pp. 244–249. IEEE (2013)

    Google Scholar 

  15. Aggarwal, S., Phadke, S., Bhandarkar, M.: Characterization of Hadoop jobs using unsupervised learning. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 748–753. IEEE (2010)

    Google Scholar 

  16. Bortnikov, E., Frank, A., Hillel, E., et al.: Predicting execution bottlenecks in map-reduce clusters. Presented as part of the, p. 18 (2012)

    Google Scholar 

  17. Yin, J., Qiao, Y.: Performance modeling and optimization of MapReduce programs. In: 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, pp. 180–186. IEEE (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No. 1551110700 - New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-Cloud Platform Design for Big data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhihui Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Li, X., Lu, Z., Wang, N., Wu, J., Huang, S. (2016). Analysis of Big Data Platform with OpenStack and Hadoop. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49178-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49177-6

  • Online ISBN: 978-3-319-49178-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics