Analysis of Big Data Platform with OpenStack and Hadoop

Li, Xiaoyan; Lu, Zhihui; Wang, Nini; Wu, Jie; Huang, Shalin

doi:10.1007/978-3-319-49178-3_29

Xiaoyan Li¹⁶,
Zhihui Lu¹⁶,
Nini Wang¹⁷,
Jie Wu¹⁷ &
…
Shalin Huang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10065))

Included in the following conference series:

Asia-Pacific Services Computing Conference

2483 Accesses
1 Citations

Abstract

In the era of big data, the cloud infrastructure needs to strongly support big data. As a distributed computational framework, Hadoop is one of the de facto leading software tools for solving big data problems. The cloud infrastructure has been proven to be a good support for three-tier architecture applications. In this paper, we construct a Hadoop big data platform based on OpenStack cloud. At the same time, we design three experimental scenarios, carry out a set of experiments using the standard Hadoop benchmarks TestDFSIO, TeraSort and PI, and examine the performance. Our experiments reveal that the disk read operation of physical servers can be a bottleneck for TestDFSIO and TeraSort. Wider allocation of VMs over physical servers achieves better performance for read jobs of TestDFSIO and TeraSort. For CPU-intensive job PI, the best practice is to centralize the allocation of VMs over physical machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 7(1), 1–5 (2012)
Google Scholar
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)
Article Google Scholar
Kotiyal, B., Kumar, A., Pant, B., et al.: Big data: mining of log file through Hadoop. In: 2013 International Conference on Human Computer Interactions (ICHCI), pp. 1–7. IEEE (2013)
Google Scholar
Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and Map Reduce. In: 2012 Nirma University International Conference on Engineering (NUiCONE), pp. 1–5. IEEE (2012)
Google Scholar
Nandimath, J., Banerjee, E., Patil, A., et al.: Big data analysis using Apache Hadoop. In: 2013 IEEE 14th International Conference on Information Reuse and Integration (IRI), pp. 700–703. IEEE (2013)
Google Scholar
Hadoop. http://Hadoop.apache.org/Introduction
Song, G., Meng, Z., Huet, F., et al.: A Hadoop MapReduce performance prediction method. In: High Performance Computing and Communications, pp. 820–825. IEEE (2013)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Yang, H., Dasdan, A., Hsiao, R.L., et al.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM (2007)
Google Scholar
Ko, B.M., Lee, J., Jo, H.: Toward enhancing block I/O performance for virtualized Hadoop cluster. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 481–482. IEEE Computer Society (2014)
Google Scholar
Vasconcelos, P.R.M., de Araújo Freitas, G.A.: Performance analysis of Hadoop MapReduce on an OpenNebula cloud with KVM and OpenVZ virtualizations. In: 2014 9th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 471–476. IEEE (2014)
Google Scholar
Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 433–438. IEEE (2010)
Google Scholar
Li, J., Wang Q, Jayasinghe D, et al.: Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks. In: 2013 IEEE International Congress on Big Data, pp. 9–16. IEEE (2013)
Google Scholar
Ishii, M., Han, J., Makino, H.: Design and performance evaluation for Hadoop clusters on virtualized environment. In: The International Conference on Information Networking 2013 (ICOIN), pp. 244–249. IEEE (2013)
Google Scholar
Aggarwal, S., Phadke, S., Bhandarkar, M.: Characterization of Hadoop jobs using unsupervised learning. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 748–753. IEEE (2010)
Google Scholar
Bortnikov, E., Frank, A., Hillel, E., et al.: Predicting execution bottlenecks in map-reduce clusters. Presented as part of the, p. 18 (2012)
Google Scholar
Yin, J., Qiao, Y.: Performance modeling and optimization of MapReduce programs. In: 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, pp. 180–186. IEEE (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No. 1551110700 - New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-Cloud Platform Design for Big data.

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, 200433, China
Xiaoyan Li & Zhihui Lu
Engineering Research Center of Cyber Security, Auditing and Monitoring, Ministry of Education, Shanghai, 200433, China
Nini Wang & Jie Wu
Wangsu Science & Technology Co., Ltd., Shanghai, 200433, China
Shalin Huang

Authors

Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Lu
View author publications
You can also search for this author in PubMed Google Scholar
Nini Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shalin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihui Lu .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
North China University of Technology, Beijing, China
Yanbo Han
University of Murcia, Murcia, Spain
Gregorio Martínez Pérez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Lu, Z., Wang, N., Wu, J., Huang, S. (2016). Analysis of Big Data Platform with OpenStack and Hadoop. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-49178-3_29
Published: 10 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49177-6
Online ISBN: 978-3-319-49178-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics