Abstract
In the era of big data, the cloud infrastructure needs to strongly support big data. As a distributed computational framework, Hadoop is one of the de facto leading software tools for solving big data problems. The cloud infrastructure has been proven to be a good support for three-tier architecture applications. In this paper, we construct a Hadoop big data platform based on OpenStack cloud. At the same time, we design three experimental scenarios, carry out a set of experiments using the standard Hadoop benchmarks TestDFSIO, TeraSort and PI, and examine the performance. Our experiments reveal that the disk read operation of physical servers can be a bottleneck for TestDFSIO and TeraSort. Wider allocation of VMs over physical servers achieves better performance for read jobs of TestDFSIO and TeraSort. For CPU-intensive job PI, the best practice is to centralize the allocation of VMs over physical machines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 7(1), 1–5 (2012)
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)
Kotiyal, B., Kumar, A., Pant, B., et al.: Big data: mining of log file through Hadoop. In: 2013 International Conference on Human Computer Interactions (ICHCI), pp. 1–7. IEEE (2013)
Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and Map Reduce. In: 2012 Nirma University International Conference on Engineering (NUiCONE), pp. 1–5. IEEE (2012)
Nandimath, J., Banerjee, E., Patil, A., et al.: Big data analysis using Apache Hadoop. In: 2013 IEEE 14th International Conference on Information Reuse and Integration (IRI), pp. 700–703. IEEE (2013)
Song, G., Meng, Z., Huet, F., et al.: A Hadoop MapReduce performance prediction method. In: High Performance Computing and Communications, pp. 820–825. IEEE (2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Yang, H., Dasdan, A., Hsiao, R.L., et al.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM (2007)
Ko, B.M., Lee, J., Jo, H.: Toward enhancing block I/O performance for virtualized Hadoop cluster. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 481–482. IEEE Computer Society (2014)
Vasconcelos, P.R.M., de Araújo Freitas, G.A.: Performance analysis of Hadoop MapReduce on an OpenNebula cloud with KVM and OpenVZ virtualizations. In: 2014 9th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 471–476. IEEE (2014)
Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 433–438. IEEE (2010)
Li, J., Wang Q, Jayasinghe D, et al.: Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks. In: 2013 IEEE International Congress on Big Data, pp. 9–16. IEEE (2013)
Ishii, M., Han, J., Makino, H.: Design and performance evaluation for Hadoop clusters on virtualized environment. In: The International Conference on Information Networking 2013 (ICOIN), pp. 244–249. IEEE (2013)
Aggarwal, S., Phadke, S., Bhandarkar, M.: Characterization of Hadoop jobs using unsupervised learning. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 748–753. IEEE (2010)
Bortnikov, E., Frank, A., Hillel, E., et al.: Predicting execution bottlenecks in map-reduce clusters. Presented as part of the, p. 18 (2012)
Yin, J., Qiao, Y.: Performance modeling and optimization of MapReduce programs. In: 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, pp. 180–186. IEEE (2014)
Acknowledgments
This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No. 1551110700 - New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-Cloud Platform Design for Big data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, X., Lu, Z., Wang, N., Wu, J., Huang, S. (2016). Analysis of Big Data Platform with OpenStack and Hadoop. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-49178-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49177-6
Online ISBN: 978-3-319-49178-3
eBook Packages: Computer ScienceComputer Science (R0)