Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers

China Venkanna Varma, P.; Kalyan Chakravarthy, K. V.; Valli Kumari, V.; Viswanadha Raju, S.

doi:10.1007/978-981-10-0451-3_22

P. China Venkanna Varma⁷,
K. V. Kalyan Chakravarthy⁸,
V. Valli Kumari⁹ &
…
S. Viswanadha Raju¹⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 437))

Abstract

Information technology (IT) is creating huge data (big data) everyday. Future business intelligence (BI) can be estimated from the past data. Storing, organizing, and processing big data is the current trend. NoSQL (Moniruzzaman and Hossain, Int J Database Theory Appl 6(4), 2013) [1] and MapReduce (Dean and Ghemawat, MapReduce: simplified data processing on large clusters) [2] technologies find an efficient way to store, organize, and process the big data with commodity hardware using new technologies such as virtualization and Linux containers (LXC) (Sudha et al, Int J Adv Res Comput Sci Softw Eng 4(1), 2014) [3]. Nowadays, all data center services are based on the virtualization and LXC technologies for better resource utilization. Docker (Anderson, Docker software engineering, 2015) [4]-based containers are lightweight virtual machines (VM) being adapted rapidly in hosting big data applications. Docker containers (or simply containers) run inside an operating system (OS) based on Linux Kernel version 2.6.29 and above. Running containers in a virtual machine is a multi-tenant model for scaling in data center services. This leads to higher resource utilization in the data centers and better operational margins. As the number of live containers increases the central processing unit (CPU)’s context switch latency for each live container significantly increases. This will reduce the input and output (IO) throughput of the containers. We observed that the network IO throughput is inversely proportional to the number of live containers sharing the same CPU. The scope of this paper is limited to the network IO throughput which creates a bottleneck in big data environments. As part of this paper, we studied the working of Docker networks, various factors of CPU context switch latency and how network IO throughput will be impacted with the number of live Docker containers. A Hadoop cluster environment built and executed benchmarks such as TestDFSIO-write and TestDFSIO-read against varying number of the live containers. We observed that Hadoop throughput is not linear with increasing number of live container nodes sharing the same system CPU. The future work of this paper can be extended to analyze the practical implications of network performance and come up with a solution to enhance the performance of the Hadoop cluster environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Moniruzzaman, A.B.M., Hossain, S.A.: NoSQL database: new era of databases for big data analytics classification, characteristics and comparison. Int. J. Database Theory Appl. 6(4). http://www.sersc.org/journals/IJDTA/vol6_no4/1.pdf. Accessed Aug 2013
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, Google, Inc. http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduceosdi04.pdf
Sudha, M., Harish, G.M., Usha, J.: Performance analysis of linux containers—an alternative approach to virtual machines. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), Jan 2014. http://www.ijarcsse.com/docs/papers/Volume_4/1_January2014/V4I10330.pdf. Accessed Jan 2014
Anderson, C.: Docker software engineering. The IEEE Computer Society, 2015. https://www.computer.org/csdl/mags/so/2015/03/mso2015030102.pdf
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. http://zoo.cs.yale.edu/classes/cs422/2014fa/readings/papers/shvachko10hdfs.pdf
Buell, J.: A benchmarking case study of virtualized hadoop performance on VMware vSphere 5. https://www.vmware.com/files/pdf/techpaper/VMWHadoopPerformancevSphere5.pdf
Opencore http://ferry.opencore.io/en/latest/

Download references

Author information

Authors and Affiliations

VistaraIT Inc., Hyderabad, India
P. China Venkanna Varma
Andhra University, Visakhapatnam, India
K. V. Kalyan Chakravarthy
Department of CSSE, Andhra University, Visakhapatnam, India
V. Valli Kumari
Department of CSE, JNTUH, Hyderabad, India
S. Viswanadha Raju

Authors

P. China Venkanna Varma
View author publications
You can also search for this author in PubMed Google Scholar
K. V. Kalyan Chakravarthy
View author publications
You can also search for this author in PubMed Google Scholar
V. Valli Kumari
View author publications
You can also search for this author in PubMed Google Scholar
S. Viswanadha Raju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. China Venkanna Varma .

Editor information

Editors and Affiliations

Dept of Applied Sci & Eng, Indian Instit of Tech Roorkee, Roorkee, India
Millie Pant
Department of Mathematics, Indian Inst of Tech Roorkee, Roorkee, India
Kusum Deep
Dept of Mathematics, South Asian University New Delhi, New Delhi, India
Jagdish Chand Bansal
Department of Mathematics and Comp Sci, Liverpool Hope University, LIVERPOOL, United Kingdom
Atulya Nagar
Department of Mathematics, National Inst of Tech Silchar, Silchar, Assam, India
Kedar Nath Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

China Venkanna Varma, P., Kalyan Chakravarthy, K.V., Valli Kumari, V., Viswanadha Raju, S. (2016). Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_22

Download citation

DOI: https://doi.org/10.1007/978-981-10-0451-3_22
Published: 21 April 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0450-6
Online ISBN: 978-981-10-0451-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics