Improving the Map and Shuffle Phases in Hadoop MapReduce

Lakshmi, J. V. N.

doi:10.1007/978-981-10-5544-7_21

J. V. N. Lakshmi⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 77))

1282 Accesses

Abstract

Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proficient analogous computing programming representation, MapReduce and Hadoop are employed for extensive data analysis applications. However, MapReduce still suffers with performance problems and MapReduce uses a shuffle phase as a featured element for logical I/O strategy. The map phase requires an improvement in its performance as this phase’s output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate checkpoints which regularly monitor all the splits generated by intermediate phases. MapReduce model is designed in a way that there is a need to wait until all maps accomplish their given task. This acts as a barrier for effective resource utilization. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling, and increase resource utilization in a cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Arulmurugan, A., Srinivasan, R.: Enhanced task scheduling scheme for Hadoop Map Reduce systems. In: IJETCSE, May 2015
Google Scholar
Dimitris, F., Ioannis, M.: Scheduling Map Reduce Jobs and Data Shuffle on Unrelated Process. MIT, Cambridge (2015)
Google Scholar
Pavloet, A.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD, vol. 5, pp. 367–378 (2009)
Google Scholar
Yandong, W., Yu, W., Que, X.: Virtual shuffling for efficient data movement in Map Reduce. In: IEEE Transitions on Computers Conference, June 2015
Google Scholar
Luiz, A.B., Jeffrey, D., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Article Google Scholar
Huston, L., Wickremesinghe, R., SatyaNarayana, M.: Storage architecture for early discard in interactive search. In: FAST Conference Proceedings (2004)
Google Scholar
Dean, J., Ghemawat, S.: Map Reduce: simplified data processing on large clusters in Google, Inc OSDI (2004).
Google Scholar
Lakshmi, J.V.N., Ananthi, S.: A theoretical model for big data analytics using machine learning algorithms. In: ICACCI Conference, Delhi, October 2015
Google Scholar
Kwon, Y.C., Howe, B.: A study of skew in Map Reduce application. In: International Conference, USA (2014)
Google Scholar
Alan, F.G., Olga, N., Shubham, C., Pradeep, K., Shravan, M.N.: Building a high level dataflow system on top of Map Reduce: the pig experience. In: IEEE Conference (2009)
Google Scholar
Yanfei, G., Jia, R., Xiaobo, Z: IShuffle—improving Hadoop performance with shuffle-on-write. In: USENIX ICAC, USA (2013)
Google Scholar
Abouzeid, A., Bajda, P., Abadi, D.J., Rasin, A., et al.: HadoopDB: an architectural hybrid of Map Reduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Google Scholar
Ananthi, S., Lakshmi, J.V.N.: A study on Hadoop architecture for big data analytics. In: Delhi Conference ICETSCET, September 2014
Google Scholar
Herodotos, H., Lim, H., Luo, G.: StarFish—a self tuning system for Big Data Analytics, CIDR, USA (2011)
Google Scholar
Ronnie, C., et al.: SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of VLDB (2008)
Google Scholar
Ashish, T., Joy deep Sen, S.: HIVE—a warehousing solution over a Map Reduce framework. In: VLDB (2009)
Google Scholar
Li, J., Ye, Y.: Improving the shuffle of Hadoop Map Reduce. In: Proceedings of IEEE ICCCTS (2013)
Google Scholar
Li, J., Yue, Y., Lin, X.: Improving the shuffle of Hadoop Map Reduce. In: IEEE ICCCTS, Beijing, China (2013)
Google Scholar
Prateek, D., Sriram, K., Janakiram, D.: Chisel: resource savvy approach for handling skew in Map Reduce application. In: IEEE Conference on Cloud Computing, vol. 35, pp. 45–56 (2013)
Google Scholar
Dean, J., Ghemawat, S.: Map Reduce: a flexible data processing tool. ACM Commun. 53, 72–77 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

AIMS Institutes of Higher Education, Peenya, Bengaluru, Karnataka, India
J. V. N. Lakshmi

Authors

J. V. N. Lakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. V. N. Lakshmi .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lakshmi, J.V.N. (2018). Improving the Map and Shuffle Phases in Hadoop MapReduce. In: Satapathy, S., Bhateja, V., Das, S. (eds) Smart Computing and Informatics . Smart Innovation, Systems and Technologies, vol 77. Springer, Singapore. https://doi.org/10.1007/978-981-10-5544-7_21

Download citation

DOI: https://doi.org/10.1007/978-981-10-5544-7_21
Published: 21 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5543-0
Online ISBN: 978-981-10-5544-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics