Abstract
Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proficient analogous computing programming representation, MapReduce and Hadoop are employed for extensive data analysis applications. However, MapReduce still suffers with performance problems and MapReduce uses a shuffle phase as a featured element for logical I/O strategy. The map phase requires an improvement in its performance as this phase’s output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate checkpoints which regularly monitor all the splits generated by intermediate phases. MapReduce model is designed in a way that there is a need to wait until all maps accomplish their given task. This acts as a barrier for effective resource utilization. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling, and increase resource utilization in a cluster.
References
Arulmurugan, A., Srinivasan, R.: Enhanced task scheduling scheme for Hadoop Map Reduce systems. In: IJETCSE, May 2015
Dimitris, F., Ioannis, M.: Scheduling Map Reduce Jobs and Data Shuffle on Unrelated Process. MIT, Cambridge (2015)
Pavloet, A.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD, vol. 5, pp. 367–378 (2009)
Yandong, W., Yu, W., Que, X.: Virtual shuffling for efficient data movement in Map Reduce. In: IEEE Transitions on Computers Conference, June 2015
Luiz, A.B., Jeffrey, D., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Huston, L., Wickremesinghe, R., SatyaNarayana, M.: Storage architecture for early discard in interactive search. In: FAST Conference Proceedings (2004)
Dean, J., Ghemawat, S.: Map Reduce: simplified data processing on large clusters in Google, Inc OSDI (2004).
Lakshmi, J.V.N., Ananthi, S.: A theoretical model for big data analytics using machine learning algorithms. In: ICACCI Conference, Delhi, October 2015
Kwon, Y.C., Howe, B.: A study of skew in Map Reduce application. In: International Conference, USA (2014)
Alan, F.G., Olga, N., Shubham, C., Pradeep, K., Shravan, M.N.: Building a high level dataflow system on top of Map Reduce: the pig experience. In: IEEE Conference (2009)
Yanfei, G., Jia, R., Xiaobo, Z: IShuffle—improving Hadoop performance with shuffle-on-write. In: USENIX ICAC, USA (2013)
Abouzeid, A., Bajda, P., Abadi, D.J., Rasin, A., et al.: HadoopDB: an architectural hybrid of Map Reduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Ananthi, S., Lakshmi, J.V.N.: A study on Hadoop architecture for big data analytics. In: Delhi Conference ICETSCET, September 2014
Herodotos, H., Lim, H., Luo, G.: StarFish—a self tuning system for Big Data Analytics, CIDR, USA (2011)
Ronnie, C., et al.: SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of VLDB (2008)
Ashish, T., Joy deep Sen, S.: HIVE—a warehousing solution over a Map Reduce framework. In: VLDB (2009)
Li, J., Ye, Y.: Improving the shuffle of Hadoop Map Reduce. In: Proceedings of IEEE ICCCTS (2013)
Li, J., Yue, Y., Lin, X.: Improving the shuffle of Hadoop Map Reduce. In: IEEE ICCCTS, Beijing, China (2013)
Prateek, D., Sriram, K., Janakiram, D.: Chisel: resource savvy approach for handling skew in Map Reduce application. In: IEEE Conference on Cloud Computing, vol. 35, pp. 45–56 (2013)
Dean, J., Ghemawat, S.: Map Reduce: a flexible data processing tool. ACM Commun. 53, 72–77 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lakshmi, J.V.N. (2018). Improving the Map and Shuffle Phases in Hadoop MapReduce. In: Satapathy, S., Bhateja, V., Das, S. (eds) Smart Computing and Informatics . Smart Innovation, Systems and Technologies, vol 77. Springer, Singapore. https://doi.org/10.1007/978-981-10-5544-7_21
Download citation
DOI: https://doi.org/10.1007/978-981-10-5544-7_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5543-0
Online ISBN: 978-981-10-5544-7
eBook Packages: EngineeringEngineering (R0)