Skip to main content

Improving the Map and Shuffle Phases in Hadoop MapReduce

  • Conference paper
  • First Online:
Smart Computing and Informatics

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 77))

  • 1282 Accesses

Abstract

Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proficient analogous computing programming representation, MapReduce and Hadoop are employed for extensive data analysis applications. However, MapReduce still suffers with performance problems and MapReduce uses a shuffle phase as a featured element for logical I/O strategy. The map phase requires an improvement in its performance as this phase’s output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate checkpoints which regularly monitor all the splits generated by intermediate phases. MapReduce model is designed in a way that there is a need to wait until all maps accomplish their given task. This acts as a barrier for effective resource utilization. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling, and increase resource utilization in a cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Arulmurugan, A., Srinivasan, R.: Enhanced task scheduling scheme for Hadoop Map Reduce systems. In: IJETCSE, May 2015

    Google Scholar 

  2. Dimitris, F., Ioannis, M.: Scheduling Map Reduce Jobs and Data Shuffle on Unrelated Process. MIT, Cambridge (2015)

    Google Scholar 

  3. Pavloet, A.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD, vol. 5, pp. 367–378 (2009)

    Google Scholar 

  4. Yandong, W., Yu, W., Que, X.: Virtual shuffling for efficient data movement in Map Reduce. In: IEEE Transitions on Computers Conference, June 2015

    Google Scholar 

  5. Luiz, A.B., Jeffrey, D., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  6. Huston, L., Wickremesinghe, R., SatyaNarayana, M.: Storage architecture for early discard in interactive search. In: FAST Conference Proceedings (2004)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: Map Reduce: simplified data processing on large clusters in Google, Inc OSDI (2004).

    Google Scholar 

  8. Lakshmi, J.V.N., Ananthi, S.: A theoretical model for big data analytics using machine learning algorithms. In: ICACCI Conference, Delhi, October 2015

    Google Scholar 

  9. Kwon, Y.C., Howe, B.: A study of skew in Map Reduce application. In: International Conference, USA (2014)

    Google Scholar 

  10. Alan, F.G., Olga, N., Shubham, C., Pradeep, K., Shravan, M.N.: Building a high level dataflow system on top of Map Reduce: the pig experience. In: IEEE Conference (2009)

    Google Scholar 

  11. Yanfei, G., Jia, R., Xiaobo, Z: IShuffle—improving Hadoop performance with shuffle-on-write. In: USENIX ICAC, USA (2013)

    Google Scholar 

  12. Abouzeid, A., Bajda, P., Abadi, D.J., Rasin, A., et al.: HadoopDB: an architectural hybrid of Map Reduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)

    Google Scholar 

  13. Ananthi, S., Lakshmi, J.V.N.: A study on Hadoop architecture for big data analytics. In: Delhi Conference ICETSCET, September 2014

    Google Scholar 

  14. Herodotos, H., Lim, H., Luo, G.: StarFish—a self tuning system for Big Data Analytics, CIDR, USA (2011)

    Google Scholar 

  15. Ronnie, C., et al.: SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of VLDB (2008)

    Google Scholar 

  16. Ashish, T., Joy deep Sen, S.: HIVE—a warehousing solution over a Map Reduce framework. In: VLDB (2009)

    Google Scholar 

  17. Li, J., Ye, Y.: Improving the shuffle of Hadoop Map Reduce. In: Proceedings of IEEE ICCCTS (2013)

    Google Scholar 

  18. Li, J., Yue, Y., Lin, X.: Improving the shuffle of Hadoop Map Reduce. In: IEEE ICCCTS, Beijing, China (2013)

    Google Scholar 

  19. Prateek, D., Sriram, K., Janakiram, D.: Chisel: resource savvy approach for handling skew in Map Reduce application. In: IEEE Conference on Cloud Computing, vol. 35, pp. 45–56 (2013)

    Google Scholar 

  20. Dean, J., Ghemawat, S.: Map Reduce: a flexible data processing tool. ACM Commun. 53, 72–77 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. V. N. Lakshmi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lakshmi, J.V.N. (2018). Improving the Map and Shuffle Phases in Hadoop MapReduce. In: Satapathy, S., Bhateja, V., Das, S. (eds) Smart Computing and Informatics . Smart Innovation, Systems and Technologies, vol 77. Springer, Singapore. https://doi.org/10.1007/978-981-10-5544-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5544-7_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5543-0

  • Online ISBN: 978-981-10-5544-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics