Skip to main content

MapReduce for Cloud Computing

  • Chapter
  • First Online:

Abstract

Cloud computing and Big data have attracted serious attention from both researchers and public users. For Cloud computing and Big data, MapReduce is one of the most widely-used scheduling model that automatically divides a job into a large amount of fine-grain tasks, distributes the tasks to the computational servers, and aggregates the partial results from all the tasks to be the final results. It naturally fits the requirement of processing a large amount of data in parallel. However, the performance of MapReduce is often seriously damaged by several straggler tasks that run far slower than other tasks in heterogeneous environments where the servers have different computational ability. To this end, in this chapter, we discuss the ways to improve the performance of MapReduce in heterogeneous environments. Specifically, we propose a Self-Adaptive MapReduce (SAMR) scheduling policy that can precisely identify the straggler tasks and boot their execution. Experiments on a real-system heterogeneous cluster prove that the proposed technique can significantly improve the performance of MapReduce applications without any program modification.

Part of contents in this chapter has been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer. Figures 7.1 and 7.4 in this chapter have been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A MapReduce scheduler is a scheduler that schedules map and reduce tasks.

References

  1. A. Aboulnaga, Z. Wang, and Z.Y. Zhang. Packing the most onto your cloud. In Proceeding of the first international workshop on Cloud data management, pages 25–28. ACM, 2009.

    Google Scholar 

  2. F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. Tarazu: Optimizing mapreduce on heterogeneous clusters. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 61–74, New York, NY, USA, 2012. ACM.

    Google Scholar 

  3. L.A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2):22–28, 2003.

    Google Scholar 

  4. H. S. Bhosale and D. P. Gadekar. Big data processing using hadoop: Survey on scheduling. International Journal of Science and Research (IJSR), 3(10):272–277, 2014.

    Google Scholar 

  5. R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6):599–616, 2009.

    Google Scholar 

  6. F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006), 2006.

    Google Scholar 

  7. J. Chauhan, D. Makaroff, and W. Grassmann. The impact of capacity scheduler configuration settings on mapreduce jobs. In Cloud and Green Computing (CGC), 2012 Second International Conference on, pages 667–674. IEEE, 2012.

    Google Scholar 

  8. R. Chen, H. Chen, and B. Zang. Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 523–534. ACM, 2010.

    Google Scholar 

  9. Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. HAT: history-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3):1038–1054, 2013.

    Google Scholar 

  10. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA 2007: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13–24, Washington, DC, USA, 2007. IEEE Computer Society.

    Google Scholar 

  11. M. De Kruijf and K. Sankaralingam. MapReduce for the cell broadband engine architecture. IBM Journal of Research and Development, 53(5):10, 2010.

    Google Scholar 

  12. J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.

    Google Scholar 

  13. P. Elespuru, S. Shakya, and S. Mishra. Mapreduce system over heterogeneous mobile devices. Software Technologies for Embedded and Ubiquitous Systems, pages 168–179, 2009.

    Google Scholar 

  14. W. Fang, B. He, Q. Luo, and N.K. Govindaraju. Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems, 2010.

    Google Scholar 

  15. M.J. Fischer, X. Su, and Y. Yin. Assigning tasks for efficiency in Hadoop. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, pages 30–39. ACM, 2010.

    Google Scholar 

  16. Hadoop. Hadoop home page. http://hadoop.apache.org/, 2011.

  17. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplied data processing on large clusters. In OSDI 2004: Proceedings of 6th Symposium on Operating System Design and Implemention, pages 137–150, New York, 2004. ACM Press.

    Google Scholar 

  18. W. Jiang, V.T. Ravi, and G. Agrawal. A Map-Reduce System with an Alternate API for Multi-core Environments. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pages 84–93. IEEE, 2010.

    Google Scholar 

  19. Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving mapreduce performance in heterogeneous environments. In 8th Usenix Symposium on Operating Systems Design and Implementation, pages 29–42, New York, 2008. ACM Press.

    Google Scholar 

  20. Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009-55, EECS Department, University of California, Berkeley, Apr 2009.

    Google Scholar 

  21. K. Morton, M. Balazinska, and D. Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In Proceedings of the 2010 international conference on Management of data, pages 507–518. ACM, 2010.

    Google Scholar 

  22. J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley. Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters. In 39th International Conference on Parallel Processing (ICPP2010). San Diego, CA, USA, 2010.

    Google Scholar 

  23. M.M. Rafique, B. Rose, A.R. Butt, and D.S. Nikolopoulos. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–12. IEEE, 2009.

    Google Scholar 

  24. T. Sandholm and K. Lai. Dynamic proportional share scheduling in hadoop. In Job Scheduling Strategies for Parallel Processing, pages 110–131. Springer, 2010.

    Google Scholar 

  25. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP 2003: Proceedings of the 9th ACM Symposium on Operating Systems Principles, pages 29–43, New York, NY, USA, 2003. ACM.

    Google Scholar 

  26. M.C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363, 2009.

    Google Scholar 

  27. Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, pages 93–102. ACM, 2010.

    Google Scholar 

  28. C. Tian, H. Zhou, Y. He, and L. Zha. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing-Volume 00, pages 218–224. IEEE Computer Society, 2009.

    Google Scholar 

  29. L.M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A break in the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication Review, 39(1):50–55, 2008.

    Google Scholar 

  30. J. Varia. Cloud architectures. White Paper of Amazon, http://jineshvaria.s3.amazonaws.com/public/cloudarchitectures-varia.pdf, 2008.

  31. R.M. Yoo, A. Romano, and C. Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 198–207. IEEE, 2009.

    Google Scholar 

  32. M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical report, Technical Report UCB/EECS-2009-55, University of California at Berkeley, 2009.

    Google Scholar 

  33. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems, pages 265–278, Paris, France, 2010. ACM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Chen .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, Q., Guo, M. (2017). MapReduce for Cloud Computing. In: Task Scheduling for Multi-core and Parallel Architectures. Springer, Singapore. https://doi.org/10.1007/978-981-10-6238-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6238-4_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6237-7

  • Online ISBN: 978-981-10-6238-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics