MapReduce for Cloud Computing

Chen, Quan; Guo, Minyi

doi:10.1007/978-981-10-6238-4_7

MapReduce for Cloud Computing

Quan Chen³ &
Minyi Guo³

Chapter
First Online: 25 November 2017

884 Accesses
3 Citations

Abstract

Cloud computing and Big data have attracted serious attention from both researchers and public users. For Cloud computing and Big data, MapReduce is one of the most widely-used scheduling model that automatically divides a job into a large amount of fine-grain tasks, distributes the tasks to the computational servers, and aggregates the partial results from all the tasks to be the final results. It naturally fits the requirement of processing a large amount of data in parallel. However, the performance of MapReduce is often seriously damaged by several straggler tasks that run far slower than other tasks in heterogeneous environments where the servers have different computational ability. To this end, in this chapter, we discuss the ways to improve the performance of MapReduce in heterogeneous environments. Specifically, we propose a Self-Adaptive MapReduce (SAMR) scheduling policy that can precisely identify the straggler tasks and boot their execution. Experiments on a real-system heterogeneous cluster prove that the proposed technique can significantly improve the performance of MapReduce applications without any program modification.

Part of contents in this chapter has been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer. Figures 7.1 and 7.4 in this chapter have been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A MapReduce scheduler is a scheduler that schedules map and reduce tasks.

References

A. Aboulnaga, Z. Wang, and Z.Y. Zhang. Packing the most onto your cloud. In Proceeding of the first international workshop on Cloud data management, pages 25–28. ACM, 2009.
Google Scholar
F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. Tarazu: Optimizing mapreduce on heterogeneous clusters. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 61–74, New York, NY, USA, 2012. ACM.
Google Scholar
L.A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2):22–28, 2003.
Google Scholar
H. S. Bhosale and D. P. Gadekar. Big data processing using hadoop: Survey on scheduling. International Journal of Science and Research (IJSR), 3(10):272–277, 2014.
Google Scholar
R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6):599–616, 2009.
Google Scholar
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006), 2006.
Google Scholar
J. Chauhan, D. Makaroff, and W. Grassmann. The impact of capacity scheduler configuration settings on mapreduce jobs. In Cloud and Green Computing (CGC), 2012 Second International Conference on, pages 667–674. IEEE, 2012.
Google Scholar
R. Chen, H. Chen, and B. Zang. Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 523–534. ACM, 2010.
Google Scholar
Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. HAT: history-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3):1038–1054, 2013.
Google Scholar
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA 2007: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13–24, Washington, DC, USA, 2007. IEEE Computer Society.
Google Scholar
M. De Kruijf and K. Sankaralingam. MapReduce for the cell broadband engine architecture. IBM Journal of Research and Development, 53(5):10, 2010.
Google Scholar
J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.
Google Scholar
P. Elespuru, S. Shakya, and S. Mishra. Mapreduce system over heterogeneous mobile devices. Software Technologies for Embedded and Ubiquitous Systems, pages 168–179, 2009.
Google Scholar
W. Fang, B. He, Q. Luo, and N.K. Govindaraju. Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems, 2010.
Google Scholar
M.J. Fischer, X. Su, and Y. Yin. Assigning tasks for efficiency in Hadoop. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, pages 30–39. ACM, 2010.
Google Scholar
Hadoop. Hadoop home page. http://hadoop.apache.org/, 2011.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplied data processing on large clusters. In OSDI 2004: Proceedings of 6th Symposium on Operating System Design and Implemention, pages 137–150, New York, 2004. ACM Press.
Google Scholar
W. Jiang, V.T. Ravi, and G. Agrawal. A Map-Reduce System with an Alternate API for Multi-core Environments. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pages 84–93. IEEE, 2010.
Google Scholar
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving mapreduce performance in heterogeneous environments. In 8th Usenix Symposium on Operating Systems Design and Implementation, pages 29–42, New York, 2008. ACM Press.
Google Scholar
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009-55, EECS Department, University of California, Berkeley, Apr 2009.
Google Scholar
K. Morton, M. Balazinska, and D. Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In Proceedings of the 2010 international conference on Management of data, pages 507–518. ACM, 2010.
Google Scholar
J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley. Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters. In 39th International Conference on Parallel Processing (ICPP2010). San Diego, CA, USA, 2010.
Google Scholar
M.M. Rafique, B. Rose, A.R. Butt, and D.S. Nikolopoulos. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–12. IEEE, 2009.
Google Scholar
T. Sandholm and K. Lai. Dynamic proportional share scheduling in hadoop. In Job Scheduling Strategies for Parallel Processing, pages 110–131. Springer, 2010.
Google Scholar
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP 2003: Proceedings of the 9th ACM Symposium on Operating Systems Principles, pages 29–43, New York, NY, USA, 2003. ACM.
Google Scholar
M.C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363, 2009.
Google Scholar
Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, pages 93–102. ACM, 2010.
Google Scholar
C. Tian, H. Zhou, Y. He, and L. Zha. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing-Volume 00, pages 218–224. IEEE Computer Society, 2009.
Google Scholar
L.M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A break in the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication Review, 39(1):50–55, 2008.
Google Scholar
J. Varia. Cloud architectures. White Paper of Amazon, http://jineshvaria.s3.amazonaws.com/public/cloudarchitectures-varia.pdf, 2008.
R.M. Yoo, A. Romano, and C. Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 198–207. IEEE, 2009.
Google Scholar
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical report, Technical Report UCB/EECS-2009-55, University of California at Berkeley, 2009.
Google Scholar
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems, pages 265–278, Paris, France, 2010. ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Quan Chen & Minyi Guo

Authors

Quan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Chen .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Q., Guo, M. (2017). MapReduce for Cloud Computing. In: Task Scheduling for Multi-core and Parallel Architectures. Springer, Singapore. https://doi.org/10.1007/978-981-10-6238-4_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-6238-4_7
Published: 25 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6237-7
Online ISBN: 978-981-10-6238-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics