Abstract
Cloud computing and Big data have attracted serious attention from both researchers and public users. For Cloud computing and Big data, MapReduce is one of the most widely-used scheduling model that automatically divides a job into a large amount of fine-grain tasks, distributes the tasks to the computational servers, and aggregates the partial results from all the tasks to be the final results. It naturally fits the requirement of processing a large amount of data in parallel. However, the performance of MapReduce is often seriously damaged by several straggler tasks that run far slower than other tasks in heterogeneous environments where the servers have different computational ability. To this end, in this chapter, we discuss the ways to improve the performance of MapReduce in heterogeneous environments. Specifically, we propose a Self-Adaptive MapReduce (SAMR) scheduling policy that can precisely identify the straggler tasks and boot their execution. Experiments on a real-system heterogeneous cluster prove that the proposed technique can significantly improve the performance of MapReduce applications without any program modification.
Part of contents in this chapter has been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer. Figures 7.1 and 7.4 in this chapter have been published through The Journal of Supercomputing. Reprinted from Ref. [9], with permission from Springer.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A MapReduce scheduler is a scheduler that schedules map and reduce tasks.
References
A. Aboulnaga, Z. Wang, and Z.Y. Zhang. Packing the most onto your cloud. In Proceeding of the first international workshop on Cloud data management, pages 25–28. ACM, 2009.
F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. Tarazu: Optimizing mapreduce on heterogeneous clusters. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 61–74, New York, NY, USA, 2012. ACM.
L.A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2):22–28, 2003.
H. S. Bhosale and D. P. Gadekar. Big data processing using hadoop: Survey on scheduling. International Journal of Science and Research (IJSR), 3(10):272–277, 2014.
R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6):599–616, 2009.
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006), 2006.
J. Chauhan, D. Makaroff, and W. Grassmann. The impact of capacity scheduler configuration settings on mapreduce jobs. In Cloud and Green Computing (CGC), 2012 Second International Conference on, pages 667–674. IEEE, 2012.
R. Chen, H. Chen, and B. Zang. Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 523–534. ACM, 2010.
Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. HAT: history-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3):1038–1054, 2013.
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA 2007: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13–24, Washington, DC, USA, 2007. IEEE Computer Society.
M. De Kruijf and K. Sankaralingam. MapReduce for the cell broadband engine architecture. IBM Journal of Research and Development, 53(5):10, 2010.
J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.
P. Elespuru, S. Shakya, and S. Mishra. Mapreduce system over heterogeneous mobile devices. Software Technologies for Embedded and Ubiquitous Systems, pages 168–179, 2009.
W. Fang, B. He, Q. Luo, and N.K. Govindaraju. Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems, 2010.
M.J. Fischer, X. Su, and Y. Yin. Assigning tasks for efficiency in Hadoop. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, pages 30–39. ACM, 2010.
Hadoop. Hadoop home page. http://hadoop.apache.org/, 2011.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplied data processing on large clusters. In OSDI 2004: Proceedings of 6th Symposium on Operating System Design and Implemention, pages 137–150, New York, 2004. ACM Press.
W. Jiang, V.T. Ravi, and G. Agrawal. A Map-Reduce System with an Alternate API for Multi-core Environments. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pages 84–93. IEEE, 2010.
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving mapreduce performance in heterogeneous environments. In 8th Usenix Symposium on Operating Systems Design and Implementation, pages 29–42, New York, 2008. ACM Press.
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009-55, EECS Department, University of California, Berkeley, Apr 2009.
K. Morton, M. Balazinska, and D. Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In Proceedings of the 2010 international conference on Management of data, pages 507–518. ACM, 2010.
J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley. Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters. In 39th International Conference on Parallel Processing (ICPP2010). San Diego, CA, USA, 2010.
M.M. Rafique, B. Rose, A.R. Butt, and D.S. Nikolopoulos. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–12. IEEE, 2009.
T. Sandholm and K. Lai. Dynamic proportional share scheduling in hadoop. In Job Scheduling Strategies for Parallel Processing, pages 110–131. Springer, 2010.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP 2003: Proceedings of the 9th ACM Symposium on Operating Systems Principles, pages 29–43, New York, NY, USA, 2003. ACM.
M.C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363, 2009.
Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, pages 93–102. ACM, 2010.
C. Tian, H. Zhou, Y. He, and L. Zha. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing-Volume 00, pages 218–224. IEEE Computer Society, 2009.
L.M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A break in the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication Review, 39(1):50–55, 2008.
J. Varia. Cloud architectures. White Paper of Amazon, http://jineshvaria.s3.amazonaws.com/public/cloudarchitectures-varia.pdf, 2008.
R.M. Yoo, A. Romano, and C. Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 198–207. IEEE, 2009.
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical report, Technical Report UCB/EECS-2009-55, University of California at Berkeley, 2009.
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems, pages 265–278, Paris, France, 2010. ACM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Chen, Q., Guo, M. (2017). MapReduce for Cloud Computing. In: Task Scheduling for Multi-core and Parallel Architectures. Springer, Singapore. https://doi.org/10.1007/978-981-10-6238-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-6238-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6237-7
Online ISBN: 978-981-10-6238-4
eBook Packages: Computer ScienceComputer Science (R0)