Predoop: Preempting Reduce Task for Job Execution Accelerations

Liang, Yi; Wang, Yufeng; Fan, Minglu; Zhang, Chen; Zhu, Yuqing

doi:10.1007/978-3-319-13021-7_13

Yi Liang¹⁶,
Yufeng Wang¹⁶,
Minglu Fan¹⁶,
Chen Zhang¹⁶ &
…
Yuqing Zhu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8807))

Included in the following conference series:

Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware

1545 Accesses
1 Citations

Abstract

Map/Reduce is a popular parallel processing framework for data intensive computing. For overlapping the Map task’s execution phase and the Reduce task’s intermediate data fetching and merging phase, existing Map/Reduce schedulers always pre-launch the Reduce task at the specific threshold where its map tasks have been launched, and this pattern incurs the occupation of the consuming resources of the reduce task during its idle time on waiting for fetching the intermediate data from map tasks. To address this issue, we propose an extension version of Hadoop map/reduce framework, called Predoop, in this paper. The basic idea of Predoop is to preempt the reduce task during its idle time and allocate the released resource to the map tasks on schedule. To achieve this goal, first, we introduce the preemptive mechanism for reduce tasks and map tasks respectively to enable Map/Reduce tasks to be preempted or resumed with correct status; second, we adopt the preempting-resuming model for the reduce task with the consideration of the progress of Reduce task data fetching & merging and the Map task execution so as to determine the timing of Reduce task preemption and resuming; third, we introduce the preemption-aware task scheduling strategy to allocate the released resources to the on-schedule Map tasks with the consideration of data locality. Experimental result demonstrates that Predoop outperforms Hadoop on various workload and the average job turnaround time can be reduced by maximum of 66.57 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, S., Schlosser, S.: Map-reduce meets wider varieties of applications. Technical report, IRP-TR-08-05 (2008)
Google Scholar
Dean, J., Ghemawat, A.: MapReduce: simplified data processing on large clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI04), May 2004, pp. 137–150. ACM Press (2004)
Google Scholar
Wang, Y.: Data dependency in map/reduce cluster. Technical report, BJUT-TR-14-01 (2014)
Google Scholar
Apache Hadoop. http://hadoop.apache.org/
https://github.com/SWIMProjectUCB/SWIM/wiki
Wang, L., Zhan, J., Luo, C., Zhu, Y.: Bigdatabench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA-14), pp. 21–32. ACM (2014)
Google Scholar
Chen, Y., Alspaugh, S., Katz, R.: Interactive query processing in big data systems: a cross-industry study of MapReduce workloads. In: Proceedings of the 38th International Conference on Very Large Data Bases (VLDB 2012), pp. 12–23. ACM (2012)
Google Scholar
Zaharia, M., Borthankur, D., Sarma, J.S.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the European Conference on Computer Systems (EuroSys’10), pp. 265–278. ACM (2010)
Google Scholar
Isard, M., Prabhakaran, V., Currey, J.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM Symposium on Operating Systems Principles (SIGOPS’09), pp. 261–276. ACM Press (2009)
Google Scholar
Zaharia, M., Borthakur, D., Sarma, J.S., et al.: Job scheduling for multi-user map/reduce clusters. Technical report, UCB-EECS-2009-55 (2009)
Google Scholar
Hammoud, M., Rehman, M. S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: International Conference on Cloud Computing (CLOUD), pp. 49–58. IEEE (2012)
Google Scholar
Ibrahim, S., Jin, H., Lu, L., et al.: Maestro: replica-aware map scheduling for MapReduce. In: International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 435–442. ACM/IEEE (2012)
Google Scholar
Tan, J., Meng, S., Meng, X., et al.: Improving ReduceTask data locality for sequential MapReduce jobs. In: International Conference on Computer Communications (INFOCOM), pp. 1627–1635. IEEE (2013)
Google Scholar
Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: optimizing MapReduce on heterogeneous clusters. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12), pp. 61–74. ACM (2012)
Google Scholar
Ananthanarayanan, G., Agarwal, S., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), pp. 18–28. ACM (2010)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: International Conference on Autonomic Computing (ICAC), pp. 235–244. ACM (2011)
Google Scholar
Wang, Y., Tan, J., Yu, W.: Preemptive ReduceTask scheduling for fair and fast job completion. In: Proceedings of the 10th International Conference on Automatic Computing (ICAC-13), pp. 45–56. ACM (2013)
Google Scholar

Download references

Acknowledgements

This work is supported by NSFC projects (Grants No. 60933003 and 61202075) and BNSF project (Grant No. 4133081).

Author information

Authors and Affiliations

Department of Computer Science, Beijing University of Technology, Beijing, China
Yi Liang, Yufeng Wang, Minglu Fan & Chen Zhang
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy Sciences, Beijing, China
Yuqing Zhu

Authors

Yi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minglu Fan
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Liang .

Editor information

Editors and Affiliations

ICT, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan
ICT, Chinese Academy of Sciences, Beijing, China
Rui Han
Shannon (IT) Lab., Huawei, China
Chuliang Weng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Y., Wang, Y., Fan, M., Zhang, C., Zhu, Y. (2014). Predoop: Preempting Reduce Task for Job Execution Accelerations. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-13021-7_13
Published: 11 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13020-0
Online ISBN: 978-3-319-13021-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics