Abstract
MapReduce is now a significant parallel processing model for large-scale data-intensive applications using clusters with commodity hardware. Scheduling of jobs and tasks, and identification of TaskTrackers which are slow in Hadoop clusters are the focus research in the recent years. MapReduce performance is currently limited by its default scheduler, which does not adapt well in heterogeneous environments. In this paper, we propose a scheduling method to identify the TaskTrackers which are running slowly in map and reduce phases of the MapReduce framework in a heterogeneous Hadoop cluster. The proposed method is integrated with the MapReduce default scheduling algorithm. The performance of this method is compared with the unmodified MapReduce default scheduler. We observe that the proposed approach shows improvements in performance to the default scheduler in the heterogeneous environments. Performance improvement was observed as the overall job execution times for different workloads from HiBench benchmark suite were reduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Dawei, J., Beng, C.O., Lei, S., Sai, W.: The performance of MapReduce: an in-depth study. VLDB 19, 1–2 (2010)
Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing, pp. 218–224 (2009)
Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. Canada (2011)
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job Scheduling for Multi-user MapReduce clusters. Technical Report, University of California, Berkeley (2009)
Chen, Q., Zhang, D., Guo, M., Deng, Q., Guo, S.: SAMR: A self adaptive MapReduce scheduling algorithm in heterogeneous environment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology, pp. 2736–2743. Washington, USA (2010)
Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. Technical Report, IBM T. J. Watson Research Center, New York (2011)
Rasooli, A., Down, D.G.: A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In: Proceeding of the 5th Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1284–1291 (2012)
Nanduri, R., Maheshwari, N., Reddyraja, A., Varma, V.: Job aware scheduling algorithm for MapReduce framework. In: Proceedings of the 3rd International Conference on Cloud Computing Technology and Science, pp. 724–729, Washington, USA (2011)
Naik, N.S., Negi, A., Sastry, V.N.: A review of adaptive approaches to MapReduce scheduling in heterogeneous environments. In: IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 677–683. Delhi, India (2014)
Zhenhua, G., Geo, R.F., Zhou, M., Yang, R.: Improving resource utilization in MapReduce. In; IEEE International Conference on Cluster Computing, pp. 402–410 (2012)
Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. J. Future Gener. Comput. Syst. 36, 1–15 (2014)
Shengsheng, H., Jie, H., Jinquan, D., Tao, X., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops, pp. 41–51 (2010)
Acknowledgments
Nenavath Srinivas Naik expresses his gratitude to Prof. P.A. Sastry (Principal), Prof. J. Prasanna Kumar (Head of the CSE Department), and Dr. B. Sandhya, MVSR Engineering College, Hyderabad, India for hosting the experimental test bed.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Naik, N.S., Negi, A., Sastry, V.N. (2016). Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments. In: Satapathy, S., Raju, K., Mandal, J., Bhateja, V. (eds) Proceedings of the Second International Conference on Computer and Communication Technologies. Advances in Intelligent Systems and Computing, vol 380. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2523-2_21
Download citation
DOI: https://doi.org/10.1007/978-81-322-2523-2_21
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2522-5
Online ISBN: 978-81-322-2523-2
eBook Packages: EngineeringEngineering (R0)