An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster
Energy efficiency of a MapReduce system has become an essential part of infrastructure management in the field of big data analytics. Here, Hadoop scheduler plays a vital role in order to ensure the energy efficiency of the system. A handful of MapReduce scheduling algorithms have been proposed in the literature for slot-based Hadoop system (i.e., Hadoop 0.x and Hadoop 1.x) to minimize the overall energy consumption. However, YARN-based Hadoop schedulers have not been discussed much in the literature. In this paper, we design a scheduling model for Hadoop YARN architecture and formulate the energy efficient scheduling problem as an Integer Program. To solve the problem, we propose a Greedy scheduler which selects the best job with minimum energy consumption in each iteration. We evaluate the performance of the proposed algorithm against the FAIR and Capacity schedulers and find out that our greedy scheduler shows better results for both CPU- and I/O intensive workloads.
KeywordsMapReduce Scheduling Energy-efficiency
Authors would like to thank Ministry of Electronics and IT, Govt. of India for providing financial support to perform this work under the Visvesvaraya Ph.D. scheme.
- 1.Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010)Google Scholar
- 2.Welcome to Apache Pig! https://pig.apache.org/. Accessed 25 June 2018
- 3.Apache Hive TM. https://hive.apache.org/. Accessed 25 June 2018
- 4.Apache Mahout: Scalable machine learning and data mining. http://mahout.apache.org/. Accessed 25 June 2018
- 5.ZooKeeper. https://zookeeper.apache.org/doc/trunk/zookeeperOver.html. Accessed 25 June 2018
- 6.Shehabi, A., et al.: United States Data Center Energy Usage Report, June 2016Google Scholar
- 8.Bampis, E., Chau, V., Letsios, D., Lucarelli, G., Milis, I., Zois, G.: Energy efficient scheduling of MapReduce jobs. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 198–209. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_17CrossRefGoogle Scholar
- 11.Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European Conference on Computer Systems – EuroSys 2012, p. 43 (2012)Google Scholar
- 12.Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Green Computing Middleware on Proceedings of the 2nd International Workshop – GCM 2011, pp. 1–6 (2011)Google Scholar
- 13.Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. (1), 1 (2015) Google Scholar
- 14.Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing - ICAC 2011, p. 235 (2011)Google Scholar