Abstract
MapReduce is a programming model for parallel distributed processing of large-scale data. Hadoop framework is an implementation of MapReduce. Since MapReduce processes data parallel on clusters of nodes, there is a need to have a good scheduling technique to optimize performance. Performance of MapReduce scheduling depends upon various points like execution time, resource utilization across the cluster, data locality, compute capacity, energy efficiency, heterogeneity, scaling, etc. Researchers have developed various algorithms to resolve some or the other problem and reach a near-optimal solution. This paper summarizes most of the research work done in this regard.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
MapReduce Tutorial. http://hadoop.apache.org/docs/
Hammoud, M., Sakr, F.M.: Locality-aware reduce task scheduling for MapReduce. In: Proceeding CLOUDCOM IEEE 3rd International Conference on Cloud Computing Technology and Science, pp. 570–576 (2011)
Guo, Z., Fox, G.: Improving MapReduce performance in heterogeneous network environments and resource utilization. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 714–716 (2012)
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. Job Sched. Strat. Parallel Process. Lect. Notes V6253, 110–131 (2010)
Song, G., Yu, L., Meng, Z., Lin, X.: A game theory based MapReduce scheduling algorithm. Emerg. Technol. Inf. Syst. Comput. Manage. Lect. Notes Electr. Eng. 236, 287–296 (2013)
Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.N.: Tarazu: optimizing MapReduce on heterogeneous clusters. In: ASPLOS XVII International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61–74 (2012)
Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: GCM 2nd International Workshop, pp. 1–6 (2011)
Wolf, J., Balmin, A., Rajan, D., Hildrum, K., Khandekar, R., Parekh, S., Wu, K.-L., Vernica, R.: On the optimization of schedules for MapReduce workloads in the presence of shared scans. VLDB J. 21(5), 589–609 (2012)
Zaharia, M., Konwinski, A., Joseph, D.A., Katz, H.R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceeding OSDI 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)
Phan, T.X.L., Zhang, Z., Loo, T.B., Lee, I.: Real-time MapReduce scheduling. Technical Report, University of Pennsylvania Department of Computer and Information Science
Luo, Y., Plale, B.: Hierarchical MapReduce programming model and scheduling algorithms. In:12th IEEE International Symposium on Cluster, Cloud and Grid Computing (2012)
Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Wu, S.: Maestro: replica-aware map scheduling for MapReduce, In: Proceeding CCGRID 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 435–442 (2012)
Tan, J., Meng, X., Zhang, L.: Coupling task progress for MapReduce resource-aware scheduling. In: INFOCOM pp. 1618–1626 (2013)
Bu, X., Rao, J., Xu, C.-Z.,: Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: Proceeding HPDC 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 227–238 (2013)
Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.-L., Balmin, A.: FLEX: a slot allocation scheduling optimizer for MapReduce workloads. In: Middleware ACM/IFIP/USENIX 11th International Conference on Middleware Archive, pp. 1–20 (2010)
Sharma, B., Prabhakar, R., Lim, S.-H., Kandemir, T.M., Das, R.C.: MROrchestrator: a fine-grained resource orchestration framework for MapReduce clusters. In: IEEE Fifth International Conference on Cloud Computing, pp. 1–8 (2012)
Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N.: HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In: IEEE International Conference on Cloud Computing, pp. 184–191 (2014)
Zhang, Q., Zhani, F.M., Yang, Y., Boutaba, R., Wong, B.: PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans. Cloud Comput. 3(2), 182–194 (2015)
Chen, C.-H., Lin, J.-W., Kuo, S.-Y.: Deadline-constrained MapReduce scheduling based on graph modelling. In: IEEE 7th International Conference, pp. 416–423 (2014)
Wang, Y., Shi, W. Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds. IEEE Trans. Cloud Comput. 306–319 (2014)
Tang, Z., Zhou, J., Li, K., Li, R.: MTSD: a task scheduling algorithm for MapReduce base on deadline constraints. In: 8th International Conference on Semantics, Knowledge and Grids, pp. 2012–2018 (2012)
Liu, L., Zhou, Y., Liu, M., Xu, G., Chen, X., Fan, D., Wang, Q.: Preemptive Hadoop jobs scheduling under a deadline. In: 8th International Conference on Semantics, Knowledge, Grids, pp. 72–79 (2012)
Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: IEEE Second International Conference on Cloud Computing Technology and Science, pp. 388–392 (2010)
Lai, Z.-R., Chang, C.-W., Liu, X., Kuo, T.-W., Hsiu, P.-C.: Deadline-aware load balancing for MapReduce. In: IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), 20–22 Aug 2014, pp. 1–10 (2014)
Sun, X.: Thesis on—an enhanced self adaptive MapReduce scheduling algorithm. In: The Graduate College at the University of Nebraska (2012)
Mashayekhy, L., Nejad, M.M., Grosu, D., Lu, D., Shi, W.: Energy-aware scheduling of MapReduce jobs. In: IEEE International Congress on Big Data, pp. 32–39 (2014)
Dong, X., Wang, Y., Liao, H.: Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: IEEE 17th International Conference on Parallel and Distributed Systems, pp. 9–16 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gaur, M., Minocha, B., Muttoo, S.K. (2018). A Study of Factors Affecting MapReduce Scheduling. In: Aggarwal, V., Bhatnagar, V., Mishra, D. (eds) Big Data Analytics. Advances in Intelligent Systems and Computing, vol 654. Springer, Singapore. https://doi.org/10.1007/978-981-10-6620-7_27
Download citation
DOI: https://doi.org/10.1007/978-981-10-6620-7_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6619-1
Online ISBN: 978-981-10-6620-7
eBook Packages: EngineeringEngineering (R0)