Skip to main content

Robustness Comparison of Scheduling Algorithms in MapReduce Framework

  • Conference paper
  • First Online:
  • 2360 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 283))

Abstract

Parallel computing is the fundamental base for MapReduce framework in Hadoop. A big data is split into small data chunks, where Map task is referred to processing a data chunk. Each data chunk is replicated over three servers in Hadoop for increasing the availability of data and decreasing the probability of data loss. As a result, the three servers that have the Map task stored on their disk are the fastest servers to process them, which are called local servers. All the servers in the same rack as local servers are called rack-local servers that are slower than local servers in processing Map tasks since the data chunk associated with the Map task should be fetched through the top of the rack switch. All the other servers are called remote servers that are the slowest servers for processing a Map task since they need to fetch data from a local server in another rack, so data should be transmitted through at least two top of the rack switches and a core switch. Note that the number of switches in the path of data transfer depends on the internal network structure of data centers. The First-In-First-Out (FIFO) and Hadoop Fair Scheduler (HFS) algorithms do not take the rack structure of data centers into account, so they are known to not be heavy-traffic delay optimal or even throughput optimal. The recent advances on scheduling for data centers considering the rack structure of them and the heterogeneity of servers resulted in the state-of-the-art Balanced-PANDAS algorithm that outperforms the classic MaxWeight algorithm and its derivation, JSQ-MaxWeight algorithm. In both Balanced-PANDAS and MaxWeight algorithms, the processing rate of local, rack-local, and remote servers are assumed to be known. However, with the change of traffic over time in addition to estimation errors of processing rates, it is not realistic to consider the processing rates to be known. In this work, we study the robustness of Balanced-PANDAS and MaxWeight algorithms in terms of inaccurate estimations of processing rates. We observe that Balanced-PANDAS is not as sensitive as MaxWeight on the accuracy of processing rates, making it more appealing to use in data centers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This manuscript is archived in https://arxiv.org/abs/2004.06254.

References

  1. Alinia, B., Sadegh Talebi, M., Hajiesmaili, M.H., Yekkehkhany, A., Crespi, N.: Competitive online scheduling algorithms with applications in deadline-constrained EV charging. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–10. IEEE (2018)

    Google Scholar 

  2. Almalki, M.M., Chehardeh, M.I., Hatziadoniu, C.J.: Capacitor bank switching transient analysis using frequency dependent network equivalents. In: North American Power Symposium (NAPS), 2015. IEEE (2015)

    Google Scholar 

  3. Bell, S., Williams, R., et al.: Dynamic scheduling of a parallel server system in heavy traffic with complete resource pooling: asymptotic optimality of a threshold policy. Electr. J. Probab. 10, 1044–1115 (2005)

    MathSciNet  MATH  Google Scholar 

  4. Bell, S.L., Williams, R.J., et al.: Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: asymptotic optimality of a threshold policy. Ann. Appl. Probab. 11(3), 608–649 (2001)

    Article  MathSciNet  Google Scholar 

  5. Broder, A.: A taxonomy of web search. In: ACM SIGIR Forum, vol. 36, pp. 3–10. ACM (2002)

    Google Scholar 

  6. Byers, J.W., Considine, J., Mitzenmacher, M.: Geometric generalizations of the power of two choices. In: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 54–63. ACM (2004)

    Google Scholar 

  7. Chehardeh, M.I., Almalki, M.M., Hatziadoniu, C.J.: Remote feeder transfer between out-of-phase sources using STS. In: Power and Energy Conference at Illinois (PECI), 2016 IEEE. IEEE (2016)

    Google Scholar 

  8. Chehardeh, M.I., Hatziadoniu, C.J.: A systematic method for reliability index evaluation of distribution networks in the presence of distributed generators. In: 2018 North American Power Symposium (NAPS). IEEE (2018)

    Google Scholar 

  9. Clower, R., Leijonhufvud, A.: The coordination of economic activities: a Keynesian perspective. Am. Econ. Rev. 65(2), 182–188 (1975)

    Google Scholar 

  10. Cooper, C., Elsässer, R., Radzik, T.: The power of two choices in distributed voting. In: International Colloquium on Automata, Languages, and Programming, pp. 435–446. Springer, Berlin (2014)

    Google Scholar 

  11. Daghighi, A.: Application of an artificial neural network as a third-party database auditing system (2019)

    Google Scholar 

  12. Daghighi, A., Kavousi, M.: Scheduling for data centers with multi-level data locality. In: 2017 Iranian Conference on Electrical Engineering (ICEE), pp. 927–936. IEEE (2017)

    Google Scholar 

  13. Dahlgaard, S., Knudsen, M.B.T., Rotenberg, E., Thorup, M.: The power of two choices with simple tabulation. In: Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pp. 1631–1642. SIAM (2016)

    Google Scholar 

  14. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  15. Deilami, S., Masoum, A.S., Moses, P.S., Masoum, M.A.S: Real-time coordination of plug-in electric vehicle charging in smart grids to minimize power losses and improve voltage profile. IEEE Trans. Smart Grid 2(3), 456–467 (2011)

    Google Scholar 

  16. Doerr, B., Goldberg, L.A., Minder, L., Sauerwald, T., Scheideler, C.: Stabilizing consensus with the power of two choices. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 149–158. ACM (2011)

    Google Scholar 

  17. Eisenhauer, E.: In poor health: supermarket redlining and urban nutrition. GeoJournal 53(2), 125–133 (2001)

    Article  Google Scholar 

  18. Gan, L., Topcu, U., Low, S.H.: Optimal decentralized protocol for electric vehicle charging. IEEE Trans. Power Syst. 28(2), 940–951 (2013)

    Article  Google Scholar 

  19. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017)

    Article  MathSciNet  Google Scholar 

  20. Gast, N.: The power of two choices on graphs: the pair-approximation is accurate? ACM SIGMETRICS Perform. Eval. Rev. 43(2), 69–71 (2015)

    Article  Google Scholar 

  21. Harrison, J.M.: Heavy traffic analysis of a system with parallel servers: asymptotic optimality of discrete-review policies. Ann. Appl. Probab. 822–848 (1998)

    Google Scholar 

  22. Harrison, J.M., López, M.J.: Heavy traffic resource pooling in parallel-server systems. Queueing Syst. 33(4), 339–368 (1999)

    Article  MathSciNet  Google Scholar 

  23. He, C., Lu, Y., Swanson, D.: Matchmaking: a new MapReduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 40–47. IEEE (2011)

    Google Scholar 

  24. Hosseini, M., Jiang, Y., Yekkehkhany, A., Berlin, R.R., Sha, L.: A mobile geo-communication dataset for physiology-aware dash in rural ambulance transport. In: Proceedings of the 8th ACM on Multimedia Systems Conference. ACM (2017)

    Google Scholar 

  25. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 59–72. ACM (2007)

    Google Scholar 

  26. Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 261–276. ACM (2009)

    Google Scholar 

  27. Jin, J., Luo, J., Song, A., Dong, F., Xiong, R.: Bar: an efficient data locality driven task scheduling algorithm for cloud computing. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 295–304. IEEE Computer Society (2011)

    Google Scholar 

  28. Kavousi, M.: Affinity scheduling and the applications on data center scheduling with data locality. arXiv preprint arXiv:1705.03125 (2017)

  29. Kreimer, J.: Real-time system with homogeneous servers and nonidentical channels in steady-state. Comput. Oper. Res. 29(11), 1465–1473 (2002)

    Article  Google Scholar 

  30. Krishna, K.H., Rani, K.A.: Reducing the energy consumption energy-efficient query processing node in web search engines (2018)

    Google Scholar 

  31. Livny, M., Melman, M.: Load balancing in homogeneous broadcast distributed systems. In: ACM SIGMETRICS Performance Evaluation Review, vol. 11, pp. 47–55. ACM (1982)

    Google Scholar 

  32. Lowery, J.C., Collins, M.A., Schroeder, B.: Systems and methods for provisioning homogeneous servers. US Patent App. 11/562,921, 22 May 2008

    Google Scholar 

  33. Luczak, M.J., McDiarmid, C., et al.: On the power of two choices: balls and bins in continuous time. Ann. Appl. Probab. 15(3), 1733–1764 (2005)

    MathSciNet  MATH  Google Scholar 

  34. Lumetta, S., Mitzenmacher, M.: Using the power of two choices to improve bloom filters. Internet Math. 4(1), 17–33 (2007)

    Article  MathSciNet  Google Scholar 

  35. Mandelbaum, A., Stolyar, A.L.: Scheduling flexible servers with convex delay costs: heavy-traffic optimality of the generalized c\(\mu \)-rule. Oper. Res. 52(6), 836–855 (2004)

    Article  MathSciNet  Google Scholar 

  36. Meyn, S.: Stability and asymptotic optimality of generalized MaxWeight policies. SIAM J. Control Optim. 47(6), 3259–3294 (2009)

    Article  MathSciNet  Google Scholar 

  37. Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)

    Article  Google Scholar 

  38. Moaddeli, A., Ahmadi, I.N., Abhar, N.: The power of d choices in scheduling for data centers with heterogeneous servers. arXiv preprint arXiv:1904.00447 (2019)

  39. Musavi, N.: A game theoretical framework for the evaluation of unmanned aircraft systems airspace integration concepts. arXiv preprint arXiv:1904.08477 (2019)

  40. Musavi, N., Onural, D., Gunes, K., Yildiz, Y.: Unmanned aircraft systems airspace integration: a game theoretical framework for concept evaluations. J. Guidance Control Dyn. 40, 96–109 (2016)

    Article  Google Scholar 

  41. Musavi, N., Tekelioğlu, K.B., Yildiz, Y., Gunes, K., Onural, D.: A game theoretical modeling and simulation framework for the integration of unmanned aircraft systems in to the national airspace. In: AIAA Infotech@ Aerospace (2016)

    Google Scholar 

  42. Polo, J., et al.: Resource-aware adaptive scheduling for MapReduce clusters. In: ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 187–207. Springer, Berlin (2011)

    Google Scholar 

  43. Richa, A.W., Mitzenmacher, M., Sitaraman, R.: The power of two random choices: a survey of techniques and results. Comb. Optim. 9, 255–304 (2001)

    MathSciNet  MATH  Google Scholar 

  44. Saadatmand, S., Azizi, S., Kavousi, M., Wunsch, D.: Autonomous control of a line follower robot using a q-learning controller. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0556–0561. IEEE (2020)

    Google Scholar 

  45. Saadatmand, S., Kavousi, M., Azizi, S.: The voltage regulation of boost converters using dual heuristic programming. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0531–0536. IEEE (2020)

    Google Scholar 

  46. Saadatmand, S., Sanjarinia, M.S., Shamsi, P., Ferdowsi, M.: Dual heuristic dynamic programing control of grid-connected synchronverters. In: North American Power Symposium (NAPS), 2019, pp. 1–6 (2019)

    Google Scholar 

  47. Saadatmand, S., Sanjarinia, M.S., Shamsi, P., Ferdowsi, M., Wunsch, D.C.: Heuristic dynamic programming for adaptive virtual synchronous generators. In: North American Power Symposium (NAPS), 2019, pp. 1–6 (2019)

    Google Scholar 

  48. Sadiq, B., De Veciana, G.: Throughput optimality of delay-driven MaxWeight scheduler for a wireless system with flow dynamics. In: 47th Annual Allerton Conference on Communication, Control, and Computing, Allerton, pp. 1097–1102. IEEE (2009)

    Google Scholar 

  49. Salehi, M.: Optimal physiology-aware scheduling of clinical states in rural ambulance transport. In: 2017 International Conference on Inventive Computing and Informatics (ICICI), pp. 247–252. IEEE (2017)

    Google Scholar 

  50. Salehi, S., Du, J.T., Ashman, H.: Use of web search engines and personalisation in information searching for educational purposes. Inf. Res. Int. Electr. J. 23(2), n2 (2018)

    Google Scholar 

  51. Schwartz, C.: Web search engines. J. Am. Soc. Inf. Sci. 49(11), 973–982 (1998)

    Article  Google Scholar 

  52. Singh, V.P.: Two-server Markovian queues with balking: heterogeneous vs. homogeneous servers. Oper. Res. 18(1), 145–159 (1970)

    Article  MathSciNet  Google Scholar 

  53. Stolyar, A.L., et al.: MaxWeight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14(1), 1–53 (2004)

    Article  MathSciNet  Google Scholar 

  54. van de Ven, P., Borst, S., Shneer, S.: Instability of MaxWeight scheduling algorithms. In: INFOCOM 2009, pp. 1701–1709. IEEE (2009)

    Google Scholar 

  55. Van Mieghem J.A.: Dynamic scheduling with convex delay costs: the generalized c| mu rule. Ann. Appl. Probab. 809–833 (1995)

    Google Scholar 

  56. Wang, C.-S., Stielau, O.H., Covic, G.A.: Design considerations for a contactless electric vehicle battery charger. IEEE Trans. Ind. Electr. 52(5), 1308–1314 (2005)

    Article  Google Scholar 

  57. Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: A throughput optimal algorithm for map task scheduling in MapReduce with data locality. ACM SIGMETRICS Perform. Eval. Rev. 40(4), 33–42 (2013)

    Article  Google Scholar 

  58. Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: Maptask scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. (TON) 24(1), 190–203 (2016)

    Article  Google Scholar 

  59. Tom White: Hadoop: the definitive guide, yahoo (2010)

    Google Scholar 

  60. White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)

    Google Scholar 

  61. Winkler, F.: Consumerism in health care: beyond the supermarket model. Policy Politics 15(1), 1–8 (1987)

    Article  Google Scholar 

  62. Xie, Q., Lu, Y.: Priority algorithm for near-data scheduling: throughput and heavy-traffic optimality. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 963–972. IEEE (2015)

    Google Scholar 

  63. Xie, Q., Yekkehkhany, A., Lu, Y.: Scheduling with multi-level data locality: throughput and heavy-traffic optimality. In: INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE (2016)

    Google Scholar 

  64. Xie, X., Liu, Y., de Rijke, M., He, J., Zhang, M., Ma, S.: Why people search for images using web search engines. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 655–663. ACM (2018)

    Google Scholar 

  65. Yekkehkhany, A.: Near data scheduling for data centers with multi levels of data locality. Dissertation, University of Illinois at Urbana-Champaign (2017)

    Google Scholar 

  66. Yekkehkhany, A.: Risk-averse multi-armed bandits and game theory. Dissertation, University of Illinois at Urbana-Champaign (2020)

    Google Scholar 

  67. Yekkehkhany, A., Arian, E., Hajiesmaili, M., Nagi, R.: Risk-averse explore-then-commit algorithms for finite-time bandits. In: 2019 IEEE 58th Conference on Decision and Control (CDC) (2019)

    Google Scholar 

  68. Yekkehkhany, A., Arian, E., Nagi, R., Shomorony, I.: A cost-based analysis for risk-averse explore-then-commit finite-time bandits (2020)

    Google Scholar 

  69. Yekkehkhany, A., Hojjati, A., Hajiesmaili, M.H.: GB-PANDAS: throughput and heavy-traffic optimality analysis for affinity scheduling. ACM SIGMETRICS Perform. Eval. Rev. 45(2), 2–14 (2018)

    Article  Google Scholar 

  70. Yekkehkhany, A., Nagi, R.: Blind GB-PANDAS: a blind throughput-optimal load balancing algorithm for affinity scheduling. IEEE/ACM Trans. Netw. 28(3), 1199–1212 (2020)

    Article  Google Scholar 

  71. Yildiz, Y., Agogino, A., Brat, G.: Predicting pilot behavior in medium scale scenarios using game theory and reinforcement learning. In: AIAA Modeling and Simulation Technologies (MST) Conference, p. 4908 (2013)

    Google Scholar 

  72. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)

    Google Scholar 

  73. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Osdi, vol. 8, p. 7 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amirali Daghighi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Daghighi, A., Chen, J.Q. (2022). Robustness Comparison of Scheduling Algorithms in MapReduce Framework. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_30

Download citation

Publish with us

Policies and ethics