Advertisement

Evaluating the Impact of Soft Walltimes on Job Scheduling Performance

  • Dalibor KlusáčekEmail author
  • Václav Chlumský
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11332)

Abstract

For two decades researchers have been analyzing the impact of inaccurate job walltime (runtime) estimates on the performance of job scheduling algorithms, especially in case of backfilling. Several studies analyzed the pros and cons of using accurate vs. inaccurate estimates. Some researchers focused on the ways users of the system can be motivated to provide more accurate runtime estimates. The recent addition of so-called “soft walltime” parameter in the widely used PBS Professional enables a system administrator to actually use some of these techniques to refine user-provided walltime estimates. The obvious question of a system administrator is whether such walltime predictions are useful and “safe” and what will be the impact on the overall system performance. In this work, we use several detailed simulations to analyze the actual impact of using soft walltimes in a job scheduler, discussing the scenarios when such “refined” estimates can be meaningfully used.

Keywords

Job Scheduling Backfilling Walltime estimate Soft walltime 

Notes

Acknowledgments

We kindly acknowledge the support and computational resources provided by the MetaCentrum under the program LM2015042 and the CERIT Scientific Cloud under the program LM2015085, provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” and the project Reg. No. CZ.02.1.01/0.0/0.0/16_013/0001797 co-funded by the Ministry of Education, Youth and Sports of the Czech Republic. We also highly appreciate the access to the workload traces provided by the Parallel Workloads Archive, MetaCentrum and CERIT-SC.

References

  1. 1.
    Alea 4: Job scheduling simulator, February 2018. https://github.com/aleasimulator
  2. 2.
    Balasundaram, V., Fox, G., Kennedy, K., Kremer, U.: A static performance estimator to guide data partitioning decisions. ACM SIGPLAN Not. 26(7), 213–223 (1991)CrossRefGoogle Scholar
  3. 3.
    CERIT Scientific Cloud, February 2018. http://www.cerit-sc.cz
  4. 4.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-36180-4_7CrossRefzbMATHGoogle Scholar
  5. 5.
    Devarakonda, M.V., Iyer, R.K.: Predictability of process resource usage: a measurement based study on UNIX. IEEE Trans. Softw. Eng. 15(12), 1579–1586 (1989)CrossRefGoogle Scholar
  6. 6.
    Downey, A.B.: Predicting queue times on space-sharing parallel computers. In: 11th International Parallel Processing Symposium, pp. 209–218 (1997)Google Scholar
  7. 7.
    Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global Grid computing for job scheduling. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 374–379. IEEE (2004)Google Scholar
  8. 8.
    Feitelson, D.G.: Parallel workloads archive, February 2018. http://www.cs.huji.ac.il/labs/parallel/workload/
  9. 9.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-63574-2_14CrossRefGoogle Scholar
  10. 10.
    Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546. IEEE (1998)Google Scholar
  11. 11.
    Guim, F., Corbalan, J., Labarta, J.: Prediction f based models for evaluating backfilling scheduling policies. In: Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007), pp. 9–17. IEEE (2007)Google Scholar
  12. 12.
    Klusáček, D.: Workload traces from metacentrum and CERIT Scientific Cloud, February 2018. http://jsspp.org/workload/
  13. 13.
    Klusáček, D., Tóth, Š., Podolníková, G.: Complex job scheduling simulations with Alea 4. In: Ninth EAI International Conference on Simulation Tools and Techniques (SimuTools 2016), pp. 124–129. ACM (2016)Google Scholar
  14. 14.
    Krakov, D., Feitelson, D.G.: Comparing performance heatmaps. In: Desai, N., Cirne, W. (eds.) JSSPP 2013. LNCS, vol. 8429, pp. 42–61. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-43779-7_3CrossRefGoogle Scholar
  15. 15.
    Kumar, R., Vadhiyar, S.: Prediction of queue waiting times for metascheduling on parallel batch systems. In: Cirne, W., Desai, N. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 8828, pp. 108–128. Springer (2014)Google Scholar
  16. 16.
    Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005).  https://doi.org/10.1007/11407522_14CrossRefGoogle Scholar
  17. 17.
    MetaCentrum, February 2018. http://www.metacentrum.cz/
  18. 18.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  19. 19.
    Nurmi, D., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 76–101. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78699-3_5CrossRefGoogle Scholar
  20. 20.
    PBS Works. PBS Professional 14.2, Administrator’s Guide, February 2018. http://www.pbsworks.com
  21. 21.
    Sarkar, V.: Determining average program execution times and their variance. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 298–312 (1989)Google Scholar
  22. 22.
    Seneviratne, S., Witharana, S.: A survey on methodologies for runtime prediction on grid environments. In: 7th International Conference on Information and Automation for Sustainability, pp. 1–6. IEEE (2014)Google Scholar
  23. 23.
    Skovira, J., Chan, W., Zhou, H., Lifka, D.: The EASY — LoadLeveler API project. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996).  https://doi.org/10.1007/BFb0022286CrossRefGoogle Scholar
  24. 24.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  25. 25.
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-47954-6_11CrossRefGoogle Scholar
  26. 26.
  27. 27.
    Talby, D., Feitelson, D.G.: Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: IPPS 1999/SPDP 1999: Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, pp. 513–517. IEEE Computer Society (1999)Google Scholar
  28. 28.
    Tang, W., Desai, N., Buettner, D., Lan, Z.: Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–11. IEEE (2010)Google Scholar
  29. 29.
    Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-16505-4_12CrossRefGoogle Scholar
  30. 30.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005).  https://doi.org/10.1007/11605300_1CrossRefGoogle Scholar
  31. 31.
    Zakay, N., Feitelson, D.G.: Preserving user behavior characteristics in trace-based simulation of parallel job scheduling. In: 22nd Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS), pp. 51–60 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.CESNET a.l.e.BrnoCzech Republic

Personalised recommendations