Skip to main content

User-Aware Metrics for Measuring Quality of Parallel Job Schedules

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8828))

Included in the following conference series:

  • 600 Accesses

Abstract

The work presented in this paper is motivated by the challenges in the design of scheduling algorithms for the Czech National Grid MetaCentrum. One of the most notable problems is our inability to efficiently analyze the quality of schedules. While it is still possible to observe and measure certain aspects of generated schedules using various metrics, it is very challenging to choose a set of metrics that would be representative when measuring the schedule quality. Without quality quantification (either relative, or absolute), we have no way to determine the impact of new algorithms and configurations on the schedule quality, prior to their deployment in a production service. The only two options we are left with is to either use expert assessment or to simply deploy new solutions into production and observe their impact on user satisfaction. To approach this problem, we have designed a novel user-aware model and a metric that can overcome the presented issues by evaluating the quality on a user level. The model assigns an expected end time (EET) to each job based on a fair partitioning of the system resources, modeling users expectations. Using this calculated EET we can then compare generated schedules in detail, while also being able to adequately visualize schedule artifacts, allowing an expert to further analyze them. Moreover, we present how coupling this model with a job scheduling simulator gives us the ability to do an in-depth evaluation of scheduling algorithms before they are deployed into a production environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Production systems (including MetaCentrum) usually employ a certain type of anti-starvation technique. Since this approach goes directly against the order suggested by the job-related metric, it naturally leads to skewed results.

  2. 2.

    \(Resc_1\) and \(Resc_2\) represent resources, e.g., CPU cores.

  3. 3.

    Depending on the implementation, fairshare can also prevent usage spikes.

  4. 4.

    By default, we assume that this provided schedule is a historic schedule as found in a workload trace. If needed, it can be extended for a use within “live” scheduler.

  5. 5.

    What is better, a more disperse distribution with a better median, or a less disperse distribution?

  6. 6.

    Box-plot maintains information on the distribution of \( VEET_u \) values by showing their minimum, lower quartile, median, upper quartile and the maximum, plus possible extreme outliers marked as dots.

References

  1. Adaptive Computing Enterprises, Inc., Maui Scheduler Administrator’s Guide, version 3.2, January 2014. http://docs.adaptivecomputing.com

  2. Adaptive Computing Enterprises, Inc., TORQUE Admininstrator Guide, version 4.2.6, January 2014. http://docs.adaptivecomputing.com

  3. Apache.org. Hadoop Capacity Scheduler, January 2014. http://hadoop.apache.org/docs/r1.1.1/capacity_scheduler.html

  4. Apache.org. Hadoop Fair Scheduler, January 2014. http://hadoop.apache.org/docs/r1.1.1/fair_scheduler.html

  5. Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In 2001 IEEE International Workshop on Workload Characterization (WWC 2001), pp. 140–148. IEEE Computer Society (2001)

    Google Scholar 

  6. Cirne, W., Brasileiro, F., Sauvé, J., Andrade, N., Paranhos, D., Santos-neto, E., Medeiros, R., Gr, F.C.: Grid computing for bag of tasks applications. In: 3rd IFIP Conference on E-Commerce, E-Business and EGovernment (2003)

    Google Scholar 

  7. Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global Grid computing for job scheduling. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID 2004, pp. 374–379. IEEE (2004)

    Google Scholar 

  8. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Job scheduling strategies for parallel processing. In: Feitelson, D.G., Rudolph, L. (eds.) Theory and practice in parallel job scheduling. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  9. Frachtenberg, E., Feitelson, D.G.: Pitfalls in parallel job scheduling evaluation. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 257–282. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (2011)

    Google Scholar 

  11. Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: Fair scheduling for distributed computing clusters. In: SOSP 2009 (2009)

    Google Scholar 

  12. Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Karatza, H.D.: Performance of gang scheduling strategies in a parallel system. Simul. Model. Pract. Theory 17(2), 430–441 (2009)

    Article  Google Scholar 

  14. Klusáček,D., Rudová, H.: Alea 2 - job scheduling simulator. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools 2010). ICST, 2010

    Google Scholar 

  15. Klusáček, D., Rudová, H., Jaroš, M.: Multi resource fairness: problems and challenges. In: Desai, N., Cirne, W. (eds.) Job Scheduling Strategies for Parallel Processing (JSSPP 2013). LNCS. Springer, Heidelberg (2013)

    Google Scholar 

  16. Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Costa, V.S. (eds.) Euro-Par 2014. LNCS, vol. 8632. Springer, Heidelberg (2014)

    Google Scholar 

  17. Krakov, D., Feitelson, D.: High-resolution analysis of parallel job workloads. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 7698, pp. 178–195. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Krakov, D., Feitelson, D.G.: Comparing Performance Heatmaps. In: Desai, N., Cirne, W. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS. Springer, Heidelberg (2013)

    Google Scholar 

  19. Leung, V.J., Sabin, G., Sadayappan, P.: Parallel job scheduling policies to improve fairness: a case study. Technical Report SAND2008-1310, Sandia National Laboratories (2008)

    Google Scholar 

  20. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)

    Article  Google Scholar 

  21. PBS Works. PBS Professional 12.1, Administrator’s Guide, January 2014. http://www.pbsworks.com/documentation/support/

  22. Ruda, M., Šustr, Z., Sitera, J., Antoš, D., Hejtmánek, L., Holub, P., Mulač, M.: Virtual clusters as a new service of MetaCentrum, the Czech NGI. In: Cracow 2009 Grid Workshop (2010)

    Google Scholar 

  23. Sabin, G., Kochhar, G., Sadayappan, P.: Job fairness in non-preemptive job scheduling. In: International Conference on Parallel Processing (ICPP 2004), pp. 186–194. IEEE Computer Society (2004)

    Google Scholar 

  24. Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 55–71. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Tóth, Š., Klusáček, D.: Tools and methods for detailed analysis of complex job schedules in the Czech National Grid. In: Bubak, M., Turała, M., Wiatr, K. (eds.) Cracow Grid Workshop, pp. 83–84. ACC CYFRONET AGH, Cracow (2013)

    Google Scholar 

  26. Tóth, Š., Ruda, M.: Practical experiences with torque meta-scheduling in the Czech National Grid. Comput. Sci. 13(2), 33–45 (2012)

    Article  Google Scholar 

  27. Vasupongayya, S., Chiang, S.-H.: On job fairness in non-preemptive parallel job scheduling. In: Zheng, S.Q. (ed.) International Conference on Parallel and Distributed Computing Systems (PDCS 2005), pp. 100–105. IASTED/ACTA Press, San Diego (2005)

    Google Scholar 

Download references

Acknowledgments

We highly appreciate the support of the Grant Agency of the Czech Republic under the grant No. P202/12/0306. The access to the MetaCentrum workloads is kindly acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalibor Klusáček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Tóth, Š., Klusáček, D. (2015). User-Aware Metrics for Measuring Quality of Parallel Job Schedules. In: Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2014. Lecture Notes in Computer Science(), vol 8828. Springer, Cham. https://doi.org/10.1007/978-3-319-15789-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15789-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15788-7

  • Online ISBN: 978-3-319-15789-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics