Abstract
The work presented in this paper is motivated by the challenges in the design of scheduling algorithms for the Czech National Grid MetaCentrum. One of the most notable problems is our inability to efficiently analyze the quality of schedules. While it is still possible to observe and measure certain aspects of generated schedules using various metrics, it is very challenging to choose a set of metrics that would be representative when measuring the schedule quality. Without quality quantification (either relative, or absolute), we have no way to determine the impact of new algorithms and configurations on the schedule quality, prior to their deployment in a production service. The only two options we are left with is to either use expert assessment or to simply deploy new solutions into production and observe their impact on user satisfaction. To approach this problem, we have designed a novel user-aware model and a metric that can overcome the presented issues by evaluating the quality on a user level. The model assigns an expected end time (EET) to each job based on a fair partitioning of the system resources, modeling users expectations. Using this calculated EET we can then compare generated schedules in detail, while also being able to adequately visualize schedule artifacts, allowing an expert to further analyze them. Moreover, we present how coupling this model with a job scheduling simulator gives us the ability to do an in-depth evaluation of scheduling algorithms before they are deployed into a production environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Production systems (including MetaCentrum) usually employ a certain type of anti-starvation technique. Since this approach goes directly against the order suggested by the job-related metric, it naturally leads to skewed results.
- 2.
\(Resc_1\) and \(Resc_2\) represent resources, e.g., CPU cores.
- 3.
Depending on the implementation, fairshare can also prevent usage spikes.
- 4.
By default, we assume that this provided schedule is a historic schedule as found in a workload trace. If needed, it can be extended for a use within “live” scheduler.
- 5.
What is better, a more disperse distribution with a better median, or a less disperse distribution?
- 6.
Box-plot maintains information on the distribution of \( VEET_u \) values by showing their minimum, lower quartile, median, upper quartile and the maximum, plus possible extreme outliers marked as dots.
References
Adaptive Computing Enterprises, Inc., Maui Scheduler Administrator’s Guide, version 3.2, January 2014. http://docs.adaptivecomputing.com
Adaptive Computing Enterprises, Inc., TORQUE Admininstrator Guide, version 4.2.6, January 2014. http://docs.adaptivecomputing.com
Apache.org. Hadoop Capacity Scheduler, January 2014. http://hadoop.apache.org/docs/r1.1.1/capacity_scheduler.html
Apache.org. Hadoop Fair Scheduler, January 2014. http://hadoop.apache.org/docs/r1.1.1/fair_scheduler.html
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In 2001 IEEE International Workshop on Workload Characterization (WWC 2001), pp. 140–148. IEEE Computer Society (2001)
Cirne, W., Brasileiro, F., Sauvé, J., Andrade, N., Paranhos, D., Santos-neto, E., Medeiros, R., Gr, F.C.: Grid computing for bag of tasks applications. In: 3rd IFIP Conference on E-Commerce, E-Business and EGovernment (2003)
Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global Grid computing for job scheduling. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID 2004, pp. 374–379. IEEE (2004)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Job scheduling strategies for parallel processing. In: Feitelson, D.G., Rudolph, L. (eds.) Theory and practice in parallel job scheduling. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)
Frachtenberg, E., Feitelson, D.G.: Pitfalls in parallel job scheduling evaluation. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 3834, pp. 257–282. Springer, Heidelberg (2005)
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (2011)
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: Fair scheduling for distributed computing clusters. In: SOSP 2009 (2009)
Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001)
Karatza, H.D.: Performance of gang scheduling strategies in a parallel system. Simul. Model. Pract. Theory 17(2), 430–441 (2009)
Klusáček,D., Rudová, H.: Alea 2 - job scheduling simulator. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools 2010). ICST, 2010
Klusáček, D., Rudová, H., Jaroš, M.: Multi resource fairness: problems and challenges. In: Desai, N., Cirne, W. (eds.) Job Scheduling Strategies for Parallel Processing (JSSPP 2013). LNCS. Springer, Heidelberg (2013)
Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Costa, V.S. (eds.) Euro-Par 2014. LNCS, vol. 8632. Springer, Heidelberg (2014)
Krakov, D., Feitelson, D.: High-resolution analysis of parallel job workloads. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 7698, pp. 178–195. Springer, Heidelberg (2013)
Krakov, D., Feitelson, D.G.: Comparing Performance Heatmaps. In: Desai, N., Cirne, W. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS. Springer, Heidelberg (2013)
Leung, V.J., Sabin, G., Sadayappan, P.: Parallel job scheduling policies to improve fairness: a case study. Technical Report SAND2008-1310, Sandia National Laboratories (2008)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)
PBS Works. PBS Professional 12.1, Administrator’s Guide, January 2014. http://www.pbsworks.com/documentation/support/
Ruda, M., Šustr, Z., Sitera, J., Antoš, D., Hejtmánek, L., Holub, P., Mulač, M.: Virtual clusters as a new service of MetaCentrum, the Czech NGI. In: Cracow 2009 Grid Workshop (2010)
Sabin, G., Kochhar, G., Sadayappan, P.: Job fairness in non-preemptive job scheduling. In: International Conference on Parallel Processing (ICPP 2004), pp. 186–194. IEEE Computer Society (2004)
Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 2537, pp. 55–71. Springer, Heidelberg (2002)
Tóth, Š., Klusáček, D.: Tools and methods for detailed analysis of complex job schedules in the Czech National Grid. In: Bubak, M., Turała, M., Wiatr, K. (eds.) Cracow Grid Workshop, pp. 83–84. ACC CYFRONET AGH, Cracow (2013)
Tóth, Š., Ruda, M.: Practical experiences with torque meta-scheduling in the Czech National Grid. Comput. Sci. 13(2), 33–45 (2012)
Vasupongayya, S., Chiang, S.-H.: On job fairness in non-preemptive parallel job scheduling. In: Zheng, S.Q. (ed.) International Conference on Parallel and Distributed Computing Systems (PDCS 2005), pp. 100–105. IASTED/ACTA Press, San Diego (2005)
Acknowledgments
We highly appreciate the support of the Grant Agency of the Czech Republic under the grant No. P202/12/0306. The access to the MetaCentrum workloads is kindly acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tóth, Š., Klusáček, D. (2015). User-Aware Metrics for Measuring Quality of Parallel Job Schedules. In: Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2014. Lecture Notes in Computer Science(), vol 8828. Springer, Cham. https://doi.org/10.1007/978-3-319-15789-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-15789-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15788-7
Online ISBN: 978-3-319-15789-4
eBook Packages: Computer ScienceComputer Science (R0)