Skip to main content

Using Inaccurate Estimates Accurately

  • Conference paper
Book cover Job Scheduling Strategies for Parallel Processing (JSSPP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6253))

Included in the following conference series:

Abstract

Job schedulers improve the system utilization by requiring users to estimate how long their jobs will run and by using this information to better pack (or “backfill”) the jobs. But, surprisingly, many studies find that deliberately making estimates less accurate boosts (or does not affect) the performance, which helps explain why production systems still exclusively rely on notoriously inaccurate estimates.

We prove these studies wrong by showing that their methodology is erroneous. The studies model an estimate e as being correlated with r·F (where r is the runtime of the associated job, F is some ”badness” factor, and larger F values imply increased inaccuracy). We show this model is invalid, because: (1) it conveys too much information to the scheduler; (2) it induces favoritism of short jobs; and (3) it is inherently different than real user inaccuracy, which associates 90% of the jobs with merely 20 estimate values, hindering the scheduler’s ability to backfill.

We conclude that researchers must stop using multiples of runtimes as estimates, or else their results would likely be invalid. We develop (and propose to use) a realistic model that preserves the estimates’ modality and allows to soundly simulate increased inaccuracy by, e.g., associating more jobs with the maximal runtime allowed (an always-popular estimate, which prevents backfilling).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Chiang, S.-H., Vernon, M.K.: Production job scheduling for parallel shared memory systems. In: 15th IEEE Int’l Parallel & Distributed Processing Symp (IPDPS) (April 2001)

    Google Scholar 

  3. Dimitriadou, S., Karatza, H.: Job scheduling in a distributed system using backfilling with inaccurate runtime computations. In: IEEE Int’l Conf. Complex, Intelligent & Software Intensive Systems (CISIS), pp. 329–336 (February 2010)

    Google Scholar 

  4. Dongarra, J.J., Meuer, H.W., Simon, H.D., Strohmaier, E.: Top500 supercomputer sites, http://www.top500.org/ (updated every 6 months)

  5. England, D., Weissman, J., Sadago-pan, J.: A new metric for robustness with application to job scheduling. In: 14th IEEE Int’l Symp. on High Performance Distributed Comput. (HPDC), pp. 135–143 (July 2005)

    Google Scholar 

  6. Ernemann, C., Krogmann, M., Lepping, J., Yahyapour, R.: Scheduling on the top 50 machines. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 17–46. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Etsion, Y., Tsafrir, D.: A Short Survey of Commercial Cluster Batch Schedulers. Technical Report 2005-13, The Hebrew University of Jerusalem (May 2005)

    Google Scholar 

  8. Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th IEEE Int’l Parallel Processing Symp (IPPS), pp. 542–546 (April 1998)

    Google Scholar 

  9. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling — a status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. on Parallel & Distributed Syst. (TPDS) 16(11), 1066–1077 (2005)

    Article  Google Scholar 

  11. Guim, F., Corbalán, J., Labarta, J.: Prediction f based models for evaluating backfilling scheduling policies. In: 8th IEEE Int’l Conf. on Parallel & Distributed Computing, Applications & Technologies (PDCAT), pp. 9–17 (December 2007)

    Google Scholar 

  12. Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 1–16. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  13. Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  14. Mu’alem, A., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. on Parallel & Distributed Syst (TPDS) 12(6), 529–543 (2001)

    Article  Google Scholar 

  15. Netto, M.A.S., Buyya, R.: Coordinated Rescheduling of Bag-of-Tasks for Executions on Multiple Resource Providers. Technical Report CLOUDS-TR-2010-1, U. of Melbourne, Australia, Submitted (TPDS) (February 2010)

    Google Scholar 

  16. Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload

  17. Sabin, G., Sadayappan, P.: On enhancing the reliability of job schedulers. In: High Availability & Performace Computing Workshop (HAPCW) (October 2005)

    Google Scholar 

  18. Srinivasan, S., Kettimuthu, R., Subrarnani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Int’l Conf. on Parallel Processing (ICPP), pp. 514–522 (August 2002)

    Google Scholar 

  19. Suzuoka, T., Subhlok, J., Gross, T.: Evaluating Job Scheduling Techniques for Highly Parallel Computers. Technical Report CMU-CS-95-149, School of Computer Science, Carnegie Mellon University (August 1995)

    Google Scholar 

  20. Tang, W., Desai, N., Buettner, D., Lan, Z.: Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: IEEE Int’l Parallel & Distributed Processing Symp (IPDPS) (April 2010)

    Google Scholar 

  21. Tsafrir, D.: Modeling, Evaluating, and Improving the Performance of Supercomputer Scheduling. PhD thesis, The Hebrew University of Jerusalem (September 2006)

    Google Scholar 

  22. Tsafrir, D., Etsion, Y., Feitelson, D.G.: A model/utility for generating user runtime estimates and appending them to a standard workload format (SWF) file (February 2006), http://www.cs.huji.ac.il/labs/parallel/workload/m_tsafrir05

  23. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  24. Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: 2nd IEEE Int’l Symp. on Workload Characterization (IISWC) (October 2006)

    Google Scholar 

  25. Zhang, Y., Franke, H., Moreira, J., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: 14th IEEE Int’l Parallel & Distributed Processing Symp. (IPDPS), pp. 133–142 (May 2000)

    Google Scholar 

  26. Zhang, Y., Franke, H., Moreira, J., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans. on Parallel & Distributed Syst. (TPDS) 14(3), 236–247 (2003)

    Article  MATH  Google Scholar 

  27. Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th IEEE Int’l Symp. on High Performance Distributed Comput. (HPDC), p. 39 (August 1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsafrir, D. (2010). Using Inaccurate Estimates Accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2010. Lecture Notes in Computer Science, vol 6253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16505-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16505-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16504-7

  • Online ISBN: 978-3-642-16505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics