Skip to main content

Job Status Prediction – Catch Them Before They Fail

  • Conference paper
  • 660 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6646))

Abstract

Jobs in a computer cluster have several exit statuses caused by application properties, user and scheduler behavior. In this paper we analyze importance of job statuses and potential use of their prediction prior to job execution. Method for prediction of failed jobs based on Bayesian classifier is proposed and accuracy of the method is analyzed on several workloads. This method is integrated to the EASY algorithm adapted to prioritize jobs that are likely to fail. System performance for both failed jobs and the entire workload is analyzed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IDC HPC Market Update, http://www.hpcadvisorycouncil.com/events/china_workshop/pdf/6_IDC.pdf

  2. TOP500 Supercomputing Sites, http://www.top500.org/

  3. Barsanti, L., Sodan, A.: Adaptive Job Scheduling via Predictive Job Resource Allocation. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2006. LNCS, vol. 4376, pp. 115–140. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  6. Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Krishnaswamy, S., Loke, S.W., Zaslavsky, A.: Estimating computation times of data-intensive applications. IEEE Distributed Systems Online 5(4) (2004)

    Google Scholar 

  8. Kapadia, N.H., Fortes, J.A.B., Brodley, C.E.: Predictive Application-Performance Modeling in a Computational Grid Environment. In: the Proceedings of the The Eighth IEEE International Symposium on High Performance Distributed Computing, pp. 47–54 (1999)

    Google Scholar 

  9. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. IEEE Transactions on Parallel and Distributed Systems 18(6), 789–803 (2007)

    Article  Google Scholar 

  10. Lifka, D.A.: The ANL IBM SP Scheduling System. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  11. Tsafrir, D., Feitelson, D.G.: The Dynamics of Backfilling: Solving the Mysteryof Why Increased Inaccuracy May Help. In: Proceedings of 2006 IEEE International Symposium on Workload Characterization, pp. 131–141 (2006)

    Google Scholar 

  12. Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload/

  13. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  14. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  15. Breiman: Random Forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grudenic, I., Bogunovic, N. (2011). Job Status Prediction – Catch Them Before They Fail. In: Riekki, J., Ylianttila, M., Guo, M. (eds) Advances in Grid and Pervasive Computing. GPC 2011. Lecture Notes in Computer Science, vol 6646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20754-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20754-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20753-2

  • Online ISBN: 978-3-642-20754-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics