Job Status Prediction – Catch Them Before They Fail

Grudenic, Igor; Bogunovic, Nikola

doi:10.1007/978-3-642-20754-9_2

Job Status Prediction – Catch Them Before They Fail

Igor Grudenic¹⁸ &
Nikola Bogunovic¹⁸

Conference paper

660 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6646))

Abstract

Jobs in a computer cluster have several exit statuses caused by application properties, user and scheduler behavior. In this paper we analyze importance of job statuses and potential use of their prediction prior to job execution. Method for prediction of failed jobs based on Bayesian classifier is proposed and accuracy of the method is analyzed on several workloads. This method is integrated to the EASY algorithm adapted to prioritize jobs that are likely to fail. System performance for both failed jobs and the entire workload is analyzed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IDC HPC Market Update, http://www.hpcadvisorycouncil.com/events/china_workshop/pdf/6_IDC.pdf
TOP500 Supercomputing Sites, http://www.top500.org/
Barsanti, L., Sodan, A.: Adaptive Job Scheduling via Predictive Job Resource Allocation. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2006. LNCS, vol. 4376, pp. 115–140. Springer, Heidelberg (2007)
Chapter Google Scholar
Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)
Chapter Google Scholar
Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)
Chapter Google Scholar
Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Krishnaswamy, S., Loke, S.W., Zaslavsky, A.: Estimating computation times of data-intensive applications. IEEE Distributed Systems Online 5(4) (2004)
Google Scholar
Kapadia, N.H., Fortes, J.A.B., Brodley, C.E.: Predictive Application-Performance Modeling in a Computational Grid Environment. In: the Proceedings of the The Eighth IEEE International Symposium on High Performance Distributed Computing, pp. 47–54 (1999)
Google Scholar
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. IEEE Transactions on Parallel and Distributed Systems 18(6), 789–803 (2007)
Article Google Scholar
Lifka, D.A.: The ANL IBM SP Scheduling System. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)
Chapter Google Scholar
Tsafrir, D., Feitelson, D.G.: The Dynamics of Backfilling: Solving the Mysteryof Why Increased Inaccuracy May Help. In: Proceedings of 2006 IEEE International Symposium on Workload Characterization, pp. 131–141 (2006)
Google Scholar
Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload/
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Breiman: Random Forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
Igor Grudenic & Nikola Bogunovic

Authors

Igor Grudenic
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Bogunovic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Information Engineering, University of Oulu, 90014, Oulu, Finland
Jukka Riekki & Mika Ylianttila &
Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240, Minhang, Shanghai, China
Minyi Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grudenic, I., Bogunovic, N. (2011). Job Status Prediction – Catch Them Before They Fail. In: Riekki, J., Ylianttila, M., Guo, M. (eds) Advances in Grid and Pervasive Computing. GPC 2011. Lecture Notes in Computer Science, vol 6646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20754-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-20754-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20753-2
Online ISBN: 978-3-642-20754-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics