Abstract
Jobs in a computer cluster have several exit statuses caused by application properties, user and scheduler behavior. In this paper we analyze importance of job statuses and potential use of their prediction prior to job execution. Method for prediction of failed jobs based on Bayesian classifier is proposed and accuracy of the method is analyzed on several workloads. This method is integrated to the EASY algorithm adapted to prioritize jobs that are likely to fail. System performance for both failed jobs and the entire workload is analyzed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
IDC HPC Market Update, http://www.hpcadvisorycouncil.com/events/china_workshop/pdf/6_IDC.pdf
TOP500 Supercomputing Sites, http://www.top500.org/
Barsanti, L., Sodan, A.: Adaptive Job Scheduling via Predictive Job Resource Allocation. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2006. LNCS, vol. 4376, pp. 115–140. Springer, Heidelberg (2007)
Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)
Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)
Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Krishnaswamy, S., Loke, S.W., Zaslavsky, A.: Estimating computation times of data-intensive applications. IEEE Distributed Systems Online 5(4) (2004)
Kapadia, N.H., Fortes, J.A.B., Brodley, C.E.: Predictive Application-Performance Modeling in a Computational Grid Environment. In: the Proceedings of the The Eighth IEEE International Symposium on High Performance Distributed Computing, pp. 47–54 (1999)
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. IEEE Transactions on Parallel and Distributed Systems 18(6), 789–803 (2007)
Lifka, D.A.: The ANL IBM SP Scheduling System. In: Feitelson, D., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)
Tsafrir, D., Feitelson, D.G.: The Dynamics of Backfilling: Solving the Mysteryof Why Increased Inaccuracy May Help. In: Proceedings of 2006 IEEE International Symposium on Workload Characterization, pp. 131–141 (2006)
Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload/
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2005)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Breiman: Random Forests. Machine Learning 45, 5–32 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grudenic, I., Bogunovic, N. (2011). Job Status Prediction – Catch Them Before They Fail. In: Riekki, J., Ylianttila, M., Guo, M. (eds) Advances in Grid and Pervasive Computing. GPC 2011. Lecture Notes in Computer Science, vol 6646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20754-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-20754-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20753-2
Online ISBN: 978-3-642-20754-9
eBook Packages: Computer ScienceComputer Science (R0)