Advertisement

To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows

  • Marwin ZüfleEmail author
  • Christian Krupitzer
  • Florian Erhard
  • Johannes Grohmann
  • Samuel Kounev
Conference paper
  • 68 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12040)

Abstract

Due to the increasing size of today’s data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously increases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 h) for predicting the time-to-failure.

Keywords

Failure prediction S.M.A.R.T. Machine learning Labeling methods Classification Regression Cloud Computing 

Notes

Acknowledgements

This work was co-funded by the German Research Foundation (DFG) under grant No. (KO 3445/11-1) and the IHK (Industrie- und Handelskammer) Würz-burg-Schweinfurt.

References

  1. 1.
    Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., Chabridon, S.: Predictive models of hard drive failures based on operational data. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 619–625. IEEE (2017)Google Scholar
  2. 2.
    Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48. ACM (2016)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  5. 5.
    Breiman, L., Cutler, A., Liaw, A., Wiener, M.: Breiman and Cutler’s random forests for classification and regression (2018). https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
  6. 6.
    Cao, H., Li, X.L., Woon, D.Y.K., Ng, S.K.: Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 25(12), 2809–2822 (2013)CrossRefGoogle Scholar
  7. 7.
    Chaves, I.C., de Paula, M.R.P., Leite, L.G., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)Google Scholar
  8. 8.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  9. 9.
    Dal Pozzolo, A., Caelen, O., Bontempi, G.: Unbalanced (2015). https://cran.r-project.org/web/packages/unbalanced/unbalanced.pdf
  10. 10.
    Dixon, M., Klabjan, D., Wei, L.: OSTSC (2017). https://cran.r-project.org/web/packages/OSTSC/OSTSC.pdf
  11. 11.
    Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001)Google Scholar
  12. 12.
    Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)Google Scholar
  13. 13.
    Krupitzer, C., Roth, F.M., VanSyckel, S., Schiele, G., Becker, C.: A survey on engineering approaches for self-adaptive systems. Pervasive Mob. Comput. J. 17(Part B), 184–206 (2015)CrossRefGoogle Scholar
  14. 14.
    Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014)Google Scholar
  15. 15.
    Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)CrossRefGoogle Scholar
  16. 16.
    Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6(May), 783–816 (2005)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Ottem, E., Plummer, J.: Playing it smart: The emergence of reliability prediction technology. Technical report, Seagate Technology Paper (1995)Google Scholar
  18. 18.
    Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: 5th USENIX Conference on File and Storage Technologies (FAST 2007), pp. 17–29 (2007)Google Scholar
  19. 19.
    Pitakrat, T., Van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of the 4th International ACM SIGSOFT Symposium on Architecting Critical Systems, pp. 1–10. ACM (2013)Google Scholar
  20. 20.
    dos Santos Lima, F.D., Pereira, F.L.F., Chaves, I.C., Gomes, J.P.P., de Castro Machado, J.: Evaluation of recurrent neural networks for hard disk drives failure prediction. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 85–90. IEEE (2018)Google Scholar
  21. 21.
    Seagate Product Marketing: Get S.M.A.R.T. for reliability. Technical report, Seagate Technology Paper (1999)Google Scholar
  22. 22.
    Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 1550147718806480 (2018)CrossRefGoogle Scholar
  23. 23.
    Sun, X., et al.: System-level hardware failure prediction using deep learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, p. 20. ACM (2019)Google Scholar
  24. 24.
    Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Industr. Inf. 10(1), 419–430 (2013)CrossRefGoogle Scholar
  25. 25.
    Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, p. 35. ACM (2018)Google Scholar
  26. 26.
    Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015)Google Scholar
  28. 28.
    Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 390–404. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14400-4_30CrossRefGoogle Scholar
  29. 29.
    Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)Google Scholar
  30. 30.
    Züfle, M., et al.: Autonomic forecasting method selection: examination and ways ahead. In: Proceedings of the 16th IEEE International Conference on Autonomic Computing (ICAC). IEEE (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Marwin Züfle
    • 1
    Email author
  • Christian Krupitzer
    • 1
  • Florian Erhard
    • 1
  • Johannes Grohmann
    • 1
  • Samuel Kounev
    • 1
  1. 1.Software Engineering GroupUniversity of WürzburgWürzburgGermany

Personalised recommendations