An empirical study of crash-inducing commits in Mozilla Firefox

Article

Abstract

Software crashes are dreaded by both software organisations and end-users. Many software organisations have automatic crash reporting tools embedded in their software systems to help quality-assurance teams track and fix crash-related bugs. Previous approaches, which focused on the triaging of crash-types and crash-related bugs, can help software organisations increase their debugging efficiency of crashes. However, these approaches can only be applied after the software systems have been crashing for a certain period of time. To help software organisations detect and fix crash-prone code earlier, we examine the characteristics of commits that lead to crashes, which we call crash-inducing commits, in Mozilla Firefox. We observe that crash-inducing commits are often submitted by developers with less experience and that developers perform more addition and deletion of lines of code in crash-inducing commits but also that they need less effort to fix the bugs caused by these commits. We also characterise commits that would lead to frequent crashes, which impact a large user base, which we call highly impactful crash-inducing commits. Compared to other crash-related bugs, we observe that bugs due to highly impactful crash-inducing commits were less reopened by developers and tend to be fixed by a single commit. We build predictive models to help software organisations detect and fix crash-prone bugs early, when their developers commit code. Our predictive models achieve a precision of 61.2% and a recall of 94.5% to predict crash-inducing commits and a precision of 60.9% and a recall of 91.1% to predict highly impactful crash-inducing commits. Software organisations could use our models and approach to track and fix crash-prone commits early, before they negatively impact users, thus increasing bug fixing efficiency and user-perceived quality.

Keywords

Crash analysis Bug triaging Prediction model Mining software repositories 

Notes

Acknowledgements

This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and by Fonds de Recherche du Québec – Nature et Technologies (FRQNT).

References

  1. An, L., & Khomh, F. (2015a). Challenges and issues of mining crash reports. In Proceedings of the 1st international workshop on software analytics (SWAN) (pp. 5–8). IEEE.Google Scholar
  2. An, L., & Khomh, F. (2015b). An empirical study of crash-inducing commits in Mozilla Firefox. In Proceedings of the 11th international conference on predictive models and data analytics in software engineering (p. 5). ACM.Google Scholar
  3. An, L., & Khomh, F. (2015c). An empirical study of highly-impactful bugs in Mozilla projects. In Proceedings of 2015 IEEE international conference on software quality, reliability and security (QRS). IEEE.Google Scholar
  4. An, L., Khomh, F., & Adams, B. (2014). Supplementary bug fixes vs. re-opened bugs. In Proceedings of the 14th international working conference on source code analysis and manipulation (SCAM) (pp. 205–214). IEEE.Google Scholar
  5. Anbalagan, P., & Vouk, M. (2009). Days of the week effect in predicting the time taken to fix defects. In Proceedings of the 2nd international workshop on defects in large software systems: Held in conjunction with the ACM SIGSOFT international symposium on software testing and analysis (ISSTA 2009) (pp. 29–30). ACM.Google Scholar
  6. Anvik, J., Hiew, L., & Murphy, G. C. (2006). Who should fix this bug? In Proceedings of the 28th international conference on software engineering, ser. ICSE ’06 (pp. 361–370). New York, NY, USA: ACM.. doi: 10.1145/1134285.1134336 CrossRefGoogle Scholar
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefMATHGoogle Scholar
  8. Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695(5), 1–9.Google Scholar
  9. C5.0 algorithm (2015). http://www.rulequest.com/see5-comparison.html, online; accessed June 13th, 2015.
  10. Dang, Y., Wu, R., Zhang, H., Zhang, D., & Nobel, P. (2012). Rebucket: a method for clustering duplicate crash reports based on call stack similarity. In Proceedings of the 34th international conference on software engineering (pp. 1084–1093). IEEE Press.Google Scholar
  11. Díaz-Uriarte, R., & De Andres, S.A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3.CrossRefGoogle Scholar
  12. Dmitrienko, A., Molenberghs, G., Chuang-Stein, C., & Offen, W. (2005). Analysis of clinical trials using SAS: a practical guide. SAS Institute. [Online]. Available: http://www.google.ca/books?id=G5ElnZDDm8gC.
  13. Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316–331.MathSciNetCrossRefMATHGoogle Scholar
  14. Joorabchi, M.E., Mirzaaghaei, M., & Mesbah, A. (2014). Works for me! characterizing non-reproducible bug reports. In Proceedings of the 11th working conference on mining software repositories (MSR) (pp. 62–71). ACM.Google Scholar
  15. Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems. In Proceedings of the 19th international conference on software maintenance (ICSM) (pp. 23–32). IEEE.Google Scholar
  16. Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., & Ubayashi, N. (2014). An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th working conference on mining software repositories (MSR) (pp. 172–181). ACM.Google Scholar
  17. Hassan, A.E., & Holt, R.C. (2003). Studying the chaos of code development, Null (p. 123). IEEE.Google Scholar
  18. Hassan, A.E., & Zhang, K. (2006). Using decision trees to predict the certification result of a build. In Proceedings of the 21st international conference on automated software engineering (ASE) (pp. 189–198). IEEE.Google Scholar
  19. Hollander, M., Wolfe, D.A., & Chicken, E. (2013). Nonparametric statistical methods, 3rd edn. Wiley.Google Scholar
  20. Jorgensen, M. (1995). Experience with the accuracy of software maintenance task effort prediction models. IEEE Transactions on Software Engineering, 21(8), 674–681.CrossRefGoogle Scholar
  21. Kamei, Y., Shihab, E., Adams, B., Hassan, A.E., Mockus, A., Sinha, A., & Ubayashi, N. (2013). A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 39(6), 757–773.CrossRefGoogle Scholar
  22. Khomh, F., Chan, B., Zou, Y., & Hassan, A.E. (2011). An entropy evaluation approach for triaging field crashes: a case study of Mozilla Firefox. In Proceedings of the 18th working conference on reverse engineering (WCRE) (pp. 261–270). IEEE.Google Scholar
  23. Kim, D., Wang, X., Kim, S., Zeller, A., Cheung, S.-C., & Park, S. (2011). Which crashes should I fix first?: predicting top crashes at an early stage to prioritize debugging efforts. IEEE Transactions on Software Engineering, 37(3), 430–447.CrossRefGoogle Scholar
  24. Kim, M., Zimmermann, T., & Nagappan, N. (2012). A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering (p. 50). ACM.Google Scholar
  25. Kim, S., Whitehead, E.J. Jr, & Zhang, Y. (2008). Classifying software changes: clean or buggy? IEEE Transactions on Software Engineering, 34(2), 181–196.CrossRefGoogle Scholar
  26. Kim, S., Zhang, H., Wu, R., & Gong, L. (2011). Dealing with noise in defect prediction. In Proceedings of the 33rd international conference on software engineering (ICSE) (pp. 481–490). IEEE.Google Scholar
  27. Kim, S., Zimmermann, T., Pan, K., & Whitehead, E.J. Jr (2006). Automatic identification of bug-introducing changes. In Proceedings of the 21st international conference on automated software engineering (ASE) (pp. 81–90). IEEE.Google Scholar
  28. Kononenko, O., Baysal, O., & Godfrey, M.W. (2016). Code review quality: how developers see it. In Proceedings of the 38th international conference on software engineering (ICSE) (pp. 1028–1038). ACM.Google Scholar
  29. Misirli, A.T., Shihab, E., & Kamei, Y. (2015). Studying high impact fix-inducing changes. Empirical Software Engineering, 1–37.Google Scholar
  30. Moser, R., Pedrycz, W., & Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on software engineering (ICSE) (pp. 181–190). IEEE.Google Scholar
  31. Mozilla’s code quality statistics (2016). https://metrics.mozilla.com/code-quality/#all, online; accessed September 12th, 2016.
  32. Mozilla’s community statistics (2016). https://wiki.mozilla.org/Community, online; accessed September 12th, 2016.
  33. Nagappan, N., & Ball, T. (2005). Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on software engineering (ICSE) (pp. 284–292). IEEE.Google Scholar
  34. National Institute of Standards & Technology (2002). The economic impacts of inadequate infrastructure for software testing. US Dept of Commerce.Google Scholar
  35. Parnas, D.L. (1994). Software aging. In Proceedings of the 16th international conference on software engineering (ICSE) (pp. 279–287). IEEE Computer Society Press.Google Scholar
  36. Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., & Wang, B. (2003). Automated support for classifying software failure reports. In Proceedings of the 25th international conference on software engineering (ICSE) (pp. 465–475). IEEE.Google Scholar
  37. Rish, I. (2001). An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, no. 22 (pp. 41–46). IBM.Google Scholar
  38. Rogerson, P.A. (2010). Statistical methods for geography: a student’s guide. Sage Publications.Google Scholar
  39. Romo, B.A., Capiluppi, A., & Hall, T. (2014). Filling the gaps of development logs and bug issue data. In Proceedings of the international symposium on open collaboration (p. 8). ACM.Google Scholar
  40. Shannon, C.E. (2001). A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev., 5, 3–55. doi: 10.1145/584091.584093.CrossRefGoogle Scholar
  41. Shihab, E., Ihara, A., Kamei, Y., Ibrahim, W.M., Ohira, M., Adams, B., Hassan, A.E., & Matsumoto, K.-i. (2013). Studying re-opened bugs in open source software. Empirical Software Engineering, 18(5), 1005–1042.CrossRefGoogle Scholar
  42. Śliwerski, J., Zimmermann, T., & Zeller, A. (2005). When do changes induce fixes? ACM sigsoft software engineering notes, no. 4 (pp. 1–5). ACM.Google Scholar
  43. Socorro (2015). Mozilla’s crash reporting system, https://crash-stats.mozilla.com/home/products/Firefox, online; accessed June 13th, 2015.
  44. SrcML (2015). http://www.srcml.org, online; accessed June 13th, 2015.
  45. Understand static code analysis tool (2015). https://scitools.com, online; accessed June 13th, 2015.
  46. Wang, S., Khomh, F., & Zou, Y. (2014). Improving bug management using correlations in crash reports. Empirical Software Engineering, 1–31.Google Scholar
  47. Williams, C., & Spacco, J. (2008). SZZ revisited: verifying when changes induce fixes. In Proceedings of the 2008 workshop on defects in large software systems (pp. 32–36). ACM.Google Scholar
  48. Wu, R. (2014). Diagnose crashing faults on production software. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering (pp. 771–774). ACM.Google Scholar
  49. Yin, R.K. (2002). Case study research: design and methods, 3rd edn., SAGE Publications,Google Scholar
  50. Zimmermann, T., Nagappan, N., Guo, P.J., & Murphy, B. (2012). Characterizing and predicting which bugs get reopened. In Proceedings of the 34th international conference on software engineering (ICSE) (pp. 1074–1083). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.SWAT Lab, DGIGLPolytechnique MontréalMontréalCanada
  2. 2.PTIDEJ Team, DGIGLPolytechnique MontréalMontréalCanada

Personalised recommendations