Advertisement

EQ-Mine: Predicting Short-Term Defects for Software Evolution

  • Jacek Ratzinger
  • Martin Pinzger
  • Harald Gall
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4422)

Abstract

We use 63 features extracted from sources such as versioning and issue tracking systems to predict defects in short time frames of two months. Our multivariate approach covers aspects of software projects such as size, team structure, process orientation, complexity of existing solution, difficulty of problem, coupling aspects, time constrains, and testing data. We investigate the predictability of several severities of defects in software projects. Are defects with high severity difficult to predict? Are prediction models for defects that are discovered by internal staff similar to models for defects reported from the field?

We present both an exact numerical prediction of future defect numbers based on regression models as well as a classification of software components as defect-prone based on the C4.5 decision tree. We create models to accurately predict short-term defects in a study of 5 applications composed of more than 8.000 classes and 700.000 lines of code. The model quality is assessed based on 10-fold cross validation.

Keywords

Software Evolution Defect Density Quality Prediction Machine Learning Regression Classification 

References

  1. 1.
    Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–689 (1999)CrossRefGoogle Scholar
  2. 2.
    Knab, P., Pinzger, M., Bernstein, A.: Predicting defect densities in source code files with decision tree learners. In: Proceedings of the International Workshop on Mining Software Repositories, Shanghai, China, May 2006, pp. 119–125. ACM Press, New York (2006)CrossRefGoogle Scholar
  3. 3.
    Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the International Conference on Software Engineering, St. Louis, MO, USA, May 2005, pp. 284–292 (2005)Google Scholar
  4. 4.
    Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proceedings of the International Symposium on Software Testing and Analysis, Rome, Italy, July 2002, pp. 55–64 (2002)Google Scholar
  5. 5.
    Schröter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, September 2006, pp. 18–27 (2006)Google Scholar
  6. 6.
    Wagner, S., Jürjens, J., Koller, C., Trischberger, P.: Comparing Bug Finding Tools with Reviews and Tests. In: Khendek, F., Dssouli, R. (eds.) TestCom 2005. LNCS, vol. 3502, pp. 40–55. Springer, Heidelberg (2005)Google Scholar
  7. 7.
    Khoshgoftaar, T.M., Yuan, X., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Uncertain classification of fault-prone software modules. Empirical Software Engineering 7(4), 297–318 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Briand, L.C., Basili, V.R., Thomas, W.M.: A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering 18(11), 931–942 (1992)CrossRefGoogle Scholar
  9. 9.
    Nikora, A.P., Munson, J.C.: Developing fault predictors for evolving software systems. In: Proceedings of the Software Metrics Symposium, Sydney, Australia, September 2003, pp. 338–350 (2003)Google Scholar
  10. 10.
    Shirabad, J.S., Lethbridge, T.C., Matwin, S.: Mining the maintenance history of a legacy software system. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, The Netherlands, September 2003, pp. 95–104 (2003)Google Scholar
  11. 11.
    Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, Netherlands, September 2003, pp. 23–32. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  12. 12.
    Moeller, K.H., Paulish, D.: An empirical investigation of software fault distribution. In: Proceedings of the International Software Metrics Symposium, pp. 82–90 (1993)Google Scholar
  13. 13.
    Hatton, L.: Re-examining the fault density-component size connection. IEEE Software 14(2), 89–98 (1997)CrossRefGoogle Scholar
  14. 14.
    Lehman, M.M., Belady, L.A.: Program Evolution - Process of Software Change. Academic Press, London (1985)Google Scholar
  15. 15.
    Gall, H., Jazayeri, M., Ratzinger (former Krajewski), J.: CVS release history data for detecting logical couplings. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2003, pp. 13–23. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  17. 17.
    Ratzinger, J., Fischer, M., Gall, H.: Evolens: Lens-view visualizations of evolution data. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2005, pp. 103–112 (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jacek Ratzinger
    • 1
  • Martin Pinzger
    • 2
  • Harald Gall
    • 2
  1. 1.Distributed Systems Group, Vienna University of TechnologyAustria
  2. 2.s.e.a.l. – software evolution and architecture lab, University of ZurichSwitzerland

Personalised recommendations