EQ-Mine: Predicting Short-Term Defects for Software Evolution
We use 63 features extracted from sources such as versioning and issue tracking systems to predict defects in short time frames of two months. Our multivariate approach covers aspects of software projects such as size, team structure, process orientation, complexity of existing solution, difficulty of problem, coupling aspects, time constrains, and testing data. We investigate the predictability of several severities of defects in software projects. Are defects with high severity difficult to predict? Are prediction models for defects that are discovered by internal staff similar to models for defects reported from the field?
We present both an exact numerical prediction of future defect numbers based on regression models as well as a classification of software components as defect-prone based on the C4.5 decision tree. We create models to accurately predict short-term defects in a study of 5 applications composed of more than 8.000 classes and 700.000 lines of code. The model quality is assessed based on 10-fold cross validation.
KeywordsSoftware Evolution Defect Density Quality Prediction Machine Learning Regression Classification
- 3.Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the International Conference on Software Engineering, St. Louis, MO, USA, May 2005, pp. 284–292 (2005)Google Scholar
- 4.Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proceedings of the International Symposium on Software Testing and Analysis, Rome, Italy, July 2002, pp. 55–64 (2002)Google Scholar
- 5.Schröter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, September 2006, pp. 18–27 (2006)Google Scholar
- 6.Wagner, S., Jürjens, J., Koller, C., Trischberger, P.: Comparing Bug Finding Tools with Reviews and Tests. In: Khendek, F., Dssouli, R. (eds.) TestCom 2005. LNCS, vol. 3502, pp. 40–55. Springer, Heidelberg (2005)Google Scholar
- 9.Nikora, A.P., Munson, J.C.: Developing fault predictors for evolving software systems. In: Proceedings of the Software Metrics Symposium, Sydney, Australia, September 2003, pp. 338–350 (2003)Google Scholar
- 10.Shirabad, J.S., Lethbridge, T.C., Matwin, S.: Mining the maintenance history of a legacy software system. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, The Netherlands, September 2003, pp. 95–104 (2003)Google Scholar
- 11.Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, Netherlands, September 2003, pp. 23–32. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
- 12.Moeller, K.H., Paulish, D.: An empirical investigation of software fault distribution. In: Proceedings of the International Software Metrics Symposium, pp. 82–90 (1993)Google Scholar
- 14.Lehman, M.M., Belady, L.A.: Program Evolution - Process of Software Change. Academic Press, London (1985)Google Scholar
- 15.Gall, H., Jazayeri, M., Ratzinger (former Krajewski), J.: CVS release history data for detecting logical couplings. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2003, pp. 13–23. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
- 17.Ratzinger, J., Fischer, M., Gall, H.: Evolens: Lens-view visualizations of evolution data. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2005, pp. 103–112 (2005)Google Scholar