Abstract
This paper reviews the main concepts related to software testing, its difficulties and the impossibility of a complete software test. Then, it proposes an approach to predict which module is defective, aiming to assure the usually limited software test resources will be wisely distributed to maximize the coverage of the modules most prone to defects. The used approach employs the recently proposed Kaizen Programming (KP) to automatically discover high-quality nonlinear combinations of the original features of a database to be used by the classification technique, replacing a human in the feature engineering process. Using a NASA open dataset with Software metrics of over 9500 modules, the experimental analysis shows that the new features can significantly boost the detection of detective modules, allowing testers to find 216% more defects than with a random module selection; this is also an improvement of 1% when compared to the original features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
B. Broekman, E. Notenboom, Testing embedded software (Pearson Education, 2003)
T.R. Moreira Filho, E. Rios, Projeto & engenharia de software: teste de software (Alta Books, Rio de Janeiro, 2003)
H. Reza, K. Ogaard, A. Malge, A model based testing technique to test web applications using statecharts, in Fifth International Conference on Information Technology: New Generations (ITNG) (2008), pp. 183–188
B. Beizer, Software testing techniques (New York, 1990)
S. Planning, The economic impacts of inadequate infrastructure for software testing (2002)
C. Inthurn, Qualidade & teste de software. Florianóp. Vis. (2001)
K. Li, M. Wu, Effective Software Test Automation: Developing an Automated Software Testing Tool (Wiley, 2006)
J. Li, P. He, J. Zhu, M.R. Lyu, Software Defect Prediction via Convolutional Neural Network, in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) (2017), pp. 318–328
K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81, 649–660 (2008)
H. Zhang, X. Zhang, M. Gu, Predicting defective software components from code complexity measures, in 13th Pacific Rim International Symposium on Dependable Computing (PRDC) (2007), pp. 93–96
T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
E. Arisholm, L.C. Briand, E.B. Johannessen, A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)
G. Mauša, T.G. Grbac, Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study. Appl. Soft Comput. 55, 331–351 (2017)
T.G. Grbac, P. Runeson, D. Huljenić, A second replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 39(4), 462–476 (2013)
M.J. Ordonez, H.M. Haddad, The state of metrics in software industry, in Fifth International Conference on Information Technology: New Generations (ITNG) (2008), pp. 453–458
S.G. Shiva, L.A. Shala, Software reuse: Research and practice, in Fourth International Conference on Information Technology (ITNG’07) (2007), pp. 603–609
V.V. De Melo, Kaizen Programming, in Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (2014), pp. 895–902
V.V. de Melo, W. Banzhaf, Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid. Inf. Sci. 430, 287–313 (2018)
V.V. de Melo, W. Banzhaf, Improving the prediction of material properties of concrete using kaizen programming with simulated annealing. Neurocomputing 246, 25–44 (2017)
V.V. de Melo, Breast cancer detection with logistic regression improved by features constructed by Kaizen programming in a hybrid approach, in 2016 IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 16–23
V.V. de Melo, W. Banzhaf, Improving Logistic Regression Classification of Credit Approval with Features Constructed by Kaizen Programming, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (2016), pp. 61–62
V.V. de Melo, W. Banzhaf, Kaizen Programming for Feature Construction for Classification, in Genetic Programming Theory and Practice XIII (Springer, 2016), pp. 39–57
L.F.D.P. Sotto, R.C. Coelho, V.V de Melo, Classification of Cardiac Arrhythmia by Random Forests with Features Constructed by Kaizen Programming with Linear Genetic Programming, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (2016), pp. 813–820
L.F.D.P. Sotto, V.V. de Melo, Solving the Lawn Mower problem with Kaizen Programming and $λ$-Linear Genetic Programming for Module Acquisition, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (2016), pp. 113–114
B. Beizer, Software is different. Ann. Softw. Eng. 10(1), 293–310 (2000)
L. Copeland, A practitioner’s guide to software test design (Artech House, 2004)
R.V. Binder, Testing object-oriented systems: models, patterns, and tools (Addison-Wesley Professional, 2000)
G.J. Myers, C. Sandler, T. Badgett, The art of software testing (Wiley, 2011)
M. Rätzmann, C. De Young, Software testing and internationalization (Lemoine International, Incorporated, 2003)
I. Burnstein, Practical software testing: A process-oriented approach (Springer Science & Business Media, 2006)
M. Fewster, D. Graham, Software test automation (Addison-Wesley Professional, 1999)
R.D. Craig, S.P. Jaskiel, Systematic software testing (Artech House, 2002)
M.L. Hutcheson, Software testing fundamentals: Methods and metrics (Wiley, 2003)
N.E. Fenton, M. Neil, A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)
D. Bowes, T. Hall, J. Petrić, Software defect prediction: Do different classifiers find the same defects? Softw. Qual. J., 1–28 (2017)
P. Ranjan, S. Kumar, U. Kumar, Software fault prediction using computational intelligence techniques: A survey. Indian J. Sci. Technol. 10(18), 1–9 (2017)
D. Radjenović, M. Heričko, R. Torkar, A. Živkovič, D. Radjenovic, Software fault prediction metrics: A systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)
T. Menzies, J. DiStefano, A. Orrego, R. Chapman, Assessing predictors of software defects, in Proceedings of Workshop on Predictive Software Models (2004)
Y. Zhou, H. Leung, Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. 80(8), 1349–1361 (2007)
C. Chang, C. Chu, Y. Yeh, Integrating in-process software defect prediction with association mining to discover defect pattern. Inf. Softw. Technol. 51(2), 375–384 (2009)
D. Rodriguez, J. Dolado, J. Tuya, Bayesian concepts in software testing: An initial review, in Proceedings of the 6th International Workshop on Automating Test Case Design, Selection and Evaluation (2015), pp. 41–46
Z. Ali, M.A. Mian, S. Shamail, Knowledge-based systems improving recall of software defect prediction models using association mining. Knowl. Based Syst. 90, 1–13 (2015)
S.S. Rathore, S. Kumar, Towards an ensemble based system for predicting the number of software faults. Expert Syst. Appl. 82, 357–382 (2017)
T. Menzies, J.S. Di Stefano, How good is your blind spot sampling policy, in Proceedings, Eighth IEEE International Symposium on High Assurance Systems Engineering (2004), pp. 129–138
L. Kumar, S. Misra, S. Ku, An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput. Stand. Interfaces 53(December 2016), 1–32 (2017)
R. Moussa, D. Azar, A PSO-GA approach targeting fault-prone software modules. J. Syst. Softw. 132, 41–49 (2017)
S.S. Rathore, S. Kumar, Knowledge-based systems linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl. Based Syst. 119, 232–256 (2017)
M.J. Siers, Z. Islam, Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51, 62–71 (2015)
L. Tian, A. Noore, Evolutionary neural network modeling for software cumulative failure time prediction. Reliab. Eng. Syst. Saf. 87(1), 45–51 (2005)
C. Catal, B. Diri, Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179(8), 1040–1058 (2009)
C. Andersson, P. Runeson, A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 33(5), 273 (2007)
C. Catal, B. Diri, A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
V.U.B. Challagulla, F.B. Bastani, I. Yen, R.A. Paul, Empirical assessment of machine learning based software defect prediction techniques, in 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS) (2005), pp. 263–270
M. Imai, Kaizen (Ky’zen), the Key to Japan’s Competitive Success (McGraw-Hill, 1986)
H. Gitlow, S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality (Taylor & Francis, 1989)
T. Menzies, M. Shepperd et al., “jm1.” Dec 2004
D. Gray, D. Bowes, N. Davey, Y. Sun, B. Christianson, The misuse of the NASA metrics data program data sets for automated software defect prediction, in 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE) (2011), pp. 96–103
T.J. McCabe, A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
T.J. McCabe, C.W. Butler, Design complexity measurement and testing. Commun. ACM 32(12), 1415–1425 (1989)
M.H. Halstead, Toward a theoretical basis for estimating programming effort, in Proceedings of the 1975 Annual Conference (1975), pp. 222–224
J.E. Gaffney Jr, Metrics in software quality assurance, in Proceedings of the ACM’81 Conference (1981), pp. 126–130
F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, {DEAP}: Evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
S. Seabold, J. Perktold, Statsmodels: Econometric and statistical modeling with python, in Proceedings of the 9th Python in Science Conference, vol. 57 (2010), p. 61
Weka Machine Learning Project, Weka. University of Waikato
G. Holmes, A. Donkin, I.H. Witten, Weka: A machine learning workbench, in Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems (1994), pp. 357–361
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nascimento, A.M., de Melo, V.V., Dias, L.A.V., da Cunha, A.M. (2018). Increasing the Prediction Quality of Software Defective Modules with Automatic Feature Engineering. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_68
Download citation
DOI: https://doi.org/10.1007/978-3-319-77028-4_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77027-7
Online ISBN: 978-3-319-77028-4
eBook Packages: EngineeringEngineering (R0)