Skip to main content

Increasing the Prediction Quality of Software Defective Modules with Automatic Feature Engineering

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 738))

Abstract

This paper reviews the main concepts related to software testing, its difficulties and the impossibility of a complete software test. Then, it proposes an approach to predict which module is defective, aiming to assure the usually limited software test resources will be wisely distributed to maximize the coverage of the modules most prone to defects. The used approach employs the recently proposed Kaizen Programming (KP) to automatically discover high-quality nonlinear combinations of the original features of a database to be used by the classification technique, replacing a human in the feature engineering process. Using a NASA open dataset with Software metrics of over 9500 modules, the experimental analysis shows that the new features can significantly boost the detection of detective modules, allowing testers to find 216% more defects than with a random module selection; this is also an improvement of 1% when compared to the original features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. B. Broekman, E. Notenboom, Testing embedded software (Pearson Education, 2003)

    Google Scholar 

  2. T.R. Moreira Filho, E. Rios, Projeto & engenharia de software: teste de software (Alta Books, Rio de Janeiro, 2003)

    Google Scholar 

  3. H. Reza, K. Ogaard, A. Malge, A model based testing technique to test web applications using statecharts, in Fifth International Conference on Information Technology: New Generations (ITNG) (2008), pp. 183–188

    Google Scholar 

  4. B. Beizer, Software testing techniques (New York, 1990)

    Google Scholar 

  5. S. Planning, The economic impacts of inadequate infrastructure for software testing (2002)

    Google Scholar 

  6. C. Inthurn, Qualidade & teste de software. Florianóp. Vis. (2001)

    Google Scholar 

  7. K. Li, M. Wu, Effective Software Test Automation: Developing an Automated Software Testing Tool (Wiley, 2006)

    Google Scholar 

  8. J. Li, P. He, J. Zhu, M.R. Lyu, Software Defect Prediction via Convolutional Neural Network, in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) (2017), pp. 318–328

    Google Scholar 

  9. K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81, 649–660 (2008)

    Article  Google Scholar 

  10. H. Zhang, X. Zhang, M. Gu, Predicting defective software components from code complexity measures, in 13th Pacific Rim International Symposium on Dependable Computing (PRDC) (2007), pp. 93–96

    Google Scholar 

  11. T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)

    Article  Google Scholar 

  12. E. Arisholm, L.C. Briand, E.B. Johannessen, A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)

    Article  Google Scholar 

  13. G. Mauša, T.G. Grbac, Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study. Appl. Soft Comput. 55, 331–351 (2017)

    Article  Google Scholar 

  14. T.G. Grbac, P. Runeson, D. Huljenić, A second replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 39(4), 462–476 (2013)

    Article  Google Scholar 

  15. M.J. Ordonez, H.M. Haddad, The state of metrics in software industry, in Fifth International Conference on Information Technology: New Generations (ITNG) (2008), pp. 453–458

    Google Scholar 

  16. S.G. Shiva, L.A. Shala, Software reuse: Research and practice, in Fourth International Conference on Information Technology (ITNG’07) (2007), pp. 603–609

    Google Scholar 

  17. V.V. De Melo, Kaizen Programming, in Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (2014), pp. 895–902

    Google Scholar 

  18. V.V. de Melo, W. Banzhaf, Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid. Inf. Sci. 430, 287–313 (2018)

    Article  MathSciNet  Google Scholar 

  19. V.V. de Melo, W. Banzhaf, Improving the prediction of material properties of concrete using kaizen programming with simulated annealing. Neurocomputing 246, 25–44 (2017)

    Article  Google Scholar 

  20. V.V. de Melo, Breast cancer detection with logistic regression improved by features constructed by Kaizen programming in a hybrid approach, in 2016 IEEE Congress on Evolutionary Computation (CEC) (2016), pp. 16–23

    Google Scholar 

  21. V.V. de Melo, W. Banzhaf, Improving Logistic Regression Classification of Credit Approval with Features Constructed by Kaizen Programming, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (2016), pp. 61–62

    Google Scholar 

  22. V.V. de Melo, W. Banzhaf, Kaizen Programming for Feature Construction for Classification, in Genetic Programming Theory and Practice XIII (Springer, 2016), pp. 39–57

    Google Scholar 

  23. L.F.D.P. Sotto, R.C. Coelho, V.V de Melo, Classification of Cardiac Arrhythmia by Random Forests with Features Constructed by Kaizen Programming with Linear Genetic Programming, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (2016), pp. 813–820

    Google Scholar 

  24. L.F.D.P. Sotto, V.V. de Melo, Solving the Lawn Mower problem with Kaizen Programming and $λ$-Linear Genetic Programming for Module Acquisition, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (2016), pp. 113–114

    Google Scholar 

  25. B. Beizer, Software is different. Ann. Softw. Eng. 10(1), 293–310 (2000)

    Article  Google Scholar 

  26. L. Copeland, A practitioner’s guide to software test design (Artech House, 2004)

    Google Scholar 

  27. R.V. Binder, Testing object-oriented systems: models, patterns, and tools (Addison-Wesley Professional, 2000)

    Google Scholar 

  28. G.J. Myers, C. Sandler, T. Badgett, The art of software testing (Wiley, 2011)

    Google Scholar 

  29. M. Rätzmann, C. De Young, Software testing and internationalization (Lemoine International, Incorporated, 2003)

    Google Scholar 

  30. I. Burnstein, Practical software testing: A process-oriented approach (Springer Science & Business Media, 2006)

    Google Scholar 

  31. M. Fewster, D. Graham, Software test automation (Addison-Wesley Professional, 1999)

    Google Scholar 

  32. R.D. Craig, S.P. Jaskiel, Systematic software testing (Artech House, 2002)

    Google Scholar 

  33. M.L. Hutcheson, Software testing fundamentals: Methods and metrics (Wiley, 2003)

    Google Scholar 

  34. N.E. Fenton, M. Neil, A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)

    Article  Google Scholar 

  35. D. Bowes, T. Hall, J. Petrić, Software defect prediction: Do different classifiers find the same defects? Softw. Qual. J., 1–28 (2017)

    Google Scholar 

  36. P. Ranjan, S. Kumar, U. Kumar, Software fault prediction using computational intelligence techniques: A survey. Indian J. Sci. Technol. 10(18), 1–9 (2017)

    Article  Google Scholar 

  37. D. Radjenović, M. Heričko, R. Torkar, A. Živkovič, D. Radjenovic, Software fault prediction metrics: A systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)

    Article  Google Scholar 

  38. T. Menzies, J. DiStefano, A. Orrego, R. Chapman, Assessing predictors of software defects, in Proceedings of Workshop on Predictive Software Models (2004)

    Google Scholar 

  39. Y. Zhou, H. Leung, Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. 80(8), 1349–1361 (2007)

    Article  Google Scholar 

  40. C. Chang, C. Chu, Y. Yeh, Integrating in-process software defect prediction with association mining to discover defect pattern. Inf. Softw. Technol. 51(2), 375–384 (2009)

    Article  Google Scholar 

  41. D. Rodriguez, J. Dolado, J. Tuya, Bayesian concepts in software testing: An initial review, in Proceedings of the 6th International Workshop on Automating Test Case Design, Selection and Evaluation (2015), pp. 41–46

    Google Scholar 

  42. Z. Ali, M.A. Mian, S. Shamail, Knowledge-based systems improving recall of software defect prediction models using association mining. Knowl. Based Syst. 90, 1–13 (2015)

    Article  Google Scholar 

  43. S.S. Rathore, S. Kumar, Towards an ensemble based system for predicting the number of software faults. Expert Syst. Appl. 82, 357–382 (2017)

    Article  Google Scholar 

  44. T. Menzies, J.S. Di Stefano, How good is your blind spot sampling policy, in Proceedings, Eighth IEEE International Symposium on High Assurance Systems Engineering (2004), pp. 129–138

    Google Scholar 

  45. L. Kumar, S. Misra, S. Ku, An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput. Stand. Interfaces 53(December 2016), 1–32 (2017)

    Article  Google Scholar 

  46. R. Moussa, D. Azar, A PSO-GA approach targeting fault-prone software modules. J. Syst. Softw. 132, 41–49 (2017)

    Article  Google Scholar 

  47. S.S. Rathore, S. Kumar, Knowledge-based systems linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl. Based Syst. 119, 232–256 (2017)

    Article  Google Scholar 

  48. M.J. Siers, Z. Islam, Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51, 62–71 (2015)

    Article  Google Scholar 

  49. L. Tian, A. Noore, Evolutionary neural network modeling for software cumulative failure time prediction. Reliab. Eng. Syst. Saf. 87(1), 45–51 (2005)

    Article  Google Scholar 

  50. C. Catal, B. Diri, Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179(8), 1040–1058 (2009)

    Article  Google Scholar 

  51. C. Andersson, P. Runeson, A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 33(5), 273 (2007)

    Article  Google Scholar 

  52. C. Catal, B. Diri, A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)

    Article  Google Scholar 

  53. V.U.B. Challagulla, F.B. Bastani, I. Yen, R.A. Paul, Empirical assessment of machine learning based software defect prediction techniques, in 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS) (2005), pp. 263–270

    Google Scholar 

  54. M. Imai, Kaizen (Ky’zen), the Key to Japan’s Competitive Success (McGraw-Hill, 1986)

    Google Scholar 

  55. H. Gitlow, S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality (Taylor & Francis, 1989)

    Google Scholar 

  56. T. Menzies, M. Shepperd et al., “jm1.” Dec 2004

    Google Scholar 

  57. D. Gray, D. Bowes, N. Davey, Y. Sun, B. Christianson, The misuse of the NASA metrics data program data sets for automated software defect prediction, in 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE) (2011), pp. 96–103

    Google Scholar 

  58. T.J. McCabe, A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)

    Google Scholar 

  59. T.J. McCabe, C.W. Butler, Design complexity measurement and testing. Commun. ACM 32(12), 1415–1425 (1989)

    Article  Google Scholar 

  60. M.H. Halstead, Toward a theoretical basis for estimating programming effort, in Proceedings of the 1975 Annual Conference (1975), pp. 222–224

    Google Scholar 

  61. J.E. Gaffney Jr, Metrics in software quality assurance, in Proceedings of the ACM’81 Conference (1981), pp. 126–130

    Google Scholar 

  62. F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, {DEAP}: Evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  63. S. Seabold, J. Perktold, Statsmodels: Econometric and statistical modeling with python, in Proceedings of the 9th Python in Science Conference, vol. 57 (2010), p. 61

    Google Scholar 

  64. Weka Machine Learning Project, Weka. University of Waikato

    Google Scholar 

  65. G. Holmes, A. Donkin, I.H. Witten, Weka: A machine learning workbench, in Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems (1994), pp. 357–361

    Google Scholar 

  66. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nascimento, A.M., de Melo, V.V., Dias, L.A.V., da Cunha, A.M. (2018). Increasing the Prediction Quality of Software Defective Modules with Automatic Feature Engineering. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77028-4_68

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77027-7

  • Online ISBN: 978-3-319-77028-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics