An Empirical Study of Predictive Modeling Techniques of Software Quality

Khoshgoftaar, Taghi M.; Gao, Kehan; Napolitano, Amri

doi:10.1007/978-3-642-32615-8_29

An Empirical Study of Predictive Modeling Techniques of Software Quality

Taghi M. Khoshgoftaar¹⁷,
Kehan Gao¹⁸ &
Amri Napolitano¹⁷

Conference paper

1157 Accesses
3 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 87))

Abstract

The primary goal of software quality engineering is to apply various techniques and processes to produce a high quality software product. One strategy is applying data mining techniques to software metrics and defect data collected during the software development process to identify the potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict program modules (instances) as fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets and classification models are built with five different learners. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs better than or similar to the best performer of the six commonly used techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berenson, M.L., Goldstein, M., Levine, D.: Intermediate Statistical Methods and Applications: A Computer Package Approach, 2nd edn. Prentice-Hall (1983)
Google Scholar
Chen, Z., Menzies, T., Port, D., Boehm, B.: Finding the right data for software cost modeling. IEEE Software 22(6), 38–46 (2005)
Article Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics for defect prediction: An investigation on feature selection techniques. Software: Practice and Experience. Special Issue: Practical Aspects of Search-Based Software Engineering 41(5), 579–606 (2011), doi:10.1002/spe.1043
Google Scholar
Goh, L., Song, Q., Kasabov, N.: A novel feature selection method to improve classification of gene expression data. In: Chen, Y.P. (ed.) Proceedings of the Second Conference on Asia-Pacific Bioinformatics, pp. 161–166. Australian Computer Society, Darlinghurst (2004)
Google Scholar
Güler, İ., íbeyli, E.D.: Feature saliency using signal-to-noise ratios in automated diagnostic systems developed for doppler ultrasound signals. Engineering Applications of Artificial Intelligence 19(1), 53–63 (2006)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transaction on Knowledge and Data Engineering 15(6), 1437–1447 (2003)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall (1998)
Google Scholar
Ilczuk, G., Mlynarski, R.: W Kargul, and A Wakulicz-Deja. New feature selection methods for qualification of the patients for cardiac pacemaker implantation. Computers in Cardiology 34(2-3), 423–426 (2007)
Google Scholar
Jiang, Y., Lin, J., Cukic, B., Menzies, T.: Variance analysis in software fault prediction models. In: Proceedings of the 20th IEEE International Symposium on Software Reliability Engineering, Bangalore, Mysore, India, November 16-19, pp. 99–108 (2009)
Google Scholar
Khoshgoftaar, T.M., Bullard, L.A., Gao, K.: Attribute selection using rough sets in software quality classification. International Journal of Reliability, Quality and Safty Engineering 16(1), 73–89 (2009)
Article Google Scholar
Khoshgoftaar, T.M., Gao, K.: Feature selection with imbalanced data for software defect prediction. In: Proceedings of the 8th International Conference on Machine Learning and Applications, Miami, Florida, USA, December 13-15, pp. 235–240 (2009)
Google Scholar
Khoshgoftaar, T.M., Golawala, M., Van Hulse, J.: An empirical study of learning from imbalanced data using random forest. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, USA, vol. 2, pp. 310–317 (2007)
Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Google Scholar
Koru, A.G., Zhang, D., El Emam, K., Liu, H.: An investigation into the functional form of the size-defect relationship for software modules. IEEE Transactions on Software Engineering 35(2), 293–304 (2009)
Article Google Scholar
Lakshmi, K., Mukherjee, D.S.: An improved feature selection using maximized signal to noise ratio technique for tc. In: Proceedings of the Third international Conference on information Technology: New Generations, pp. 541–546. IEEE Computer Society Press, Washington, DC (2006)
Google Scholar
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering 34(4), 485–496 (2008)
Article Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(1), 2–13 (2007)
Article Google Scholar
Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: Detecting fault modules applying feature selection to classifiers. In: Proceedings of 8th IEEE International Conference on Information Reuse and Integration, Las Vegas, Nevada, August 13-15, pp. 667–672 (2007)
Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust Feature Selection Using Ensemble Feature Selection Techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Chapter Google Scholar
Shawe-Taylor, J., Cristianini, N.: Support Vector Machines, 2nd edn. Cambridge University Press (2000)
Google Scholar
Wang, H., Khoshgoftaar, T.M., Gao, K., Seliya, N.: Mining data from multiple software development projects. In: Proceedings of 2009 IEEE International Conference on Data Mining Workshops, Miami, Florida, USA, December 6, pp. 551–557 (2009)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann (2005)
Google Scholar
Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B., Wesslen, A.: Experimentation in Software Engineering: An Introduction. Kluwer International Series in Software Engineering. Software Engineering. Kluwer Academic Publishers, Boston (2000)
Book MATH Google Scholar
Yang, C.-H., Huang, C.-C., Wu, K.-C., Chang, H.-Y.: A Novel GA-Taguchi-Based Feature Selection Method. In: Fyfe, C., Kim, D., Lee, S.-Y., Yin, H. (eds.) IDEAL 2008. LNCS, vol. 5326, pp. 112–119. Springer, Heidelberg (2008)
Chapter Google Scholar
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the 29th International Conference on Software Engineering Workshops, p. 76. IEEE Computer Society Press, Washington, DC (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Florida Atlantic University, Boca Raton, FL, 33431, USA
Taghi M. Khoshgoftaar & Amri Napolitano
Eastern Connecticut State University, Willimantic, CT, 06226, USA
Kehan Gao

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Kehan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Amri Napolitano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, 100 Morrissey Blvd, 02125, Boston, MA, USA
Junichi Suzuki
Graduate School of Engineering, 2-1 Yamada-oka, Suita, 565-0871, Osaka, Japan
Tadashi Nakano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khoshgoftaar, T.M., Gao, K., Napolitano, A. (2012). An Empirical Study of Predictive Modeling Techniques of Software Quality. In: Suzuki, J., Nakano, T. (eds) Bio-Inspired Models of Network, Information, and Computing Systems. BIONETICS 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32615-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-32615-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32614-1
Online ISBN: 978-3-642-32615-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics