Abstract
Prediction models are built with various machine learning algorithms to identify defects prior to release to facilitate software testing, and save testing costs. Naïve Bayes classifier is one of the best performing classification techniques in defect prediction. It assumes conditional independence of features and for defect prediction problem some of the features are not actually conditionally independent. The interesting problem is to relax these conditional independence assumptions and to check whether there is any improvement in performance of classifiers. We have built Bayesian Network structures using different classes of algorithms namely score-based, constraint-based and hybrid algorithms. We propose an approach to augment these Bayesian Network structures with class node. Bayesian Network classifiers along with Random Forests, Logistic Regression and Naïve Bayes classifiers are then evaluated using measures like AUC and H-measure. We observe that RSMAX2 and Grow-Shrink classifiers (after augmentation) perform consistently better in defect prediction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Anagnostopoulos, C.: Measuring classification performance: the hmeasure package (2012)
Cheng, J., Greiner, R.: Comparing bayesian network classifiers. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 101–108. Morgan Kaufmann Publishers Inc. (1999)
Cooper, G.F., Herskovits, E.: A bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41. IEEE (2010)
D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Softw. Eng. 17(4–5), 531–577 (2012)
Dejaeger, K., Verbraken, T., Baesens, B.: Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans. Software Eng. 39(2), 237–257 (2013)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Fenton, N., Neil, M., Marquez, D.: Using bayesian networks to predict software defects and reliability. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 222(4), 701–712 (2008)
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Software Eng. 25(5), 675–689 (1999)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Software Eng. 26(7), 653–661 (2000)
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009)
Hand, D.J., Anagnostopoulos, C.: A better beta for the h measure of classification performance. Pattern Recogn. Lett. 40, 41–46 (2014)
Hassan, A.E.: Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, pp. 78–88. IEEE Computer Society (2009)
Hatton, L.: Reexamining the fault density-component size connection. IEEE Softw. 14(2), 89 (1997)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning (1993)
Jing, Y., Pavlović, V., Rehg, J.M.: Boosted bayesian network classifiers. Mach. Learn. 73(2), 155–184 (2008)
Keogh, E., Pazzani, M.: Learning augmented bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, pp. 225–230. Citeseer (1999)
Khoshgoftaar, T.M., Allen, E.B.: Ordering fault-prone software modules. Software Qual. J. 11(1), 19–37 (2003)
Korb, K.B., Nicholson, A.E.: Bayesian Artificial Intelligence. CRC Press, Boca Raton (2010)
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Software Eng. 34(4), 485–496 (2008)
Lipow, M.: Number of faults per line of code. IEEE Trans. Software Eng. 8(4), 437–439 (1982)
Margaritis, D.: Learning Bayesian network model structure from data. Ph.D. thesis, US Army (2003)
Menzies, T., Di Stefano, J.S., Chapman, M., McGill, K.: Metrics that matter. In: Proceedings 27th Annual NASA Goddard/IEEE Software Engineering Workshop, pp. 51–57. IEEE (2002)
Moser, R., Pedrycz, W., Succi, G.: Analysis of the reliability of a subset of change metrics for defect prediction. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 309–311. ACM (2008)
Muthukumaran, K., Rallapalli, A., Murthy, N.: Impact of feature selection techniques on bug prediction models. In: Proceedings of the 8th India Software Engineering Conference, pp. 120–129. ACM (2015)
Ohlsson, N., Alberg, H.: Predicting fault-prone software modules in telephone switches. IEEE Trans. Software Eng. 22(12), 886–894 (1996)
Okutan, A., Yıldız, O.T.: Software defect prediction using bayesian networks. Empirical Softw. Eng. 19(1), 154–181 (2014)
Russell, S., Norvig, P., Intelligence, A.: A Modern Approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs (1995). 25
Sacha, J.P., Goodenday, L.S., Cios, K.J.: Bayesian learning for cardiac spect image interpretation. Artif. Intell. Med. 26(1), 109–143 (2002)
Sahami, M.: Learning limited dependence bayesian classifiers. In: KDD, vol. 96, pp. 335–338 (1996)
Scutari, M.: Learning bayesian networks with the bnlearn r package. arXiv preprint (2009). arXiv:0908.3817
Shivaji, S., Whitehead, J.E.J., Akella, R., Kim, S.: Reducing features to improve bug prediction. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pp. 600–604. IEEE Computer Society (2009)
Stecklein, J.M., Dabney, J., Dick, B., Haskins, B., Lovell, R., Moroney, G.: Error cost escalation through the project life cycle (2004)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Wang, T., Li, W.H.: Naive bayes software defect prediction model. In: 2010 International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1–4 (2010)
Webb, G.I., Boughton, J.R., Wang, Z.: Not so naive bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor Models in Software Engineering, PROMISE 2007, ICSE Workshops 2007, p. 9. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Muthukumaran, K., Srinivas, S., Malapati, A., Neti, L.B.M. (2018). Software Defect Prediction Using Augmented Bayesian Networks. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-60618-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)