Cross project defect prediction for open source software

  • Anushree AgrawalEmail author
  • Ruchika Malhotra
Original Research


Software defect prediction is the process of identification of defects early in the life cycle so as to optimize the testing resources and reduce maintenance efforts. Defect prediction works well if sufficient amount of data is available to train the prediction model. However, not always this is the case. For example, when the software is the first release or the company has not maintained significant data. In such cases, cross project defect prediction may identify the defective classes. In this work, we have studied the feasibility of cross project defect prediction and empirically validated the same. We conducted our experiments on 12 open source datasets. The prediction model is built using 12 software metrics. After studying the various train test combinations, we found that cross project defect prediction was feasible in 35 out of 132 cases. The success of prediction is determined via precision, recall and AUC of the prediction model. We have also analyzed 14 descriptive characteristics to construct the decision tree. The decision tree learnt from this data has 15 rules which describe the feasibility of successful cross project defect prediction.


Cross project Defect prediction Software characteristics 


  1. 1.
    Zimmermann T, Gall H, Giger E, Murphy B (2009) Cross-project defect predictionGoogle Scholar
  2. 2.
    Malhotra R, Agrawal A (2014) CMS tool. ACM SIGSOFT Softw. Eng. Notes 39(1):1–5CrossRefGoogle Scholar
  3. 3.
    Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418CrossRefGoogle Scholar
  4. 4.
    Gray R, Macdonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 5849(96):6–7Google Scholar
  5. 5.
    Mishra B, Shukla KK (2011) Impact of attribute selection on defect proneness prediction in OO software. In: 2011 2nd Int. Conf. Comput. Commun. Technol., pp 367–372Google Scholar
  6. 6.
    Chidamber Shyam R, Kemerer Chris F (1994) A Metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493CrossRefGoogle Scholar
  7. 7.
    He Z, Shu F, Yang Y, Li M, Wang Q (2011) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199CrossRefGoogle Scholar
  8. 8.
    Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256CrossRefGoogle Scholar
  9. 9.
    Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRefGoogle Scholar
  10. 10.
    Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luembourg, pp 252–261Google Scholar
  11. 11.
    Steffen H (2013) Training data selection for cross-project defect prediction.In: 9th International Conference on Predictive Models in Software Engineering, ACM, New York, USA, p 10Google Scholar
  12. 12.
    Ryu D, Choi O, Baik J (2014) Improving prediction robustness of VAB-SVM for cross-project defect prediction. In: IEEE 17th International Conference on Computational Science and Engineering, Chengdu, pp 994–999Google Scholar
  13. 13.
    Panichella R, Oliveto R, De Lucia A (2010) Cross-project defect prediction models: L’Union fait la force. Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerp, pp 164–173Google Scholar
  14. 14.
    Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: 41st Euromicro Conference on Software Engineering and Advanced Applications, Funchal, pp 96–103Google Scholar
  15. 15.
    Herbold S (2015) CrossPare: a tool for benchmarking cross-project defect predictions. In: 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), Lincoln, NE, pp 90–96Google Scholar
  16. 16.
    Satin RFP, Wiese IS, Ré R (2015) An exploratory study about the cross-project defect prediction: impact of using different classification algorithms and a measure of performance in building predictive models. In: Latin American Computing Conference (CLEI), Arequipa, pp 1–12Google Scholar
  17. 17.
    Zhang Y, Lo D, Xia X, Sun J (2015) An empirical study of classifier combination for cross-project defect prediction. In: IEEE 39th Annual Computer Software and Applications Conference, Taichung, pp 264–269Google Scholar
  18. 18.
    Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, pp 801–811Google Scholar
  19. 19.
    Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998CrossRefGoogle Scholar
  20. 20.
    Ryu D, Baik J (2016) Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl Soft Comput 49:1062–1077CrossRefGoogle Scholar
  21. 21.
    Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: IEEE/ACM 38th International Conference on Software Engineering (ICSE), Austin, TX, pp 309–320Google Scholar
  22. 22.
    Hosseini S, Turhan B, Mantyla M (2016) Search based training data selection for cross project defect prediction. In: The 12th International Conference on Predictive Models and Data Analytics in Software Engineering, ACM, New York, USA, p 10Google Scholar
  23. 23.
    Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empir Softw Eng 22(6):3186–3218CrossRefGoogle Scholar
  24. 24.
    Fei W et al. (2017) Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. In: IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), Buenos Aires, pp 195–197Google Scholar
  25. 25.
    Poon WN, Bennin KE, Huang J, Phannachitta P, Keung JW (2017) Cross-project defect prediction using a credibility theory based naive Bayes classifier. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, pp 434–441Google Scholar
  26. 26.
    Huang S, Wu Y, Ji H, Bai C (2017) A three-stage defect prediction model for cross-project defect prediction. In: International conference on dependable systems and their applications (DSA), Beijing, pp 169–169Google Scholar
  27. 27.
    Jing XY, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339CrossRefGoogle Scholar
  28. 28.
    Goel L, Damodaran D, Khatri SK, Sharma M (2017) A literature review on cross project defect prediction. In: 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), Mathura, pp 680–685Google Scholar
  29. 29. Accessed 10 Aug 2017
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.

Copyright information

© Bharati Vidyapeeth's Institute of Computer Applications and Management 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringDelhi Technological UniversityDelhiIndia

Personalised recommendations