Skip to main content

Cross Projects Defect Prediction Modeling

  • Chapter
  • First Online:

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 32))

Abstract

Software defect prediction has been much studied in the field of research in Software Engineering. Within project Software defect prediction works well as there is sufficient amount of data available to train any model. But rarely local training data of the projects is available for predictions. There are many public defect data repositories available from various organizations. This availability leads to the motivation for Cross projects defect prediction. This chapter cites on defect prediction using cross projects defect data. We proposed two experiments with cross projects homogeneous metric set data and within projects data on open source software projects with class level information. The machine learning models including the ensemble approaches are used for prediction. The class imbalance problem is addressed using oversampling techniques. An empirical analysis is carried out to validate the performance of the models. The results indicate that cross projects defect prediction with homogeneous metric sets are comparable to within project defect prediction with statistical significance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering (ICSE). Waikiki, Honolulu, HI, USA: ACM, 978-1-4503-0445

    Google Scholar 

  2. Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel distributed computing (SNPD), Aug 2012, pp 476–481

    Google Scholar 

  3. He Z, Peters F, Menzies T, Yang Y (2013) Learning from opensource projects: an empirical study on defect prediction. In: Proceedings of the 7th international symposium on empirical software engineering and measurement (ESEM)

    Google Scholar 

  4. Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the 6th IEEE international conference on software testing, verification and validation (ICST)

    Google Scholar 

  5. Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068

    Article  Google Scholar 

  6. Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 508–519 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786814

  7. Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous crosscompany defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 496–507 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786813

  8. Herbold S (2017) A systematic mapping study on cross-project defect prediction. CoRR, vol. abs/1705.06429, 2017 [Online]. Available: https://arxiv.org/abs/1705.06429

  9. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE). ACM

    Google Scholar 

  10. Jureczko M, Madeyski L (2015) Cross–project defect prediction with respect to code ownership model: an empirical study. e-Inform Softw Eng J 9:21–35

    Google Scholar 

  11. Khoshgoftaar TM, Rebours P, Seliya N (2008) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49 [Online]. Available: http://dx.doi.org/10.1007/s11219-008-9058-3

  12. Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE). ACM

    Google Scholar 

  13. Turhan B, Tosun A, Bener A (2011) Empirical evaluation of mixed-project defect prediction models. In: 2011 37th EUROMICRO conference on software engineering and advanced applications (SEAA), pp 396–403

    Google Scholar 

  14. Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578

    Article  Google Scholar 

  15. Liu Y, Khoshgoftaar T, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864

    Article  Google Scholar 

  16. Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society

    Google Scholar 

  17. Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834

    Article  Google Scholar 

  18. Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society

    Google Scholar 

  19. Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71 [Online]. Available: http://dx.doi.org/10.1007/s10664-014-9346-4

  20. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256

    Article  Google Scholar 

  21. Peters F, Menzies T, Layman L (2015) LACE2: better privacy preserving data sharing for cross project defect prediction. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 801–811

    Google Scholar 

  22. Zhang Y, Lo D, Xia X, Sun J (2015) An empirical study of classifier combination for cross-project defect prediction. In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol 2, pp 264–269

    Google Scholar 

  23. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  24. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S002200009791504X

  25. Xia X et al (2016) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998

    Article  Google Scholar 

  26. Zhang F et al (2016) Cross-project defect prediction using a connectivity- based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering. ACM

    Google Scholar 

  27. Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’union fait la force. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering and reverse engineering (CSMRWCRE), pp 164–173

    Google Scholar 

  28. Hall MA (1998) Correlation-based feature subset selection for machine learning. Ph.D. dissertation, University of Waikato, Hamilton, New Zealand

    Google Scholar 

  29. Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720

    Article  Google Scholar 

  30. Singh P, Verma S, Vyas OP (2013) Article: cross company and within company fault prediction using object oriented metrics. Int J Comput Appl 74(8):5–11 (full text available)

    Article  Google Scholar 

  31. Amasaki S, Kawata K, Yokogawa T (2015) Improving crossproject defect prediction methods with data simplification. In: 2015 41st Euromicro conference on software engineering and advanced applications (SEAA), Aug 2015, pp 96–103

    Google Scholar 

  32. He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199

    Article  Google Scholar 

  33. Raman B, Ioerger TR (2003) Enhancing learning using feature and example selection. Technical Report, Department of Computer Science, Texas A&M University

    Google Scholar 

  34. Ryu D, Jang J-I, Baik J (2015a) A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, pp 1–38 [Online]. Available: http://dx.doi.org/10.1007/s11219-015-9287-1

  35. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). ACM, pp 91–100

    Google Scholar 

  36. Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI), July 2015, pp 2–7

    Google Scholar 

  37. Nam J, Pan S, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), pp 382–391

    Google Scholar 

  38. Ryu D, Jang J-I, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980 [Online]. Available: http://dx.doi.org/10.1007/s11390-015-1575-5

  39. Nam J, Kim S (2015) Clami: defect prediction on unlabeled datasets. In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), Nov 2015, pp 452–463

    Google Scholar 

  40. Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: Proceedings of the 10th working conference on mining software repositories, series MSR ’13. IEEE Press, Piscataway, NJ, USA, pp 409–418 [Online]. Available: http://dl.acm.org/citation.cfm?id=2487085.2487161

  41. Mizuno O, Hirata Y (2014) A cross-project evaluation of textbased fault-prone module prediction. In: 2014 6th international workshop on empirical software engineering in practice (IWESEP), Nov 2014, pp 43–48

    Google Scholar 

  42. Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations software engineering (FSE). ACM

    Google Scholar 

  43. Turhan B, Misirli AT, Bener A (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118 [Online]. Available: http://doi.org/10.1016/j.infsof.2012.10.003

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonam Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Goel, L., Gupta, S. (2020). Cross Projects Defect Prediction Modeling. In: Hemanth, J., Bhatia, M., Geman, O. (eds) Data Visualization and Knowledge Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-030-25797-2_1

Download citation

Publish with us

Policies and ethics