Cross Projects Defect Prediction Modeling

Goel, Lipika; Gupta, Sonam

doi:10.1007/978-3-030-25797-2_1

Cross Projects Defect Prediction Modeling

Lipika Goel⁵ &
Sonam Gupta⁵

Chapter
First Online: 10 August 2019

1064 Accesses
1 Citations

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 32))

Abstract

Software defect prediction has been much studied in the field of research in Software Engineering. Within project Software defect prediction works well as there is sufficient amount of data available to train any model. But rarely local training data of the projects is available for predictions. There are many public defect data repositories available from various organizations. This availability leads to the motivation for Cross projects defect prediction. This chapter cites on defect prediction using cross projects defect data. We proposed two experiments with cross projects homogeneous metric set data and within projects data on open source software projects with class level information. The machine learning models including the ensemble approaches are used for prediction. The class imbalance problem is addressed using oversampling techniques. An empirical analysis is carried out to validate the performance of the models. The results indicate that cross projects defect prediction with homogeneous metric sets are comparable to within project defect prediction with statistical significance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering (ICSE). Waikiki, Honolulu, HI, USA: ACM, 978-1-4503-0445
Google Scholar
Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel distributed computing (SNPD), Aug 2012, pp 476–481
Google Scholar
He Z, Peters F, Menzies T, Yang Y (2013) Learning from opensource projects: an empirical study on defect prediction. In: Proceedings of the 7th international symposium on empirical software engineering and measurement (ESEM)
Google Scholar
Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the 6th IEEE international conference on software testing, verification and validation (ICST)
Google Scholar
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Article Google Scholar
Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 508–519 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786814
Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous crosscompany defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 496–507 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786813
Herbold S (2017) A systematic mapping study on cross-project defect prediction. CoRR, vol. abs/1705.06429, 2017 [Online]. Available: https://arxiv.org/abs/1705.06429
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE). ACM
Google Scholar
Jureczko M, Madeyski L (2015) Cross–project defect prediction with respect to code ownership model: an empirical study. e-Inform Softw Eng J 9:21–35
Google Scholar
Khoshgoftaar TM, Rebours P, Seliya N (2008) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49 [Online]. Available: http://dx.doi.org/10.1007/s11219-008-9058-3
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE). ACM
Google Scholar
Turhan B, Tosun A, Bener A (2011) Empirical evaluation of mixed-project defect prediction models. In: 2011 37th EUROMICRO conference on software engineering and advanced applications (SEAA), pp 396–403
Google Scholar
Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578
Article Google Scholar
Liu Y, Khoshgoftaar T, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Article Google Scholar
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society
Google Scholar
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834
Article Google Scholar
Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society
Google Scholar
Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71 [Online]. Available: http://dx.doi.org/10.1007/s10664-014-9346-4
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
Peters F, Menzies T, Layman L (2015) LACE2: better privacy preserving data sharing for cross project defect prediction. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 801–811
Google Scholar
Zhang Y, Lo D, Xia X, Sun J (2015) An empirical study of classifier combination for cross-project defect prediction. In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol 2, pp 264–269
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S002200009791504X
Xia X et al (2016) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998
Article Google Scholar
Zhang F et al (2016) Cross-project defect prediction using a connectivity- based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering. ACM
Google Scholar
Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’union fait la force. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering and reverse engineering (CSMRWCRE), pp 164–173
Google Scholar
Hall MA (1998) Correlation-based feature subset selection for machine learning. Ph.D. dissertation, University of Waikato, Hamilton, New Zealand
Google Scholar
Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720
Article Google Scholar
Singh P, Verma S, Vyas OP (2013) Article: cross company and within company fault prediction using object oriented metrics. Int J Comput Appl 74(8):5–11 (full text available)
Article Google Scholar
Amasaki S, Kawata K, Yokogawa T (2015) Improving crossproject defect prediction methods with data simplification. In: 2015 41st Euromicro conference on software engineering and advanced applications (SEAA), Aug 2015, pp 96–103
Google Scholar
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199
Article Google Scholar
Raman B, Ioerger TR (2003) Enhancing learning using feature and example selection. Technical Report, Department of Computer Science, Texas A&M University
Google Scholar
Ryu D, Jang J-I, Baik J (2015a) A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, pp 1–38 [Online]. Available: http://dx.doi.org/10.1007/s11219-015-9287-1
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). ACM, pp 91–100
Google Scholar
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI), July 2015, pp 2–7
Google Scholar
Nam J, Pan S, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), pp 382–391
Google Scholar
Ryu D, Jang J-I, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980 [Online]. Available: http://dx.doi.org/10.1007/s11390-015-1575-5
Nam J, Kim S (2015) Clami: defect prediction on unlabeled datasets. In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), Nov 2015, pp 452–463
Google Scholar
Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: Proceedings of the 10th working conference on mining software repositories, series MSR ’13. IEEE Press, Piscataway, NJ, USA, pp 409–418 [Online]. Available: http://dl.acm.org/citation.cfm?id=2487085.2487161
Mizuno O, Hirata Y (2014) A cross-project evaluation of textbased fault-prone module prediction. In: 2014 6th international workshop on empirical software engineering in practice (IWESEP), Nov 2014, pp 43–48
Google Scholar
Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations software engineering (FSE). ACM
Google Scholar
Turhan B, Misirli AT, Bener A (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118 [Online]. Available: http://doi.org/10.1016/j.infsof.2012.10.003
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ajay Kumar Garg Engineering College, Ghaziabad, India
Lipika Goel & Sonam Gupta

Authors

Lipika Goel
View author publications
You can also search for this author in PubMed Google Scholar
Sonam Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonam Gupta .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering (ECE), Karunya University, Coimbatore, Tamil Nadu, India
Jude Hemanth
Amity University, Noida, Uttar Pradesh, India
Madhulika Bhatia
Department of Electrical Engineering and Computer Science, Ştefan cel Mare University of Suceava, Suceava, Romania
Oana Geman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Goel, L., Gupta, S. (2020). Cross Projects Defect Prediction Modeling. In: Hemanth, J., Bhatia, M., Geman, O. (eds) Data Visualization and Knowledge Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-030-25797-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-25797-2_1
Published: 10 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25796-5
Online ISBN: 978-3-030-25797-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics