Abstract
Software defect prediction has been much studied in the field of research in Software Engineering. Within project Software defect prediction works well as there is sufficient amount of data available to train any model. But rarely local training data of the projects is available for predictions. There are many public defect data repositories available from various organizations. This availability leads to the motivation for Cross projects defect prediction. This chapter cites on defect prediction using cross projects defect data. We proposed two experiments with cross projects homogeneous metric set data and within projects data on open source software projects with class level information. The machine learning models including the ensemble approaches are used for prediction. The class imbalance problem is addressed using oversampling techniques. An empirical analysis is carried out to validate the performance of the models. The results indicate that cross projects defect prediction with homogeneous metric sets are comparable to within project defect prediction with statistical significance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering (ICSE). Waikiki, Honolulu, HI, USA: ACM, 978-1-4503-0445
Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel distributed computing (SNPD), Aug 2012, pp 476–481
He Z, Peters F, Menzies T, Yang Y (2013) Learning from opensource projects: an empirical study on defect prediction. In: Proceedings of the 7th international symposium on empirical software engineering and measurement (ESEM)
Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the 6th IEEE international conference on software testing, verification and validation (ICST)
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 508–519 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786814
Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous crosscompany defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, series ESEC/FSE 2015. ACM, New York, NY, USA, pp 496–507 [Online]. Available: http://doi.acm.org/10.1145/2786805.2786813
Herbold S (2017) A systematic mapping study on cross-project defect prediction. CoRR, vol. abs/1705.06429, 2017 [Online]. Available: https://arxiv.org/abs/1705.06429
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE). ACM
Jureczko M, Madeyski L (2015) Cross–project defect prediction with respect to code ownership model: an empirical study. e-Inform Softw Eng J 9:21–35
Khoshgoftaar TM, Rebours P, Seliya N (2008) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49 [Online]. Available: http://dx.doi.org/10.1007/s11219-008-9058-3
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE). ACM
Turhan B, Tosun A, Bener A (2011) Empirical evaluation of mixed-project defect prediction models. In: 2011 37th EUROMICRO conference on software engineering and advanced applications (SEAA), pp 396–403
Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578
Liu Y, Khoshgoftaar T, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834
Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society
Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71 [Online]. Available: http://dx.doi.org/10.1007/s10664-014-9346-4
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Peters F, Menzies T, Layman L (2015) LACE2: better privacy preserving data sharing for cross project defect prediction. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 801–811
Zhang Y, Lo D, Xia X, Sun J (2015) An empirical study of classifier combination for cross-project defect prediction. In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol 2, pp 264–269
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S002200009791504X
Xia X et al (2016) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998
Zhang F et al (2016) Cross-project defect prediction using a connectivity- based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering. ACM
Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’union fait la force. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering and reverse engineering (CSMRWCRE), pp 164–173
Hall MA (1998) Correlation-based feature subset selection for machine learning. Ph.D. dissertation, University of Waikato, Hamilton, New Zealand
Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720
Singh P, Verma S, Vyas OP (2013) Article: cross company and within company fault prediction using object oriented metrics. Int J Comput Appl 74(8):5–11 (full text available)
Amasaki S, Kawata K, Yokogawa T (2015) Improving crossproject defect prediction methods with data simplification. In: 2015 41st Euromicro conference on software engineering and advanced applications (SEAA), Aug 2015, pp 96–103
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199
Raman B, Ioerger TR (2003) Enhancing learning using feature and example selection. Technical Report, Department of Computer Science, Texas A&M University
Ryu D, Jang J-I, Baik J (2015a) A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, pp 1–38 [Online]. Available: http://dx.doi.org/10.1007/s11219-015-9287-1
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). ACM, pp 91–100
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI), July 2015, pp 2–7
Nam J, Pan S, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), pp 382–391
Ryu D, Jang J-I, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980 [Online]. Available: http://dx.doi.org/10.1007/s11390-015-1575-5
Nam J, Kim S (2015) Clami: defect prediction on unlabeled datasets. In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), Nov 2015, pp 452–463
Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: Proceedings of the 10th working conference on mining software repositories, series MSR ’13. IEEE Press, Piscataway, NJ, USA, pp 409–418 [Online]. Available: http://dl.acm.org/citation.cfm?id=2487085.2487161
Mizuno O, Hirata Y (2014) A cross-project evaluation of textbased fault-prone module prediction. In: 2014 6th international workshop on empirical software engineering in practice (IWESEP), Nov 2014, pp 43–48
Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations software engineering (FSE). ACM
Turhan B, Misirli AT, Bener A (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118 [Online]. Available: http://doi.org/10.1016/j.infsof.2012.10.003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Goel, L., Gupta, S. (2020). Cross Projects Defect Prediction Modeling. In: Hemanth, J., Bhatia, M., Geman, O. (eds) Data Visualization and Knowledge Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-030-25797-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-25797-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25796-5
Online ISBN: 978-3-030-25797-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)