Abstract
Software change-proneness prediction, which predicts whether or not class files in a project will be changed in their next release, can help software developers allocate resources more effectively and reduce software maintenance costs. Previous studies found that change-proneness prediction cannot work well with limited training data, especially for new projects. To address this issue, the cross-project change-proneness prediction is proposed, which builds a prediction model by using sufficient data form other projects, i.e. the source projects, and predicts the change-prone files in a target project. However, the cross-project prediction is unstable due to the large metric distinction between source projects, leading to a challenge for classifying change-prone files. To improve the cross-project prediction, we propose a Deep Metric Learning (DML) model to minimize such feature distinction before the file classification. Specifically, DML maps files in source projects into a particular space, where files from the same category, e.g. change-prone files, are getting closer while files from different categories are getting further. Besides, we also leverage an over-sampling approach to handle the highly imbalanced dataset for model training. We verify our model on 20 change-proneness datasets, and compare it with 5 cross-project change-proneness models. Results indicate that the proposed model can substantially improve the performance of change-proneness prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
References
Anbalagan, P., Vouk, M.: On predicting the time taken to correct bug reports in open source projects. In: IEEE International Conference on Software Maintenance, ICSM 2009, pp. 523–526. IEEE (2009)
Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)
Bhargava, N., Sharma, G., Bhargava, R., Mathuria, M.: Decision tree analysis on j48 algorithm for data mining. Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6) (2013)
Bieman, J.M., Andrews, A.A., Yang, H.J.: Understanding change-proneness in OO software through visualization. In: 2003 11th IEEE International Workshop on Program Comprehension, pp. 44–53. IEEE (2003)
Bieman, J.M., Jain, D., Yang, H.J.: OO design patterns, design structure, and program changes: an industrial case study. In: 2001 Proceedings of IEEE International Conference on Software Maintenance, pp. 580–589. IEEE (2001)
Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)
Cai, X., Wang, C., Xiao, B., Chen, X., Zhou, J.: Deep nonlinear metric learning with independent subspace analysis for face verification. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 749–752. ACM (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Elish, M.O., Al-Rahman Al-Khiaty, M.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw.: Evol. Process 25(5), 407–437 (2013)
Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft Comput. 19(9), 2511–2524 (2015)
Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 566–571. IEEE (2011)
Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., Ubayashi, N.: An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 172–181. ACM (2014)
Giger, E., Pinzger, M., Gall, H.C.: Can we predict types of code changes? An empirical analysis. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 217–226. IEEE (2012)
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: 1994 Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)
Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882 (2014)
Huang, Y., Huang, Z., Wang, Y., BingWu, F.: Survey on data driven software defects prediction. Chin. J. Electron. 4, 982–988 (2017)
Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17534-3_19
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2288–2295. IEEE (2012)
Koru, A.G., Tian, J.: Comparing High-Change Modules and Modules with the Highest Measurement Values in Two Large-Scale Open-Source Products. IEEE Press (2005)
Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)
Kumar, L., Rath, S.K., Sureka, A.: Using source code metrics to predict change-prone web services: a case-study on ebay services. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 1–7. IEEE (2017)
Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw.-Practice Exp. 28(15), 1551–1558 (1998)
Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empirical Softw. Eng. 17(3), 200–242 (2012)
Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)
Malhotra, R., Bansal, A.J.: Cross project change prediction using open source projects. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 201–207. IEEE (2014)
Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybern. 4(4), 273–286 (2013)
Malhotra, R., Khanna, M.: Examining the effectiveness of machine learning algorithms for prediction of change prone classes. In: 2014 International Conference on High Performance Computing & Simulation (HPCS), pp. 635–642. IEEE (2014)
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376 (2005)
Mens, T., Tourwé, T.: A survey of software refactoring. IEEE Trans. Softw. Eng. 30(2), 126–139 (2004)
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, pp. 181–190. ACM (2008)
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)
Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 532–538. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-39940-9_565
Riaz, M., Mendes, E., Tempero, E.: A systematic review of software maintainability prediction and metrics. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pp. 367–377. IEEE Computer Society (2009)
Romano, D., Pinzger, M.: Using source code metrics to predict change-prone Java interfaces. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 303–312. IEEE (2011)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)
Spinellis, D.: ckjm chidamber and kemerer metrics software. Technical report, v 1.6. Technical report, Athens University of Economics and Business (2005). http://www.spinellis.gr/sw/ckjm
Tempero, E., et al.: The qualitas corpus: a curated collection of Java code for empirical studies. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp. 336–345. IEEE (2010)
Tsang, I.W., Kwok, J.T., Bay, C., Kong, H.: Distance metric learning with kernels. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 126–129. Citeseer (2003)
Tsuruoka, Y., Tsujii, J.: Boosting precision and recall of dictionary-based protein name recognition. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, vol. 13, pp. 41–48. Association for Computational Linguistics (2003)
Van Koten, C., Gray, A.: An application of Bayesian network for predicting object-oriented software maintainability. Inf. Softw. Technol. 48(1), 59–67 (2006)
Wang, D., Wang, Q.: Improving the performance of defect prediction based on evolution data. Chin. J. Softw. 27(12), 3014–3029 (2016)
Yeung, D.Y., Chang, H.: A kernel approach for semisupervised metric learning. IEEE Trans. Neural Netw. 18(1), 141–149 (2007)
Zhou, Y., Leung, H.: Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. 80(8), 1349–1361 (2007)
Acknowledgement
The work described in this paper was partially supported by the Fundamental Research Funds for the Central Universities of China (No. 106112017CDJXSYY002), the National Natural Science Foundation of China (Grant no. 61402062, 61602068, 61602069, 61772093), Chongqing Research Program of Basic Science & Frontier Technology (Grant no. cstc2015jcyjA40037, cstc2016jcyjA0458, cstc2016jcyjA0468), and the Chongqing Major Theme Program (Grant No. cstc2017zdcy-zdzxX0002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ge, Y., Chen, M., Liu, C., Chen, F., Huang, S., Wang, H. (2018). Deep Metric Learning for Software Change-Proneness Prediction. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-02698-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02697-4
Online ISBN: 978-3-030-02698-1
eBook Packages: Computer ScienceComputer Science (R0)