Deep Metric Learning for Software Change-Proneness Prediction

Ge, Yongxin; Chen, Min; Liu, Chao; Chen, Feiyi; Huang, Sheng; Wang, Hongxing

doi:10.1007/978-3-030-02698-1_25

Yongxin Ge¹⁷,
Min Chen¹⁷,
Chao Liu¹⁷,
Feiyi Chen¹⁷,
Sheng Huang¹⁷ &
…
Hongxing Wang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11266))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1840 Accesses
5 Citations

Abstract

Software change-proneness prediction, which predicts whether or not class files in a project will be changed in their next release, can help software developers allocate resources more effectively and reduce software maintenance costs. Previous studies found that change-proneness prediction cannot work well with limited training data, especially for new projects. To address this issue, the cross-project change-proneness prediction is proposed, which builds a prediction model by using sufficient data form other projects, i.e. the source projects, and predicts the change-prone files in a target project. However, the cross-project prediction is unstable due to the large metric distinction between source projects, leading to a challenge for classifying change-prone files. To improve the cross-project prediction, we propose a Deep Metric Learning (DML) model to minimize such feature distinction before the file classification. Specifically, DML maps files in source projects into a particular space, where files from the same category, e.g. change-prone files, are getting closer while files from different categories are getting further. Besides, we also leverage an over-sampling approach to handle the highly imbalanced dataset for model training. We verify our model on 20 change-proneness datasets, and compare it with 5 cross-project change-proneness models. Results indicate that the proposed model can substantially improve the performance of change-proneness prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
CKJM: https://www.spinellis.gr/sw/ckjm/.
2.
CKJM: https://www.spinellis.gr/sw/ckjm/.

References

Anbalagan, P., Vouk, M.: On predicting the time taken to correct bug reports in open source projects. In: IEEE International Conference on Software Maintenance, ICSM 2009, pp. 523–526. IEEE (2009)
Google Scholar
Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)
Article Google Scholar
Bhargava, N., Sharma, G., Bhargava, R., Mathuria, M.: Decision tree analysis on j48 algorithm for data mining. Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6) (2013)
Google Scholar
Bieman, J.M., Andrews, A.A., Yang, H.J.: Understanding change-proneness in OO software through visualization. In: 2003 11th IEEE International Workshop on Program Comprehension, pp. 44–53. IEEE (2003)
Google Scholar
Bieman, J.M., Jain, D., Yang, H.J.: OO design patterns, design structure, and program changes: an industrial case study. In: 2001 Proceedings of IEEE International Conference on Software Maintenance, pp. 580–589. IEEE (2001)
Google Scholar
Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)
Article Google Scholar
Cai, X., Wang, C., Xiao, B., Chen, X., Zhou, J.: Deep nonlinear metric learning with independent subspace analysis for face verification. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 749–752. ACM (2012)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Elish, M.O., Al-Rahman Al-Khiaty, M.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw.: Evol. Process 25(5), 407–437 (2013)
Google Scholar
Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft Comput. 19(9), 2511–2524 (2015)
Article Google Scholar
Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 566–571. IEEE (2011)
Google Scholar
Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., Ubayashi, N.: An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 172–181. ACM (2014)
Google Scholar
Giger, E., Pinzger, M., Gall, H.C.: Can we predict types of code changes? An empirical analysis. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 217–226. IEEE (2012)
Google Scholar
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: 1994 Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)
Google Scholar
Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882 (2014)
Google Scholar
Huang, Y., Huang, Z., Wang, Y., BingWu, F.: Survey on data driven software defects prediction. Chin. J. Electron. 4, 982–988 (2017)
Google Scholar
Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17534-3_19
Chapter Google Scholar
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2288–2295. IEEE (2012)
Google Scholar
Koru, A.G., Tian, J.: Comparing High-Change Modules and Modules with the Highest Measurement Values in Two Large-Scale Open-Source Products. IEEE Press (2005)
Google Scholar
Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)
Article Google Scholar
Kumar, L., Rath, S.K., Sureka, A.: Using source code metrics to predict change-prone web services: a case-study on ebay services. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 1–7. IEEE (2017)
Google Scholar
Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw.-Practice Exp. 28(15), 1551–1558 (1998)
Article Google Scholar
Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empirical Softw. Eng. 17(3), 200–242 (2012)
Article Google Scholar
Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)
Article Google Scholar
Malhotra, R., Bansal, A.J.: Cross project change prediction using open source projects. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 201–207. IEEE (2014)
Google Scholar
Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybern. 4(4), 273–286 (2013)
Article Google Scholar
Malhotra, R., Khanna, M.: Examining the effectiveness of machine learning algorithms for prediction of change prone classes. In: 2014 International Conference on High Performance Computing & Simulation (HPCS), pp. 635–642. IEEE (2014)
Google Scholar
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376 (2005)
Article Google Scholar
Mens, T., Tourwé, T.: A survey of software refactoring. IEEE Trans. Softw. Eng. 30(2), 126–139 (2004)
Article Google Scholar
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, pp. 181–190. ACM (2008)
Google Scholar
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)
Google Scholar
Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 532–538. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-39940-9_565
Chapter Google Scholar
Riaz, M., Mendes, E., Tempero, E.: A systematic review of software maintainability prediction and metrics. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pp. 367–377. IEEE Computer Society (2009)
Google Scholar
Romano, D., Pinzger, M.: Using source code metrics to predict change-prone Java interfaces. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 303–312. IEEE (2011)
Google Scholar
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)
Google Scholar
Spinellis, D.: ckjm chidamber and kemerer metrics software. Technical report, v 1.6. Technical report, Athens University of Economics and Business (2005). http://www.spinellis.gr/sw/ckjm
Tempero, E., et al.: The qualitas corpus: a curated collection of Java code for empirical studies. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp. 336–345. IEEE (2010)
Google Scholar
Tsang, I.W., Kwok, J.T., Bay, C., Kong, H.: Distance metric learning with kernels. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 126–129. Citeseer (2003)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Boosting precision and recall of dictionary-based protein name recognition. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, vol. 13, pp. 41–48. Association for Computational Linguistics (2003)
Google Scholar
Van Koten, C., Gray, A.: An application of Bayesian network for predicting object-oriented software maintainability. Inf. Softw. Technol. 48(1), 59–67 (2006)
Article Google Scholar
Wang, D., Wang, Q.: Improving the performance of defect prediction based on evolution data. Chin. J. Softw. 27(12), 3014–3029 (2016)
Google Scholar
Yeung, D.Y., Chang, H.: A kernel approach for semisupervised metric learning. IEEE Trans. Neural Netw. 18(1), 141–149 (2007)
Article Google Scholar
Zhou, Y., Leung, H.: Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. 80(8), 1349–1361 (2007)
Article Google Scholar

Download references

Acknowledgement

The work described in this paper was partially supported by the Fundamental Research Funds for the Central Universities of China (No. 106112017CDJXSYY002), the National Natural Science Foundation of China (Grant no. 61402062, 61602068, 61602069, 61772093), Chongqing Research Program of Basic Science & Frontier Technology (Grant no. cstc2015jcyjA40037, cstc2016jcyjA0458, cstc2016jcyjA0468), and the Chongqing Major Theme Program (Grant No. cstc2017zdcy-zdzxX0002).

Author information

Authors and Affiliations

School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Yongxin Ge, Min Chen, Chao Liu, Feiyi Chen, Sheng Huang & Hongxing Wang

Authors

Yongxin Ge
View author publications
You can also search for this author in PubMed Google Scholar
Min Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Feiyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Liu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Shanghai Jiao Tong University, Shanghai, China
Kai Yu
Tsinghua University, Beijing, China
Jiwen Lu
Central China Normal University, Wuhan, China
Xingpeng Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ge, Y., Chen, M., Liu, C., Chen, F., Huang, S., Wang, H. (2018). Deep Metric Learning for Software Change-Proneness Prediction. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-02698-1_25
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02697-4
Online ISBN: 978-3-030-02698-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics