Skip to main content

Deep Metric Learning for Software Change-Proneness Prediction

  • Conference paper
  • First Online:
Book cover Intelligence Science and Big Data Engineering (IScIDE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11266))

Abstract

Software change-proneness prediction, which predicts whether or not class files in a project will be changed in their next release, can help software developers allocate resources more effectively and reduce software maintenance costs. Previous studies found that change-proneness prediction cannot work well with limited training data, especially for new projects. To address this issue, the cross-project change-proneness prediction is proposed, which builds a prediction model by using sufficient data form other projects, i.e. the source projects, and predicts the change-prone files in a target project. However, the cross-project prediction is unstable due to the large metric distinction between source projects, leading to a challenge for classifying change-prone files. To improve the cross-project prediction, we propose a Deep Metric Learning (DML) model to minimize such feature distinction before the file classification. Specifically, DML maps files in source projects into a particular space, where files from the same category, e.g. change-prone files, are getting closer while files from different categories are getting further. Besides, we also leverage an over-sampling approach to handle the highly imbalanced dataset for model training. We verify our model on 20 change-proneness datasets, and compare it with 5 cross-project change-proneness models. Results indicate that the proposed model can substantially improve the performance of change-proneness prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    CKJM: https://www.spinellis.gr/sw/ckjm/.

  2. 2.

    CKJM: https://www.spinellis.gr/sw/ckjm/.

References

  1. Anbalagan, P., Vouk, M.: On predicting the time taken to correct bug reports in open source projects. In: IEEE International Conference on Software Maintenance, ICSM 2009, pp. 523–526. IEEE (2009)

    Google Scholar 

  2. Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)

    Article  Google Scholar 

  3. Bhargava, N., Sharma, G., Bhargava, R., Mathuria, M.: Decision tree analysis on j48 algorithm for data mining. Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6) (2013)

    Google Scholar 

  4. Bieman, J.M., Andrews, A.A., Yang, H.J.: Understanding change-proneness in OO software through visualization. In: 2003 11th IEEE International Workshop on Program Comprehension, pp. 44–53. IEEE (2003)

    Google Scholar 

  5. Bieman, J.M., Jain, D., Yang, H.J.: OO design patterns, design structure, and program changes: an industrial case study. In: 2001 Proceedings of IEEE International Conference on Software Maintenance, pp. 580–589. IEEE (2001)

    Google Scholar 

  6. Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)

    Article  Google Scholar 

  7. Cai, X., Wang, C., Xiao, B., Chen, X., Zhou, J.: Deep nonlinear metric learning with independent subspace analysis for face verification. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 749–752. ACM (2012)

    Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  9. Elish, M.O., Al-Rahman Al-Khiaty, M.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw.: Evol. Process 25(5), 407–437 (2013)

    Google Scholar 

  10. Elish, M.O., Aljamaan, H., Ahmad, I.: Three empirical studies on predicting software maintainability using ensemble methods. Soft Comput. 19(9), 2511–2524 (2015)

    Article  Google Scholar 

  11. Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 566–571. IEEE (2011)

    Google Scholar 

  12. Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., Ubayashi, N.: An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 172–181. ACM (2014)

    Google Scholar 

  13. Giger, E., Pinzger, M., Gall, H.C.: Can we predict types of code changes? An empirical analysis. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 217–226. IEEE (2012)

    Google Scholar 

  14. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25

    Chapter  Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  16. Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: 1994 Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)

    Google Scholar 

  17. Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882 (2014)

    Google Scholar 

  18. Huang, Y., Huang, Z., Wang, Y., BingWu, F.: Survey on data driven software defects prediction. Chin. J. Electron. 4, 982–988 (2017)

    Google Scholar 

  19. Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17534-3_19

    Chapter  Google Scholar 

  20. Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2288–2295. IEEE (2012)

    Google Scholar 

  21. Koru, A.G., Tian, J.: Comparing High-Change Modules and Modules with the Highest Measurement Values in Two Large-Scale Open-Source Products. IEEE Press (2005)

    Google Scholar 

  22. Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)

    Article  Google Scholar 

  23. Kumar, L., Rath, S.K., Sureka, A.: Using source code metrics to predict change-prone web services: a case-study on ebay services. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 1–7. IEEE (2017)

    Google Scholar 

  24. Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw.-Practice Exp. 28(15), 1551–1558 (1998)

    Article  Google Scholar 

  25. Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empirical Softw. Eng. 17(3), 200–242 (2012)

    Article  Google Scholar 

  26. Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)

    Article  Google Scholar 

  27. Malhotra, R., Bansal, A.J.: Cross project change prediction using open source projects. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 201–207. IEEE (2014)

    Google Scholar 

  28. Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybern. 4(4), 273–286 (2013)

    Article  Google Scholar 

  29. Malhotra, R., Khanna, M.: Examining the effectiveness of machine learning algorithms for prediction of change prone classes. In: 2014 International Conference on High Performance Computing & Simulation (HPCS), pp. 635–642. IEEE (2014)

    Google Scholar 

  30. Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376 (2005)

    Article  Google Scholar 

  31. Mens, T., Tourwé, T.: A survey of software refactoring. IEEE Trans. Softw. Eng. 30(2), 126–139 (2004)

    Article  Google Scholar 

  32. Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, pp. 181–190. ACM (2008)

    Google Scholar 

  33. Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)

    Google Scholar 

  34. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 532–538. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-39940-9_565

    Chapter  Google Scholar 

  35. Riaz, M., Mendes, E., Tempero, E.: A systematic review of software maintainability prediction and metrics. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pp. 367–377. IEEE Computer Society (2009)

    Google Scholar 

  36. Romano, D., Pinzger, M.: Using source code metrics to predict change-prone Java interfaces. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 303–312. IEEE (2011)

    Google Scholar 

  37. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)

    Google Scholar 

  38. Spinellis, D.: ckjm chidamber and kemerer metrics software. Technical report, v 1.6. Technical report, Athens University of Economics and Business (2005). http://www.spinellis.gr/sw/ckjm

  39. Tempero, E., et al.: The qualitas corpus: a curated collection of Java code for empirical studies. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp. 336–345. IEEE (2010)

    Google Scholar 

  40. Tsang, I.W., Kwok, J.T., Bay, C., Kong, H.: Distance metric learning with kernels. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 126–129. Citeseer (2003)

    Google Scholar 

  41. Tsuruoka, Y., Tsujii, J.: Boosting precision and recall of dictionary-based protein name recognition. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, vol. 13, pp. 41–48. Association for Computational Linguistics (2003)

    Google Scholar 

  42. Van Koten, C., Gray, A.: An application of Bayesian network for predicting object-oriented software maintainability. Inf. Softw. Technol. 48(1), 59–67 (2006)

    Article  Google Scholar 

  43. Wang, D., Wang, Q.: Improving the performance of defect prediction based on evolution data. Chin. J. Softw. 27(12), 3014–3029 (2016)

    Google Scholar 

  44. Yeung, D.Y., Chang, H.: A kernel approach for semisupervised metric learning. IEEE Trans. Neural Netw. 18(1), 141–149 (2007)

    Article  Google Scholar 

  45. Zhou, Y., Leung, H.: Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. 80(8), 1349–1361 (2007)

    Article  Google Scholar 

Download references

Acknowledgement

The work described in this paper was partially supported by the Fundamental Research Funds for the Central Universities of China (No. 106112017CDJXSYY002), the National Natural Science Foundation of China (Grant no. 61402062, 61602068, 61602069, 61772093), Chongqing Research Program of Basic Science & Frontier Technology (Grant no. cstc2015jcyjA40037, cstc2016jcyjA0458, cstc2016jcyjA0468), and the Chongqing Major Theme Program (Grant No. cstc2017zdcy-zdzxX0002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ge, Y., Chen, M., Liu, C., Chen, F., Huang, S., Wang, H. (2018). Deep Metric Learning for Software Change-Proneness Prediction. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02698-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02697-4

  • Online ISBN: 978-3-030-02698-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics