Cross-Project Issue Classification Based on Ensemble Modeling in a Social Coding World

  • Yarong ZengEmail author
  • Yue Yu
  • Qiang Fan
  • Xunhui Zhang
  • Tao Wang
  • Gang Yin
  • Huaimin Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)


The simplified and deformalized contribution mechanisms in social coding are attracting more and more contributors involved in the collaborative software development. To reduce the burden on the side of project core team, various kinds of automated and intelligent approaches have been proposed based on machine learning and data mining technologies, which would be restricted by the lack of training data. In this paper, we conduct an extensive empirical study of transferring and aggregating reusable models across projects in the context of issue classification, based on a large-scale dataset including 799 open source projects and more than 795,000 issues. We propose a novel cross-project approach which integrate multiple models learned from various source projects to classify target project. We evaluate our approach through conducting comparative experiments with the within-project classification and a typical cross-project method called Bellwether. The results show that our cross-project approach based on ensemble modeling can obtain great performance, which comparable to the within-project classification and performs better than Bellwether.


Cross-project Issue classification Transfer learning Ensemble modeling Modeling 


  1. 1.
    Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, p. 23. ACM (2008)Google Scholar
  2. 2.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  3. 3.
    Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 60–69. IEEE (2012)Google Scholar
  4. 4.
    Bissyandé, T.F., Lo, D., Jiang, L., Réveillere, L., Klein, J., Le Traon, Y.: Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 188–197. IEEE (2013)Google Scholar
  5. 5.
    Fan, Q., Yu, Y., Yin, G., Wang, T., Wang, H.: Where is the road for issue reports classification based on text mining? In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 121–130. IEEE (2017)Google Scholar
  6. 6.
    Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp. 345–355. ACM (2014)Google Scholar
  7. 7.
    He, P., Li, B., Ma, Y.: Towards cross-project defect prediction with imbalanced feature sets. arXiv preprint arXiv:1411.4228 (2014)
  8. 8.
    Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 92–101. ACM (2014)Google Scholar
  9. 9.
    Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross versus within-company cost estimation studies: a systematic review. IEEE Trans. Softw. Eng. 33(5), 316–329 (2007)CrossRefGoogle Scholar
  10. 10.
    Konietschke, F., Hothorn, L.A., Brunner, E., et al.: Rank-based multiple test procedures and simultaneous confidence intervals. Electron. J. Stat. 6, 738–759 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Konietschke, F., Placzek, M., Schaarschmidt, F., Hothorn, L.A.: nparcomp: An R software package for nonparametric multiple comparisons and simultaneous confidence intervals (2015)Google Scholar
  12. 12.
    Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131. ACM (2016)Google Scholar
  13. 13.
    Lan, L., Tao, D., Gong, C., Guan, N., Luo, Z.: Online multi-object tracking by quadratic pseudo-boolean optimization. In: IJCAI, pp. 3396–3402 (2016)Google Scholar
  14. 14.
    Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54(3), 248–256 (2012)CrossRefGoogle Scholar
  15. 15.
    Menzies, T., Butcher, A., Marcus, A., Zimmermann, T., Cok, D.: Local vs. global models for effort estimation and defect prediction. In: Automated Software Engineering, pp. 343–351. IEEE (2011)Google Scholar
  16. 16.
    Merten, T., Falis, M., Hübner, P., Quirchmayr, T., Bürsner, S., Paech, B.: Software feature request detection in issue tracking systems. In: 2016 IEEE 24th International Requirements Engineering Conference (RE), pp. 166–175. IEEE (2016)Google Scholar
  17. 17.
    Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)Google Scholar
  18. 18.
    Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 382–391. IEEE Press (2013)Google Scholar
  19. 19.
    Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)CrossRefGoogle Scholar
  20. 20.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  21. 21.
    Peters, F., Menzies, T., Marcus, A.: Better cross company defect prediction. In: Mining Software Repositories, pp. 409–418 (2013)Google Scholar
  22. 22.
    Posnett, D., Filkov, V., Devanbu, P.: Ecological inference in empirical software engineering. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp. 362–371. IEEE Computer Society (2011)Google Scholar
  23. 23.
    Premraj, R., Herzig, K.: Network versus code metrics to predict defects: a replication study. In: 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 215–224. IEEE (2011)Google Scholar
  24. 24.
    Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)CrossRefGoogle Scholar
  25. 25.
    Uddin, J., Ghazali, R., Deris, M.M., Naseem, R., Shah, H.: A survey on bug prioritization. Artif. Intell. Rev. 47(2), 145–180 (2017)CrossRefGoogle Scholar
  26. 26.
    Van Der Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 357–361. IEEE Press (2015)Google Scholar
  27. 27.
    Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: determinants of pull request evaluation latency on GitHub. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pp. 367–371. IEEE (2015)Google Scholar
  28. 28.
    Yu, Y., Wang, H., Yin, G., Wang, T.: Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 74, 204–218 (2016)CrossRefGoogle Scholar
  29. 29.
    Zanetti, M.S., Scholtes, I., Tessone, C.J., Schweitzer, F.: Categorizing bugs with social networks: a case study on four open source software communities. In: Proceedings of the 35th International Conference on Software Engineering, pp. 1032–1041. IEEE (2013)Google Scholar
  30. 30.
    Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 182–191. ACM (2014)Google Scholar
  31. 31.
    Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th International Conference on Software Engineering, pp. 309–320. ACM (2016)Google Scholar
  32. 32.
    Zhou, Y., Tong, Y., Gu, R., Gall, H.: Combining text mining and data mining for bug report classification. J. Softw. Evol. Process 28(3), 150–176 (2016)CrossRefGoogle Scholar
  33. 33.
    Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 91–100. ACM (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yarong Zeng
    • 1
    Email author
  • Yue Yu
    • 1
  • Qiang Fan
    • 1
  • Xunhui Zhang
    • 1
  • Tao Wang
    • 1
  • Gang Yin
    • 1
  • Huaimin Wang
    • 1
  1. 1.National Laboratory for Parallel and Distributed ProcessingNational University of Defence TechnologyChangshaChina

Personalised recommendations