Advertisement

Automated Software Engineering

, Volume 25, Issue 2, pp 347–381 | Cite as

Automatic approval prediction for software enhancement requests

  • Zeeshan Ahmed Nizamani
  • Hui Liu
  • David Matthew Chen
  • Zhendong Niu
Article
  • 263 Downloads

Abstract

Software applications often receive a large number of enhancement requests that suggest developers to fulfill additional functions. Such requests are usually checked manually by the developers, which is time consuming and tedious. Consequently, an approach that can automatically predict whether a new enhancement report will be approved is beneficial for both the developers and enhancement suggesters. With the approach, according to their available time, the developers can rank the reports and thus limit the number of reports to evaluate from large collection of low quality enhancement requests that are unlikely to be approved. The approach can help developers respond to the useful requests more quickly. To this end, we propose a multinomial naive Bayes based approach to automatically predict whether a new enhancement report is likely to be approved or rejected. We acquire the enhancement reports of open-source software applications from Bugzilla for evaluation. Each report is preprocessed and modeled as a vector. Using these vectors with their corresponding approval status, we train a Bayes based classifier. The trained classifier predicts approval or rejection of the new enhancement reports. We apply different machine learning and neural network algorithms, and it turns out that the multinomial naive Bayes classifier yields the highest accuracy with the given dataset. The proposed approach is evaluated with 40,000 enhancement reports from 35 open source applications. The results of tenfold cross validation suggest that the average accuracy is up to 89.25%.

Keywords

Software enhancements Machine learning Multinomial naive Bayes Document classification 

Notes

Acknowledgements

The work is supported by the National Key Research and Development Program of China (2016YFB1000801) and the National Natural Science Foundation of China (61472034, 61772071,61690205).

References

  1. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)CrossRefGoogle Scholar
  2. Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, ACM, p. 23 (2008)Google Scholar
  3. Anvik, J.: Automating bug report assignment. In: Proceedings of the 28th international Conference on Software engineering, ACM, pp. 937–940 (2006)Google Scholar
  4. Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, ACM, pp. 361–370 (2006)Google Scholar
  5. Banerjee, S., Cukic, B., Adjeroh, D.: Automated duplicate bug report classification using subsequence matching. In: 2012 IEEE 14th International Symposium on High-Assurance Systems Engineering (HASE), IEEE, pp. 74–81 (2012)Google Scholar
  6. Bhattacharya, P., Neamtiu, I., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)CrossRefGoogle Scholar
  7. Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naive bayes. Expert Syst. Appl. 15, 2160–2164 (2011)Google Scholar
  8. Chen, Z., Lü, K.: A preprocess algorithm of filtering irrelevant information based on the minimum class difference. Knowl.-Based Syst. 19(6), 422–429 (2006)CrossRefGoogle Scholar
  9. Delany, S., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)CrossRefGoogle Scholar
  10. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)CrossRefMATHGoogle Scholar
  11. Eberhardt, J.: Bayesian spam detection. Sch. Horiz. Univ. Minn. Morris Undergrad. J. 2(1), 2 (2015)Google Scholar
  12. Feng, L., Song, L., Sha, C., Gong, X.: Practical duplicate bug reports detection in a large web-based development community. In: Web Technologies and Applications, Springer, pp. 709–720 (2013)Google Scholar
  13. Gad, W., Rady, S.: Email filtering based on supervised learning and mutual information feature selection. In: 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), IEEE, pp. 147–152 (2015)Google Scholar
  14. Gopalan, R., Krishna, A.: Duplicate bug report detection using clustering. In: Software Engineering Conference (ASWEC), 2014 23rd Australian, IEEE, pp. 104–109 (2014)Google Scholar
  15. Hellerstein, J., Thathachar, J., Rish, I.: Recognizing End-User Transactions in Performance Management, vol. 19. IBM Thomas J, Watson Research Division, New York (2000)Google Scholar
  16. Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 392–401 (2013)Google Scholar
  17. Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)CrossRefGoogle Scholar
  18. Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp. 122–132 (2014)Google Scholar
  19. Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, pp. 111–120 (2009)Google Scholar
  20. Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive bayes text classifiers: a locally weighted learning approach. J. Exp. Theor. Artif. Intell. 25, 273–286 (2013)CrossRefGoogle Scholar
  21. Jin, Z., Li, Q., Zeng, D., Wang, L.: Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion. In: 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), IEEE, pp. 132–134 (2015)Google Scholar
  22. Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 1–10 (2010)Google Scholar
  23. Lamkanfi, A., Demeyer, S., Soetens, Q., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, vol. 322, pp. 249–258 (2011)Google Scholar
  24. Lazar, A., Ritchey, S., Sharif, B.: Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 308–311 (2014)Google Scholar
  25. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: international Conference on Machine Learning, 2014, ICML, vol. 14, pp. 1188–1196 (2014)Google Scholar
  26. Lin, M., Yang, C., Lee, C., Chen, C.: Enhancements for duplication detection in bug reports with manifold correlation features. J. Syst. Softw. 121, 223–233 (2016)CrossRefGoogle Scholar
  27. Liu, Y., Liu, Z., Chua, T., Sun, M.: Topical word embeddings. In: The 29th AAAI Conference on Artificial Intelligence (AAAI’15), AAAI, pp. 2418–2424 (2015)Google Scholar
  28. Murphy, G., Cubranic, D.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, Citeseer (2004)Google Scholar
  29. Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: IEEE International Conference on Software Maintenance, ICSM, pp. 346–355 (2008)Google Scholar
  30. Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 22–30 (2013)Google Scholar
  31. Pingclasai, N., Hata, H., Matsumoto, K.: Classifying bug reports to bugs and other requests using topic modeling. In: Software Engineering Conference (APSEC), 2013 20th Asia-Pacific, IEEE, vol. 2, pp. 13–18 (2013)Google Scholar
  32. Rajlich, V.: Software evolution and maintenance. In: Proceedings of the on Future of Software Engineering, ACM, pp. 133–144 (2014)Google Scholar
  33. Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, IBM New York, vol. 3, pp. 41–46 (2001)Google Scholar
  34. Rish, I., Hellerstein, J., Jayram, T.: An analysis of data characteristics that affect naive bayes performance. IBM TJ Watson Research Center 30 (2001)Google Scholar
  35. Roy, N.K.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014a)Google Scholar
  36. Roy, N.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014b)Google Scholar
  37. Santos, I., Laorden, C., Sanz, B., Bringas, P.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)CrossRefGoogle Scholar
  38. Saric, F., Glavas, G., Karan, M., Snajder, J., Basic, B.: Takelab: Systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 441–448 (2012)Google Scholar
  39. Schölkopf, B., Burges, C.: Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)MATHGoogle Scholar
  40. Sohrawardi, S.J., Azam, I., Hosain, S.: A comparative study of text classification algorithms on user submitted bug reports. In: 2014 Ninth International Conference on Digital Information Management (ICDIM), IEEE, pp. 242–247 (2014)Google Scholar
  41. Su, J., Shirab, J., Matwin, S.: Large scale text classification using semi-supervised multinomial naive bayes. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 97–104 (2011)Google Scholar
  42. Sun, C., Lo, D., Khoo, S., Jiang, J.: Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, pp. 253–262 (2011)Google Scholar
  43. Tan, S., Wang, Y., Wu, G.: Adapting centroid classifier for document categorization. Expert Syst. Appl. 38(8), 10,264–10,273 (2011)CrossRefGoogle Scholar
  44. Thung, F., Kochhar, P.S., Lo, D.: Dupfinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ACM, pp. 871–874 (2014)Google Scholar
  45. Tian, Y., Sun, C., Lo, D.: Improved duplicate bug report identification. In: 2012 16th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, pp. 385–390 (2012)Google Scholar
  46. Valdivia Garcia, H., Shihab, E.: Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 72–81 (2014)Google Scholar
  47. Wang, S., Jiang, L., Li, C.: Adapting naive bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2014)CrossRefGoogle Scholar
  48. Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering, ACM, p. 461470 (2008)Google Scholar
  49. Wei, Z., Feng, G.: An improvement to naive bayes for text classification. Proc. Eng. 15, 2160–2164 (2011)CrossRefGoogle Scholar
  50. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  51. Xia, X., Lo, D., Shihab, E., Wang, X., Yang, X.: Elblocker: Predicting blocking bugs with ensemble imbalance learning. Inf. Softw. Technol. 61, 93–106 (2015)CrossRefGoogle Scholar
  52. Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification (2017). arXiv preprint arXiv:1704.04769
  53. Xuan, H., Ming, L.: Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1909–1915 (2017)Google Scholar
  54. Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 48(4), 741–754 (2012)CrossRefGoogle Scholar
  55. Zaghloul, W., Lee, S.M., Trimi, S.: Text classification: neural networks vs support vector machines. Ind. Manag. Data Syst. 109(5), 708–717 (2009)CrossRefGoogle Scholar
  56. Zhang, H.: The optimality of naive bayes. AA 1(2), 3 (2004)Google Scholar
  57. Zhang, H., Li, D.: Naïve bayes text classifier. In: IEEE International Conference on Granular Computing, 2007. GRC 2007, IEEE, pp. 708–708 (2007)Google Scholar
  58. Zhang, W., Tang, X., Yoshida, T.: TESC: An approach to TExt classification using semi-supervised clustering. Knowl.-Based Syst. 75, 152–160 (2015)CrossRefGoogle Scholar
  59. Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRefGoogle Scholar
  60. Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, IEEE Press, pp. 14–24 (2012)Google Scholar
  61. Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Schroter, A., Weiss, C.: What makes a good bug report? IEEE Trans. Softw. Eng. 36(5), 618–643 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina

Personalised recommendations