Enhancing Bug Report Assignment with an Optimized Reduction of Training Set
Despite the great potential to save the labor cost of developers, automated bug triaging as a text classification problem has not been thoroughly investigated on long descriptions, which are informative but often noisy. In this paper an optimized bug triage technique is proposed to build a high quality set of bug data by removing the noisy and non-informative bug reports while ensuring the maximum accuracy of bug triaging with weights and binary constraints. The proposed technique is built upon three feature selection algorithms and four instances selection algorithms with intention to recommend the bug and to automatically assign it more accurately even with noisy bug descriptions. Several experiments are conducted and the experimental results show that the reduced training sets by the proposed approach can achieve better accuracy in several cases, about 4% on average better than the original ones.
KeywordsBug triaging Bug reports Bug assignment Machine learning Text classification Industrial scale
This work is supported by the National Natural Science Foundation of China (Nos. 61672122, 61602077), the Public Welfare Funds for Scientific Research of Liaoning Province of China (No. 20170005), the Natural Science Foundation of Liaoning Province of China (No. 20170540097), and the Fundamental Research Funds for the Central Universities (No. 3132016348).
- 2.Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering, DBLP, pp. 92–97 (2004)Google Scholar
- 3.Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine. In: 2009 Fourth International Conference on Software Engineering Advances, pp. 216–221, September 2009Google Scholar
- 5.Zou, W., Hu, Y., Xuan, J., Jiang, H.: Towards training set reduction for bug triage. In: Proceedings-35th Annual IEEE International Computer Software and Applications Conference, pp. 576–581, July 2011Google Scholar
- 7.Kumar, V.A.: ArK feature selection algorithm to resolve small sample size problem. Data Min. Knowl. Eng. 5(2), 59–61 (2013)Google Scholar
- 10.Selvi, C., Ahuja, C., Sivasankar, E.: A comparative study of feature selection and machine learning methods for sentiment classification on movie data set. In: Mandal, D., Kar, R., Das, S., Panigrahi, B.K. (eds.) Intelligent Computing and Applications. AISC, vol. 343, pp. 367–379. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2268-2_39CrossRefGoogle Scholar
- 11.Brighton, H., Mellish, C.: Identifying competence-critical instances for instance-based learners. In: Liu, H., Motoda, H. (eds.) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol. 608, pp. 77–94. Springer, Boston (2001). https://doi.org/10.1007/978-1-4757-3359-4_5CrossRefGoogle Scholar
- 13.Čubranić, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering (SEKE 2004), pp. 92–97, June 2004Google Scholar