Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques

Ardimento, Pasquale; Bilancia, Massimo; Monopoli, Stefano

doi:10.1007/978-3-319-46307-0_11

Pasquale Ardimento¹⁶,
Massimo Bilancia¹⁷ &
Stefano Monopoli¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9956))

Included in the following conference series:

International Conference on Discovery Science

1730 Accesses
9 Citations

Abstract

In modern software development, finding and fixing bugs is a vital part of software development and quality assurance. Once a bug is reported, it is typically recorded in the Bug Tracking System, and is assigned to a developer to resolve (bug triage). Current practice of bug triage is largely a manual collaborative process, which is often time-consuming and error-prone. Predicting on the basis of past data the time to fix a newly-reported bug has been shown to be an important target to support the whole triage process. Many researchers have, therefore, proposed methods for automated bug-fix time prediction, largely based on statistical prediction models exploiting the attributes of bug reports. However, existing algorithms often fail to validate on multiple large projects widely-used in bug studies, mostly as a consequence of inappropriate attribute selection [2]. In this paper, instead of focusing on attribute subset selection, we explore an alternative promising approach consisting of using all available textual information. The problem of bug-fix time estimation is then mapped to a text categorization problem. We consider a multi-topic Supervised Latent Dirichlet Allocation (SLDA) model, which adds to Latent Dirichlet Allocation a response variable consisting of an unordered binary target variable, denoting time to resolution discretized into FAST (negative class) and SLOW (positive class) labels. We have evaluated SLDA on four large-scale open source projects. We show that the proposed model greatly improves recall, when compared to standard single topic algorithms.

The authors equally contributed to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AbdelMoez, W., Kholief, M., Elsalmy, F.M.: Improving bug fix-time prediction model by filtering out outliers. In: 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), pp. 359–364 (2013)
Google Scholar
Bhattacharya, P., Neamtiu, I.: Bug-fix time prediction models: can we do better? In: Proceeding of the 8th Working Conference on Mining Software Repositories, MSR 2011, pp. 207–210. ACM Press, New York (2011)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: NIPS (2007)
Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians, pp. 1–33 (2016). http://arxiv.org/abs/1601.00670
Boyd-Graber, J., Mimno, D., Newman, D.: Care and feeding of topic models: problems, diagnostics, and improvements. In: Airoldi, E.M., Blei, D., Erosheva, E.A., Fienberg, S.E. (eds.) Handbook of Mixed Membership Models and Their Applications. CRC Press, Boca Raton (2014)
Google Scholar
The Bugzilla Team: Bugzilla Documentation 5.0.3+ (2016). https://www.bugzilla.org/docs/
Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Ann. Appl. Stat. 4(1), 124–150 (2010)
Article MathSciNet MATH Google Scholar
Dobson, A.J., Barnett, A.: An Introduction to Generalized Linear Models: Chapman & Hall/CRC Texts in Statistical Science, 3rd edn. Taylor & Francis (2008)
Google Scholar
Folino, F., Guarascio, M., Pontieri, L.: An approach to the discovery of accurate and expressive fix-time prediction models. In: Hammoudi, S., Maciaszek, L., Teniente, E., Camp, O., Cordeiro, J. (eds.) ICEIS 2015. LNBIP, vol. 241, pp. 108–128. Springer, Heidelberg (2015). doi:10.1007/978-3-319-22348-3_7
Chapter Google Scholar
Giger, E., Pinzger, M., Gall, H.: Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE 2010, pp. 52–56. ACM Press, New York (2010)
Google Scholar
Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 122–132. IEEE (2014)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683
Chapter Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(01), 9–27 (1995)
Article Google Scholar
Karatzoglou, A., Meyer, D., Hornik, K.: Support vector machines in R. J. Stat. Softw. 15(1), 1–28 (2006)
Google Scholar
Lakshminarayanan, B., Raich, R.: Inference in supervised latent Dirichlet allocation. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2011)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology, pp. 114–119. Association for Computational Linguistics, Stroudsburg (1995)
Google Scholar
Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, Promise 2011, pp. 1–8. ACM Press, New York (2011)
Google Scholar
Panjer, L.D.: Predicting eclipse bug lifetimes. In: Fourth International Workshop on Mining Software Repositories, MSR 2007: ICSE Workshops 2007, pp. 29–32. IEEE, Washington, DC (2007). doi:10.1109/MSR.2007.25
Pressman, R.S., Maxim, B.R.: Software Engineering: A Practitioner’s Approach, 8th edn. McGraw-Hill Higher Education (2014)
Google Scholar
Core Team, R.: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2016). https://www.R-project.org/
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naïve Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, pp. 616–662 (2003)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Wilbur, W.J., Kim, W.: The ineffectiveness of within-document term frequency in text classification. Inf. Retr. 12(5), 509–525 (2009)
Article Google Scholar
Xuan, J., Jiang, H., Hu, Y., Ren, Z., Zou, W., Luo, Z., Wu, X.: Towards effective bug triage with software data reduction techniques. IEEE Trans. Knowl. Data Eng. 27(1), 264–280 (2015)
Article Google Scholar
Zhang, C., Kjellström, H.: How to supervise topic models. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014, Part II. LNCS, vol. 8926, pp. 500–515. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16181-5_39
Google Scholar
Zhang, J., Wang, X., Hao, D., Xie, B., Zhang, L., Mei, H.: A survey on bug-report analysis. Sci. China Inf. Sci. 58(2), 1–24 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Bari Aldo Moro, Via Orabona, 4, 70125, Bari, Italy
Pasquale Ardimento
Ionian Department of Law, Economics and Environment, University of Bari Aldo Moro, Via Lago Maggiore angolo Via Ancona, 74121, Taranto, Italy
Massimo Bilancia
Everis Italia S.p.A., Via Gustavo Fara, 26, 20124, Milano, Italy
Stefano Monopoli

Authors

Pasquale Ardimento
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Bilancia
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Monopoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pasquale Ardimento .

Editor information

Editors and Affiliations

Campus Middelhe, M.G.103a, Universiteit Antwerpen Campus Middelhe, M.G.103a, Antwerp, Belgium
Toon Calders
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Bari, Italy
Donato Malerba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ardimento, P., Bilancia, M., Monopoli, S. (2016). Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-46307-0_11
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46306-3
Online ISBN: 978-3-319-46307-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics