Spark-Based Cluster Implementation of a Bug Report Assignment Recommender System

Florea, Adrian-Cătălin; Anvik, John; Andonie, Răzvan

doi:10.1007/978-3-319-59060-8_4

Spark-Based Cluster Implementation of a Bug Report Assignment Recommender System

Adrian-Cătălin Florea¹⁹,
John Anvik²⁰ &
Răzvan Andonie^19,21

Conference paper
First Online: 24 May 2017

2209 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10246))

Abstract

The use of recommenders for bug report triage decisions is especially important in the context of large software development projects, where both the frequency of reported problems and a large number of active developers can pose problems in selecting the most appropriate developer to work on a certain issue. From a machine learning perspective, the triage problem of bug report assignment in software projects may be regarded as a classification problem which can be solved by a recommender system. We describe a highly scalable SVM-based bug report assignment recommender that is able to run on massive datasets. Unlike previous desktop-based implementations of bug report triage assignment recommenders, our recommender is implemented on a cloud platform. The system uses a novel sequence of machine learning processing steps and compares favorably with other SVM-based bug report assignment recommender systems with respect to prediction performance. We validate our approach on real-world datasets from the Netbeans, Eclipse and Mozilla projects.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Data extracted from a Mozilla’s Bugzilla database dump as of March 4, 2016.
2.
https://bugs.eclipse.org/bugs/.
3.
http://2011.msrconf.org/msr-challenge.html.
4.
https://netbeans.org/bugzilla/.
5.
https://bugzilla.mozilla.org/.
6.
https://www.bugzilla.org/docs/2.18/html/dbdoc.html.
7.
The profiles table contains personal information, such as names and email addresses of the project members.
8.
https://www.bugzilla.org/docs//2.18/html/dbdoc.html.
9.
In cases where a bug report triager is also an active developer, the person will have been assigned to issues that are marked as FIXED.
10.
Those terms tagged as either Noun {NN}, Noun Plural {NNS}, Proper Noun {NNP}, or Proper Noun Plural {NNPS}.
11.
http://spark.apache.org/.
12.
http://spark.apache.org/mllib/.
13.
http://www.scala-lang.org/.
14.
https://github.com/acflorea/columbugus.
15.
https://cloud.google.com/dataproc/.
16.
Without this reduction, the dataset was found to be too large for WEKA to process.

References

Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine. In: Fourth International Conference on Software Engineering Advances, ICSEA 2009, pp. 216–221, September 2009
Google Scholar
Anvik, J.: Automating bug report assignment. In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006, NY, USA, pp. 937–940 (2006). http://doi.acm.org/10.1145/1134285.1134457
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006, NY, USA, pp. 361–370 (2006). http://doi.acm.org/10.1145/1134285.1134336
Anvik, J., Murphy, G.C.: Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans. Softw. Eng. Methodol. 20(3), 10:1–10:35 (2011). http://doi.acm.org/10.1145/2000791.2000794
Banitaan, S., Alenezi, M.: Tram: an approach for assigning bug reports using their metadata. In: 2013 Third International Conference on Communications and Information Technology, pp. 215–219, June 2013
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012). http://dl.acm.org/citation.cfm?id=2188385.2188395
Bhattacharya, P., Neamtiu, I., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cavalcanti, Y.C., da Mota Silveira Neto, P.A., do Carmo Machado, I., Vale, T.F., de Almeida, E.S., de Lemos Meira, S.R.: Challenges and opportunities for software change request repositories: a systematic mapping study. J. Softw. Evol. Process 26(7), 620–653 (2014). http://dx.doi.org/10.1002/smr.1639
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). http://dx.doi.org/10.1023/A: 1022627411411
MATH Google Scholar
Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering (SEKE 2004), Banff, Alberta, Canada, 20–24 June 2004, pp. 92–97 (2004)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Harris, D., Harris, S.: Digital Design and Computer Architecture, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2012)
MATH Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)
Article Google Scholar
Nasim, S., Razzaq, S., Ferzund, J.: Automated change request triage using alpha frequency matrix. In: Frontiers of Information Technology (FIT), pp. 298–302, December 2011
Google Scholar
Nguyen, T.T., Nguyen, A.T., Nguyen, T.N.: Topic-based, time-aware bug assignment. SIGSOFT Softw. Eng. Notes 39(1), 1–4 (2014). http://doi.acm.org/10.1145/2557833.2560585
Article Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Book Google Scholar
Reis, C.R., de Mattos Fortes, R.P., Pontin, R., Fortes, M.: An overview of the software engineering process and tools in the mozilla project (2002)
Google Scholar
Shinnar, A., Cunningham, D., Saraswat, V., Herta, B.: M3r: Increased performance for in-memory Hadoop jobs. Proc. VLDB Endow. 5(12), 1736–1747 (2012). http://dx.doi.org/10.14778/2367502.2367513
Article Google Scholar
Shokripour, R., Anvik, J., Kasirun, Z.M., Zamani, S.: Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 2–11. IEEE Press, Piscataway (2013). http://dl.acm.org/citation.cfm?id=2487085.2487089
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009). http://dx.doi.org/10.1016/j.ipm.2009.03.002
Article Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003). http://dx.doi.org/10.3115/1073445.1073478
Wu, W., Zhang, W., Yang, Y., Wang, Q.: Drex: developer recommendation with k-nearest-neighbor search and expertise ranking. In: 2011 18th Asia Pacific Software Engineering Conference (APSEC), pp. 389–396, December 2011
Google Scholar
Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering, pp. 72–81, October 2013
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997). http://dl.acm.org/citation.cfm?id=645526.657137

Download references

Acknowledgment

The authors are grateful to the Mozilla Foundation for providing a dump of their Bugzilla database.

Author information

Authors and Affiliations

Electronics and Computers Department, Transilvania University of Braşov, Braşov, Romania
Adrian-Cătălin Florea & Răzvan Andonie
Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge, AB, Canada
John Anvik
Computer Science Department, Central Washington University, Ellensburg, WA, USA
Răzvan Andonie

Authors

Adrian-Cătălin Florea
View author publications
You can also search for this author in PubMed Google Scholar
John Anvik
View author publications
You can also search for this author in PubMed Google Scholar
Răzvan Andonie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian-Cătălin Florea .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Florea, AC., Anvik, J., Andonie, R. (2017). Spark-Based Cluster Implementation of a Bug Report Assignment Recommender System. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science(), vol 10246. Springer, Cham. https://doi.org/10.1007/978-3-319-59060-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-59060-8_4
Published: 24 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59059-2
Online ISBN: 978-3-319-59060-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics