Structured information in bug report descriptions—influence on IR-based bug localization and developers
- 8 Downloads
Multiple information retrieval (IR)-based bug localization techniques have been proposed over the last years. The foundation of the approaches relies on textual similarity of the bug report description and the source code files. The basic assumption is that these descriptions are well suited to query the code base. However, often bug reports contain structured information such as stack traces and source code next to natural language, which might interfere with the initial belief. In this paper, we systematically analyze the influence of structured information on IR-based techniques. Therefore, an empirical study on 7334 bug reports, out of which more than 30% contain structured information, was carried out. Based on the results, a follow-up user study was conducted focusing on source code fragments found in bug reports. Our results show that stack traces tend to negatively affect IR-based bug localization performance and require special handling. Compared to natural language–only reports, source code is beneficial for IR-based algorithms, as well as for developers to identify false positives in bug localization results.
KeywordsBug report structure Bug localization Information retrieval
We thank Mihaela Todorova Tomova and Mario Janke for their assistance in conducting the user study.
Our work is funded by the BMBF grant: 01IS16003B, DFG grant: MA 5030/3–1, the EU EFRE/TAB grant: 2015FE9033, and DLR grant: D/943/67258261.
- AmaLgam (2017). AmaLgam website. https://sites.google.com/site/wswshaoweiwang/.
- Bacchelli, A., Cleve, A., Lanza, M., Mocci, A. (2011). Extracting structured data from natural language documents with island parsing. In: International Conference on Automated Software Engineering (ASE.Google Scholar
- Bassil, S., & Keller, R.K. (2001). Software visualization tools: survey and analysis. In: 9th International Workshop on Program Comprehension (IWPC 2001), 12-13 May 2001, Toronto, Canada.Google Scholar
- Bettenburg, N, Premraj, R, Zimmermann, T, Kim, S. (2008). Extracting structural information from bug reports. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 2008.Google Scholar
- BLUiR. (2017). BLUiR website. http://www.riponsaha.com/BLUiR.html.
- Collberg, CS, Kobourov, SG, Nagra, J, Pitts, J, Wampler, K. (2003). A system for graph-based visualization of the evolution of software. In: Proceedings ACM 2003 Symposium on Software Visualization, San Diego, California, USA, June 11-13, 2003.Google Scholar
- Eick, S.G., Steffen, J.L., Sumner, E.E. Jr. (1992). Seesoft-a tool for visualizing line oriented software statistics. IEEE Trans Software Eng.Google Scholar
- Git SCM. (2018). Git SCM. http://www.git-scm.com.
- Gouveia, C, Campos, J, Abreu, R. (2013). Using HTML5 visualizations in software fault localization. In: 2013 First IEEE Working Conference on Software Visualization (VISSOFT), Eindhoven, The Netherlands, pp. 1–10.Google Scholar
- Grissom, RJ, & Kim, JJ. (2012). Effect sizes for research: univariate and multivariate applications. Routledge: Taylor & Francis Group.Google Scholar
- JIRA. (2018). Jira issue tracking software. http://www.jira.com.
- Kagdi, HH, Collard, ML, Maletic, JI. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance 19(2).Google Scholar
- Lukins, SK, Kraft, NA, Etzkorn, LH. (2010). Bug localization using latent Dirichlet allocation. Information & Software Technology 52(9).Google Scholar
- Mann, H.B., & Whitney, D.R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pp. 50–60.Google Scholar
- Marcus, A, & Maletic, JI. (2003). Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th International Conference on Software Engineering.Google Scholar
- Moreno, L, Treadway, JJ, Marcus, A, Shen, W. (2014). On the use of stack traces to improve text retrieval-based bug localization. In: 30th IEEE Int. Conference on Software Maintenance and Evolution.Google Scholar
- Parnin, C., & Orso, A. (2011). Are automated debugging techniques actually helping programmers? In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA.Google Scholar
- Rath, M., & Mäder, P. (2018). Replication data for: structured information in bug report descriptions — influence on IR-based bug localization and developers. https://bit.ly/2Es9hfT.
- Rath, M, Rempel, P, Mȧder, P. (2017). The IlmSeven dataset. In: 25th IEEE International Requirements Engineering Conference, RE.Google Scholar
- Rath, M, Lo, D, Mäder, P. (2018). Analyzing requirements and traceability information to improve bug localization. In 15th IEEE/ACM Working Conference on Mining Software Repositories MSR 2018. Gothenburg: ACM.Google Scholar
- Reps, TW, Ball, T, Das, M, Larus, JR. (1997). The use of program profiling for software maintenance with applications to the year 2000 problem. In: Software Engineering - ESEC/FSE ’97, 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on Foundations of Software Engineering, Zurich, Switzerland.Google Scholar
- Saha, RK, Lease, M, Khurshid, S, Perry, DE. (2013). Improving bug localization using structured information retrieval. In: 28th IEEE/ACM Int. Conference on Automated Software Engineering, ASE 2013.Google Scholar
- Storey, MD, Cubranic, D, Germȧn, D.M. (2005). On the use of visualization to support awareness of human activities in software development: a survey and a framework. In: Proceedings of the ACM 2005 Symposium on Software Visualization, St. Louis, Missouri, USA, May 14-15, 2005.Google Scholar
- Wang, Q, Parnin, C, Orso, A. (2015). Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015.Google Scholar
- Wang, S, & Lo, D. (2014). Version history, similar report, and structure: putting them together for improved bug localization. In: 22nd International Conference on Program Comprehension, ICPC 2014.Google Scholar
- Wang, S, & Lo, D. (2016). Amalgam+: composing rich information sources for accurate bug localization. Journal of Software: Evolution and Process 28(10).Google Scholar
- Wen, M, Wu, R, Cheung, S. (2016) In Lo, D, Apel, S, Khurshid, S (Eds.), Locus: locating bugs from software changes, (pp. 262–273). Singapore: ACM. https://doi.org/10.1145/2970276.2970359.
- Wong, C, Xiong, Y, Zhang, H, Hao, D, Zhang, L, Mei, H. (2014). Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: IEEE International Conference on Software Maintenance and Evolution.Google Scholar
- Xia, X, Bao, L, Lo, D, Li, S. (2016). Automated debugging considered harmful considered harmful: a user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME.Google Scholar
- Xie, X, Liu, Z, Song, S, Chen, Z, Xuan, J, Xu, B. (2016). Revisit of automatic debugging via human focus-tracking analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE.Google Scholar
- Ye, X, Shen, H, Ma, X, Bunescu, RC, Liu, C. (2016). From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016.Google Scholar
- Zhou, J, Zhang, H, Lo, D. (2012). Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th Int. Conf on Software Engineering, ICSE 2012.Google Scholar