Structured information in bug report descriptions—influence on IR-based bug localization and developers

  • Michael RathEmail author
  • Patrick Mäder


Multiple information retrieval (IR)-based bug localization techniques have been proposed over the last years. The foundation of the approaches relies on textual similarity of the bug report description and the source code files. The basic assumption is that these descriptions are well suited to query the code base. However, often bug reports contain structured information such as stack traces and source code next to natural language, which might interfere with the initial belief. In this paper, we systematically analyze the influence of structured information on IR-based techniques. Therefore, an empirical study on 7334 bug reports, out of which more than 30% contain structured information, was carried out. Based on the results, a follow-up user study was conducted focusing on source code fragments found in bug reports. Our results show that stack traces tend to negatively affect IR-based bug localization performance and require special handling. Compared to natural language–only reports, source code is beneficial for IR-based algorithms, as well as for developers to identify false positives in bug localization results.


Bug report structure Bug localization Information retrieval 



We thank Mihaela Todorova Tomova and Mario Janke for their assistance in conducting the user study.

Funding information

Our work is funded by the BMBF grant: 01IS16003B, DFG grant: MA 5030/3–1, the EU EFRE/TAB grant: 2015FE9033, and DLR grant: D/943/67258261.


  1. AmaLgam (2017). AmaLgam website.
  2. Bacchelli, A., Cleve, A., Lanza, M., Mocci, A. (2011). Extracting structured data from natural language documents with island parsing. In: International Conference on Automated Software Engineering (ASE.Google Scholar
  3. Bassil, S., & Keller, R.K. (2001). Software visualization tools: survey and analysis. In: 9th International Workshop on Program Comprehension (IWPC 2001), 12-13 May 2001, Toronto, Canada.Google Scholar
  4. Bettenburg, N, Premraj, R, Zimmermann, T, Kim, S. (2008). Extracting structural information from bug reports. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 2008.Google Scholar
  5. BLUiR. (2017). BLUiR website.
  6. Cliff, N. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3), 494.CrossRefGoogle Scholar
  7. Collberg, CS, Kobourov, SG, Nagra, J, Pitts, J, Wampler, K. (2003). A system for graph-based visualization of the evolution of software. In: Proceedings ACM 2003 Symposium on Software Visualization, San Diego, California, USA, June 11-13, 2003.Google Scholar
  8. Eick, S.G., Steffen, J.L., Sumner, E.E. Jr. (1992). Seesoft-a tool for visualizing line oriented software statistics. IEEE Trans Software Eng.Google Scholar
  9. Git SCM. (2018). Git SCM.
  10. Gouveia, C, Campos, J, Abreu, R. (2013). Using HTML5 visualizations in software fault localization. In: 2013 First IEEE Working Conference on Software Visualization (VISSOFT), Eindhoven, The Netherlands, pp. 1–10.Google Scholar
  11. Grissom, RJ, & Kim, JJ. (2012). Effect sizes for research: univariate and multivariate applications. Routledge: Taylor & Francis Group.Google Scholar
  12. JIRA. (2018). Jira issue tracking software.
  13. Kagdi, HH, Collard, ML, Maletic, JI. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance 19(2).Google Scholar
  14. Kruskal, WH, & Wallis, WA. (1952). Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, 47(260), 583–621.CrossRefzbMATHGoogle Scholar
  15. Lukins, SK, Kraft, NA, Etzkorn, LH. (2010). Bug localization using latent Dirichlet allocation. Information & Software Technology 52(9).Google Scholar
  16. Mann, H.B., & Whitney, D.R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pp. 50–60.Google Scholar
  17. Marcus, A, & Maletic, JI. (2003). Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th International Conference on Software Engineering.Google Scholar
  18. Moreno, L, Treadway, JJ, Marcus, A, Shen, W. (2014). On the use of stack traces to improve text retrieval-based bug localization. In: 30th IEEE Int. Conference on Software Maintenance and Evolution.Google Scholar
  19. Parnin, C., & Orso, A. (2011). Are automated debugging techniques actually helping programmers? In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA.Google Scholar
  20. Rath, M., & Mäder, P. (2018). Replication data for: structured information in bug report descriptions — influence on IR-based bug localization and developers.
  21. Rath, M, Rempel, P, Mȧder, P. (2017). The IlmSeven dataset. In: 25th IEEE International Requirements Engineering Conference, RE.Google Scholar
  22. Rath, M, Lo, D, Mäder, P. (2018). Analyzing requirements and traceability information to improve bug localization. In 15th IEEE/ACM Working Conference on Mining Software Repositories MSR 2018. Gothenburg: ACM.Google Scholar
  23. Reps, TW, Ball, T, Das, M, Larus, JR. (1997). The use of program profiling for software maintenance with applications to the year 2000 problem. In: Software Engineering - ESEC/FSE ’97, 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on Foundations of Software Engineering, Zurich, Switzerland.Google Scholar
  24. Saha, RK, Lease, M, Khurshid, S, Perry, DE. (2013). Improving bug localization using structured information retrieval. In: 28th IEEE/ACM Int. Conference on Automated Software Engineering, ASE 2013.Google Scholar
  25. Storey, MD, Cubranic, D, Germȧn, D.M. (2005). On the use of visualization to support awareness of human activities in software development: a survey and a framework. In: Proceedings of the ACM 2005 Symposium on Software Visualization, St. Louis, Missouri, USA, May 14-15, 2005.Google Scholar
  26. Wang, Q, Parnin, C, Orso, A. (2015). Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015.Google Scholar
  27. Wang, S, & Lo, D. (2014). Version history, similar report, and structure: putting them together for improved bug localization. In: 22nd International Conference on Program Comprehension, ICPC 2014.Google Scholar
  28. Wang, S, & Lo, D. (2016). Amalgam+: composing rich information sources for accurate bug localization. Journal of Software: Evolution and Process 28(10).Google Scholar
  29. Wen, M, Wu, R, Cheung, S. (2016) In Lo, D, Apel, S, Khurshid, S (Eds.), Locus: locating bugs from software changes, (pp. 262–273). Singapore: ACM.
  30. Wong, C, Xiong, Y, Zhang, H, Hao, D, Zhang, L, Mei, H. (2014). Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: IEEE International Conference on Software Maintenance and Evolution.Google Scholar
  31. Xia, X, Bao, L, Lo, D, Li, S. (2016). Automated debugging considered harmful considered harmful: a user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME.Google Scholar
  32. Xie, X, Liu, Z, Song, S, Chen, Z, Xuan, J, Xu, B. (2016). Revisit of automatic debugging via human focus-tracking analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE.Google Scholar
  33. Ye, X, Shen, H, Ma, X, Bunescu, RC, Liu, C. (2016). From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016.Google Scholar
  34. Zhou, J, Zhang, H, Lo, D. (2012). Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th Int. Conf on Software Engineering, ICSE 2012.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Technische Universitat IlmenauIlmenauGermany

Personalised recommendations