Advertisement

Using bug descriptions to reformulate queries during text-retrieval-based bug localization

  • Oscar ChaparroEmail author
  • Juan Manuel Florez
  • Andrian Marcus
Article
  • 42 Downloads

Abstract

Text Retrieval (TR)-based approaches for bug localization rely on formulating an initial query based on the full text of a bug report. When the query fails to retrieve the buggy code artifacts, developers can reformulate the query and retrieve more candidate code documents. Existing research on query reformulation focuses mostly on leveraging relevance feedback from the user or on expanding the original query with additional information. We hypothesize that the title of the bug reports, the observed behavior, expected behavior, steps to reproduce, and code snippets provided by the users in bug descriptions, contain the most relevant information for retrieving the buggy code artifacts, and that other parts of the descriptions contain more irrelevant terms, which hinder retrieval. This paper proposes and evaluates a set of query reformulation strategies based on the selection of existing information in bug descriptions, and the removal of irrelevant parts from the original query. The results show that selecting the bug report title and the observed behavior is the strategy that performs best across various TR-based bug localization approaches and code granularities, as it leads to retrieving the buggy code artifacts within the top-N results for 25.6% more queries (on average) than without query reformulation. This strategy is highly applicable and consistent across different thresholds N. Selecting the steps to reproduce or the expected behavior (when provided in the bug reports) along with the bug title and the observed behavior leads to higher performance (i.e., between 31.4% and 41.7% more queries) and comparable consistency, yet it is applicable in fewer cases. These reformulation strategies are easy to use and are independent of the underlying retrieval technique.

Keywords

Bug descriptions Query reformulation Bug localization Text retrieval 

Notes

Acknowledgments

This research was supported in part by the grants CCF-1848608 and CCF-1526118 from the US National Science Foundation.

References

  1. Ali N, Sabane A, Gueheneuc Y-G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’12), pp 174–183Google Scholar
  2. Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empir Softw Eng 17(4-5):424–466Google Scholar
  3. Bassett BR, Kraft NA (2013) Structural information based term weighting in text retrieval for feature location. In: Proceedings of the international conference on program comprehension (ICPC’13), pp 133–141Google Scholar
  4. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Comput Surv 44(1):1zbMATHGoogle Scholar
  5. Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the international conference on software engineering (ICSE’16), pp 716–718Google Scholar
  6. Chaparro O, Florez JM, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd international conference on software maintenance and evolution (ICSME’17), pp 376–387Google Scholar
  7. Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’17), pp 396–407Google Scholar
  8. Chaparro O, Florez JM, Marcus A (2018) Replication package. https://tinyurl.com/y7bzqnwc
  9. Damevski K, Shepherd D, Pollock L (2016) A field study of how developers locate features in source code. Empir Softw Eng 21(2):724–747Google Scholar
  10. Dao T, Zhang L, Na M (2017) How does execution information help with information-retrieval based bug localization? In: Proceedings of the international conference on program comprehension (ICPC’17), pp 241–250Google Scholar
  11. Davies S, Roper M, Wood M (2012) Using bug report similarity to enhance bug localisation. In: Proceedings of the working conference on reverse engineering (WCRE’12), pp 125–134Google Scholar
  12. Davies S, Roper M (2014) What’s in a bug report? In: Proceedings of the international, symposium on empirical software engineering and measurement (ESEM’14), pp 26:1–26:10Google Scholar
  13. De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability. Springer, pp 71–98Google Scholar
  14. Dietrich T, Cleland-Huang J, Shin Y (2013) Learning effective query transformations for enhanced requirements trace retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 586–591Google Scholar
  15. Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: Proceedings of the international conference on mining software repositories (MSR’16), pp 286–290Google Scholar
  16. Dit B, Revelle M, Gethers M, Poshyvanyk D (2012) Feature location in source code A taxonomy and survey. J Softw Evol Process 25(1):53–95Google Scholar
  17. Eddy BP, Kraft NA, Gray J (2018) Impact of structural weighting on a latent dirichlet allocation–based feature location technique. J Softw Evol Process 30(1):e1892Google Scholar
  18. Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 351–360Google Scholar
  19. Ge X, Shepherd DC, Damevski K, Murphy-Hill E (2017) Design and evaluation of a multi-recommendation system for local code search. J Vis Lang Comput 39:1–9Google Scholar
  20. Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the international conference on automated software engineering (ASE’10), pp 245–254Google Scholar
  21. Guo J, Gibiec M, Cleland-Huang J (2017) Tackling the term-mismatch problem in automated trace retrieval. Empir Softw Eng 22(3):1103–1142Google Scholar
  22. Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies Tim (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’13), pp 842–851Google Scholar
  23. Hatcher E, Gospodnetic O (2004) Lucene in action. Manning PublicationsGoogle Scholar
  24. Hill E, Roldan-Vega M, Fails JA, Mallet G (2014) Nl-based query refinement and contextualized code search results: A user study. In: Proceedings of the conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE’14), pp 34–43Google Scholar
  25. Hoang TV, Oentaryo RJ, Le TB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Transactions on Software Engineering. (to appear)Google Scholar
  26. Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, New YorkzbMATHGoogle Scholar
  27. Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the international symposium on software testing and analysis (ISSTA’14). ACM, pp 437–440Google Scholar
  28. Kevic K, Fritz T (2014) Automatic search term identification for change tasks. In: Proceedings of the international conference on software engineering (ICSE’14), pp 468–471Google Scholar
  29. Lemos OAL, de Paula AC, Sajnani H, Lopes CV (2015) Can the use of types and query expansion help improve large-scale code search? In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’15), pp 41–50Google Scholar
  30. Le T-DB, Thung F, Lo D (2014) Predicting effectiveness of ir-based bug localization techniques. In: Proceedings of the 25th international symposium on software reliability engineering (ISSRE’14), pp 335–345Google Scholar
  31. Le T-DB, Oentaryo RJ, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’15), pp 579–590Google Scholar
  32. Lee J, Kim D, Tegawendé F, Jung Bissyandé W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of ir-based bug localization. In: Proceedings of the 27th international symposium on software testing and analysis (ISSTA’18) ISSTA 2018, pp 61–72Google Scholar
  33. Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings of the Asia-Pacific symposium on internetware (Internetware’16), pp 36–44Google Scholar
  34. Lu XA, Keefer RB (1995) Query expansion/reduction and its impact on retrieval effectiveness. NIST Special Publication, pp 231–231Google Scholar
  35. Lucene Apache (2017) https://lucene.apache.org/
  36. Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) Codehow: effective code search based on api understanding and extended boolean model. In: Proceedings of the international conference on automated software engineering (ASE’15), pp 260–270Google Scholar
  37. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of the annual meeting of the association for computational linguistics (ACL’14), pp 55–60Google Scholar
  38. Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the working conference on reverse engineering (WCRE’04), pp 214–223Google Scholar
  39. Marcus A, Haiduc S (2013) Text retrieval approaches for concept location in source code. In: Software Engineering: International Summer Schools, ISSSE 2009-2011, Salerno, Italy. Revised Tutorial Lectures, volume 7171 of Lecture Notes in Computer Science. Springer, pp 126–158Google Scholar
  40. Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, De Lucia A (2017) Predicting query quality for applications of text retrieval to software engineering tasks. Trans Softw Eng Methodol 26(1):3:1–3:45Google Scholar
  41. Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization? In: Proceedings of the 34th IEEE international conference on software maintenance and evolution (ICSME’18), pp 410–421Google Scholar
  42. Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 151–160Google Scholar
  43. Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the international conference on automated software engineering (ASE’11), pp 263–272Google Scholar
  44. Nichols BD (2010) Augmented bug localization using past bug information. In: Proceedings of the annual southeast regional conference (ACMSE’10), pp 1–6Google Scholar
  45. Nie L, He J, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783Google Scholar
  46. Ponzanelli L, Mocci A, Lanza M (2015) Stormed: stack overflow ready made data. In: Proceedings of 12th working conference on mining software repositories (MSR’15), pp 474–477Google Scholar
  47. Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137Google Scholar
  48. Rahman MM, Roy CK (2016) Quickar: automatic query reformulation for concept location using crowdsourced knowledge. In: Proceedings of the international conference on automated software engineering (ASE’16), pp 220–225Google Scholar
  49. Rahman MM, Roy CK (2017a) Strict: information retrieval based search term identification for concept location. In: Proceeding of the conference on software analysis, evolution, and reengineering (SANER’17), pp 79–90Google Scholar
  50. Rahman MM, Roy CK (2017b) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the international conference on automated software engineering (ASE’17). IEEE Press, pp 428–439Google Scholar
  51. Rahman Md M, Barson J, Paul S, Kayani J, Lois FA, Quezada SF, Parnin C, Stolee KT, Ray B (2018a) Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings of the 15th international conference on mining software repositories (MSR’18), pp 465–475Google Scholar
  52. Rahman MM, Roy CK (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th joint meeting on foundations of software engineering (ESEC/FSE’18). (to appear)Google Scholar
  53. Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the working conference on mining software repositories (MSR’11), pp 43–52Google Scholar
  54. Rath M, Lo D, Mäder P (2018) Analyzing requirements and traceability information to improve bug localization. In: Proceedings of the working conference on mining software repositories (MSR’18). ACMGoogle Scholar
  55. Roldan-Vega M, Mallet G, Hill E, Fails JA (2013) Conquer: a tool for nl-based query refinement and contextualizing code search results. In: Proceedings of the international conference on software maintenance (ICSM’13), pp 512–515Google Scholar
  56. Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 345–355Google Scholar
  57. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620zbMATHGoogle Scholar
  58. Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572Google Scholar
  59. Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the international conference on aspect-oriented software development (AOSD’07), pp 212–224Google Scholar
  60. Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648Google Scholar
  61. Sim SE, Umarji M, Ratanotayanon S, Lopes CV (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol 21(1):4Google Scholar
  62. Sisman B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization. In: Proceedings of the working conference on mining software repositories (MSR’12), pp 50–59Google Scholar
  63. Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the working conference on mining software repositories (MSR’13), pp 309–318Google Scholar
  64. Sisman B, Akbar SA, Kak AC (2016) Exploiting spatial code proximity and order for improved source code retrieval for bug localization. J Softw Evol Process 29 (1):e1805Google Scholar
  65. Starke J, Luce C, Sillito J (2009) Searching and skimming: an exploratory study. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 157–166Google Scholar
  66. Takahashi A, Sae-Lim N, Hayashi S, Motoshi S (2018) Preliminary study on using code smells to improve bug localization. In: Proceedings of the international conference on program comprehension (ICPC’18). ACM, p 4Google Scholar
  67. Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension (ICPC’14), pp 53–63Google Scholar
  68. Wang S, Lo D, Lawall J (2014a) Compositional vector space models for improved bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 171–180Google Scholar
  69. Wang S, Lo D, Jiang L (2014b) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE’14), pp 677–682Google Scholar
  70. Wang S, Lo D (2016) Amalgam+: composing rich information sources for accurate bug localization. J Softw Evol Process 28(10):921–942Google Scholar
  71. Wen M, Wu R, Cheung S (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st international conference on automated software engineering (ASE’16), pp 262–273Google Scholar
  72. Wong C-P, Xiong Y, Zhang H, Hao D, Lu Z, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 181–190Google Scholar
  73. Xiao Y, Keung J, Bennin KE, Mi Q (2018) Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software TechnologyGoogle Scholar
  74. Ye X, Bunescu R, Liu C (2016a) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402Google Scholar
  75. Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’16), pp 404–415Google Scholar
  76. Youm KC, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192Google Scholar
  77. Zhang Y, Lo D, Xia X, Le TDB, Scanniello G, Sun J (2016) Inferring links between concerns and methods with multi-abstraction vector space model. In: Proceedings of the international conference on software maintenance and evolution (ICSME’16), pp 110–121Google Scholar
  78. Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the international conference on software engineering (ICSE’12), pp 14–24Google Scholar
  79. Yu Z, Tong Y, Chen T, Han J (2017) Augmenting bug localization with part-of-speech and invocation. Int J Softw Eng Knowl Eng 27(6):925–949Google Scholar
  80. Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceThe University of Texas at DallasRichardsonUSA

Personalised recommendations