Skip to main content

Text Retrieval Approaches for Concept Location in Source Code

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7171))

Abstract

Concept location in source code is an essential activity during software change. It starts with a change request and results in a place in the source code where the change is to be implemented. As a program comprehension activity, it is also part of other software evolution tasks, such as, bug localization, recovery of traceability links between software artifacts, retrieving software components for reuse, etc. While concept location is primarily a human activity, tool support is necessary given the large amount of information encoded in source code. Many such tools rely on text retrieval techniques and help developers perform concept location much like document retrieval on web. This paper presents and discusses the applications of text retrieval to support concept location, in the context of software change.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rajlich, V.: Intensions are a Key to Program Comprehension. In: International Conference on Program Comprehension, pp. 1–9 (2009)

    Google Scholar 

  2. Biggerstaff, T.J., Mitbander, B.G., Webster, D.E.: The Concept Assignment Problem in Program Understanding. In: 15th IEEE/ACM International Conference on Software Engineering, pp. 482–498 (1994)

    Google Scholar 

  3. Rajlich, V., Wilde, N.: The Role of Concepts in Program Comprehension. In: IEEE International Workshop on Program Comprehension, pp. 271–278. IEEE Computer Society Press (2002)

    Google Scholar 

  4. Wilde, N., et al.: Locating User Functionality in Old Code. In: IEEE International Conference on Software Maintenance, pp. 200–205 (1992)

    Google Scholar 

  5. Robillard, M.P., Murphy, G.C.: Representing concerns in source code. ACM Transactions on Software Engineering and Methodology 16(1) (2007)

    Google Scholar 

  6. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)

    Google Scholar 

  7. Rajlich, V., Gosavi, P.: Incremental Change in Object-Oriented Programming. IEEE Software 21(4), 62–69 (2004)

    Article  Google Scholar 

  8. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  9. Porter, M.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  10. Gay, G., et al.: On the Use of Relevance Feedback in IR-Based Concept Location. In: IEEE International Conference on Software Maintenance, pp. 351–360 (2009)

    Google Scholar 

  11. Dit, B., et al.: Can Better Identifier Splitting Techniques Help Feature Location? In: 19th IEEE International Conference on Program Comprehension, pp. 11–20 (2011)

    Google Scholar 

  12. Poshyvanyk, D., et al.: Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification. In: 14th IEEE International Conference on Program Comprehension, pp. 137–146 (2006)

    Google Scholar 

  13. Poshyvanyk, D., et al.: Feature Location using Probabilistic Ranking of Methods based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering 33(6), 420–432 (2007)

    Article  Google Scholar 

  14. Poshyvanyk, D., Marcus, A.: Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code. In: 15th IEEE International Conference on Program Comprehension, pp. 37–46. IEEE Computer Society (2007)

    Google Scholar 

  15. Liu, D., et al.: Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace. In: 22nd IEEE/ACM International Conference on Automated Software Engineering, pp. 234–243 (2007)

    Google Scholar 

  16. Cleary, B., et al.: An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering 14(1), 93–130 (2009)

    Article  Google Scholar 

  17. Scanniello, G., Marcus, A.: Clustering Support for Static Concept Location in Source Code. In: 19th IEEE International Conference on Program Comprehension, pp. 1–10 (2011)

    Google Scholar 

  18. Asadi, F., et al.: A Heuristic-based Approach to Identify Concepts in Execution Traces. In: 14th European Conference on Software Maintenance and Reengineering, pp. 31–40 (2010)

    Google Scholar 

  19. Cleary, B., Exton, C.: Assisting Concept Location in Software Comprehension. In: 19th Psychology of Programming Workshop, pp. 42–55 (2007)

    Google Scholar 

  20. Eaddy, M., et al.: CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis. In: 17th IEEE International Conference on Program Comprehension, pp. 53–62 (2008)

    Google Scholar 

  21. Hayashi, S., Sekine, K., Saeki, M.: iFL: An Interactive Environment for Understanding Feature Implementations. In: 26th IEEE International Conference on Software Maintenance, pp. 1–5 (2010)

    Google Scholar 

  22. Lukins, S.K., Kraft, N.A., Etzkorn, L.H.: Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation. In: 15th Working Conference on Reverse Engineering, pp. 155–164 (2008)

    Google Scholar 

  23. Lukins, S.K., Kraft, N.A., Etzkorn, L.H.: Bug localization using Latent Dirichlet Allocation. Information and Software Technology 52, 972–990 (2010)

    Article  Google Scholar 

  24. Nichols, B.D.: Augmented bug localization using past bug information. In: 48th ACM Annual Southeast Regional Conference, pp. 1–6 (2010)

    Google Scholar 

  25. Peng, X., et al.: Iterative Context-Aware Feature Location. In: 33rd International Conference on Software Engineering, NIER Track, pp. 900–903 (2011)

    Google Scholar 

  26. Ratanotayanon, S., Choi, H.J., Sim, S.E.: My Repository Runneth Over: An Empirical Study on Diversifying Data Sources to Improve Feature Search. In: 18th IEEE International Conference on Program Comprehension, pp. 206–305 (2010)

    Google Scholar 

  27. Revelle, M., Poshyvanyk, D.: An Exploratory Study on Assessing Feature Location Techniques. In: 17th IEEE International Conference on Program Comprehension, pp. 218–222 (2009)

    Google Scholar 

  28. Revelle, M., Dit, B., Poshyvanyk, D.: Using Data Fusion and Web Mining to Support Feature Location in Software. In: 18th IEEE International Conference on Program Comprehension, pp. 14–23 (2010)

    Google Scholar 

  29. Shao, P., Smith, R.K.: Feature location by IR modules and call graph. In: 47th ACM Annual Southeast Regional Conference (2009)

    Google Scholar 

  30. Zhao, W., et al.: SNIAFL: towards a static non-interactive approach to feature location. In: 26th International Conference on Software Engineering, pp. 293–303 (2004)

    Google Scholar 

  31. Ahn, S.-Y., et al.: A Weighted Call Graph Approach for Finding Relevant Components in Source Code. In: 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, pp. 539–544 (2009)

    Google Scholar 

  32. Zhao, W., et al.: SNIAFL: Towards a Static Non-interactive Approach to Feature Location. ACM Transactions on Software Engineering and Methodologies 15(2), 195–226 (2006)

    Article  Google Scholar 

  33. Marcus, A., et al.: An Information Retrieval Approach to Concept Location in Source Code. In: 11th IEEE Working Conference on Reverse Engineering, pp. 214–223 (2004)

    Google Scholar 

  34. Cubranic, D., et al.: Learning from project history: a case study for software development. In: ACM Conference on Computer Supported Cooperative Work, pp. 82–91 (2004)

    Google Scholar 

  35. Cubranic, D., et al.: Hipikat: A Project Memory for Software Development. IEEE Transactions on Software Engineering 31(6), 446–465 (2005)

    Article  Google Scholar 

  36. Marcus, A., et al.: Static Techniques for Concept Location in Object-Oriented Code. In: 13th IEEE International Workshop on Program Comprehension, pp. 33–42 (2005)

    Google Scholar 

  37. Enslen, E., et al.: Mining Source Code to Automatically Split Identifiers for Software Analysis. In: 6th IEEE Working Conference on Mining Software Repositories, pp. 71–80 (2009)

    Google Scholar 

  38. Poshyvanyk, D., et al.: IRiSS - A Source Code Exploration Tool. In: 21st IEEE International Conference on Software Maintenance, pp. 69–72 (2005)

    Google Scholar 

  39. Poshyvanyk, D., Marcus, A., Dong, Y.: JIRiSS - an Eclipse plug-in for Source Code Exploration. In: 14th IEEE International Conference on Program Comprehension, pp. 252–255 (2006)

    Google Scholar 

  40. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)

    Google Scholar 

  41. Cubranic, D., Murphy, G.C.: Hipikat: Recommending pertinent software development artifacts. In: 25th International Conference on Software Engineering, pp. 408–418 (2003)

    Google Scholar 

  42. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  43. Hatcher, E., Gospodnetić, O.: Lucene in Action. Manning Publications (2004)

    Google Scholar 

  44. Savage, T., Revelle, M., Poshyvanyk, D.: FLAT^3: Feature Location and Textual Tracing Tool. In: 32nd ACM/IEEE International Conference on Software Engineering, Tool Demo, pp. 255–258 (2010)

    Google Scholar 

  45. Deerwester, S., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  46. Dit, B.: Monitoring the Searching and Browsing Behavior of Developers in Eclipse during Concept Location. Department of Computer Science, Wayne State University, Detroit (2009)

    Google Scholar 

  47. Hofmann, T.: From Latent Semantic Indexing to Language Models and Back. In: Workshop on Language Modeling and Information Retrieval (2001)

    Google Scholar 

  48. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)

    Google Scholar 

  49. Cleary, B., Exton, C.: The Cognitive Assignment Eclipse Plug-in. In: 14th IEEE International Conference on Program Comprehension, pp. 241–244 (2006)

    Google Scholar 

  50. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  51. Kuhn, A., Ducasse, S., Girba, T.: Semantic Clustering: Identifying Topics in Source Code. Information and Software Technology 49(3), 230–243 (2007)

    Article  Google Scholar 

  52. Ohlemacher, S., Marcus, A.: Towards a Benchmark and Automatic Calibration for IR-based Concept Location. In: 19th IEEE International Conference on Program Comprehension, pp. 246–249 (2011)

    Google Scholar 

  53. Henninger, S.: Using iterative refinement to find reusable software. IEEE Software 11(5), 48–59 (1994)

    Article  Google Scholar 

  54. Furnas, G.W., et al.: The Vocabulary Problem in Human-System Communication. Communications of the ACM 30(11), 964–971 (1987)

    Article  Google Scholar 

  55. Starke, J., Luce, C., Sillito, J.: Searching and Skimming: An Exploratory Study. In: International Conference on Software Maintenance, pp. 157–166 (2009)

    Google Scholar 

  56. Song, D., Bruza, P.: Towards Context-sensitive Information Inference. Journal of the American Soceity for Information Science and Technology 4, 321–334 (2003)

    Article  Google Scholar 

  57. Haiduc, S., Marcus, A.: On the Effect of the Query in IR-based Concept Location. In: 19th IEEE International Conference on Program Comprehension, pp. 234–237 (2011)

    Google Scholar 

  58. Antoniol, G., Gueheneuc, Y.G.: Feature Identification: An Epidemiological Metaphor. IEEE Transactions on Software Engineering 32(9), 627–641 (2006)

    Article  Google Scholar 

  59. Marcus, A., Poshyvanyk, D.: The Conceptual Cohesion of Classes. In: 21st IEEE International Conference on Software Maintenance, pp. 133–142 (2005)

    Google Scholar 

  60. Ratanotayanon, S., Choi, H.J., Elliott Sim, S.: Using transitive changesets to support feature location. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 341–344 (2010)

    Google Scholar 

  61. Kagdi, H., et al.: Assigning change requests to software developers. Journal of Software Maintenance and Evolution: Research and Practice (2011) (to appear)

    Google Scholar 

  62. Poshyvanyk, D., Petrenko, M., Marcus, A.: Integrating COTS Search Engines into Eclipse: Google Desktop Case Study. In: Proceedings of the 2nd International ICSE 2007 Workshop on Incorporating COTS Software Into Software Systems: Tools and Techniques, pp. 6–10 (2007)

    Google Scholar 

  63. Rao, S., Kak, A.: Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: 8th Working Conference on Mining Software Repositories, pp. 43–52 (2011)

    Google Scholar 

  64. Chen, K., Vaclav, R.: RIPPLES: Tool for Change in Legacy Software. In: International Conference on Software Maintenance, pp. 230–239 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Marcus, A., Haiduc, S. (2013). Text Retrieval Approaches for Concept Location in Source Code. In: De Lucia, A., Ferrucci, F. (eds) Software Engineering. ISSSE ISSSE ISSSE 2010 2009 2011. Lecture Notes in Computer Science, vol 7171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36054-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36054-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36053-4

  • Online ISBN: 978-3-642-36054-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics