Wuhan University Journal of Natural Sciences

, Volume 24, Issue 5, pp 391–399 | Cite as

A Query Expansion Method Based on Evolving Source Code

  • Huan JinEmail author
  • Lei Xiong
Computer Science


The existing query expansion (QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts (EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine (SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge (QECK) by 13%–16% in terms of precision when the first query result is inspected.

Key words

code search query expansion crowd knowledge evolving context 

CLC number

TP 311 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering [C]// Proc 35th International Conference on Software Engineering (ICSE). Piscataway: IEEE, 2013: 842–851.Google Scholar
  2. [2]
    Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse [C]// Proc 13th International Conference on Software Engineering. Piscataway: IEEE, 1991: 318–328.Google Scholar
  3. [3]
    Lv F, Zhang H Y, Lou J G, et al. CodeHow: Effective code search based on API understanding and extended boolean model (E) [C]// Proc 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE Press, 2015: 260–270.Google Scholar
  4. [4]
    Nie L, Jiang H, Ren Z, et al. Query expansion based on crowd knowledge for code search [C]// IEEE Transactions on Services Computing. Piscataway: IEEE, 2016: 771–783.Google Scholar
  5. [5]
    Cutting D. Java code examples for org.apache.lucene. index.index writer[EB/OL]. [2019-01-02].
  6. [6]
  7. [7]
  8. [8]
    Fluri B, Wursch M, Pinzger M, et al. Tools-changedistiller [EB/OL]. [2019-01-02].
  9. [9]
    Google. Crystalsaf [EB/OL]. [2019-01-02].
  10. [10]
    Fluri B, Wursch M, Pinzger M, et al. Change distilling—Tree differencing for fine-grained source code change extraction [C]// IEEE Transactions on Software Engineering. Piscataway: IEEE, 2007: 725–743.Google Scholar
  11. [11]
    Keivanloo I, Rilling J, Zou Y. Spotting working code examples [C]// Proc 36th International Conference on Software Engineering. Piscataway: IEEE, 2014: 664–675.Google Scholar
  12. [12]
    Eclipse Foundation. Eclipse.jdt.core [EB/OL]. [2019-01-02].
  13. [13]
    Sun X, Liu X, Hu J, et al. Empirical studies on the NLP techniques for source code data preprocessing [C]// Proc 3rd International Workshop on Evidential Assessment of Software Technologies. New York: ACM, 2014: 32–39.Google Scholar
  14. [14]
  15. [15]
    Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval [M]. Cambridge: Cambridge University Press, 2008.CrossRefGoogle Scholar
  16. [16]
    Fluri B, Wursch M, Pinzger M, et al. Tools-changedistiller [EB/OL]. [2019-01-02].
  17. [17]
    Joachims T. Training linear SVMs in linear time [C]// Proc 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 217–226.Google Scholar
  18. [18]
    Mcmillan C, Poshyvanyk D, Grechanik M, et al. Portfolio: Searching for relevant functions and their usages in millions of lines of code [C]// ACM Transactions on Software Engineering and Methodology (TOSEM). New York: ACM, 2013: 1–30.Google Scholar
  19. [19]
    Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes [C]// Proc 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016: 511–22.Google Scholar
  20. [20]
    Eclipse Foundation. Eclipse Documentation [EB/OL]. [2019-01-02].
  21. [21]
    Salton G, Fox E A, Wu H. Extended boolean information retrieval [J]. Communication of the ACM, 1983, 26: 1022–1036.CrossRefGoogle Scholar
  22. [22]
    Stack Exchange Inc. Files for stackexchange [EB/OL]. [2019-01-02].
  23. [23]
  24. [24]
    Cutting D. Using Apache Lucene to search [EB/OL]. [2019-01-02]

Copyright information

© Wuhan University and Springer-Verlag GmbH Germany 2019

Authors and Affiliations

  1. 1.College of Information EngineeringJiangxi University of TechnologyNanchang, JiangxiChina
  2. 2.Center of Collaboration and InnovationJiangxi University of TechnologyNanchang, JiangxiChina

Personalised recommendations