Abstract
Programmers often look for a “snippet,” that is, a small piece of example code, to remind themselves of how to solve a problem or to quickly learn about a new resource. However, existing tools such as general-purpose search engines and code-specific search engines do not deal well with searches for snippets. In this chapter, we present a prototype search engine designed to work with code snippets. Our approach is based on using the non-code text on a web page as metadata for the snippet to improve indexing and retrieval. We discuss some implementation issues that we encountered, which lead to lessons learned for others who follow. These issues include: extracting snippets from web pages, selecting and indexing metadata, matching query terms with multiple metadata indexes, and identifying a text summary to be used in the presentations of results.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
S. Bajracharya and C. Lopes. Mining search topics from a code search engine usage log. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, pages 111–120. IEEE Computer Society, 2009.
C. Fox. A stop list for general text, 1989.
James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java TM Language Specification. Addison-Wesley Professional, 3rd edition, 2005.
T. Grotton. Combining content extraction heuristics: The combine system. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pages 591–595, 2008.
Reid Holmes, Robert J. Walker, and Gail C. Murphy. Strathcona example recommendation tool. In Michel Wermelinger and Harald Gall, editors, ESEC/SIGSOFT FSE, pages 237–240. ACM, 2005.
Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300–336, 2009.
Merriam-Webster. Merriam-Webster’s 9th Collegiate Dictionary. Merriam-Webster. Springfield, MA, USA, 1992.
Michael McCandless, Erik Hatcher, and Otis Gospodnetić. Lucene in Action. Manning Publications, second edition, 2010.
M.F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A classifier ensemble method, 2006.
Susan Elliott Sim, Charles L. A. Clarke, and Richard C. Holt. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the Sixth International Workshop on Program Comprehension, page 180, Los Alamitos, CA, 1998. IEEE Computer Society.
Jeffrey Stylos and Brad A. Myers. Mica: A web-search tool for finding api components and examples. In IEEE Symposium on Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006, pages 195–202, Brighton, United Kingdom, 2006. IEEE.
Acknowledgements
This material is based upon work supported by the NSF under Grant No. IIS-0846034 and by the UCI Summer Undergraduate Research Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessary reflect the views of the NSF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Tantikul, P., Thompson, C.A., Gallardo-Valencia, R.E., Sim, S.E. (2013). Novel and Applied Algorithms in a Search Engine for Java Code Snippets. In: Sim, S.E., Gallardo-Valencia, R.E. (eds) Finding Source Code on the Web for Remix and Reuse. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6596-6_14
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6596-6_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6595-9
Online ISBN: 978-1-4614-6596-6
eBook Packages: Computer ScienceComputer Science (R0)