Abstract
Nowadays, software developers typically search online for reusable solutions to common programming problems. However, forming the question appropriately, and locating and integrating the best solution back to the code can be tricky and time consuming. As a result, several mining systems have been proposed to aid developers in the task of locating reusable snippets and integrating them into their source code. Most of these systems, however, do not model the semantics of the snippets in the context of source code provided. In this work, we propose a snippet mining system, named StackSearch, that extracts semantic information from Stack Overlow posts and recommends useful and in-context snippets to the developer. Using a hybrid language model that combines Tf-Idf and fastText, our system effectively understands the meaning of the given query and retrieves semantically similar posts. Moreover, the results are accompanied with useful metadata using a named entity recognition technique. Upon evaluating our system in a set of common programming queries, in a dataset based on post links, and against a similar tool, we argue that our approach can be useful for recommending ready-to-use snippets to the developer.
Chapter PDF
Similar content being viewed by others
References
Baltes, S., Treude, C., Diehl, S.: SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets. In: Proceedings of the 16th International Conference on Mining Software Repositories. pp. 191–194. MSR ’19, IEEE Press, Piscataway, NJ, USA (2019)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017)
Brandt, J., Dontcheva, M., Weskamp, M., Klemmer, S.R.: Example-centric Programming: Integrating Web Search into the Development Environment. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 513–522. CHI ’10, ACM, New York, NY, USA (2010)
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based N-gram Models of Natural Language. Computational Linguistics 18(4), 467–479 (1992)
Buse, R.P.L., Weimer, W.: Synthesizing API Usage Examples. In: Proceedings of the 34th International Conference on Software Engineering. pp. 782–792. ICSE ’12, IEEE Press, Piscataway, NJ, USA (2012)
Diamantopoulos, T., Karagiannopoulos, G., Symeonidis, A.L.: CodeCatch: Extracting Source Code Snippets from Online Sources. In: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering. pp. 21–27. RAISE ’18, ACM, New York, NY, USA (2018)
Diamantopoulos, T., Sifaki, M.I., Symeonidis, A.L.: Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow. In: Proceedings of the 16th International Conference on Mining Software Repositories. p. 215–219. MSR ’19, IEEE Press (2019)
Diamantopoulos, T., Symeonidis, A.L.: Employing Source Code Information to Improve Question-answering in Stack Overflow. In: Proceedings of the 12th Working Conference on Mining Software Repositories. pp. 454–457. MSR ’15, IEEE Press, Piscataway, NJ, USA (2015)
Fowkes, J., Sutton, C.: Parameter-free Probabilistic API Mining across GitHub. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 254–265. FSE 2016, ACM, New York, NY, USA (2016)
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API Learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 631–642. FSE 2016, ACM, New York, NY, USA (2016)
Jiang, L., Misherghi, G., Su, Z., Glondu, S.: DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In: Proceedings of the 29th International Conference on Software Engineering. pp. 96–105. ICSE ’07, IEEE Computer Society, Washington, DC, USA (2007)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (2017)
Katirtzis, N., Diamantopoulos, T., Sutton, C.: Learning a Metric for Code Readability. In: 21th International Conference on Fundamental Approaches to Software Engineering. pp. 189–206. FASE 2018, Springer International Publishing, Boston, MA, USA (2018)
Kim, J., Lee, S., Hwang, S.w., Kim, S.: Towards an Intelligent Code Search Engine. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. pp. 1358–1363. AAAI ’10, AAAI Press, Palo Alto, CA, USA (2010)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning. pp. 282–289. ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)
Mandelin, D., Xu, L., BodÃk, R., Kimelman, D.: Jungloid Mining: Helping to Navigate the API Jungle. SIGPLAN Not. 40(6), 48–61 (2005)
Montandon, J.E., Borges, H., Felix, D., Valente, M.T.: Documenting APIs with Examples: Lessons Learned with the APIMiner Platform. In: Proceedings of the 20th Working Conference on Reverse Engineering. pp. 401–408. WCRE 2013, IEEE Computer Society, Piscataway, NJ, USA (2013)
Moreno, L., Bavota, G., Di Penta, M., Oliveto, R., Marcus, A.: How Can I Use This Method? In: Proceedings of the 37th International Conference on Software Engineering - Volume 1. pp. 880–890. ICSE ’15, IEEE Press, Piscataway, NJ, USA (2015)
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving Document Ranking with Dual Word Embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web. pp. 83–84. WWW ’16 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016)
Nguyen, T., Rigby, P.C., Nguyen, A.T., Karanfil, M., Nguyen, T.N.: T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 1013–1017. FSE 2016, ACM, New York, NY, USA (2016)
Ponzanelli, L., Bacchelli, A., Lanza, M.: Seahawk: Stack Overflow in the IDE. In: Proceedings of the 2013 International Conference on Software Engineering. pp. 1295–1298. ICSE ’13, IEEE Press, Piscataway, NJ, USA (2013)
Ponzanelli, L., Bavota, G., Di Penta, M., Oliveto, R., Lanza, M.l.: Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter. In: Proceedings of the 11th Working Conference on Mining Software Repositories. pp. 102–111. MSR 2014, ACM, New York, NY, USA (2014)
Raghothaman, M., Wei, Y., Hamadi, Y.: SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In: Proceedings of the 38th International Conference on Software Engineering. pp. 357–367. ICSE ’16, ACM, New York, NY, USA (2016)
Silva, R.F.G., Roy, C.K., Rahman, M.M., Schneider, K.A., Paixao, K., de Almeida Maia, M.: Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge. In: Proceedings of the 27th International Conference on Program Comprehension. p. 358–368. ICPC ’19, IEEE Press (2019)
Thummalapenta, S., Xie, T.: PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. pp. 204–213. ASE ’07, ACM, New York, NY, USA (2007)
Wang, J., Yu, L.C., Lai, K.R., Zhang, X.: Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 225–230. Association for Computational Linguistics, Berlin, Germany (2016)
Wang, J., Dang, Y., Zhang, H., Chen, K., Xie, T., Zhang, D.: Mining Succinct and High-Coverage API Usage Patterns from Source Code. In: Proceedings of the 10th Working Conference on Mining Software Repositories. pp. 319–328. MSR ’13, IEEE Press, Piscataway, NJ, USA (2013)
Wei, Y., Chandrasekaran, N., Gulwani, S., Hamadi, Y.: Building Bing Developer Assistant. Tech. Rep. MSR-TR-2015-36, Microsoft Research (2015)
Wightman, D., Ye, Z., Brandt, J., Vertegaal, R.: SnipMatch: Using Source Code Context to Enhance Snippet Retrieval and Parameterization. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology. pp. 219–228. UIST ’12, ACM, New York, NY, USA (2012)
Xie, T., Pei, J.: MAPO: Mining API Usages from Open Source Repositories. In: Proceedings of the 2006 International Workshop on Mining Software Repositories. pp. 54–57. MSR ’06, ACM, New York, NY, USA (2006)
Ye, D., Xing, Z., Foo, C.Y., Ang, Z.Q., Li, J., Kapre, N.: Software-Specific Named Entity Recognition in Software Engineering Social Content. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). vol. 1, pp. 90–101. IEEE Press (2016)
Ye, D., Xing, Z., Foo, C.Y., Li, J., Kapre, N.: Learning to Extract API Mentions from Informal Natural Language Discussions. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 389–399. IEEE Press (2016)
Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative Study of CNN and RNN for Natural Language Processing. arXiv:1702.01923 (2017)
Yu, M., Zhao, T., Dong, D., Tian, H., Yu, D.: Compound Embedding Features for Semi-supervised Learning. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 563–568. Association for Computational Linguistics, Atlanta, Georgia (2013)
Zagalsky, A., Barzilay, O., Yehudai, A.: Example Overflow: Using Social Media for Code Recommendation. In: Proceedings of the Third International Workshop on Recommendation Systems for Software Engineering. pp. 38–42. RSSE ’12, IEEE Press, Piscataway, NJ, USA (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Diamantopoulos, T., Oikonomou, N., Symeonidis, A. (2020). Extracting Semantics from Question-Answering Services for Snippet Reuse. In: Wehrheim, H., Cabot, J. (eds) Fundamental Approaches to Software Engineering. FASE 2020. Lecture Notes in Computer Science(), vol 12076. Springer, Cham. https://doi.org/10.1007/978-3-030-45234-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-45234-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45233-9
Online ISBN: 978-3-030-45234-6
eBook Packages: Computer ScienceComputer Science (R0)