Skip to main content

Comment-Mine—A Semantic Search Approach to Program Comprehension from Code Comments

  • Chapter
  • First Online:
Advanced Computing and Systems for Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1136))

Abstract

Annotating programs with natural language comments is a common programming practice to increase the readability of code. While researchers have attempted specific tasks like comment quality analysis or classification, there is absence of an integrated knowledge representation based on comments to aid program comprehension. We propose Comment-Mine a semantic search architecture, which extracts knowledge related to design, implementation and evolution of a software in the form of a knowledge graph. This supports program comprehension and various comment analysis tasks. We manually annotate concepts for 5600 comments extracted from 672 C/C++ files/projects crawled from code repositories like GitHub. Comment-Mine extracts 38,992 concepts, out of which 79.8% is correct and validated using manual annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dehaghani, S.M.H., Hajrahimi, N.: Which factors affect software projects maintenance cost more? Acta Inf. Med. Acad. Med. Sci. Bosnia Herzegovina 21(1), 63–72 (2013)

    Google Scholar 

  2. Koskinen, J.: Software maintenance costs. Information Technology Research Institute, University of Jyvaskyla, Tech. rep. (2003)

    Google Scholar 

  3. Singer, J., Lethbridge, T., Vinson, N.: An examination of software engineering work practices. In: CASCON High Impact Papers, pp. 174–188. IBM Corp. (2010)

    Google Scholar 

  4. Etzkorn, L.H., Davis, C.G., Bowen, L.L.: The language of comments in computer software: a sublanguage of english. J. Pragmatics 33(11), 1731–1756 (2001). (Elsevier)

    Article  Google Scholar 

  5. Abebe, S.L., Haiduc, S., Marcus, A., Tonella, P., Antoniol, G.: Analyzing the evolution of the source code vocabulary. In: European Conference on Software Maintenance and Reengineering (ESMR), pp. 189–198. IEEE, New York (2009)

    Google Scholar 

  6. Stroustrup, B., Sutter, H.: C++ core guidelines. http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines. Last Accessed 1 Aug 2019

  7. Github. https://github.com/

  8. Corazza, A., Maggio, V., Scanniello, G.: Coherence of comments and method implementations: a dataset and an empirical investigation. Software Qual. J. 26(2), 751–777 (2018)

    Article  Google Scholar 

  9. de Souza, S.C.B., Anquetil, N., de Oliveira, K.M.: A study of the documentation essential to software maintenance. In: International Conference on Design of Communication, pp. 68–75. ACM, New York (2005)

    Google Scholar 

  10. Aman, H., Amasaki, S., Yokogawa, T., Kawahara, M.: Empirical analysis of words in comments written for java methods. In: Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 375–379. IEEE, New York (2017)

    Google Scholar 

  11. Padioleau, Y., Tan, L., Zhou, Y.: Listening to programmers taxonomies and characteristics of comments in operating system code. In: International Conference on Software Engineering (ICSE), pp. 331–341. IEEE, New York (2009)

    Google Scholar 

  12. Haouari, D., Sahraoui, H., Langlais, P.: How good is your comment? a study of comments in Java programs. In: International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 137–146. IEEE, New York (2011)

    Google Scholar 

  13. Steidl, D., Hummel, B., Juergens, E.: Quality analysis of source code comments. In: International Conference on Program Comprehension (ICPC), pp. 83–92. IEEE, New York (2013)

    Google Scholar 

  14. Tan, L., Yuan, D., Krishna, G., Zhou, Y.: icomment: bugs or bad comments? In: Association for Computing Machinery’s Special Interest Group on Operating Systems Review (SIGOPS), pp. 145–158. ACM, New York (2007)

    Google Scholar 

  15. Miller, E.: An introduction to the resource description framework. Bull. Am. Soc. Inf. Sci. Technol. 25(1), 15–19 (1998). (Wiley Online Library)

    Article  MathSciNet  Google Scholar 

  16. Prud, E., Seaborne, A., et al.: SPARQL query language for RDF (2006)

    Google Scholar 

  17. Codeproject. https://www.codeproject.com/

  18. Begel, A., Zimmermann, T.: Analyze this! 145 questions for data scientists in software engineering. In: International Conference on Software Engineering (ICSE), pp. 12–23. ACM, New York (2014)

    Google Scholar 

  19. Sillito, J., Murphy, G.C., De Volder, K.: Asking and answering questions during a programming change task. IEEE Trans. Software Eng. 34(4), 434–451 (2008). (IEEE)

    Article  Google Scholar 

  20. Majumdar, S., Shakti, P., Das, P.P., Ghosh, S.: Smartkt: a search framework to assist program comprehension using smart knowledge transfer. In: International Conference on Software Quality, Reliability and Security (QRS), pp. 97–108. IEEE, New York (2019)

    Google Scholar 

  21. Pascarella, L., Bacchelli, A.: Classifying code comments in java open-source software systems. In: International Conference on Mining Software Repositories (MSR), pp. 227–237. IEEE, New York (2017)

    Google Scholar 

  22. Tan, L., Yuan, D., Zhou, Y.: Hotcomments: how to make program comments more useful? In: Conference on Programming Language Design and Implementation (SIGPLAN), pp. 20–27. ACM, New York (2007)

    Google Scholar 

  23. Howden, W.E.: Comments analysis and programming errors. IEEE Trans. Software Eng. 16(1), 72–81 (1990). (IEEE)

    Article  Google Scholar 

  24. Krancher, O., Dibbern, J.: Knowledge in software-maintenance outsourcing projects: beyond integration of business and technical knowledge. In: Hawaii International Conference on System Sciences, pp. 4406–4415. IEEE, New York (2015)

    Google Scholar 

  25. Lattner, C., Adve, V.: The llvm compiler framework and infrastructure tutorial. In: International Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 15–16. Springer, Berlin (2004)

    Google Scholar 

  26. De Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. Association for Computational Linguistics (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srijoni Majumdar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Majumdar, S., Papdeja, S., Das, P.P., Ghosh, S.K. (2020). Comment-Mine—A Semantic Search Approach to Program Comprehension from Code Comments. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 1136. Springer, Singapore. https://doi.org/10.1007/978-981-15-2930-6_3

Download citation

Publish with us

Policies and ethics