Skip to main content
Log in

Summarizing Software Artifacts: A Literature Review

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lloret E, Palomar M. Text summarisation in progress: A literature review. Artificial Intelligence Review, 2012, 37(1): 1-41.

  2. Murphy G C. Lightweight structural summarization as an aid to software evolution [Ph.D. Thesis]. University of Washington, 1996.

  3. Sridhara G, Hill E, Muppaneni D, Pollock L L, Vijay-Shanker K. Towards automatically generating summary comments for java methods. In Proc. the 25th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.43-52.

  4. Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: Replication and expansion. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.13-22.

  5. Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366-380.

  6. Bettenburg N, Premraj R, Zimmermann T, Kim S. Extracting structural information from bug reports. In Proc. the International Working Conference on Mining Software Repositories, May 2008, pp.27-30. J. Comput. Sci. & Technol., Sept. 2016, Vol.31, No.5

  7. Bacchelli A, Lanza M, Mastrodicasa E S. On the road to hades-helpful automatic development email summarization. In Proc. the 1st International Workshop on the Next Five Years of Text Analysis in Software Maintenance, Sept. 2012.

  8. Di Sorbo A, Panichella S, Visaggio C A, Di Penta M, Canfora G, Gall H C. Development emails content analyzer: Intention mining in developer discussions (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2015, pp.12-23.

  9. Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In Proc. the 17th Working Conference on Reverse Engineering, Oct. 2010, pp.35-44.

  10. Nenkova A, McKeown K. A survey of text summarization techniques. In Mining Text Data, Aggarwal C C, Zhai C (eds.), Springer US, 2012, pp.43-76.

  11. Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval (1 edition). Cambridge University Press, 2008.

  12. Kagdi H, Collard M L, Maletic J I. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice, 2007, 19(2): 77-131.

  13. Bacchelli A, Lanza M, Robbes R. Linking e-mails and source code artifacts. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, May 2010, pp.375-384.

  14. Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.223-226.

  15. Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2).

  16. Rodeghero P, McMillan C, McBurney P W, Bosch N, D’Mello S. Improving automated source code summarization via an eyetracking study of programmers. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.390-401.

  17. Rodeghero P, Liu C, McBurney P, McMillan C. An eyetracking study of java programmers and application to source code summarization. IEEE Transactions on Software Engineering, 2015, 41(11): 1038-1054.

  18. Rastkar S, Murphy G C. Why did this code change? In Proc. the 2013 International Conference on Software Engineering, May 2013, pp.1193-1196.

  19. Binkley D, Lawrie D, Hill E, Burge J, Harris I, Hebig R, Keszocze O, Reed K, Slankas J. Task-driven software summarization. In Proc. the 29th IEEE International Conference on Software Maintenance, Sept. 2013, pp.432-435.

  20. Panichella A, Aponte J, Di Penta M, Marcus A, Canfora G. Mining source code descriptions from developer communications. In Proc. the 20th International Conference on Program Comprehension (ICPC), Jun. 2012, pp.63-72.

  21. Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022.

  22. Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531.

  23. De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Using IR methods for labeling source code artifacts: Is it worthwhile? In Proc. the 20th International Conference on Program Comprehension, Jun. 2012, pp.193-202.

  24. De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Labeling source code with information retrieval methods: An empirical study. Empirical Software Engineering, 2014, 19(5): 1383-1420.

  25. Vassallo C, Panichella S, Di Penta M, Canfora G. Codes: Mining source code descriptions from developers discussions. In Proc. the 22nd International Conference on Program Comprehension, May 2014, pp.106-109.

  26. Rahman M M, Roy C K, Keivanloo I. Recommending insightful comments for source code using crowd-sourced knowledge. In Proc. the 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), Sept. 2015, pp.81-90.

  27. Sridhara G, Pollock L L, Vijay-Shanker K. Generating parameter comments and integrating with method summaries. In Proc. the 19th IEEE International Conference on Program Comprehension, Jun. 2011, pp.71-80.

  28. Sridhara G, Pollock L, Vijay-Shanker K. Automatically detecting and describing high level actions within methods. In Proc. the 33rd International Conference on Software Engineering (ICSE), May 2011, pp.101-110.

  29. Rastkar S. Summarizing software concerns. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, May 2010, pp.527-528.

  30. Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In Proc. the 27th International Conference on Software Maintenance, Sept. 2011, pp.103-112.

  31. Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L L, Vijay-Shanker K. Automatic generation of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.23-32.

  32. Moreno L, Marcus A, Pollock L L, Vijay Shanker K. Jsummarizer: An automatic generator of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.230-232.

  33. McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In Proc. the 22nd International Conference on Program Comprehension, Jun. 2014, pp.279-290.

  34. McBurney P W, McMillan C. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering, 2016, 42(2): 103-119.

  35. McBurney P W. Automatic documentation generation via source code summarization. In Proc. the 37th International Conference on Software Engineering - Volume 2, May 2015, pp.903-906.

  36. McBurney P W, Liu C, McMillan C, Weninger T. Improving topic model source code summarization. In Proc. the 22nd International Conference on Program Comprehension, June 2014, pp.291-294.

  37. Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G. Automatic generation of release notes. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.484-495.

  38. Kulkarni N, Varma V. Supporting comprehension of unfamiliar programs by modeling an expert’s perception. In Proc. the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, Jun. 2014, pp.19-24.

  39. Wong E, Yang J, Tan L. Autocomment: Mining question and answer sites for automatic comment generation. In Proc. the IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Nov. 2013, pp.562-567.

  40. Zhang Y, Hou D. Extracting problematic API features from forum discussions. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.142-151.

  41. Kamimura M, Murphy G C. Towards generating human-oriented summaries of unit test cases. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.215-218.

  42. Panichella S, Panichella A, Beller M, Zaidman A, Gall H C. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.547-558.

  43. Li B, Vendome C, Linares-Vásquez M, Poshyvanyk D, Kraft N A. Automatically documenting unit test cases. In Proc. the IEEE Int. Conf. Software Testing, Verification and Valication, Apr. 2016, pp.341-352.

  44. Dragan N, Collard M, Maletic J. Automatic identification of class stereotypes. In Proc. the IEEE International Conference on Software Maintenance (ICSM), Sept. 2010, pp.1-10.

  45. Abid N, Dragan N, Collard M, Maletic J. Using stereotypes in the automatic generation of natural language summaries for C++ methods. In Proc. the International Conference on Software Maintenance and Evolution, Sept.29-Oct.1, 2015, pp.561-565.

  46. Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk D. On automatically generating commit messages via summarization of source code changes. In Proc. the 14th IEEE International Working Conference on Source Code Analysis and Manipulation, Sept. 2014, pp.275-284.

  47. Moreno L, Marcus A. Jstereocode: Automatically identifying method and class stereotypes in java code. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2012, pp.358-361.

  48. Buse R P, Weimer W R. Automatically documenting program changes. In Proc. the IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.33-42.

  49. Nielson F, Nielson H R, Hankin C. Principles of Program Analysis. Springer, 2015.

  50. Kupiec J, Pedersen J O, Chen F. A trainable document summarizer. In Proc the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 1995, pp.68-73.

  51. Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. In Proc. the 28th IEEE International Conference on Software Maintenance, Sept. 2012, pp.430-439.

  52. Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, Volume 1, May 2010, pp.505-514.

  53. Murray G, Carenini G. Summarizing spoken and written conversations. In Proc. the Conference on Empirical Methods in Natural Language Processing, Oct. 2008, pp.773-782.

  54. Jiang H, Zhang J, Ma H, Nazar N, Ren Z. Mining authorship characteristics in bug repositories. Science China Information Sciences, 2015. (Accepted)

  55. Ying A T T, Robillard M P. Code fragment summarization. In Proc. the 9th Joint Meeting on Foundations of Software Engineering, Aug. 2013, pp.655-658.

  56. Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504-517.

  57. Petrosyan G, Robillard M P, Mori R D. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering-Volume 1, May 2015, pp.869-879.

  58. Mani S, Catherine R, Sinha V S, Dubey A. AUSUM: Approach for unsupervised bug report summarization. In Proc. the 20th International Symposium on the Foundations of Software Engineering, Nov. 2012, Article No. 11.

  59. Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. Empirical Software Engineering, 2015, 20(2): 516-548.

  60. Yeasmin S, Roy C, Schneider K. Interactive visualization of bug reports using topic evolution and extractive summaries. In Proc. the IEEE International Conference on Software Maintenance and Evolution, Sept. 2014, pp.421-425.

  61. Fowkes J, Chanthirasegaran P, Allamanis M, Lapata M, Sutton C A. TASSAL: Autofolding for source code summarization. In Proc. the 38th International Conference on Software Engineering Companion, May 2016, pp.649-652.

  62. Aponte J, Marcus A. Improving traceability link recovery methods through software artifact summarization. In Proc. the 6th International Workshop on Traceability in Emerging Forms of Software Engineering, May 2011, pp.46-49.

  63. Fritz T, Shepherd D C, Kevic K, Snipes W, Bräunlich C. Developers’ code context models for change tasks. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.7-18.

  64. Kevic K, Walters B M, Shaffer T R, Sharif B, Shepherd D C, Fritz T. Tracing software developers’ eyes and interactions for change tasks. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug.31-Sept.4, 2015, pp.202-213.

  65. Ying A T T, Robillard M P. Selection and presentation practices for code example summarization. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.460-471.

  66. Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2011, pp.253-262.

  67. Wang X, Zhang L, Xie T, Anvik J, Sun J. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. the 30th ACM/IEEE International Conference on Software Engineering, May 2008, pp.461-470.

  68. Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In Proc. the 29th International Conference on Software Engineering, May 2007, pp.499-510.

  69. McBurney P W, McMillan C. An empirical study of the textual similarity between source code and source code summaries. Empirical Software Engineering, 2014: 21(1): 17-42.

  70. Hill E, Pollock L, Vijay-Shanker K. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.232-242.

  71. Treude C, Filho F F, Kulesza U. Summarizing and measuring development activity. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Sept. 2015, pp.625-636.

  72. Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 27.

  73. Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871-1874.

  74. Wong E, Liu T, Tan L. Clocom: Mining existing source code for automatic comment generation. In Proc. the 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2015, pp.380-389.

  75. Jones K S, Galliers J R. Evaluating Natural Language Processing Systems: An Analysis and Review. Springer-Verlag Berlin Heidelberg, 1995.

  76. Nenkova A, McKeown K. Automatic summarization. Foundations and Trends in Information Retrieval, 2011, 5(2/3):103-233.

  77. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37-46.

  78. Nenkova A, Passonneau R J. Evaluating content selection insummarization: The pyramid method. In Proc. the Human Language Technology/North American Chapter of the Association for Computational Linguistics, May 2004, pp.145-152.

  79. Kitchenham B, Brereton P. A systematic review of systematic review process research in software engineering. Information and Software Technology, 2013, 55(12): 2049-2075.

  80. Mesquida A L, Mas A, Amengual E, Calvo-Manzano J A. It service management process improvement based on ISO/IEC 15504: A systematic review. Information and Software Technology, 2012, 54(3): 239-247.

  81. Shihab E, Jiang Z M, Hassan A E. Studying the use of developer IRC meetings in open source projects. In Proc. the IEEE International Conference on Software Maintenance, Nov. 2009, pp.147-156.

  82. Guzzi A, Begel A, Miller J K, Nareddy K. Facilitating enterprise software developer communication with cares. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), Sept. 2012, pp.527-536.

  83. Ponzanelli L, Mocci A, Lanza M. Summarizing complex development artifacts by mining heterogeneous data. In Proc. the 12th IEEE/ACM Working Conference on Mining Software Repositories, May 2015, pp.401-405.

  84. Zhao Y, Zhu Q. Evaluation on crowdsourcing research: Current status and future direction. Information Systems Frontiers, 2014, 16(3): 417-434.

  85. Howe J. The rise of crowdsourcing. http://www.wired.com/2006/06/crowds/, July 2006.

  86. Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20-22.

  87. Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15-28.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Jiang.

Additional information

Special Section on Software Systems 2016

This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2013CB035906, the Fundamental Research Funds for the Central Universities of China under Grant No. DUT13RC(3)53, and in part by the New Century Excellent Talents in University of China under Grant No. NCET-13-0073 and the National Natural Science Foundation of China under Grant No. 61300017.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nazar, N., Hu, Y. & Jiang, H. Summarizing Software Artifacts: A Literature Review. J. Comput. Sci. Technol. 31, 883–909 (2016). https://doi.org/10.1007/s11390-016-1671-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1671-1

Keywords

Navigation