Skip to main content

Recognizing Potential Runtime Types from Python Docstrings

  • Conference paper
  • First Online:
Software Analysis, Testing, and Evolution (SATE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11293))

Included in the following conference series:

Abstract

Docstring plays an important role in software development and maintanance as it is used in source code to document a specific segment of code. In dynamic language programming, docstring is usually used to annotate types of parameters and return values.

Docstrings can help developers remind the expected types of a parameter, without process of comprehending the context which is time-consuming. In this study, we propose an automatic approach to recognize potential types of a parameter from its description.

In our approach, we utilize feature selection to select useful features for classifier training. Then we adopt four different kinds of classifiers to recognize potential types and evaluate their performances using seven metrics.

We collect a dataset of 314 type descriptions from ten prevalent Python projects. Our experimental results show that, Decision Tree classifier has the best performances among four studied classifiers, whose precision, recall, F1-score, jaccard index, hamming loss, accuracy and MRR achieve 0.681, 0.548, 0.582, 0.542, 1.234, 0.432 and 0.778 respectively. Multi-layer perceptron has the weakest performances. Futher more, we discover that the performances of four classifiers achieve their best performances when select top 20% or 40% features with the highest \(\chi ^2\) statistic.

This study archive a dataset of type descriptions and propose a framework of automatically recognizing potential types of a parameter from its description.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Instagram/MonkeyType.

References

  1. Barone, A.V.M., Sennrich, R.: A parallel corpus of python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275 (2017)

  2. Belue, L.M., Bauer Jr., K.W.: Determining input features for multilayer perceptrons. Neurocomputing 7(2), 111–121 (1995)

    Article  Google Scholar 

  3. Gao, Z., Bird, C., Barr, E.T.: To type or not to type: quantifying detectable bugs in JavaScript. In: Proceedings of the 39th International Conference on Software Engineering, (ICSE) 2017, Buenos Aires, Argentina, 20–28 May 2017, pp. 758–769 (2017). https://doi.org/10.1109/ICSE.2017.75

  4. Milojkovic, N., Ghafari, M., Nierstrasz, O.: It’s duck (typing) season! In: Proceedings of the 25th International Conference on Program Comprehension, ICPC 2017, Buenos Aires, Argentina, 22–23 May 2017, pp. 312–315 (2017). https://doi.org/10.1109/ICPC.2017.10

  5. Milojkovic, N., Ghafari, M., Nierstrasz, O.: Exploiting type hints in method argument names to improve lightweight type inference. In: Proceedings of the 25th International Conference on Program Comprehension, ICPC 2017, Buenos Aires, Argentina, 22–23 May 2017. pp. 77–87 (2017). https://doi.org/10.1109/ICPC.2017.33

  6. Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 195–200. ACM (2005)

    Google Scholar 

  7. Goodger, D.: Docstring Conventions (2001). https://www.python.org/dev/peps/pep-0257/

  8. Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel classification. In: Herrera, F., Charte, F., Rivera, A.J., del Jesus, M. (eds.) Multilabel Classification, pp. 17–31. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41111-8_2

    Chapter  Google Scholar 

  9. Sikandar, A., et al.: Decision tree based approaches for detecting protein complex in protein protein interaction network (PPI) via link and sequence analysis. IEEE Access 6, 22108–22120 (2018)

    Article  Google Scholar 

  10. Johnson, R., Zhang, T.: Supervised and semi-supervised text categorization using LSTM for region embeddings. arXiv preprint arXiv:1602.02373 (2016)

  11. Vitousek, M.M., Kent, A.M., Siek, J.G., Baker, J.: Design and evaluation of gradual typing for Python. In: ACM SIGPLAN Notices, vol. 50, pp. 45–56. ACM (2014)

    Google Scholar 

  12. Iyer, S., Konstas, I., Cheung, A.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers, vol. 1, pp. 2073–2083 (2016)

    Google Scholar 

  13. Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: KNN based machine learning approach for text and document mining. Int. J. Database Theory Appl. 7(1), 61–70 (2014)

    Article  Google Scholar 

  14. Taherzadeh, G., Zhou, Y., Liew, A.W.C., Yang, Y.: Structure-based prediction of protein-peptide binding regions using random forest. Bioinformatics 34(3), 477–484 (2017)

    Article  Google Scholar 

  15. Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)

    Google Scholar 

  16. Xu, Z., Liu, P., Zhang, X., Xu, B.: Python predictive analysis for bug detection. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 121–132. ACM (2016)

    Google Scholar 

  17. Loper, E.: Epydoc: API documentation extraction in Python. http://epydoc.sourceforge.net/pycon-epydoc.ps. Accessed 13 2008

  18. McBurney, P.W., McMillan, C.: Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd International Conference on Program Comprehension. ICPC 2014, pp. 279–290. ACM, New York, NY, USA (2014). http://doi.acm.org/10.1145/2597008.2597149

  19. Mining, W.I.D.: Data Mining: Concepts And Techniques. Morgan Kaufmann, Burlington (2006)

    Google Scholar 

  20. Papanikolaou, Y., Dimitriadis, D., Tsoumakas, G., Laliotis, M., Markantonatos, N., Vlahavas, I.P.: Ensemble approaches for large-scale multi-label classification and question answering in biomedicine. In: CLEF (Working Notes), pp. 1348–1360 (2014)

    Google Scholar 

  21. Xu, Z., Zhang, X., Chen, L., Pei, K., Xu, B.: Python probabilistic type inference with natural language support. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 607–618. ACM (2016)

    Google Scholar 

  22. Souza, C., Figueiredo, E.: How do programmers use optional typing?: an empirical study. In: Proceedings of the 13th International Conference on Modularity, pp. 109–120. ACM (2014)

    Google Scholar 

  23. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  24. Chen, L., Xu, B., Zhou, T., Zhou, X.: A constraint based bug checking approach for Python. In: 33rd Annual IEEE International Computer Software and Applications Conference, 2009. COMPSAC 2009, vol. 2, pp. 306–311. IEEE (2009)

    Google Scholar 

Download references

Acknowledgments

The work is supported by National Key R&D Program of China (2018YFB1003900), the Natural Science Foundation of Jiangsu Province of China (BK20140611), the National Natural Science Foundation of China (61872177, 61772263, 61432001), and the program B for Outstanding PhD candidate of Nanjing University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, Y., Ma, W., Li, Y., Chen, Z., Chen, L. (2018). Recognizing Potential Runtime Types from Python Docstrings. In: Bu, L., Xiong, Y. (eds) Software Analysis, Testing, and Evolution. SATE 2018. Lecture Notes in Computer Science(), vol 11293. Springer, Cham. https://doi.org/10.1007/978-3-030-04272-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04272-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04271-4

  • Online ISBN: 978-3-030-04272-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics