Skip to main content

TBCNN for Programs’ Abstract Syntax Trees

  • Chapter
  • First Online:
Tree-Based Convolutional Neural Networks

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 1440 Accesses

Abstract

In this chapter, we will apply the tree-based convolutional neural network (TBCNN) to the source code of programming languages, which we call programming language processing. In fact, programming language processing is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. A distinct characteristic of a program is that it contains rich, explicit, and complicated structural information, necessitating more intensive modeling of structures. In this chapter, we propose a TBCNN variant for programming language processing, where a convolution kernel is designed for programs’ abstract syntax trees. We show the effectiveness of TBCNN in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.

The contents of this chapter were published in [16]. Copyright\(\copyright \)2016, Association for the Advancement of Artificial Intelligence (https://www.aaai.org). Implementation code and the collected dataset are available through our website (https://sites.google.com/site/treebasedcnn/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Parsed by pycparser (https://pypi.python.org/pypi/pycparser/).

  2. 2.

    In their original paper, they do not deal with varying-length data, but their method extends naturally to this scenario. Their method is also mathematically equivalent to average pooling.

  3. 3.

    http://programming.grids.cn. The data are available on our website (Footnote 1 of this Chapter).

  4. 4.

    We do not use the pretrained vector representations, which are inimical to the recursive neural network: the weight \(W_\text {code}\) encodes children’s representation to its candidate parent’s; adversely, the high-level nodes in programs (e.g., a function definition) are typically non-informative.

  5. 5.

    History versions can be found at https://arxiv.org/pdf/1409.3348v1 and https://arxiv.org/pdf/1409.5718v1.

References

  1. Baxter, I., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of the International Conference on Software Maintenance, pp. 368–377 (1998)

    Google Scholar 

  2. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  3. Bettenburg, N., Begel, A.: Deciphering the story of software development through frequent pattern mining. In: Proceedings of the 35th International Conference on Software Engineering, pp. 1197–1200 (2013)

    Google Scholar 

  4. Chilowicz, M., Duris, E., Roussel, G.: Syntax tree fingerprinting for source code similarity detection. In: Proceedings of the IEEE International Conference on Program Comprehension, pp. 243–247 (2009)

    Google Scholar 

  5. Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of 25th International Conference on Machine Learning, pp. 160–167 (2008)

    Google Scholar 

  6. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  7. Dahl, G., Mohamed, A., Hinton, G.: Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010)

    Google Scholar 

  8. Dietz, L., Dallmeier, V., Zeller, A., Scheffer, T.: Localizing bugs in program executions with graphical models. In: Advances in Neural Information Processing Systems, pp. 468–476 (2009)

    Google Scholar 

  9. Ghabi, A., Egyed, A.: Code patterns for automatically validating requirements-to-code traces. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 200–209 (2012)

    Google Scholar 

  10. Hao, D., Lan, T., Zhang, H., Guo, C., Zhang, L.: Is this a bug or an obsolete test? In: Proceedings of the European Conference on Object-Oriented Programming, pp. 602–628 (2013)

    Chapter  Google Scholar 

  11. Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 58–68 (2014)

    Google Scholar 

  12. Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering, pp. 837–847 (2012)

    Google Scholar 

  13. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)

    Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  16. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1287–1293 (2016)

    Google Scholar 

  17. Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325 (2015)

    Google Scholar 

  18. Pane, J., Ratanamahatana, C., Myers, B.: Studying the language and structure in non-programmers’ solutions to programming problems. Int. J. Hum. Comput. Stud. 54(2), 237–264 (2001)

    Article  Google Scholar 

  19. Peng, H., Mou, L., Li, G., Liu, Y., Zhang, L., Jin, Z.: Building program vector representations for deep learning. In: Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management, pp. 547–553 (2015)

    Chapter  Google Scholar 

  20. Pinker, S.: The Language Instinct: The New Science of Language and Mind. Pengiun Press (1994)

    Google Scholar 

  21. Socher, R., Huang, E., Pennin, J., Manning, C., Ng, A.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  22. Socher, R., Karpathy, A., Le, Q., Manning, C., Ng, A.Y.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)

    Google Scholar 

  23. Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161 (2011)

    Google Scholar 

  24. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  25. Steidl, D., Gode, N.: Feature-based detection of bugs in clones. In: Proceedings of the 7th International Workshop on Software Clones, pp. 76–82 (2013)

    Google Scholar 

  26. Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lili Mou .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mou, L., Jin, Z. (2018). TBCNN for Programs’ Abstract Syntax Trees. In: Tree-Based Convolutional Neural Networks. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-13-1870-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1870-2_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1869-6

  • Online ISBN: 978-981-13-1870-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics