Advertisement

On the Impact of Linguistic Information in Kernel-Based Deep Architectures

  • Danilo CroceEmail author
  • Simone Filice
  • Roberto Basili
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10640)

Abstract

Kernel methods enable the direct usage of structured representations of textual data during language learning and inference tasks. On the other side, deep neural networks are effective in learning non-linear decision functions. Recent works demonstrated that expressive kernels and deep neural networks can be combined in a Kernel-based Deep Architecture (KDA), a common framework that allows to explicitly model structured information into a neural network. This combination achieves state-of-the-art accuracy in different semantic inference tasks. This paper investigates the impact of linguistic information on the performance reachable by a KDA by studying the benefits that different kernels can bring to the inference quality. We believe that the expressiveness of data representations will play a key role in the wide spread adoption of neural networks in AI problem solving. We experimentally evaluated the adoption of different kernels (each characterized by a growing expressive power) in a Question Classification task. Results suggest the importance of rich kernel functions in optimizing the accuracy of a KDA.

References

  1. 1.
    Cancedda, N., Gaussier, É., Goutte, C., Renders, J.M.: Word-sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001)Google Scholar
  3. 3.
    Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34(2), 193–224 (2008)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)zbMATHGoogle Scholar
  5. 5.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefzbMATHGoogle Scholar
  6. 6.
    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of EMNLP 2013 (2013)Google Scholar
  7. 7.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  9. 9.
    Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015). https://transacl.org/ojs/index.php/tacl/article/view/570 Google Scholar
  10. 10.
    Croce, D., Filice, S., Castellucci, G., Basili, R.: Deep learning in semantic kernel spaces. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 345–354 (2017). http://aclanthology.coli.uni-saarland.de/pdf/P/P17/pp.17-1032.pdf
  11. 11.
    Croce, D., Basili, R.: Large-scale kernel-based language learning through the ensemble Nyström methods. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Di Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 100–112. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30671-1_8 CrossRefGoogle Scholar
  12. 12.
    Severyn, A., Nicosia, M., Moschitti, A.: Building structures from classifiers for passage reranking. In: CIKM 2013, pp. 969–978. ACM, New York (2013)Google Scholar
  13. 13.
    Filice, S., Croce, D., Moschitti, A., Basili, R.: KeLP at SemEval-2016 task 3: learning semantic relations between questions and comments. In: Proceedings of SemEval 2016, June 2016Google Scholar
  14. 14.
    Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in information retrieval, pp. 26–32. ACM Press (2003)Google Scholar
  15. 15.
    Haussler, D.: Convolution kernels on discrete structures. Technical report UCS-CRL-99-10. University of California, Santa Cruz (1999)Google Scholar
  16. 16.
    Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP 2011, pp. 1034–1046 (2011)Google Scholar
  17. 17.
    Filice, S., Da San Martino, G., Moschitti, A.: Structural representations for learning relations between pairs of texts. In: Proceedings of ACL 2015, Beijing, China, pp. 1003–1013. http://www.aclweb.org/anthology/pp.15-1097
  18. 18.
    Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871842_32 CrossRefGoogle Scholar
  19. 19.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)Google Scholar
  20. 20.
    Sahlgren, M.: The word-space model. Ph.D. thesis, Stockholm University (2006)Google Scholar
  21. 21.
    Annesi, P., Croce, D., Basili, R.: Semantic compositionality in tree kernels. In: Proceedings of CIKM 2014. ACM (2014)Google Scholar
  22. 22.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRefGoogle Scholar
  23. 23.
    Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the Nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)zbMATHMathSciNetGoogle Scholar
  24. 24.
    Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(3), 229–249 (2006)CrossRefGoogle Scholar
  25. 25.
    Filice, S., Castellucci, G., Croce, D., Basili, R.: KeLP: a kernel-based learning platform for natural language processing. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, pp. 19–24, July 2015. http://www.aclweb.org/anthology/pp.15-4004
  26. 26.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings EMNLP 2014, Doha, Qatar, pp. 1746–1751, October 2014Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Enterprise EngineeringUniversity of Roma, Tor VergataRomeItaly

Personalised recommendations