Skip to main content

A Subpath Kernel for Rooted Unordered Trees

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Included in the following conference series:

Abstract

Kernel method is one of the promising approaches to learning with tree-structured data, and various efficient tree kernels have been proposed to capture informative structures in trees. In this paper, we propose a new tree kernel function based on “subpath sets” to capture vertical structures in rooted unordered trees, since such tree-structures are often used to code hierarchical information in data. We also propose a simple and efficient algorithm for computing the kernel by extending the multikey quicksort algorithm used for sorting strings. The time complexity of the algorithm is O((|T 1| + |T 2|)log(|T 1| + |T 2|)) time on average, and the space complexity is O(|T 1| + |T 2|), where |T 1| and |T 2| are the numbers of nodes in two trees T 1 and T 2. We apply the proposed kernel to two supervised classification tasks, XML classification in web mining and glycan classification in bioinformatics. The experimental results show that the predictive performance of the proposed kernel is competitive with that of the existing efficient tree kernel for unordered trees proposed by Vishwanathan et al. [1], and is also empirically faster than the existing kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vishwanathan, S.V.N., Smola, A.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems, vol. 15, pp. 569–576 (2003)

    Google Scholar 

  2. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  3. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  4. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  5. Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)

    Google Scholar 

  6. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of the Fourteenth Annual Conference on Neural Information Processing Systems, pp. 625–632 (2001)

    Google Scholar 

  7. Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 291–298 (2002)

    Google Scholar 

  8. Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A gram distribution kernel applied to glycan classification and motif extraction. In: Proceedings of the Seventeenth International Conference on Genome Informatics, pp. 25–34 (2006)

    Google Scholar 

  9. Aiolli, F., Martino, G.D.S., Sperduti, A.: Route kernels for trees. In: Proceedings of the Twentie-sixth International Conference on Machine Learning, pp. 17–24 (2009)

    Google Scholar 

  10. Daumé III, H., Marcu, D.: A tree-position kernel for document compression. In: Proceedings of the Fourth Document Understanding Conference (2004)

    Google Scholar 

  11. Kashima, H.: Machine Learning Approaches for Structured-data. PhD thesis, Kyoto University (2007)

    Google Scholar 

  12. Ichikawa, H., Hakodaa, K., Hashimoto, T., Tokunaga, T.: Efficient sentence retrieval based on syntactic structure. In: Proceedings of the COLING/ACL, pp. 407–411 (2006)

    Google Scholar 

  13. Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 360–369 (1997)

    Google Scholar 

  14. Teo, C.H., Vishwanathan, S.V.N.: Fast and space efficient string kernels using suffix arrays. In: Proceedings of the Twentie-third International Conference on Machine Learning, pp. 929–936 (2006)

    Google Scholar 

  15. Shibuya, T.: Constructing the suffix tree of a tree with a large alphabet. IEICE Transactions on Fundamentals of Electronics 86(5), 1061–1066 (2003)

    Google Scholar 

  16. Kailing, K., Kriegel, H.P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Teo, C.H., Vishwanathan, S.V.N.: SASK: suffix arrays based string kernels (2006), http://users.cecs.anu.edu.au/~chteo/SASK.html

  18. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  19. Zaki, M.J., Aggarwal, C.C.: Xrules: An effective structural classifier for xml data. Machine Learning Journal 62(1-2), 137–170 (2006)

    Article  Google Scholar 

  20. Hashimoto, K., Hamajima, M., Goto, S., Masumoto, S., Kawashima, M., Kanehisa, M.: Glycan: The database of carbohydrate structures. Genome Informatics 14, 649–650 (2003)

    Google Scholar 

  21. Doubet, S., Albersheim, P.: Carbbank. Glycobiology 2(6), 505 (1992)

    Article  Google Scholar 

  22. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    MATH  Google Scholar 

  23. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 566–575 (2002)

    Google Scholar 

  24. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. Neural Information Processing Systems 15, 1441–1448 (2003)

    Google Scholar 

  25. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328 (2003)

    Google Scholar 

  26. Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pp. 129–143 (2003)

    Google Scholar 

  27. Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explorations 5(1), 59–68 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kimura, D., Kuboyama, T., Shibuya, T., Kashima, H. (2011). A Subpath Kernel for Rooted Unordered Trees. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20841-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20840-9

  • Online ISBN: 978-3-642-20841-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics