A Subpath Kernel for Rooted Unordered Trees

Kimura, Daisuke; Kuboyama, Tetsuji; Shibuya, Tetsuo; Kashima, Hisashi

doi:10.1007/978-3-642-20841-6_6

Daisuke Kimura²²,
Tetsuji Kuboyama²³,
Tetsuo Shibuya²⁴ &
…
Hisashi Kashima²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1699 Accesses
9 Citations

Abstract

Kernel method is one of the promising approaches to learning with tree-structured data, and various efficient tree kernels have been proposed to capture informative structures in trees. In this paper, we propose a new tree kernel function based on “subpath sets” to capture vertical structures in rooted unordered trees, since such tree-structures are often used to code hierarchical information in data. We also propose a simple and efficient algorithm for computing the kernel by extending the multikey quicksort algorithm used for sorting strings. The time complexity of the algorithm is O((|T ₁| + |T ₂|)log(|T ₁| + |T ₂|)) time on average, and the space complexity is O(|T ₁| + |T ₂|), where |T ₁| and |T ₂| are the numbers of nodes in two trees T ₁ and T ₂. We apply the proposed kernel to two supervised classification tasks, XML classification in web mining and glycan classification in bioinformatics. The experimental results show that the predictive performance of the proposed kernel is competitive with that of the existing efficient tree kernel for unordered trees proposed by Vishwanathan et al. [1], and is also empirically faster than the existing kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vishwanathan, S.V.N., Smola, A.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems, vol. 15, pp. 569–576 (2003)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
MATH Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Book MATH Google Scholar
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)
Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of the Fourteenth Annual Conference on Neural Information Processing Systems, pp. 625–632 (2001)
Google Scholar
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 291–298 (2002)
Google Scholar
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A gram distribution kernel applied to glycan classification and motif extraction. In: Proceedings of the Seventeenth International Conference on Genome Informatics, pp. 25–34 (2006)
Google Scholar
Aiolli, F., Martino, G.D.S., Sperduti, A.: Route kernels for trees. In: Proceedings of the Twentie-sixth International Conference on Machine Learning, pp. 17–24 (2009)
Google Scholar
Daumé III, H., Marcu, D.: A tree-position kernel for document compression. In: Proceedings of the Fourth Document Understanding Conference (2004)
Google Scholar
Kashima, H.: Machine Learning Approaches for Structured-data. PhD thesis, Kyoto University (2007)
Google Scholar
Ichikawa, H., Hakodaa, K., Hashimoto, T., Tokunaga, T.: Efficient sentence retrieval based on syntactic structure. In: Proceedings of the COLING/ACL, pp. 407–411 (2006)
Google Scholar
Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 360–369 (1997)
Google Scholar
Teo, C.H., Vishwanathan, S.V.N.: Fast and space efficient string kernels using suffix arrays. In: Proceedings of the Twentie-third International Conference on Machine Learning, pp. 929–936 (2006)
Google Scholar
Shibuya, T.: Constructing the suffix tree of a tree with a large alphabet. IEICE Transactions on Fundamentals of Electronics 86(5), 1061–1066 (2003)
Google Scholar
Kailing, K., Kriegel, H.P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Chapter Google Scholar
Teo, C.H., Vishwanathan, S.V.N.: SASK: suffix arrays based string kernels (2006), http://users.cecs.anu.edu.au/~chteo/SASK.html
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Zaki, M.J., Aggarwal, C.C.: Xrules: An effective structural classifier for xml data. Machine Learning Journal 62(1-2), 137–170 (2006)
Article Google Scholar
Hashimoto, K., Hamajima, M., Goto, S., Masumoto, S., Kawashima, M., Kanehisa, M.: Glycan: The database of carbohydrate structures. Genome Informatics 14, 649–650 (2003)
Google Scholar
Doubet, S., Albersheim, P.: Carbbank. Glycobiology 2(6), 505 (1992)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
MATH Google Scholar
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 566–575 (2002)
Google Scholar
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. Neural Information Processing Systems 15, 1441–1448 (2003)
Google Scholar
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328 (2003)
Google Scholar
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pp. 129–143 (2003)
Google Scholar
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explorations 5(1), 59–68 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656, Japan
Daisuke Kimura & Hisashi Kashima
Computer Centre, Gakushuin University, Mejiro 1-5-1, Toyoshima-ku, Tokyo, 171-8588, Japan
Tetsuji Kuboyama
Human Genome Center, Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, Tokyo, 108-8639, Japan
Tetsuo Shibuya

Authors

Daisuke Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuji Kuboyama
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuo Shibuya
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Kashima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang
Faculty of Engineering and Information Technology, Center for Quantum Computation and Intelligent Systems, Data Sciences and Knowledge Discovery Lab, University of Technology Sydney, NSW 2007, Sydney, Australia
Longbing Cao
Department of Computer Science and Engineering, University of Minnesota, MN 55455, Minneapolis, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, D., Kuboyama, T., Shibuya, T., Kashima, H. (2011). A Subpath Kernel for Rooted Unordered Trees. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-20841-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics