Abstract
In this paper, we study efficient computation of tree similarity for ordered trees based on compressed subtree enumeration. The compressed subtree enumeration is a new paradigm of enumeration algorithms that enumerates all subtrees of an input tree T in the form of their compressed bit signatures. For the task of enumerating all compressed bit signatures of k-subtrees in an ordered tree T, we first present an enumeration algorithm in O(k)-delay, and then, present another enumeration algorithm in constant-delay using O(n) time preprocessing that directly outputs bit signatures. These algorithms are designed based on bit-parallel speed-up technique for signature maintenance. By experiments on real and artificial datasets, both algorithms showed approximately 22% to 36% speed-up over the algorithms without bit-parallel signature maintenance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. KDD 2002, pp. 71–80 (2002)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. SDM 2002 (2002)
Chim, H., Deng, X.: A new suffix tree similarity measure for document clustering. In: Proc. WWW 2007, pp. 121–130 (2007)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proc. of Advances in Neural Information Processing Systems, NIPS, pp. 625–632 (2001)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press (2001)
Goldberg, L.A.: Polynomial space polynomial delay algorithms for listing families of graphs. In: Proc. ACM STOC 1993, pp. 218–225. ACM (1993)
Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed random access memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: Proc. 19th ICML 2002, pp. 291–298. Morgan Kaufmann Publishers Inc. (2002)
Kimura, D., Kuboyama, T., Shibuya, T., Kashima, H.: A subpath kernel for rooted unordered trees. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 62–74. Springer, Heidelberg (2011)
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F.: An efficient unordered tree kernel and its application to glycan classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS(LNAI), vol. 5012, pp. 184–195. Springer, Heidelberg (2008)
Kuboyama, T., Hirata, K., Kashima, H., Aoki-Kinoshita, K.F., Yasuda, H.: A spectrum tree kernel. Information and Media Technologies 22(2), 292–299 (2007)
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. In: Proc. NIPS 2004 (2004)
Lakkaraju, P., Gauch, S., Speretta, M.: Document similarity based on concept tree distance. In: Proc. 19th ACM HT 2008, pp. 127–132 (2008)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)
Nakano, S.: Efficient generation of plane trees. IPL 84(3), 167–172 (2002)
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: Proc. 23rd ICML, pp. 953–960. ACM (2006)
Wasa, K., Kaneta, Y., Uno, T., Arimura, H.: Constant time enumeration of bounded-size subtrees in trees and its application. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds.) COCOON 2012. LNCS, vol. 7434, pp. 347–359. Springer, Heidelberg (2012)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: Proc. VLDB 2005, pp. 709–720 (2005)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. KDD 2002, pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wasa, K., Hirata, K., Uno, T., Arimura, H. (2013). Faster Algorithms for Tree Similarity Based on Compressed Enumeration of Bounded-Sized Ordered Subtrees. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-41062-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)