Abstract
We introduce a tree distance function based on multi-sets. We show that this function is a metric on tree spaces, and we design an algorithm to compute the distance between trees of size at most n in O(n 2) time and O(n) space. Contrary to other tree distance functions that require expensive memory allocations to maintain dynamic programming tables of forests, our function can be implemented over simple and static structures. Additionally, we present a case study in which we compare our function with other two distance functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Augsten, N., Bhlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: VLDB 2005, pp. 301–312 (2005)
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Chawathe, S.S., Garcia-Molina, H.: Meaningful change detection in structured data. SIGMOD Rec. 26(2), 26–37 (1997)
Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. SIGMOD Rec. 25(2), 493–504 (1996)
Demaine, E., Mosez, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 146–157. Springer, Heidelberg (2007)
Garofalakis, M., Kumar, A.: Xml stream processing using tree-edit distance embeddings. ACM Trans. Database Syst. 30(1), 279–332 (2005)
Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. Theoretical Computer Science 143(1), 148–157 (1995)
Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Klein, P., Tirthapura, S., Sharvit, D., Kimia, B.: A tree-edit-distance algorithm for comparing simple, closed shapes. In: SODA 2000, Philadelphia, USA. Society for Industrial and Applied Mathematics, pp. 696–704 (2000)
Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)
Müller-Molina, A.J., Shinohara, T.: On approximate matching of programs for protecting libre software. In: CASCON 2006, pp. 275–289. ACM Press, New York (2006)
Müller-Molina, A.J., Shinohara, T.: Fast approximate matching of programs for protecting libre/open source software by using spatial indexes. In: SCAM 2007, pp. 111–122. IEEE Computer Society, Los Alamitos (2007)
Ohkura, N., Hirata, K., Kuboyama, T., Harao, M.: The q-gram distance for ordered unlabeled trees. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 189–202. Springer, Heidelberg (2005)
Shinohara, T., Ishizaka, H.: On dimension reduction mappings for approximate retrieval of multi-dimensional data. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS, vol. 2281, pp. 224–231. Springer, Heidelberg (2002)
Tai, K.-C.: The tree-to-tree correction problem. JACM 26(3), 422–433 (1979)
Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structured data. In: SIGMOD 2005, pp. 754–765 (2005)
Zhang, K.: Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition 28(3), 463–474 (1995)
Zhang, K.: Computing similarity between rna secondary structures. In: INTSYS 1998, pp. 126–132 (1998)
Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müller-Molina, A.J., Hirata, K., Shinohara, T. (2009). A Tree Distance Function Based on Multi-sets. In: Chawla, S., et al. New Frontiers in Applied Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00399-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-00399-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00398-1
Online ISBN: 978-3-642-00399-8
eBook Packages: Computer ScienceComputer Science (R0)