Skip to main content

A Tree Distance Function Based on Multi-sets

  • Conference paper
New Frontiers in Applied Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5433))

Included in the following conference series:

Abstract

We introduce a tree distance function based on multi-sets. We show that this function is a metric on tree spaces, and we design an algorithm to compute the distance between trees of size at most n in O(n 2) time and O(n) space. Contrary to other tree distance functions that require expensive memory allocations to maintain dynamic programming tables of forests, our function can be implemented over simple and static structures. Additionally, we present a case study in which we compare our function with other two distance functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Augsten, N., Bhlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: VLDB 2005, pp. 301–312 (2005)

    Google Scholar 

  2. Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)

    Article  Google Scholar 

  3. Chawathe, S.S., Garcia-Molina, H.: Meaningful change detection in structured data. SIGMOD Rec. 26(2), 26–37 (1997)

    Article  Google Scholar 

  4. Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. SIGMOD Rec. 25(2), 493–504 (1996)

    Article  Google Scholar 

  5. Demaine, E., Mosez, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 146–157. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Garofalakis, M., Kumar, A.: Xml stream processing using tree-edit distance embeddings. ACM Trans. Database Syst. 30(1), 279–332 (2005)

    Article  Google Scholar 

  7. Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. Theoretical Computer Science 143(1), 148–157 (1995)

    Google Scholar 

  8. Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Klein, P., Tirthapura, S., Sharvit, D., Kimia, B.: A tree-edit-distance algorithm for comparing simple, closed shapes. In: SODA 2000, Philadelphia, USA. Society for Industrial and Applied Mathematics, pp. 696–704 (2000)

    Google Scholar 

  10. Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)

    Google Scholar 

  11. Müller-Molina, A.J., Shinohara, T.: On approximate matching of programs for protecting libre software. In: CASCON 2006, pp. 275–289. ACM Press, New York (2006)

    Google Scholar 

  12. Müller-Molina, A.J., Shinohara, T.: Fast approximate matching of programs for protecting libre/open source software by using spatial indexes. In: SCAM 2007, pp. 111–122. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  13. Ohkura, N., Hirata, K., Kuboyama, T., Harao, M.: The q-gram distance for ordered unlabeled trees. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 189–202. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Shinohara, T., Ishizaka, H.: On dimension reduction mappings for approximate retrieval of multi-dimensional data. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS, vol. 2281, pp. 224–231. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Tai, K.-C.: The tree-to-tree correction problem. JACM 26(3), 422–433 (1979)

    Article  Google Scholar 

  16. Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structured data. In: SIGMOD 2005, pp. 754–765 (2005)

    Google Scholar 

  17. Zhang, K.: Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition 28(3), 463–474 (1995)

    Article  Google Scholar 

  18. Zhang, K.: Computing similarity between rna secondary structures. In: INTSYS 1998, pp. 126–132 (1998)

    Google Scholar 

  19. Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müller-Molina, A.J., Hirata, K., Shinohara, T. (2009). A Tree Distance Function Based on Multi-sets. In: Chawla, S., et al. New Frontiers in Applied Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00399-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00399-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00398-1

  • Online ISBN: 978-3-642-00399-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics