Skip to main content

A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model

  • Conference paper
Book cover Web Information Systems Engineering – WISE 2013 (WISE 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Included in the following conference series:

Abstract

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yamamoto, Y., Hirata, K., Kuboyama, T.: On Computing Tractable Variations of Unordered Tree Edit Distance with Network Algorithms. In: Okumura, M., Bekki, D., Satoh, K. (eds.) JSAI-isAI 2012. LNCS, vol. 7258, pp. 211–223. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Pawlik, M., Augsten, N.: RTED: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment 5(4), 334–345 (2011)

    Google Scholar 

  3. Bille, P.: A survey on Tree Edit Distance and Related Problems. Theoretical Computer Science 337(1-3), 217–239 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Shasha, D., Wang, J.T.L., Kaizhong, Z., Shih, F.Y.: Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man and Cybernetics 24(4), 668–678 (1994)

    Article  Google Scholar 

  5. Nayak, R.: Fast and effective clustering of XML data using structural information. Knowledge and Information Systems 14(2), 197–215 (2008)

    Article  MathSciNet  Google Scholar 

  6. Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering Using a Tensor Space Model. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 488–499. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Transactions on Algorithms 6(1), 1–19 (2009)

    Article  MathSciNet  Google Scholar 

  8. Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42(3), 133–139 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hirata, K., Yamamoto, Y., Kuboyama, T.: Improved MAX SNP-Hard Results for Finding an Edit Distance between Unordered Trees. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 402–415. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Fukagawa, D., Tamura, T., Takasu, A., Tomita, E., Akutsu, T.: A Clique-based Method for the Edit Distance between Unordered Trees and Its Application to Analysis of Glycan Structures. BMC Bioinformatics 12(1), 1–9 (2011)

    Article  Google Scholar 

  11. Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E., Akutsu, T.: A clique-based method using dynamic programming for computing edit distance between unordered trees. Journal of Computational Biology 19(10), 1089–1104 (2012)

    Article  MathSciNet  Google Scholar 

  12. Torsello, A., Hancock, E.R.: Computing approximate tree edit distance using relaxation labeling. Pattern Recognition Letters 24(8), 1089–1097 (2003)

    Article  MATH  Google Scholar 

  13. Chen, Y., Cooke, D.: Unordered Tree Matching and Strict Unordered Tree Matching: The Evaluation of Tree Pattern Queries. In: The 2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 33–41. IEEE Computer Society, Huangshan (2010)

    Chapter  Google Scholar 

  14. Zhang, S., Wang, J.T.L.: Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering 20(1), 68–82 (2008)

    Article  Google Scholar 

  15. Akutsu, T., Fukagawa, D., Takasu, A.: Improved approximation of the largest common subtree of two unordered trees of bounded height. Information Processing Letters 109(2), 165–170 (2008)

    Article  MathSciNet  Google Scholar 

  16. Valiente, G.: Algorithms on trees and graphs. Springer, Heidelberg (2002)

    Book  MATH  Google Scholar 

  17. Chi, Y., Yang, Y., Muntz, R.R.: Canonical Forms for Labelled Trees and Their Applications in Frequent Subtree Mining. Knowledge and Information System 8(2), 203–234 (2005)

    Article  Google Scholar 

  18. Scholl, A.: Balancing and Sequencing of Assembly Lines, 2nd edn. Physica-Verlag, Heidelberg (1999)

    Book  Google Scholar 

  19. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Representations of graphs. In: Introduction to Algorithms, 3rd edn., pp. 524–531. MIT Press and McGraw-Hill, Cambridge (2009)

    Google Scholar 

  20. Romanowski, C.J., Nagi, R.: On Comparing Bills of Materials: A Similarity/Distance Measure for Unordered Trees. IEEE Transactions on System, Man, and Cybernets 35(2), 249–260 (2005)

    Article  Google Scholar 

  21. Kanehisa, M.: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research. 38(suppl 1), D355–D360 (2010)

    Google Scholar 

  22. Akutsu, T., Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E.: An Improved Clique-Based Method for Computing Edit Distance between Unordered Trees and Its Application to Comparison of Glycan Structures. In: International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010), pp. 536–540 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chowdhury, I.J., Nayak, R. (2013). A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41230-1_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41229-5

  • Online ISBN: 978-3-642-41230-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics