A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model

Chowdhury, Israt Jahan; Nayak, Richi

doi:10.1007/978-3-642-41230-1_35

Israt Jahan Chowdhury²⁰ &
Richi Nayak²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Included in the following conference series:

International Conference on Web Information Systems Engineering

2022 Accesses
4 Citations

Abstract

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yamamoto, Y., Hirata, K., Kuboyama, T.: On Computing Tractable Variations of Unordered Tree Edit Distance with Network Algorithms. In: Okumura, M., Bekki, D., Satoh, K. (eds.) JSAI-isAI 2012. LNCS, vol. 7258, pp. 211–223. Springer, Heidelberg (2012)
Chapter Google Scholar
Pawlik, M., Augsten, N.: RTED: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment 5(4), 334–345 (2011)
Google Scholar
Bille, P.: A survey on Tree Edit Distance and Related Problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Article MathSciNet MATH Google Scholar
Shasha, D., Wang, J.T.L., Kaizhong, Z., Shih, F.Y.: Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man and Cybernetics 24(4), 668–678 (1994)
Article Google Scholar
Nayak, R.: Fast and effective clustering of XML data using structural information. Knowledge and Information Systems 14(2), 197–215 (2008)
Article MathSciNet Google Scholar
Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering Using a Tensor Space Model. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 488–499. Springer, Heidelberg (2011)
Chapter Google Scholar
Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Transactions on Algorithms 6(1), 1–19 (2009)
Article MathSciNet Google Scholar
Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42(3), 133–139 (1992)
Article MathSciNet MATH Google Scholar
Hirata, K., Yamamoto, Y., Kuboyama, T.: Improved MAX SNP-Hard Results for Finding an Edit Distance between Unordered Trees. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 402–415. Springer, Heidelberg (2011)
Chapter Google Scholar
Fukagawa, D., Tamura, T., Takasu, A., Tomita, E., Akutsu, T.: A Clique-based Method for the Edit Distance between Unordered Trees and Its Application to Analysis of Glycan Structures. BMC Bioinformatics 12(1), 1–9 (2011)
Article Google Scholar
Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E., Akutsu, T.: A clique-based method using dynamic programming for computing edit distance between unordered trees. Journal of Computational Biology 19(10), 1089–1104 (2012)
Article MathSciNet Google Scholar
Torsello, A., Hancock, E.R.: Computing approximate tree edit distance using relaxation labeling. Pattern Recognition Letters 24(8), 1089–1097 (2003)
Article MATH Google Scholar
Chen, Y., Cooke, D.: Unordered Tree Matching and Strict Unordered Tree Matching: The Evaluation of Tree Pattern Queries. In: The 2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 33–41. IEEE Computer Society, Huangshan (2010)
Chapter Google Scholar
Zhang, S., Wang, J.T.L.: Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering 20(1), 68–82 (2008)
Article Google Scholar
Akutsu, T., Fukagawa, D., Takasu, A.: Improved approximation of the largest common subtree of two unordered trees of bounded height. Information Processing Letters 109(2), 165–170 (2008)
Article MathSciNet Google Scholar
Valiente, G.: Algorithms on trees and graphs. Springer, Heidelberg (2002)
Book MATH Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical Forms for Labelled Trees and Their Applications in Frequent Subtree Mining. Knowledge and Information System 8(2), 203–234 (2005)
Article Google Scholar
Scholl, A.: Balancing and Sequencing of Assembly Lines, 2nd edn. Physica-Verlag, Heidelberg (1999)
Book Google Scholar
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Representations of graphs. In: Introduction to Algorithms, 3rd edn., pp. 524–531. MIT Press and McGraw-Hill, Cambridge (2009)
Google Scholar
Romanowski, C.J., Nagi, R.: On Comparing Bills of Materials: A Similarity/Distance Measure for Unordered Trees. IEEE Transactions on System, Man, and Cybernets 35(2), 249–260 (2005)
Article Google Scholar
Kanehisa, M.: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research. 38(suppl 1), D355–D360 (2010)
Google Scholar
Akutsu, T., Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E.: An Improved Clique-Based Method for Computing Edit Distance between Unordered Trees and Its Application to Comparison of Glycan Structures. In: International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010), pp. 536–540 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia
Israt Jahan Chowdhury & Richi Nayak

Authors

Israt Jahan Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
AT&T Labs-Research, Florham Park, NJ, USA
Divesh Srivastava
Victoria University, Melbourne, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, I.J., Nayak, R. (2013). A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-41230-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics