Abstract
Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yamamoto, Y., Hirata, K., Kuboyama, T.: On Computing Tractable Variations of Unordered Tree Edit Distance with Network Algorithms. In: Okumura, M., Bekki, D., Satoh, K. (eds.) JSAI-isAI 2012. LNCS, vol. 7258, pp. 211–223. Springer, Heidelberg (2012)
Pawlik, M., Augsten, N.: RTED: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment 5(4), 334–345 (2011)
Bille, P.: A survey on Tree Edit Distance and Related Problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Shasha, D., Wang, J.T.L., Kaizhong, Z., Shih, F.Y.: Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man and Cybernetics 24(4), 668–678 (1994)
Nayak, R.: Fast and effective clustering of XML data using structural information. Knowledge and Information Systems 14(2), 197–215 (2008)
Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering Using a Tensor Space Model. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 488–499. Springer, Heidelberg (2011)
Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Transactions on Algorithms 6(1), 1–19 (2009)
Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42(3), 133–139 (1992)
Hirata, K., Yamamoto, Y., Kuboyama, T.: Improved MAX SNP-Hard Results for Finding an Edit Distance between Unordered Trees. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 402–415. Springer, Heidelberg (2011)
Fukagawa, D., Tamura, T., Takasu, A., Tomita, E., Akutsu, T.: A Clique-based Method for the Edit Distance between Unordered Trees and Its Application to Analysis of Glycan Structures. BMC Bioinformatics 12(1), 1–9 (2011)
Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E., Akutsu, T.: A clique-based method using dynamic programming for computing edit distance between unordered trees. Journal of Computational Biology 19(10), 1089–1104 (2012)
Torsello, A., Hancock, E.R.: Computing approximate tree edit distance using relaxation labeling. Pattern Recognition Letters 24(8), 1089–1097 (2003)
Chen, Y., Cooke, D.: Unordered Tree Matching and Strict Unordered Tree Matching: The Evaluation of Tree Pattern Queries. In: The 2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 33–41. IEEE Computer Society, Huangshan (2010)
Zhang, S., Wang, J.T.L.: Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering 20(1), 68–82 (2008)
Akutsu, T., Fukagawa, D., Takasu, A.: Improved approximation of the largest common subtree of two unordered trees of bounded height. Information Processing Letters 109(2), 165–170 (2008)
Valiente, G.: Algorithms on trees and graphs. Springer, Heidelberg (2002)
Chi, Y., Yang, Y., Muntz, R.R.: Canonical Forms for Labelled Trees and Their Applications in Frequent Subtree Mining. Knowledge and Information System 8(2), 203–234 (2005)
Scholl, A.: Balancing and Sequencing of Assembly Lines, 2nd edn. Physica-Verlag, Heidelberg (1999)
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Representations of graphs. In: Introduction to Algorithms, 3rd edn., pp. 524–531. MIT Press and McGraw-Hill, Cambridge (2009)
Romanowski, C.J., Nagi, R.: On Comparing Bills of Materials: A Similarity/Distance Measure for Unordered Trees. IEEE Transactions on System, Man, and Cybernets 35(2), 249–260 (2005)
Kanehisa, M.: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic acids research. 38(suppl 1), D355–D360 (2010)
Akutsu, T., Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E.: An Improved Clique-Based Method for Computing Edit Distance between Unordered Trees and Its Application to Comparison of Glycan Structures. In: International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010), pp. 536–540 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chowdhury, I.J., Nayak, R. (2013). A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-41230-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)