Advertisement

Top-down tree edit-distance of regular tree languages

  • Sang-Ki Ko
  • Yo-Sub Han
  • Kai Salomaa
Article
  • 3 Downloads

Abstract

We study the edit-distance of regular tree languages. The edit-distance is a useful metric for measuring the similarity or dissimilarity between two objects. A regular tree language is a set of trees accepted by a finite-state tree automaton or described by a regular tree grammar. Given two regular tree languages L and R, we define the edit-distance d(LR) between L and R to be the minimum edit-distance between a tree in L and a tree in R. Given tree automata for L and R, we design a polynomial time algorithm that computes d(LR). We also present an efficient algorithm that identifies a special common string between two context-free grammars using the edit-distance between two tree languages.

Keywords

Tree edit-distance Regular tree languages Tree automata Dynamic programming 

Notes

Acknowledgements

Han was supported by the Basic Science Research Program through NRF (2015R1D1A1A01060097). Salomaa was supported by the Natural Sciences and Engineering Research Council of Canada Grant OGP0147224.

References

  1. 1.
    Bunke, H.: Edit distance of regular languages. In: Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval, pp. 113–124 (1996)Google Scholar
  2. 2.
    Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 90–101 (1999)Google Scholar
  3. 3.
    Choffrut, C., Pighizzini, G.: Distances between languages and reflexivity of relations. Theor. Comput. Sci. 286(1), 117–138 (2002)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Comon, H., Dauchet, M., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2007). http://www.tata.gforge.inria.fr
  5. 5.
    Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Trans. Algorithms 6(1), 2:1–2:19 (2009)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Gécseg, F., Steinby, M.: Tree Automata. Akademiai Kiadó. https://arxiv.org/abs/1509.06233 (1984)
  7. 7.
    Gécseg, F., Steinby, M.: Tree languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, Vol. 3: Beyond Words, pp. 1–68. Springer-Verlag, New York (1997)Google Scholar
  8. 8.
    Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 26(2), 147–160 (1950)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Han, Y.S., Ko, S.K.: Alignment distance of regular tree languages. In: A. Carayol, C. Nicaud (eds.) Proceedings of the 22nd Conference on Implementation and Application of Automata, pp. 126–137 (2017)Google Scholar
  10. 10.
    Han, Y.S., Ko, S.K., Salomaa, K.: The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci. 24(7), 1067–1082 (2013)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Jiang, T., Wang, L., Zhang, K.: Alignment of trees—an alternative to tree edit. Theor. Comput. Sci. 143(1), 137–148 (1995)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Proceedings of the 6th Annual European Symposium on Algorithms, pp. 91–102 (1998)Google Scholar
  13. 13.
    Ko, S.K., Han, Y.S., Salomaa, K.: Approximate matching between a context-free grammar and a finite-state automaton. Inf. Comput. 247, 278–289 (2016)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  15. 15.
    McNaughton, R.: Parenthesis grammars. J. ACM 14(3), 490–500 (1967)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Mohri, M.: Edit-distance of weighted automata: general definitions and algorithms. Int. J. Found. Comput. Sci, 14(6), 957–982 (2003)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Myers, G.: Approximately matching context-free languages. Inf. Process. Lett. 54, 85–92 (1995)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of the 5th International Workshop on the Web and Databases, pp. 61–66 (2002)Google Scholar
  19. 19.
    Paull, M.C., Unger, S.H.: Structural equivalence of context-free grammars. J. Comput. Syst. Sci. 2(4), 427–463 (1968)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.F.: Automatic web news extraction using tree edit distance. In: Proceedings of the 13th International Conference on World Wide Web, pp. 502–511 (2004)Google Scholar
  21. 21.
    Selkow, S.: The tree-to-tree editing problem. Inf. Process. Lett. 6(6), 184–186 (1977)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Tai, K.C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Tekli, J., Chbeir, R., Yetongnon, K.: Survey: an overview on XML similarity: background, current trends and future directions. Comput. Sci. Rev. 3(3), 151–173 (2009)CrossRefMATHGoogle Scholar
  24. 24.
    Thatcher, J.: Tree automata: an informal survey. In: Aho, A. (ed.) Currents in the Theory of Computing, pp. 143–172. Prentice-Hall, Upper Saddle River (1973)Google Scholar
  25. 25.
    Wagner, R.A.: Order-\(n\) correction for regular languages. Commun. ACM 17, 265–268 (1974)CrossRefMATHGoogle Scholar
  26. 26.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research, pp. 354–359 (1990)Google Scholar
  28. 28.
    Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structured data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 754–765 (2005)Google Scholar
  29. 29.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Zhang, Z., Cao, R.L.S., Zhu, Y.: Similarity metric for XML documents. In: Proceedings of Workshop on Knowledge and Experience Management (2003)Google Scholar

Copyright information

© Indian Institute of Technology Madras 2018

Authors and Affiliations

  1. 1.Artificial Intelligence Research CenterKorea Electronics Technology InstituteSeongnamRepublic of Korea
  2. 2.Department of Computer ScienceYonsei UniversitySeoulRepublic of Korea
  3. 3.School of ComputingQueen’s UniversityKingstonCanada

Personalised recommendations