CPM 2008: Combinatorial Pattern Matching pp 230-243

# Fast Algorithms for Computing Tree LCS

• Shay Mozes
• Dekel Tsur
• Oren Weimann
• Michal Ziv-Ukelson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5029)

## Abstract

The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time $$O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$$, where r is the number of pairs (v ∈ F, w ∈ G) such that v and w have the same label. Our second algorithm runs in time $$O(L r \lg r \cdot \lg\lg|G|)$$, where L is the size of the LCS of F and G. For this algorithm we present a novel three dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an $$O(r h \lg \lg|G|)$$ time algorithm, where h = height(F) + height(G).

## Keywords

Match Pair Main Path Longe Common Subsequence Path Decomposition Longe Common Subsequence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Abouelhoda, M.I., Ohlebusch, E.: Chaining algorithms for multiple genome comparison. J. of Discrete Algorithms 3(2-4), 321–341 (2005)
2. 2.
Amir, A., Hartman, T., Kapah, O., Shalom, B.R., Tsur, D.: Generalized LCS. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 50–61. Springer, Heidelberg (2007)
3. 3.
Apostolico, A., Guerra, C.: The longest common subsequence problem revisited. Algorithmica 2, 315–336 (1987)
4. 4.
Backofen, R., Hermelin, D., Landau, G.M., Weimann, O.: Normalized similarity of RNA sequences. In: Proc. 12th symposium on String Processing and Information Retrieval (SPIRE), pp. 360–369 (2005)Google Scholar
5. 5.
Backofen, R., Hermelin, D., Landau, G.M., Weimann, O.: Local alignment of RNA sequences with arbitrary scoring schemes. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 246–257. Springer, Heidelberg (2006)
6. 6.
Bille, P.: A survey on tree edit distance and related problems. Theoretical computer science 337, 217–239 (2005)
7. 7.
Bille, P.: Pattern Matching in Trees and Strings. PhD thesis, ITU University of Copenhagen (2007)Google Scholar
8. 8.
Chawathe, S.: Comparing hierarchical data in external memory. In: Proc. 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, U.K, pp. 90–101 (1999)Google Scholar
9. 9.
Chen, W.: New algorithm for ordered tree-to-tree correction problem. J. of Algorithms 40, 135–158 (2001)
10. 10.
Chin, F.Y.L., Poon, C.K.: A fast algorithm for computing longest common subsequences of small alphabet size. J. of Information Processing 13(4), 463–469 (1990)
11. 11.
Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. on Computing 32, 1654–1673 (2003)
12. 12.
Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 146–157. Springer, Heidelberg (2007)
13. 13.
Eppstein, D., Galil, Z., Giancarlo, R., Italiano, G.F.: Sparse dynamic programming i: linear cost functions. J. of the ACM 39(3), 519–545 (1992)
14. 14.
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. of Computing 13(2), 338–355 (1984)
15. 15.
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Com. ACM 18(6), 341–343 (1975)
16. 16.
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. of the ACM 24(4), 664–675 (1977)
17. 17.
Hsu, W.J., Du, M.W.: New algorithms for the LCS problem. J. of Computer and System Sciences 29(2), 133–152 (1984)
18. 18.
Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20(5), 350–353 (1977)
19. 19.
Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)
20. 20.
Klein, P.N., Tirthapura, S., Sharvit, D., Kimia, B.B.: A tree-edit-distance algorithm for comparing simple, closed shapes. In: Proc. 11th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 696–704 (2000)Google Scholar
21. 21.
Levenstein, V.I.: Binary codes capable of correcting insetrions and reversals. Sov. Phys. Dokl. 10, 707–719 (1966)
22. 22.
Lozano, A., Valiente, G.: On the maximum common embedded subtree problem for ordered trees. In: Iliopoulos, C.S., Lecroq, T. (eds.) String Algorithmics, pp. 155–170. King’s College Publications (2004)Google Scholar
23. 23.
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. of Computer and System Sciences 20(1), 18–31 (1980)
24. 24.
Myers, G., Miller, W.: Chaining multiple-alignment fragments in sub-quadratic time. In: Proc. 6th annual ACM-SIAM symposium on Discrete algorithms (SODA), pp. 38–47 (1995)Google Scholar
25. 25.
Rick, C.: Simple and fast linear space computation of longest common subsequences. Information Processing Letters 75(6), 275–281 (2000)
26. 26.
Tai, K.: The tree-to-tree correction problem. J. of the ACM 26(3), 422–433 (1979)
27. 27.
Touzet, H.: A linear tree edit distance algorithm for similar ordered trees. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 334–345. Springer, Heidelberg (2005)Google Scholar
28. 28.
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6(3), 80–82 (1977)
29. 29.
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. of the ACM 21(1), 168–173 (1974)
30. 30.
Zhang, K.: Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition 28(3), 463–474 (1995)
31. 31.
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. of Computing 18(6), 1245–1262 (1989)

## Authors and Affiliations

• Shay Mozes
• 1
• Dekel Tsur
• 2
• Oren Weimann
• 3
• Michal Ziv-Ukelson
• 2
1. 1.Brown UniversityProvidenceUSA
2. 2.Ben-Gurion UniversityBeer-ShevaIsrael
3. 3.Massachusetts Institute of TechnologyCambridgeUSA