Advertisement

A Fast Longest Common Subsequence Algorithm for Similar Strings

  • Abdullah N. Arslan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6031)

Abstract

The longest common subsequence problem is a very important computational problem for which there are many algorithms. We present a new algorithm for this problem. Let X and Y be any two given strings each of length O(n). We observe that a longest common subsequence can be obtained by using longest common prefixes of suffixes (longest common extensions) of X and Y. The longest common extension problem asks for the longest common prefix of suffixes starting in a given pair of positions in X and Y, respectively. Let e be the number of edit operations, insert, delete, and substitute to change X to Y (i.e. let e be the edit distance between X and Y). Our algorithm visits \(O(\min\{en,(1+\sqrt{2})^{2e+1})\) nodes in the edit graph, and for every visited node, performs one longest common extension query. Each of these queries can be answered in constant time if we represent the strings by a suffix tree or a suffix array. These data structures can be created in linear time. We do not assume that the edit distance e is known beforehand, therefore we try values for e starting with e = 1 (without loss of generality X ≠ Y) and double e until our algorithm finds a longest common subsequence. The total time complexity of our algorithm is \(O(\min\{en\log{n},n+e(1+\sqrt{2})^{2e+1}\})\). This is a better time complexity result compared to those of existing solutions for the problem when e is small. For example, when \(e\leq \frac{1}{3}((\log_{(1+\sqrt{2})}~{n})-1)\) our algorithm finds an optimal solution in time O(n).

Keywords

algorithm string edit distance longest common subsequence suffix tree lowest common ancestor suffix array longest common extension dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apostolico, A., Guerra, C.: The longest common subsequence problem revisited. Algorithmica (2), 315–336 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Bergroth, L., Hakonen, H., Ratia, T.: A survey of longest common subsequence algorithms. In: SPIRE, pp. 39–48 (2000)Google Scholar
  4. 4.
    Fischer, J., Heun, V.: Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)zbMATHGoogle Scholar
  6. 6.
    Ilie, L., Tinta, L.: Practical algorithms for the longest common extension problem. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 302–309. Springer, Heidelberg (2009)Google Scholar
  7. 7.
    Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest common prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Kuo, S., Cross, G.R.: An algorithm to find the length of the longest common subsequence of two strings. ACM SIGIR Forum 23(3-4), 89–99 (1989)CrossRefGoogle Scholar
  9. 9.
    Masek, W.J., Paterson, M.S.: A faster algorithm for computing string-edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Miller, W., Myers, E.W.: A file comparison program. Softw. Pract. Exp. 15(11), 1025–1040 (1985)CrossRefGoogle Scholar
  11. 11.
    Nakatsu, N., Kambayashi, Y., Yajima, S.: A longest common subsequence algorithm suitable for similar texts. Acta Informatica 18, 171–179 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Ukkonen, E.: Algorithms for approximate string matching. Information and Control 64, 100–118 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1975)CrossRefGoogle Scholar
  14. 14.
    Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) sequence comparison algorithm. Inf. Proc. Lett. 35, 317–323 (1990)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Abdullah N. Arslan
    • 1
  1. 1.Department of Computer Science and Information SystemsTexas A & M University - CommerceUSA

Personalised recommendations