Advertisement

Journal of Mathematical Sciences

, Volume 158, Issue 5, pp 759–769 | Cite as

Faster subsequence recognition in compressed strings

  • A. Tiskin
Article

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length \( \overline{m} \), and an uncompressed pattern of length n, Cégielski et al. gave an algorithm for local subsequence recognition running in time \( O\left( {\overline{m} n^2 \log n}\right)\). We improve the running time to \( O\left( {\overline{m} n^{1.5} }\right)\). Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time \( O\left( {\overline{m} n^{1.5} }\right)\); the same problem with a compressed pattern is known to be NP-hard. Bibliography: 22 titles.

Keywords

Recognition Problem Massive Data Longe Common Subsequence Subsequence Recognition Algorithmic Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1976).Google Scholar
  2. 2.
    C. E. R. Alves, E. N. Cáceres, and S. W. Song, “An all-substrings common subsequence algorithm,” Electr. Notes Discr. Math., 19, 133–139 (2005).CrossRefGoogle Scholar
  3. 3.
    J. L. Bentley, “Multidimensional divide-and-conquer,” Comm. ACM, 23, No. 4, 214–229 (1980).MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    P. Cégielski, I. Guessarian, Y. Lifshits, and Y. Matiyasevich, “Window subsequence problems for compressed texts,” in: Proceedings of CSR, Lect. Notes Comp. Sci., 3967 (2006), pp. 127–136.Google Scholar
  5. 5.
    P. Cégielski, I. Guessarian, and Y. Matiyasevich, “Multiple serial episodes matching,” Inform. Process. Lett., 98, No. 6, 211–218 (2006).CrossRefMathSciNetGoogle Scholar
  6. 6.
    M. Crochemore, G. M. Landau, and M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices,” SIAM J. Comp., 32, No. 6, 1654–1673 (2003).MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press (1994).Google Scholar
  8. 8.
    G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, and J. Kärkkäinen, “Episode matching,” in: Proceedings of CPM, Lect. Notes Comp. Sci., 1264 (1997), pp. 12–27.Google Scholar
  9. 9.
    D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press (1997).Google Scholar
  10. 10.
    J. Kärkkäinen, G. Navarro, and E. Ukkonen, “Approximate string matching on Ziv–Lempel compressed text,” J. Discr. Alg., 1, 313–338 (2003).MATHCrossRefGoogle Scholar
  11. 11.
    Y. Lifshits and M. Lohrey, “Querying and embedding compressed texts,” in: Proceedings of MFCS, Lect. Notes Comp. Sci., 4162 (2006), pp. 681–692.Google Scholar
  12. 12.
    W. J. Masek and M. S. Paterson, “A faster algorithm computing string edit distances,” J. Comp. System Sci., 20, 18–31 (1980).MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    G. Myers, “Approximately matching context-free languages,” Inform. Process. Lett., 54, 85–92 (1995).MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    G. Navarro, “A guided tour to approximate string matching,” ACM Comp. Surv., 33, No. 1, 31–88 (2001).CrossRefGoogle Scholar
  15. 15.
    F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer (1985).Google Scholar
  16. 16.
    W. Rytter, “Application of Lempel–Ziv factorization to the approximation of grammar-based compression,” Theor. Comp. Sci., 302, No. 1–3, 211–222 (2003).MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    A. Tiskin, “Semi-local longest common subsequences in subquadratic time,” J. Discr. Alg., 6, No. 4, 570–581 (2008).MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    A. Tiskin, “Semi-local string comparison: Algorithmic techniques and applications,” Math. Comput. Sci., 1, No. 4, 571–603 (2008).MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    B. W. Watson and G. Zwaan, “A taxonomy of sublinear multiple keyword pattern matching algorithms,” Sci. Comput. Programm., 27, No. 2, 85–118 (1996).MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    T. A. Welch, “A technique for high-performance data compression,” Computer, 17, No. 6, 8–19 (1984).CrossRefGoogle Scholar
  21. 21.
    G. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transact. Inform. Theory, 23, 337–343 (1977).MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    G. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Transact. Inform. Theory, 24, 530–536 (1978).MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of WarwickCoventryUnited Kingdom

Personalised recommendations