Skip to main content

Faster Fully Compressed Pattern Matching by Recompression

  • Conference paper
Automata, Languages, and Programming (ICALP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7391))

Included in the following conference series:

Abstract

In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammar generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored, so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way.

This technique yields an \(\mathcal{O}((n+m)\log M \log(n+m))\) algorithm for compressed pattern matching, where n (m) is the size of the compressed representation of the text (pattern, respectively), while M is the size of the decompressed pattern. Since M ≤ 2m, this substantially improves the previously best \(\mathcal{O}(m^2n)\) algorithm.

Since LZ compression standard reduces to SLP with log( N / n) overhead and in \(\mathcal{O}(n \log(N/n))\) time, the presented algorithm can be applied also to the fully LZ-compressed pattern matching problem, yielding an \(\mathcal{O}(s \log s \log M)\) running time, where s = n log(N/n) + m log(M/m).

The full version of this paper is available at http://arxiv.org/abs/1111.3244

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 819–828 (2000)

    Google Scholar 

  2. Gawrychowski, P.: personal communication (2011)

    Google Scholar 

  3. Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Randall, D. (ed.) SODA, pp. 362–372. SIAM (2011)

    Google Scholar 

  4. Gawrychowski, P.: Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 421–432. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Gawrychowski, P.: Simple and Efficient LZW-Compressed Multiple Pattern Matching. In: Kärkkäinen, J. (ed.) CPM 2012. LNCS, vol. 7354, pp. 232–242. Springer, Heidelberg (2012)

    Google Scholar 

  6. Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS 2012. LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl — Leibniz-Zentrum fuer Informatik (2012)

    Google Scholar 

  7. Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Efficient Algorithms for Lempel-Ziv Encoding. In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  8. Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Randomized Efficient Algorithms for Compressed Strings: The Finger-Print Approach (Extended Abstract). In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  9. Hirao, M., Shinohara, A., Takeda, M., Arikawa, S.: Fully compressed pattern matching algorithm for balanced straight-line programs. In: SPIRE 2000, pp. 132–138 (2000)

    Google Scholar 

  10. Jeż, A.: Compressed membership for NFA (DFA) with compressed labels is in NP (P). In: Dürr, C., Wilke, T. (eds.) STACS 2012. LIPIcs, vol. 14, pp. 136–147. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)

    Google Scholar 

  11. Jeż, A.: Recompression: a simple and powerful technique for word equations. In: CoRR 1203.3705 (submitted, 2012)

    Google Scholar 

  12. Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Pattern Matching in Text Compressed by Using Antidictionaries. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 37–49. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  13. Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Lifshits, Y., Lohrey, M.: Querying and Embedding Compressed Texts. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 681–692. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Lohrey, M., Mathissen, C.: Compressed Membership in Automata with Compressed Labels. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 275–288. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  17. Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  18. Plandowski, W.: Testing Equivalence of Morphisms on Context-Free Languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  19. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jeż, A. (2012). Faster Fully Compressed Pattern Matching by Recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds) Automata, Languages, and Programming. ICALP 2012. Lecture Notes in Computer Science, vol 7391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31594-7_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31594-7_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31593-0

  • Online ISBN: 978-3-642-31594-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics