Abstract
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammar generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored, so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way.
This technique yields an \(\mathcal{O}((n+m)\log M \log(n+m))\) algorithm for compressed pattern matching, where n (m) is the size of the compressed representation of the text (pattern, respectively), while M is the size of the decompressed pattern. Since M ≤ 2m, this substantially improves the previously best \(\mathcal{O}(m^2n)\) algorithm.
Since LZ compression standard reduces to SLP with log( N / n) overhead and in \(\mathcal{O}(n \log(N/n))\) time, the presented algorithm can be applied also to the fully LZ-compressed pattern matching problem, yielding an \(\mathcal{O}(s \log s \log M)\) running time, where s = n log(N/n) + m log(M/m).
The full version of this paper is available at http://arxiv.org/abs/1111.3244
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 819–828 (2000)
Gawrychowski, P.: personal communication (2011)
Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Randall, D. (ed.) SODA, pp. 362–372. SIAM (2011)
Gawrychowski, P.: Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 421–432. Springer, Heidelberg (2011)
Gawrychowski, P.: Simple and Efficient LZW-Compressed Multiple Pattern Matching. In: Kärkkäinen, J. (ed.) CPM 2012. LNCS, vol. 7354, pp. 232–242. Springer, Heidelberg (2012)
Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS 2012. LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl — Leibniz-Zentrum fuer Informatik (2012)
Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Efficient Algorithms for Lempel-Ziv Encoding. In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)
Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Randomized Efficient Algorithms for Compressed Strings: The Finger-Print Approach (Extended Abstract). In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)
Hirao, M., Shinohara, A., Takeda, M., Arikawa, S.: Fully compressed pattern matching algorithm for balanced straight-line programs. In: SPIRE 2000, pp. 132–138 (2000)
Jeż, A.: Compressed membership for NFA (DFA) with compressed labels is in NP (P). In: Dürr, C., Wilke, T. (eds.) STACS 2012. LIPIcs, vol. 14, pp. 136–147. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)
Jeż, A.: Recompression: a simple and powerful technique for word equations. In: CoRR 1203.3705 (submitted, 2012)
Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Pattern Matching in Text Compressed by Using Antidictionaries. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 37–49. Springer, Heidelberg (1999)
Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)
Lifshits, Y., Lohrey, M.: Querying and Embedding Compressed Texts. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 681–692. Springer, Heidelberg (2006)
Lohrey, M., Mathissen, C.: Compressed Membership in Automata with Compressed Labels. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 275–288. Springer, Heidelberg (2011)
Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997)
Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)
Plandowski, W.: Testing Equivalence of Morphisms on Context-Free Languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeż, A. (2012). Faster Fully Compressed Pattern Matching by Recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds) Automata, Languages, and Programming. ICALP 2012. Lecture Notes in Computer Science, vol 7391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31594-7_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-31594-7_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31593-0
Online ISBN: 978-3-642-31594-7
eBook Packages: Computer ScienceComputer Science (R0)