Advertisement

Compressed Indexing with Signature Grammars

  • Anders Roy Christiansen
  • Mikko Berggren Ettienne
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10807)

Abstract

The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S.

We present a data structure that supports pattern matching queries in \(O(m + \mathsf {occ}(\lg \lg n + \lg ^\epsilon z))\) time using \(O(z \lg (n / z))\) space where z is the size of the LZ77 parse of S and \(\epsilon > 0\) is an arbitrarily small constant, when the alphabet is small or \(z = O(n^{1 - \delta })\) for any constant \(\delta > 0\). We also present two data structures for the general case; one where the space is increased by \(O(z\lg \lg z)\), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using \(O(z\lg (n/z))\) space.

Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.

References

  1. 1.
    Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proceedings of the 11th Annual Symposium on Discrete Algorithms. Citeseer (2000)Google Scholar
  2. 2.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15775-2_37 CrossRefGoogle Scholar
  3. 3.
    Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel-Ziv compressed indexing. In: 28th Annual Symposium on Combinatorial Pattern Matching. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2017)Google Scholar
  4. 4.
    Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th SOCG, pp. 1–10 (2011)Google Scholar
  5. 5.
    Cole, R., Vishkin, U.: Deterministic coin tossing with applications to optimal parallel list ranking. Inf. Control 70(1), 32–53 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fredman, M.L., Willard, D.E.: Blasting through the information theoretic barrier with fusion trees. In: Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, STOC 1990, pp. 1–7. ACM, New York (1990)Google Scholar
  8. 8.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28332-1_21 CrossRefGoogle Scholar
  9. 9.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54423-1_63 CrossRefGoogle Scholar
  10. 10.
    Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. arXiv preprint arXiv:1705.10382 (2017)
  11. 11.
    Gawrychowski, P., Karczmarz, A., Kociumaka, T., Łącki, J., Sankowski, P.: Optimal dynamic strings. arXiv preprint arXiv:1511.02612 (2015)
  12. 12.
    Jeż, A.: Faster fully compressed pattern matching by recompression. ACM Trans. Algorithms (TALG) 11(3), 20 (2015)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing (WSP 1996), vol. 26, no. (Teollisuuskatu 23), pp. 141–155 (1996)Google Scholar
  14. 14.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. (CSUR) 39(1), 2 (2007)CrossRefzbMATHGoogle Scholar
  17. 17.
    Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index, LZ factorization, and LCE queries in compressed space. arXiv preprint arXiv:1504.06954 (2015)
  18. 18.
    Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proceedings of the 50th FOCS, pp. 315–323 (2009)Google Scholar
  19. 19.
    Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proceedings of 37th Conference on Foundations of Computer Science, October 1996Google Scholar
  20. 20.
    Tomohiro, I.: Longest common extension with recompression (2017)Google Scholar
  21. 21.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Anders Roy Christiansen
    • 1
  • Mikko Berggren Ettienne
    • 1
  1. 1.The Technical University of DenmarkKongens LyngbyDenmark

Personalised recommendations