Skip to main content

Compressed Indexing with Signature Grammars

  • Conference paper
  • First Online:
LATIN 2018: Theoretical Informatics (LATIN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10807))

Included in the following conference series:

Abstract

The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S.

We present a data structure that supports pattern matching queries in \(O(m + \mathsf {occ}(\lg \lg n + \lg ^\epsilon z))\) time using \(O(z \lg (n / z))\) space where z is the size of the LZ77 parse of S and \(\epsilon > 0\) is an arbitrarily small constant, when the alphabet is small or \(z = O(n^{1 - \delta })\) for any constant \(\delta > 0\). We also present two data structures for the general case; one where the space is increased by \(O(z\lg \lg z)\), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using \(O(z\lg (n/z))\) space.

Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proceedings of the 11th Annual Symposium on Discrete Algorithms. Citeseer (2000)

    Google Scholar 

  2. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15775-2_37

    Chapter  Google Scholar 

  3. Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel-Ziv compressed indexing. In: 28th Annual Symposium on Combinatorial Pattern Matching. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2017)

    Google Scholar 

  4. Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th SOCG, pp. 1–10 (2011)

    Google Scholar 

  5. Cole, R., Vishkin, U.: Deterministic coin tossing with applications to optimal parallel list ranking. Inf. Control 70(1), 32–53 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  6. Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  7. Fredman, M.L., Willard, D.E.: Blasting through the information theoretic barrier with fusion trees. In: Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, STOC 1990, pp. 1–7. ACM, New York (1990)

    Google Scholar 

  8. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_21

    Chapter  Google Scholar 

  9. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54423-1_63

    Chapter  Google Scholar 

  10. Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. arXiv preprint arXiv:1705.10382 (2017)

  11. Gawrychowski, P., Karczmarz, A., Kociumaka, T., Łącki, J., Sankowski, P.: Optimal dynamic strings. arXiv preprint arXiv:1511.02612 (2015)

  12. Jeż, A.: Faster fully compressed pattern matching by recompression. ACM Trans. Algorithms (TALG) 11(3), 20 (2015)

    MathSciNet  MATH  Google Scholar 

  13. Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing (WSP 1996), vol. 26, no. (Teollisuuskatu 23), pp. 141–155 (1996)

    Google Scholar 

  14. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  15. Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  16. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. (CSUR) 39(1), 2 (2007)

    Article  MATH  Google Scholar 

  17. Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index, LZ factorization, and LCE queries in compressed space. arXiv preprint arXiv:1504.06954 (2015)

  18. Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proceedings of the 50th FOCS, pp. 315–323 (2009)

    Google Scholar 

  19. Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proceedings of 37th Conference on Foundations of Computer Science, October 1996

    Google Scholar 

  20. Tomohiro, I.: Longest common extension with recompression (2017)

    Google Scholar 

  21. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikko Berggren Ettienne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Christiansen, A.R., Ettienne, M.B. (2018). Compressed Indexing with Signature Grammars. In: Bender, M., Farach-Colton, M., Mosteiro, M. (eds) LATIN 2018: Theoretical Informatics. LATIN 2018. Lecture Notes in Computer Science(), vol 10807. Springer, Cham. https://doi.org/10.1007/978-3-319-77404-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77404-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77403-9

  • Online ISBN: 978-3-319-77404-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics