Advertisement

Finger Search in Grammar-Compressed Strings

  • Philip Bille
  • Anders Roy Christiansen
  • Patrick Hagge Cording
  • Inge Li Gørtz
Article

Abstract

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log N) time and subsequently accessing in O(log D) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log N) time and accessing and moving the finger in O(log D + log log N) time. Compared to the best linear space solution to random access, we improve a O(log N) query bound to O(log D) for the static variant and to O(log D + log log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Keywords

Compression Grammars Finger search Algorithms 

References

  1. 1.
    Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceeding of the 39th FOCS, pp. 534–543 (1998)Google Scholar
  2. 2.
    Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proceeding of the DCC, pp. 119–128 (1998)Google Scholar
  3. 3.
    Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Proceeding of the DCC, pp. 143–152 (2000)Google Scholar
  4. 4.
    Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733–1744 (2000)CrossRefGoogle Scholar
  5. 5.
    Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Proceeding of the 23rd ESA (2015)Google Scholar
  6. 6.
    Belazzougui, D., Gagie, T., Gawrychowski, P., Karkkainen, J., Ordonez, A., Puglisi, S., Tabei, Y.: Queries on lz-bounded encodings. In: Proceeding of the DCC, pp. 83–92 (2015)Google Scholar
  7. 7.
    Bentley, J.L., Yao, A.C.-C.: An almost optimal algorithm for unbounded searching. Inform. Process. Lett. 5(3), 82–87 (1976)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Bille, P., Cording, P.H., Gørtz, I.L.: Compressed subsequence matching and packed tree coloring. Algorithmica, 1–13 (2015).  https://doi.org/10.1007/s00453-015-0068-9
  9. 9.
    Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2013). Announced at WADSMathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2014). Announced at SODA 2011MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Blelloch, G.E., Maggs, B.M., Woo, S.L.M.: Space-efficient finger search on degree-balanced search trees. In: Proceeding of the 14th SODA, pp. 374–383 (2003)Google Scholar
  12. 12.
    Brodal, G.S.: Finger search trees. In: Handbook of Data Structures and Applications. Chapman and Hall/CRC (2004)Google Scholar
  13. 13.
    Brodal, G.S., Lagogiannis, G., Makris, C., Tsakalidis, A.K., Tsichlas, K.: Optimal finger search trees in the pointer machine. J. Comput. Syst Sci. 67(2), 381–418 (2003)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005). Announced at STOC 2002 and SODA 2002MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)MathSciNetMATHGoogle Scholar
  16. 16.
    Cording, P.H., Gawrychowski, P., Weimann, O.: Bookmarks in grammar-compressed strings. In: Proceeding of the 23rd SPIRE, pp. x–y (2016)Google Scholar
  17. 17.
    Dietz, P.F., Raman, R.: A constant update time finger search tree. Inf. Process. Lett. 52(3), 147–154 (1994)CrossRefMATHGoogle Scholar
  18. 18.
    Farach, M., Muthukrishnan, S.: Perfect hashing for strings: formalization and algorithms. In: Proceeding of the 7th CPM, pp. 130–140. Springer (1996)Google Scholar
  19. 19.
    Fleischer, R.: A simple balanced search tree with o(1) worst-case update time. Int. J. Found. Comput. Sci. 7(2), 137–150 (1996)CrossRefMATHGoogle Scholar
  20. 20.
    Gage, P.: A new algorithm for data compression. The C Users J. 12(2), 23–38 (1994)Google Scholar
  21. 21.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceeding of the 6th LATA, pp. 240–251 (2012)Google Scholar
  22. 22.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceeding of the 11th LATIN, pp. 731–742. Springer (2014)Google Scholar
  23. 23.
    Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms 32, 64–68 (2015)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Gagie, T., Hoobin, C., Puglisi, S.J.: Block graphs in practice. In: Proceeding of the ICABD, pp. 30–36 (2014)Google Scholar
  25. 25.
    Ga̧sieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proceeding of the 15th DCC, p. 458 (2005)Google Scholar
  26. 26.
    Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Proceeding of the 26th CPM, pp. 219–230. Springer (2015)Google Scholar
  27. 27.
    Guibas, L.J., McCreight, E.M., Plass, M.F., Roberts, J.R.: A new representation for linear lists. In: Proceeding of the 9Th STOC, pp. 49–60 (1977)Google Scholar
  28. 28.
    I, T., Matsubara, W., Shimohira, K., Inenaga, S., Bannai, H., Takeda, M., Narisawa, K., Shinohara, A.: Detecting regularities on grammar-compressed strings. Inform. Comput. 240, 74–89 (2015)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Kieffer, J.C., Yang, E.H.: Grammar based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Kieffer, J.C., Yang, E.H., Nelson, G.J., Cosman, P.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(5), 1227–1245 (2000)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Kosaraju, S.R.: Localized search in sorted lists. In: Proceeding of the 13th STOC, pp. 62–69, New York (1981)Google Scholar
  33. 33.
    Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  34. 34.
    Mehlhorn, K.: A new data structure for representing sorted lists. In: Proceeding of the WG, pp. 90–112 (1981)Google Scholar
  35. 35.
    Navarro, G., Ordónez, A.: Grammar compressed sequences with rank/select support. In: 21St SPIRE, pp. 31–44. Springer (2014)Google Scholar
  36. 36.
    Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)MATHGoogle Scholar
  37. 37.
    Nishimoto, T., I, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: Proceeding of the 41st MFCS, pp. 72:1–72:15 (2016)Google Scholar
  38. 38.
    Pugh, W.: Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990)CrossRefGoogle Scholar
  39. 39.
    Rytter, W.: Application of lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Seidel, R., Aragon, C.R.: Randomized search trees. Algorithmica 16(4/5), 464–497 (1996)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Dept. of Informatics Kyushu University (1999)Google Scholar
  42. 42.
    Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. J. ACM 32(3), 652–686 (1985)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Tanaka, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Computing convolution on grammar-compressed text. In: Proceeding of the 23rd DCC, pp. 451–460 (2013)Google Scholar
  44. 44.
    Tomohiro, I., Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Compressed automata for dictionary matching. Theor. Comput. Sci. 578, 30–41 (2015)MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)MathSciNetMATHGoogle Scholar
  46. 46.
    Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceeding of the 24th CPM, pp. 247–258 (2013)Google Scholar
  47. 47.
    Welch, T.A.: A technique for high-performance data compression. IEEE Computer 17(6), 8–19 (1984)CrossRefGoogle Scholar
  48. 48.
    Yang, E.H., Kieffer, J.C.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform – part one: without context models. IEEE Trans. Inf. Theory 46(3), 755–754 (2000)CrossRefMATHGoogle Scholar
  49. 49.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefMATHGoogle Scholar
  50. 50.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Technical University of DenmarkKgs. LyngbyDenmark

Personalised recommendations