Skip to main content

Self-indexed Text Compression Using Straight-Line Programs

  • Conference paper
Mathematical Foundations of Computer Science 2009 (MFCS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5734))

Abstract

Straight-line programs (SLPs) offer powerful text compression by representing a text T[1,u] in terms of a restricted context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present a grammar representation whose size is of the same order of that of a plain SLP representation, and can answer other queries apart from expanding nonterminals. This can be of independent interest. We then extend it to achieve the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(n). We also give byproducts on representing binary relations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proc. 2nd DCC, pp. 279–288 (1992)

    Google Scholar 

  2. Barbay, J., Golynski, A., Munro, I., Rao, S.S.: Adaptive searching in succinctly encoded binary relations and tree-structured documents. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 24–35. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comp. Sci. 321(1), 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE TIT 51(7), 2554–2576 (2005)

    MathSciNet  MATH  Google Scholar 

  6. Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)

    Google Scholar 

  7. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. 3(2), 20 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gasieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)

    Google Scholar 

  11. Gasieniec, L., Potapov, I.: Time/space efficient compressed pattern matching. Fund. Inf. 56(1-2), 137–154 (2003)

    MathSciNet  MATH  Google Scholar 

  12. Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)

    Google Scholar 

  13. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)

    Google Scholar 

  14. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  15. Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proc. 3rd WSP, pp. 141–155. Carleton University Press (1996)

    Google Scholar 

  16. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  17. Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comp. Sci. 298(1), 253–272 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kieffer, J., Yang, E.-H.: Grammar-based codes: A new class of universal lossless source codes. IEEE TIT 46(3), 737–754 (2000)

    MathSciNet  MATH  Google Scholar 

  19. Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  20. Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theor. Comp. Sci. 387(3), 332–347 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Morrison, D.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  22. Munro, J., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discr. Alg. 2(1), 87–114 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), 2 (2007)

    Article  MATH  Google Scholar 

  25. Nevill-Manning, C., Witten, I., Maulsby, D.: Compression by induction of hierarchical grammars. In: Proc. 4th DCC, pp. 244–253 (1994)

    Google Scholar 

  26. Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)

    Google Scholar 

  27. Russo, L., Oliveira, A.: A compressed self-index using a Ziv-Lempel dictionary. Inf. Retr. 11(4), 359–388 (2008)

    Article  Google Scholar 

  28. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  29. Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 164–175. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  30. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE TIT 23(3), 337–343 (1977)

    MathSciNet  MATH  Google Scholar 

  31. Ziv, J., Lempel, A.: Compression of individual sequences via variable length coding. IEEE TIT 24(5), 530–536 (1978)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Claude, F., Navarro, G. (2009). Self-indexed Text Compression Using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds) Mathematical Foundations of Computer Science 2009. MFCS 2009. Lecture Notes in Computer Science, vol 5734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03816-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03816-7_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03815-0

  • Online ISBN: 978-3-642-03816-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics