Skip to main content

Variable-Length Codes for Space-Efficient Grammar-Based Compression

  • Conference paper
String Processing and Information Retrieval (SPIRE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7608))

Included in the following conference series:

Abstract

Dictionary is a crucial data structure to implement grammar-based compression algorithms. Such a dictionary should access any codes in O(1) time for an efficient compression. A standard dictionary consisting of fixed-length codes consumes a large amount of memory of 2n logn bits for n variables. We present novel dictionaries consisting of variable-length codes for offline and online grammar-based compression algorithms. In an offline setting, we present a dictionary of at most min {nlogn + 2n + o(n), 3nlogσ(1 + o(1))} bits of space where \(\sigma<2\sqrt{n}\). In an online setting, we present a dictionary of at most \(\frac{7}{4}n\log n + 4n + o(n)\) bits of space for a constant alphabet and unknown n. Experiments revealed that memory usage in our dictionary was much smaller than that of state-of-the-art dictionaries.

Partially supported by KAKENHI(23680016, 20589824) and JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbay, J., Navarro, G.: Compressed Representations of Permutations, and Applications. In: STACS, pp. 111–122 (2009)

    Google Scholar 

  2. Brisaboa, N.R., Ladra, S., Navarro, G.: Directly Addressable Variable-Length Codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51, 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  4. Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)

    Google Scholar 

  5. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundam. Inform. 111(3), 313–337 (2011)

    MathSciNet  MATH  Google Scholar 

  6. Erdős, P., Szekeres, G.: A combinatorial problem in geometry. Compositio Mathematica 2, 463–470 (1935)

    MathSciNet  Google Scholar 

  7. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. In: SODA, pp. 690–696 (2007)

    Google Scholar 

  8. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram Mining on SLP Compressed Strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 278–289. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 636–645 (2003)

    Google Scholar 

  10. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression. In: STACS, pp. 26–28 (2009)

    Google Scholar 

  11. Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)

    Google Scholar 

  12. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  13. Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  14. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Maruyama, S., Sakamoto, H., Takeda, M.: An online algorithm for lightweight grammar-based compression. Algorithms 5(2), 213–235 (2012)

    Article  Google Scholar 

  16. Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Okanohara, D.: dag_vector, https://github.com/pfi/dag_vector

  18. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)

    Google Scholar 

  19. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302, 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Sakamoto, H., Kida, T., Shimozono, S.: A Space-Saving Linear-Time Algorithm for Grammar-Based Compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Sakamoto, H., Maruyama, S., Kida, T., Shimozono, S.: A space-saving approximation algorithm for grammar-based compression. IEICE Trans. Inf. Syst. 92(2), 158–165 (2009)

    Article  Google Scholar 

  22. Tiskin, A.: Towards Approximate Matching in Compressed Strings: Local Subsequence Recognition. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 401–414. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takabatake, Y., Tabei, Y., Sakamoto, H. (2012). Variable-Length Codes for Space-Efficient Grammar-Based Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34109-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34108-3

  • Online ISBN: 978-3-642-34109-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics