Skip to main content

The Smallest Grammar Problem Revisited

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9954))

Included in the following conference series:

Abstract

In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here we close the gaps for LZ78 and BISECTION by showing that the approximation ratio of LZ78 is \(\varTheta ( (n/\log n)^{2/3})\), whereas the approximation ratio of BISECTION is \(\varTheta ( (n/\log n)^{1/2})\). We also derive a lower bound for a smallest grammar for a word in terms of its number of LZ77-factors, which refines existing bounds of Rytter. Finally, we improve results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is shown in [14] that every SLP in Chomsky normal form for w has at least \(g_{\mathsf {LZ77}}(w)\) many nonterminals. But the number of nonterminals in a smallest Chomsky normal form SLP for w is bounded by g(w).

References

  1. Arpe, J., Reischuk, R.: On the complexity of optimal grammar-based compression. In: Proceedings of the DCC 2006, pp. 173–182. IEEE Computer Society (2006)

    Google Scholar 

  2. Berstel, J., Brlek, S.: On the length of word chains. Inf. Process. Lett. 26(1), 23–28 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  3. Casel, K., Fernau, H., Gaspers, S., Gras, B., Schmid, M.L.: On the complexity of grammar-based compression over fixed alphabets. In: Proceeding ICALP 2016, LNCS. Springer, Heidelberg (2016, to appear)

    Google Scholar 

  4. Charikar, M., Lehman, E., Lehman, A., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Diwan, A.A.: A New Combinatorial Complexity Measure for Languages. Tata Institute, Bombay (1986)

    Google Scholar 

  6. Gasieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Efficient algorithms for Lempel-Ziv encoding (extended abstract). In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  7. Jeż, A.: Approximation of grammar-based compression via recompression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 165–176. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(4), 1227–1245 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proceedings of the DCC 1999, pp. 296–305. IEEE Computer Society (1999)

    Google Scholar 

  11. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  12. Lohrey, M.: The Compressed Word Problem for Groups. Springer, Heidelberg (2014)

    Book  MATH  Google Scholar 

  13. Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)

    MATH  Google Scholar 

  14. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  16. Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

The work in this paper was supported by the DFG grant LO 748/10-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danny Hucke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Hucke, D., Lohrey, M., Reh, C.P. (2016). The Smallest Grammar Problem Revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46049-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46048-2

  • Online ISBN: 978-3-319-46049-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics