Abstract
In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here we close the gaps for LZ78 and BISECTION by showing that the approximation ratio of LZ78 is \(\varTheta ( (n/\log n)^{2/3})\), whereas the approximation ratio of BISECTION is \(\varTheta ( (n/\log n)^{1/2})\). We also derive a lower bound for a smallest grammar for a word in terms of its number of LZ77-factors, which refines existing bounds of Rytter. Finally, we improve results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is shown in [14] that every SLP in Chomsky normal form for w has at least \(g_{\mathsf {LZ77}}(w)\) many nonterminals. But the number of nonterminals in a smallest Chomsky normal form SLP for w is bounded by g(w).
References
Arpe, J., Reischuk, R.: On the complexity of optimal grammar-based compression. In: Proceedings of the DCC 2006, pp. 173–182. IEEE Computer Society (2006)
Berstel, J., Brlek, S.: On the length of word chains. Inf. Process. Lett. 26(1), 23–28 (1987)
Casel, K., Fernau, H., Gaspers, S., Gras, B., Schmid, M.L.: On the complexity of grammar-based compression over fixed alphabets. In: Proceeding ICALP 2016, LNCS. Springer, Heidelberg (2016, to appear)
Charikar, M., Lehman, E., Lehman, A., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Diwan, A.A.: A New Combinatorial Complexity Measure for Languages. Tata Institute, Bombay (1986)
Gasieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Efficient algorithms for Lempel-Ziv encoding (extended abstract). In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)
Jeż, A.: Approximation of grammar-based compression via recompression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 165–176. Springer, Heidelberg (2013)
Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)
Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(4), 1227–1245 (2000)
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proceedings of the DCC 1999, pp. 296–305. IEEE Computer Society (1999)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Heidelberg (2008)
Lohrey, M.: The Compressed Word Problem for Groups. Springer, Heidelberg (2014)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)
Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1977)
Acknowledgment
The work in this paper was supported by the DFG grant LO 748/10-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Hucke, D., Lohrey, M., Reh, C.P. (2016). The Smallest Grammar Problem Revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-46049-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46048-2
Online ISBN: 978-3-319-46049-9
eBook Packages: Computer ScienceComputer Science (R0)