Abstract
Dictionary is a crucial data structure to implement grammar-based compression algorithms. Such a dictionary should access any codes in O(1) time for an efficient compression. A standard dictionary consisting of fixed-length codes consumes a large amount of memory of 2n logn bits for n variables. We present novel dictionaries consisting of variable-length codes for offline and online grammar-based compression algorithms. In an offline setting, we present a dictionary of at most min {nlogn + 2n + o(n), 3nlogσ(1 + o(1))} bits of space where \(\sigma<2\sqrt{n}\). In an online setting, we present a dictionary of at most \(\frac{7}{4}n\log n + 4n + o(n)\) bits of space for a constant alphabet and unknown n. Experiments revealed that memory usage in our dictionary was much smaller than that of state-of-the-art dictionaries.
Partially supported by KAKENHI(23680016, 20589824) and JST PRESTO program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barbay, J., Navarro, G.: Compressed Representations of Permutations, and Applications. In: STACS, pp. 111–122 (2009)
Brisaboa, N.R., Ladra, S., Navarro, G.: Directly Addressable Variable-Length Codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51, 2554–2576 (2005)
Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundam. Inform. 111(3), 313–337 (2011)
Erdős, P., Szekeres, G.: A combinatorial problem in geometry. Compositio Mathematica 2, 463–470 (1935)
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. In: SODA, pp. 690–696 (2007)
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram Mining on SLP Compressed Strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 278–289. Springer, Heidelberg (2011)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 636–645 (2003)
Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression. In: STACS, pp. 26–28 (2009)
Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)
Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)
Maruyama, S., Sakamoto, H., Takeda, M.: An online algorithm for lightweight grammar-based compression. Algorithms 5(2), 213–235 (2012)
Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Okanohara, D.: dag_vector, https://github.com/pfi/dag_vector
Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302, 211–222 (2003)
Sakamoto, H., Kida, T., Shimozono, S.: A Space-Saving Linear-Time Algorithm for Grammar-Based Compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)
Sakamoto, H., Maruyama, S., Kida, T., Shimozono, S.: A space-saving approximation algorithm for grammar-based compression. IEICE Trans. Inf. Syst. 92(2), 158–165 (2009)
Tiskin, A.: Towards Approximate Matching in Compressed Strings: Local Subsequence Recognition. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 401–414. Springer, Heidelberg (2011)
Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takabatake, Y., Tabei, Y., Sakamoto, H. (2012). Variable-Length Codes for Space-Efficient Grammar-Based Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)