Statistical Encoding of Succinct Data Structures

González, Rodrigo; Navarro, Gonzalo

doi:10.1007/11780441_27

Statistical Encoding of Succinct Data Structures

Rodrigo González¹⁸ &
Gonzalo Navarro¹⁸

Conference paper

936 Accesses
24 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Abstract

In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any sequence S=s ₁ s ₂...s _n, over an alphabet of size σ, using \(nH_k(S)+O(\frac{n}{\log_\sigma n} (k \log \sigma + \log\log n))\) bits of space, where H _k(S) is the k-th order empirical entropy of S. The representation permits extracting any substring of size Θ(log_σ n) in constant time, and thus it completely replaces S under the RAM model. This is extremely important because it permits converting any succinct data structure requiring o(|S|) = o(nlogσ) bits in addition to S, into another requiring nH _k(S)+o(nlogσ) (overall) for any k = o(log_σ n). They achieve this result by using Ziv-Lempel compression, and conjecture that the result can in particular be useful to implement compressed full-text indexes.

In this paper we extend their result, by obtaining the same space and time complexities using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. In addition, we prove some results on the applicability of the scheme for full-text self-indexing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bell, T., Cleary, J., Witten, I.: Text compression. Prentice-Hall, Englewood Cliffs (1990)
Google Scholar
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Google Scholar
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: Proc. 46th FOCS (2005)
Google Scholar
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and searching XML data via two zips. In: Proc. 15th WWW 2006 (2006)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. 41st FOCS, pp. 390–398 (2000)
Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004) (Extended version to appear in ACM TALG)
Chapter Google Scholar
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)
Google Scholar
Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)
Article MathSciNet Google Scholar
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)
MathSciNet Google Scholar
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Article MathSciNet Google Scholar
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Google Scholar
Munro, I., Raman, R., Raman, V., Rao, S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)
Chapter Google Scholar
Munro, I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. 38th FOCS, pp. 118–126 (1997)
Google Scholar
Munro, I., Rao, S.S.: Succinct Representations of Functions. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 1006–1015. Springer, Heidelberg (2004)
Chapter Google Scholar
Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004)
Article MATH Google Scholar
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)
Google Scholar
Sadakane, K., Grossi, R.: Personal communication (2005)
Google Scholar
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. 17th SODA, pp. 1230–1239 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Chile,
Rodrigo González & Gonzalo Navarro

Authors

Rodrigo González
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Moshe Lewenstein
Department of Software, Technical University of Catalonia, 08034, Barcelona, Spain
Gabriel Valiente

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González, R., Navarro, G. (2006). Statistical Encoding of Succinct Data Structures. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_27

Download citation

DOI: https://doi.org/10.1007/11780441_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics