Skip to main content

Reducing the Space Requirement of LZ-Index

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

The LZ-index is a compressed full-text self-index able to represent a text P 1...m, over an alphabet of size \(\sigma = O(\textrm{polylog}(u))\) and with k-th order empirical entropy H k (T), using 4uH k (T) + o(ulogσ) bits for any k = o(log σ u). It can report all the occ occurrences of a pattern P 1...m in T in O(m 3logσ + (m + occ)logu) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uH k (T) + o(ulogσ) bits of space, for any constant ε> 0, and we simultaneously improve the search time to O(m 2logm + (m + occ)logu). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/log σ u) time. In addition, we show how the space can be squeezed to (1 + ε)uH k (T) + o(ulogσ) to obtain a structure with O(m 2) average search time for \(m \geqslant 2\log_\sigma{u}\).

Supported in part by CONICYT PhD Fellowship Program (first author) and Fondecyt Grant 1-050493 (second author) and the Grant-in-Aid of the Ministry of Education, Science, Sports and Culture of Japan (third author).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. on Computing 17(3), 427–462 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: Proc. FOCS, pp. 184–196 (2005)

    Google Scholar 

  4. Ferragina, P., Manzini, G.: Indexing compressed texts. J. of the ACM 54(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  5. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004), Extended version: ACM TALG (to appear)

    Chapter  Google Scholar 

  6. Geary, R., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)

    Google Scholar 

  7. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)

    Google Scholar 

  8. Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. on Computing 29(3), 893–911 (1999)

    Article  MathSciNet  Google Scholar 

  9. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  10. Morrison, D.R.: Patricia – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)

    Article  MathSciNet  Google Scholar 

  11. Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Google Scholar 

  12. Munro, I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses and Static Trees. SIAM J. on Computing 31(3), 762–776 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004), See also TR/DCC-2003-0, Dept. of CS, U. Chile, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/jlzindex.ps.gz

  15. Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA, pp. 233–242 (2002)

    Google Scholar 

  16. Sadakane, K.: New Text Indexing Functionalities of the Compressed Suffix Arrays. J. of Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Sadakane, K., Grossi, R.: Squeezing Succinct Data Structures into Entropy Bounds. In: Proc. SODA, pp. 1230–1239 (2006)

    Google Scholar 

  18. Ziv, J., Lempel, A.: Compression of individual sequences via variable–rate coding. IEEE Trans. Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arroyuelo, D., Navarro, G., Sadakane, K. (2006). Reducing the Space Requirement of LZ-Index. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_29

Download citation

  • DOI: https://doi.org/10.1007/11780441_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics