Skip to main content

Lempel-Ziv Decoding in External Memory

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9685))

Abstract

Simple and fast decoding is one of the main advantages of LZ77-type text encoding used in many popular file compressors such as gzip and 7zip. With the recent introduction of external memory algorithms for Lempel–Ziv factorization there is a need for external memory LZ77 decoding but the standard algorithm makes random accesses to the text and cannot be trivially modified for external memory computation. We describe the first external memory algorithms for LZ77 decoding, prove that their I/O complexity is optimal, and demonstrate that they are very fast in practice, only about three times slower than in-memory decoding (when reading input and writing output is included in the time).

This research is partially supported by Academy of Finland through grant 258308 and grant 250345 (CoECGR).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.1000genomes.org/.

  2. 2.

    http://dumps.wikimedia.org/.

  3. 3.

    http://www.kernel.org/.

References

  1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988). doi:10.1145/48529.48535

    Article  MathSciNet  Google Scholar 

  2. Badkobeh, G., Crochemore, M., Toopsuwan, C.: Computing the maximal-exponent repeats of an overlap-free string in linear time. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 61–72. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34109-0_8

    Chapter  Google Scholar 

  3. Brodal, G.S., Katajainen, J.: Worst-case efficient external-memory priority queues. In: Arnborg, S. (ed.) SWAT 1998. LNCS, vol. 1432, pp. 107–118. Springer, Heidelberg (1998). doi:10.1007/BFb0054359

    Google Scholar 

  4. Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw. Pract. Exper. 38(6), 589–637 (2008). doi:10.1002/spe.844

    Article  Google Scholar 

  5. Ferrada, H., Gagie, T., Hirvola, T., Puglisi, S.J.: Hybrid indexes for repetitive datasets. Phil. Trans. R. Soc. A 372 (2014). doi:10.1098/rsta.2013.0137

    Google Scholar 

  6. Ferragina, P., Manzini, G.: On compressing the textual web. In: Proceedings of 3rd International Conference on Web Search and Web Data Mining (WSDM), pp. 391–400. ACM (2010). doi:10.1145/1718487.1718536

  7. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi:10.1007/978-3-642-13089-2_23

    Chapter  Google Scholar 

  8. Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Asano, T., Nakano, S., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 653–662. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25591-5_67

    Chapter  Google Scholar 

  9. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28332-1_21

    Chapter  Google Scholar 

  10. Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB 5(3), 265–273 (2011)

    Article  Google Scholar 

  11. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proceedings of 2014 Data Compression Conference (DCC), pp. 153–162. IEEE (2014). doi:10.1109/DCC.2014.78

  12. Kolpakov, R., Bana, G., Kucherov, G.: MREPS: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31(13), 3672–3678 (2003). doi:10.1093/nar/gkg617

    Article  Google Scholar 

  13. Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of 40th Annual Symposium on Foundations of Computer Science (FOCS), pp. 596–604. IEEE Computer Society (1999). doi:10.1109/SFFCS.1999.814634

  14. Kolpakov, R., Kucherov, G.: Finding approximate repetitions under haamming distance. Theor. Comput. Sci. 303(1), 135–156 (2003). doi:10.1016/S0304-3975(02)00448-6

    Google Scholar 

  15. Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proceedings of 2010 Data Compression Conference (DCC), pp. 239–248 (2010). doi:10.1109/DCC.2010.29

  16. Kreft, S., Navarro, G.: Self-indexing based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21458-5_6

    Chapter  Google Scholar 

  17. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16321-0_20

    Chapter  Google Scholar 

  18. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theor. 22(1), 75–81 (1976). doi:10.1109/TIT.1976.1055501

    Article  MathSciNet  MATH  Google Scholar 

  19. Vitter, J.S.: Algorithms and data structures for external memory. Found. Trends Theoret. Comput. Sci. 2(4), 305–474 (2006). doi:10.1561/0400000014

    Article  MathSciNet  MATH  Google Scholar 

  20. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977). doi:10.1109/TIT.1977.1055714

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juha Kärkkäinen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Belazzougui, D., Kärkkäinen, J., Kempa, D., Puglisi, S.J. (2016). Lempel-Ziv Decoding in External Memory. In: Goldberg, A., Kulikov, A. (eds) Experimental Algorithms. SEA 2016. Lecture Notes in Computer Science(), vol 9685. Springer, Cham. https://doi.org/10.1007/978-3-319-38851-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38851-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38850-2

  • Online ISBN: 978-3-319-38851-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics