Skip to main content

Bicriteria Data Compression: Efficient and Usable

  • Conference paper
Book cover Algorithms - ESA 2014 (ESA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8737))

Included in the following conference series:

Abstract

Lempel-Ziv’s LZ77 algorithm is the de facto choice for compressing massive datasets (see e.g., Snappy in BigTable, Lz4 in Cassandra) because its algorithmic structure is flexible enough to guarantee very fast decompression speed at reasonable compressed-space occupancy. Recent theoretical results have shown how to design a bit-optimal LZ77-compressor which minimizes the compress size and how to deploy it in order to design a bicriteria data compressor, namely an LZ77-compressor which trades compressed-space occupancy versus its decompression time in a smoothed and principled way. Preliminary experiments were promising but raised many algorithmic and engineering questions which have to be addressed in order to turn these algorithmic results into an effective and practical tool. In this paper we address these issues by first designing a novel bit-optimal LZ77-compressor which is simple, cache-aware and asymptotically optimal. We benchmark our approach by investigating several algorithmic and implementation issues over many dataset types and sizes, and against an ample class of classic (LZ-based, PPM-based and BWT-based) as well as engineered compressors (Snappy, Lz4, and Lzma2). We conclude noticing how our novel bicriteria LZ77-compressor improves the state-of-the-art of fast (de)compressors Snappy and Lz4.

This work was partially supported by MIUR of Italy under the project PRIN ARS Technomedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer (2008)

    Google Scholar 

  2. Békési, J., Galambos, G., Pferschy, U., Woeginger, G.J.: Greedy algorithms for on-line data compression. J. Algorithms 25(2), 274–289 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  3. Borthakur, D., et al.: Apache Hadoop goes realtime at Facebook. In: SIGMOD, pp. 1071–1080 (2011)

    Google Scholar 

  4. Brodal, G.S., Fagerberg, R., Greve, M., López-Ortiz, A.: Online sorted range reporting. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 173–182. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. Digital (1994)

    Google Scholar 

  6. Chang, F., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26(2) (2008)

    Google Scholar 

  7. Farruggia, A., Ferragina, P., Frangioni, A., Venturini, R.: Bicriteria data compression. In: SODA, pp. 1582–1595 (2014)

    Google Scholar 

  8. Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)

    Article  MathSciNet  Google Scholar 

  9. Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of Lempel-Ziv compression. In: SODA, pp. 768–777 (2009)

    Google Scholar 

  10. Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61(1), 51–74 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of Lempel-Ziv compression. SIAM Journal on Computing (SICOMP) 42(4), 1521–1541 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  12. Katajainen, J., Raita, T.: An analysis of the longest match and the greedy heuristics in text encoding. Journal of the ACM 39(2), 281–294 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  13. Klein, S.T.: Efficient optimal recompression. Computer Journal 40(2/3), 117–126 (1997)

    Article  Google Scholar 

  14. Huang, L., Jia, J., Yu, B., Chun, B., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: NIPS, pp. 883–891 (2010)

    Google Scholar 

  15. Salomon, D.: Data Compression: the Complete Reference, 4th edn. Springer (2006)

    Google Scholar 

  16. Schuegraf, E.J., Heaps, H.S.: A comparison of algorithms for data base compression by use of fragments as language elements. Information Storage and Retrieval 10(9-10), 309–319 (1974)

    Article  Google Scholar 

  17. Smith, M.E.G., Storer, J.A.: Parallel algorithms for data compression. Journal of the ACM 32(2), 344–373 (1985)

    Article  MATH  Google Scholar 

  18. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers (1999)

    Google Scholar 

  19. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transaction on Information Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  20. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Farruggia, A., Ferragina, P., Venturini, R. (2014). Bicriteria Data Compression: Efficient and Usable. In: Schulz, A.S., Wagner, D. (eds) Algorithms - ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44777-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44777-2_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44776-5

  • Online ISBN: 978-3-662-44777-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics