Skip to main content

Optimally Partitioning a Text to Improve Its Compression

  • Chapter
  • First Online:
Compressed Data Structures for Strings

Part of the book series: Atlantis Studies in Computing ((ATLANTISCOMP,volume 4))

  • 1102 Accesses

Abstract

Reorganizing data in order to improve the performance of a given compressor \(\mathcal{C}\) is a recent and important paradigm in data compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Page 836 of Buchsbaum et al. (2003) says: “computing a good approximation to the TSP reordering before partitioning contributes significant compression improvement at minimal time cost. [...] This time is negligible compared to the time to compute the optimal, contiguous partition via DP”.

  2. 2.

    We are assuming that \(\mathcal{C}(\alpha )\) is a prefix-free encoding of \(\alpha \), so that we can concatenate the compressed output of many substrings and still be able to recover them via a sequential scan.

  3. 3.

    Notice that we can precompute and store the last occurrence of symbol \(T[j+1]\) in \(T[1:j]\) for all \(j\)s in linear time and space.

  4. 4.

    Notice that the value \(\log ((r_i-l+1)!)\) can be stored in a variable and updated in constant time since the size of the value \(r_i-l+1\) changes just by one after a \({\textsc {Remove}}\) or an \({\textsc {Append}}\).

  5. 5.

    Here we assume that it contains at least one symbol. Nevertheless, as we will see, the compression gap between booster’s partition and the optimal one grows as the cost of the model becomes bigger.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rossano Venturini .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Atlantis Press and the authors

About this chapter

Cite this chapter

Venturini, R. (2014). Optimally Partitioning a Text to Improve Its Compression. In: Compressed Data Structures for Strings. Atlantis Studies in Computing, vol 4. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-033-1_3

Download citation

Publish with us

Policies and ethics

Societies and partnerships