Abstract
Reorganizing data in order to improve the performance of a given compressor \(\mathcal{C}\) is a recent and important paradigm in data compression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Page 836 of Buchsbaum et al. (2003) says: “computing a good approximation to the TSP reordering before partitioning contributes significant compression improvement at minimal time cost. [...] This time is negligible compared to the time to compute the optimal, contiguous partition via DP”.
- 2.
We are assuming that \(\mathcal{C}(\alpha )\) is a prefix-free encoding of \(\alpha \), so that we can concatenate the compressed output of many substrings and still be able to recover them via a sequential scan.
- 3.
Notice that we can precompute and store the last occurrence of symbol \(T[j+1]\) in \(T[1:j]\) for all \(j\)s in linear time and space.
- 4.
Notice that the value \(\log ((r_i-l+1)!)\) can be stored in a variable and updated in constant time since the size of the value \(r_i-l+1\) changes just by one after a \({\textsc {Remove}}\) or an \({\textsc {Append}}\).
- 5.
Here we assume that it contains at least one symbol. Nevertheless, as we will see, the compression gap between booster’s partition and the optimal one grows as the cost of the model becomes bigger.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Atlantis Press and the authors
About this chapter
Cite this chapter
Venturini, R. (2014). Optimally Partitioning a Text to Improve Its Compression. In: Compressed Data Structures for Strings. Atlantis Studies in Computing, vol 4. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-033-1_3
Download citation
DOI: https://doi.org/10.2991/978-94-6239-033-1_3
Published:
Publisher Name: Atlantis Press, Paris
Print ISBN: 978-94-6239-032-4
Online ISBN: 978-94-6239-033-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)