Advertisement

Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

  • Raffaele Giancarlo
  • Marinella Sciortino
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)

Abstract

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.

Keywords

Data Compression Binary String Compression Algorithm Optimal Partition Input String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burrows, M., Wheeler, D.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)Google Scholar
  2. 2.
    Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression. The Computer Journal 39 (1996) 731–740CrossRefGoogle Scholar
  3. 3.
    Buchsbaum, A.L., Caldwell, D.F., Church, K.W., Fowler, G.S., Muthukrishnan, S.: Engineering the compression of massive tables: An experimental approach. In: Proc. 11th ACM-SIAM Symp. on Discrete Algorithms. (2000) 175–184Google Scholar
  4. 4.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48 (2001) 407–430CrossRefMathSciNetGoogle Scholar
  5. 5.
    Bentley, J., Sleator, D., Tarjan, R., Wei, V.: A locally adaptive data compression scheme. Comm. of ACM 29 (1986) 320–330zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience (1990)Google Scholar
  7. 7.
    Effros, M.: Universal lossless source coding with the Burrows-Wheeler transform. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1999) 178–187Google Scholar
  8. 8.
    Sadakane, K.: On optimality of variants of the block sorting compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1998) 570Google Scholar
  9. 9.
    Arnavut, Z., Magliveras, S.S.: Block sorting and compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1997) 181–190Google Scholar
  10. 10.
    Balkenhol, B., Kurtz, S.: Universal data compression based on the Burrows and Wheeler-transformation: Theory and practice. Technical Report 98-069, Sonderforshunngsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, Germany (1998) Available from http://www.mathematik.uni-bielefeld.de/sfb343/preprints.
  11. 11.
    Wirth, A.I., Moffat, A.: Can we do without ranks in Burrows Wheeler transform compression? In: Proc. IEEE Data Compression Conference, IEEE Computer Society (2001) 419–428Google Scholar
  12. 12.
    Buchsbaum, A.L., Giancarlo, R., Fowler, G.S.: Improving table compression with combinatorial optimization. In: Proc. 13th ACM-SIAM Symp. on Discrete Algorithms. (2002) 213–222Google Scholar
  13. 13.
    Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. on Information Theory IT-23 (1977) 337–343MathSciNetGoogle Scholar
  14. 14.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory IT-24 (1978) 530–578CrossRefMathSciNetGoogle Scholar
  15. 15.
    Moffat, A.: Implementing the PPM data compression scheme. IEEE Trans. on Communication COM-38 (1990) 1917–1921CrossRefGoogle Scholar
  16. 16.
    Cormak, G., Horspool, R.: Data compression using dynamic markov modelling. Computer J. 30 (1987) 541–550Google Scholar
  17. 17.
    Cleary, J., Teahan, W.: Unbounded length contexts for PPM. Computer J. 40 (1997) 67–75CrossRefGoogle Scholar
  18. 18.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21 (1975) 194–203zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Levenshtein, V.: On the redundancy and delay of decodable coding of natural numbers. (Translation from) Problems in Cybernetics, Nauka, Mscow 20 (1968) 173–179Google Scholar
  20. 20.
    Capocelli, R.M., Giancarlo, R., Taneja, I.: Bounds on the redundancy of Huffman codes. IEEE Transactions on Information Theory 32 (1986) 854–857zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Raffaele Giancarlo
    • 1
  • Marinella Sciortino
    • 1
  1. 1.Dipartimento di Matematica ed ApplicazioniUniversità degli Studi di PalermoPalermoItaly

Personalised recommendations