Ziv-Lempel Compressors with Deferred-Innovation
The noiseless data-compression algorithms introduced by Ziv and Lempel [ZL77, ZL78] parse an input data string into successive substrings, each consisting of two parts: The citation, namely the longest prefix that has appeared earlier in the input, and the innovation, the symbol immediately following the citation. Thus the citation has appeared earlier, but was not then followed by the innovation symbol. In “extremal” versions of the LZ algorithm the citation may have begun anywhere in the input; in “incremental” versions it must have begun a previously parsed substring. Originally the citation and the innovation were encoded, individually or jointly, into an output word to be transmitted or stored. Subsequently, several authors [MW85, ZL78, SS82, W84] speculated that the cost of this encoding might be excessive because the coded innovation contributes roughly lg(α) bits, where α is the size of the input alphabet, regardless of the compressibility of the source. To remedy the possible excess, these authors suggested storing the parsed substring as usual, but encoding for output only the citation, deferring the encoding of the innovation as the first symbol of the next parsed substring. Thus the innovation might participate in whatever compression that substring enjoyed. We call this strategy deferred innovation. It is exemplified in the algorithm described by Welch [W84] and implemented in UNIX compress and its progeny.
KeywordsCompression Ratio Memory Size Alphabet Size Input Alphabet Input Length
Unable to display preview. Download preview PDF.
- MW85.Miller, V.S, and Wegman, M.N., Variations on a theme by Lempel and Ziv. Combinatorial Algorithms on Words, Springer-Verlag (A. Apostolico and Z. Galil, editors) (1985) 131–140.Google Scholar