Abstract
In this paper, the optimality proof of Lempel-Ziv coding is re-studied, and a much more general compression optimality theorem is derived. In particular, the property of quasi-distinct parsing is defined. This property is much weaker than distinct parsing required in the original proof, yet we show that the theorem holds with this weaker property as well. This provides a better understanding of the optimality proof of Lempel-Ziv coding, together with a new tool for proving optimality of other compression schemes. To demonstrate the possible use of this generalization, a new coding method – the APT coding – is presented. This new coding method is based on a principle that is very different from Lempel-Ziv’s coding. Moreover, it does not directly define any parsing technique. Nevertheless, APT coding is analyzed in this paper and using the generalized theorem shown to be asymptotically optimal up to a constant factor, if APT quasi-distinctness hypothesis holds. An empirical evidence that this hypothesis holds is also given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Levy, A., Reuveni, L.: The practical efficiency of convolutions in pattern matching algorithms. Fundamenta Informaticae 84(1), 1–15 (2008)
Bell, T.C., Cleary, J.G., Witten, I.H.: Text compression. Prentice-Hall, Englewood Cliffs (1990)
Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm., Tech. Rep. 124, Digital Equipment Corporation, Palo Alto, Calif. (1994)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, Hoboken (1991)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th FOCS, pp. 549–554 (1989)
Kieffer, J.C., Yang, E.-H.: Grammar based codes: a new class of universal lossless source codes. IEEE Transactions on Information Theory IT-46(3), 737–754 (2000)
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Transactions on Information Theory IT-22, 75–81 (1976)
Louchard, G., Szpankowski, W.: Generalized lempel-ziv parsing scheme and its preliminary analysis of the average profile. In: Data Compression Conference (DCC), pp. 262–271 (1995)
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM (JACM) 48(3), 407–430 (2001)
Nevill-Manning, C., Witten, I., Maulsby, D.: Compression by induction of hierarchical grammars. In: Proceedings of Data Compression Conference (DCC), pp. 244–253 (1994)
Wyner, A.D., Ziv, J.: Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Transactions on Information Theory IT-35(6), 1250–1258 (1989)
Wyner, A.D., Ziv, J.: The sliding-window lempel-ziv algorithm is asymptotically optimal. Proceedings of the IEEE 82, 872–877 (1994)
Ziv, J.: Coding theorems for individual sequences. IEEE Transactions on Information Theory IT-24(4), 405–412 (1978)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory IT-24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amir, A., Aumann, Y., Levy, A., Roshko, Y. (2009). Quasi-distinct Parsing and Optimal Compression Methods. In: Kucherov, G., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 2009. Lecture Notes in Computer Science, vol 5577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02441-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-02441-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02440-5
Online ISBN: 978-3-642-02441-2
eBook Packages: Computer ScienceComputer Science (R0)