Abstract
A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This extends Wyner-Ziv model to lossy environment. In this conference version, we consider only Bernoulli model (i.e., memoryless channel) but our results hold under much weaker probabilistic assumptions.
This research was partially done while the authors were visiting INRIA in Rocquencourt, France. The authors wish to thank INRIA (project ALGO) for a generous support. Additional support for the first author was provided by KBN grant 2 1087 91 01, while for the second author by NSF Grants NCR-9206315 and CCR-9201078 and INT-8912631, and in part by NATO Collaborative Grant 0057/89.
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho, Algorithms for Finding Patterns in Strings, in Handbook of Theoretical Computer Science. Volume A: Algorithms and Complexity (ed. J. van Leeuwen), 255â300, The MIT Press, Cambridge (1990).
N. Alon and J. Spencer, The Probabilistic Method, John Wiley&Sons, New York (1992).
M. Atallah, P. Jacquet and W. Szpankowski, Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm, Proc. Combinatorial Pattern Matching, Tucson, Lecture Notes in Computer Science, 644, (eds. A. Apostolico, M. Crochemore, Z. Galil, U. Manber), pp. 27â40, Springer-Verlag 1992.
R. Arratia and M. Waterman, The Erdös-RĂ©nyi Strong Law for Pattern Matching with Given Proportion of Mismatches, Annals of Probability, 17, 1152â1169 (1989).
R. Arratia, L. Gordon, and M. Waterman, The Erdös-RĂ©nyi Law in Distribution for Coin Tossing and Sequence Matching, Annals of Statistics, 18, 539â570 (1990)
T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression, Englewood Cliffs, NJ: Prentice-Hall, 1971.
P. Billingsley, Convergence of Probability Measure, John Wiley & Sons, New York, 1968.
W. Chang, and E. Lawler, Approximate String Matching in Sublinear Expected Time, Proc. of 1990 FOCS, 116â124 (1990).
T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley&Sons, New York (1991).
I. Csiszår and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York (1981).
J. Feldman, r-Entropy, Equipartition, and Ornstein's Isomorphism Theory in R n, Israel J. Math., 36, 321â345 (1980).
P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, J. Combinatorial Theory. Ser. A, (1994); to appear.
J.C. Kieffer, Strong Converses in Source Coding Relative to a Fidelity Criterion, IEEE Trans. Information Theory, 37, 257â262 (1991).
J. C. Kieffer, Sample Converses in Source Coding Theory, IEEE Trans. Information Theory, 37, 263â268 (1991).
A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75â81 (1976).
D. Ornstein and B. Weiss, Entropy and Data Compression Schemes, IEEE Information Theory, 39, 78â83 (1993).
D. Ornstein and P. Shields, Universal Almost Sure Data Compression, Annals of Probability, 18, 441â452 (1990).
B. Pittel, Asymptotic Growth of a Class of random Trees, Annals of Probability, 13, 414â427 (1985).
Y. Steinberg and M. Gutman, An Algorithm for Source Coding Subject to a Fidelity Criterion, Based on String Matching, IEEE Trans. Information Theory, 39, 877â886 (1993).
W. Szpankowski, Asymptotic Properties of Data Compression and Suffix Trees, IEEE Trans. Information Theory, 39, 1647â1659 (1993).
W. Szpankowski, A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors, SIAM J. Computing, 22, 1176â1198 (1993).
A. Wyner and J. Ziv, Some Asymptotic Properties of the Entropy of a Stationary Ergodic Data Source with Applications to Data Compression, IEEE Trans. Information Theory, 35, 1250â1258 (1989).
Z. Zhang and V. Wei, An On-Line Universal Lossy Data Compression Algorithm via Continuous Codebook Refinement, submitted to a journal.
J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Trans. Information Theory, 23, 3, 337â343 (1977).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luczak, T., Szpankowski, W. (1994). A lossy data compression based on string matching: Preliminary analysis and suboptimal algorithms. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_9
Download citation
DOI: https://doi.org/10.1007/3-540-58094-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive