Skip to main content

A lossy data compression based on string matching: Preliminary analysis and suboptimal algorithms

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Abstract

A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This extends Wyner-Ziv model to lossy environment. In this conference version, we consider only Bernoulli model (i.e., memoryless channel) but our results hold under much weaker probabilistic assumptions.

This research was partially done while the authors were visiting INRIA in Rocquencourt, France. The authors wish to thank INRIA (project ALGO) for a generous support. Additional support for the first author was provided by KBN grant 2 1087 91 01, while for the second author by NSF Grants NCR-9206315 and CCR-9201078 and INT-8912631, and in part by NATO Collaborative Grant 0057/89.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.V. Aho, Algorithms for Finding Patterns in Strings, in Handbook of Theoretical Computer Science. Volume A: Algorithms and Complexity (ed. J. van Leeuwen), 255–300, The MIT Press, Cambridge (1990).

    Google Scholar 

  2. N. Alon and J. Spencer, The Probabilistic Method, John Wiley&Sons, New York (1992).

    Google Scholar 

  3. M. Atallah, P. Jacquet and W. Szpankowski, Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm, Proc. Combinatorial Pattern Matching, Tucson, Lecture Notes in Computer Science, 644, (eds. A. Apostolico, M. Crochemore, Z. Galil, U. Manber), pp. 27–40, Springer-Verlag 1992.

    Google Scholar 

  4. R. Arratia and M. Waterman, The Erdös-RĂ©nyi Strong Law for Pattern Matching with Given Proportion of Mismatches, Annals of Probability, 17, 1152–1169 (1989).

    Google Scholar 

  5. R. Arratia, L. Gordon, and M. Waterman, The Erdös-RĂ©nyi Law in Distribution for Coin Tossing and Sequence Matching, Annals of Statistics, 18, 539–570 (1990)

    Google Scholar 

  6. T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression, Englewood Cliffs, NJ: Prentice-Hall, 1971.

    Google Scholar 

  7. P. Billingsley, Convergence of Probability Measure, John Wiley & Sons, New York, 1968.

    Google Scholar 

  8. W. Chang, and E. Lawler, Approximate String Matching in Sublinear Expected Time, Proc. of 1990 FOCS, 116–124 (1990).

    Google Scholar 

  9. T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley&Sons, New York (1991).

    Google Scholar 

  10. I. Csiszår and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York (1981).

    Google Scholar 

  11. J. Feldman, r-Entropy, Equipartition, and Ornstein's Isomorphism Theory in R n, Israel J. Math., 36, 321–345 (1980).

    Google Scholar 

  12. P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, J. Combinatorial Theory. Ser. A, (1994); to appear.

    Google Scholar 

  13. J.C. Kieffer, Strong Converses in Source Coding Relative to a Fidelity Criterion, IEEE Trans. Information Theory, 37, 257–262 (1991).

    Google Scholar 

  14. J. C. Kieffer, Sample Converses in Source Coding Theory, IEEE Trans. Information Theory, 37, 263–268 (1991).

    Google Scholar 

  15. A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).

    Google Scholar 

  16. D. Ornstein and B. Weiss, Entropy and Data Compression Schemes, IEEE Information Theory, 39, 78–83 (1993).

    Google Scholar 

  17. D. Ornstein and P. Shields, Universal Almost Sure Data Compression, Annals of Probability, 18, 441–452 (1990).

    Google Scholar 

  18. B. Pittel, Asymptotic Growth of a Class of random Trees, Annals of Probability, 13, 414–427 (1985).

    Google Scholar 

  19. Y. Steinberg and M. Gutman, An Algorithm for Source Coding Subject to a Fidelity Criterion, Based on String Matching, IEEE Trans. Information Theory, 39, 877–886 (1993).

    Google Scholar 

  20. W. Szpankowski, Asymptotic Properties of Data Compression and Suffix Trees, IEEE Trans. Information Theory, 39, 1647–1659 (1993).

    Google Scholar 

  21. W. Szpankowski, A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors, SIAM J. Computing, 22, 1176–1198 (1993).

    Google Scholar 

  22. A. Wyner and J. Ziv, Some Asymptotic Properties of the Entropy of a Stationary Ergodic Data Source with Applications to Data Compression, IEEE Trans. Information Theory, 35, 1250–1258 (1989).

    Google Scholar 

  23. Z. Zhang and V. Wei, An On-Line Universal Lossy Data Compression Algorithm via Continuous Codebook Refinement, submitted to a journal.

    Google Scholar 

  24. J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Trans. Information Theory, 23, 3, 337–343 (1977).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luczak, T., Szpankowski, W. (1994). A lossy data compression based on string matching: Preliminary analysis and suboptimal algorithms. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics