Skip to main content

Data compression with substitution

  • Conference paper
  • First Online:
Electronic Dictionaries and Automata in Computational Linguistics (LITP 1987)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 377))

Included in the following conference series:

  • 174 Accesses

Abstract

All the data compression methods described in this paper are based on substitutions acting on characters or factors occurring inside the source texts. The average expected compression ratio is often close to 2. Most methods have a bad behaviour when error appears in encoded texts. One bit lost and the decompression is almost impossible!

To increase the compression ratios, other methods can be used. Arithmetic coding is such an example which leads to higher efficiency.

Another way to increase the compression ratios is to give up the "lossless information" condition. These compaction methods must use semantic rule to recover the original information. Such methods cannot be applied to create archives or to communicate. A compaction example is found in [McI 82] for the "spell" program available under the Unix operating system.

This work has been supported by PRC Math.-Info.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. A.V. Aho, M.J. Corasick, Efficient string matching: An aid to bibliographic research, Commun. ACM18,6 (1975), 333–340.

    Google Scholar 

  2. J.L. Bentley, D.D. Sleator, R.E. Tarjan, V.K. Wei, A locally adaptive data compression scheme, Commun. ACM29,4 (1986), 320–330.

    Google Scholar 

  3. J.Berstel, D.Perrin, Theory of codes, Academic Press (1985).

    Google Scholar 

  4. M. Crochemore, Transducers and Repetitions, Theoret. Comput. Sci.45 (1986), 63–86.

    Google Scholar 

  5. P. Elias, Universal Codeword Sets and Representation of the Integers, I.E.E.E. Trans. Inform. TheoryIT 21,2 (1975), 194–203.

    Google Scholar 

  6. N.Faller, An adaptive system for data compression, in Record of the 7th Asilomar Conference on Circuits, Systems, and Computers (1973), 593–597.

    Google Scholar 

  7. R.G.Gallager, Information Theory and Reliable Communication, Wiley (1968).

    Google Scholar 

  8. R.G. Gallager, Variations on a theme by Huffman, I.E.E.E. Trans. Inform. TheoryIT 24,6 (1978), 668–674.

    Google Scholar 

  9. E.N. Gilbert, C.L. Monma, Multigram Codes, I.E.E.E. Trans. Inform. TheoryIT 28,2 (1982), 346–348.

    Google Scholar 

  10. A.Hartman, M.Rodeh, Optimal Parsing of Strings, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 155–167.

    Google Scholar 

  11. R.W.Hamming, Coding and Information Theory, Prentice-Hall (1980).

    Google Scholar 

  12. G.Held, La compression des données, méthodes et applications, Masson (1987).

    Google Scholar 

  13. D.A. Huffman, A method for the construction of minimum redundancy codes, Proc. IRE40 (1951), 1098–1101.

    Google Scholar 

  14. M. Jakobsson, Compression of character strings by an adaptive dictionary, BIT25 (1985), 593–603.

    Google Scholar 

  15. D.E. Knuth, Dynamic Huffman Coding, J. Algorithms6 (1985), 163–180.

    Google Scholar 

  16. G.G. Langdon Jr., A Note on the Ziv-Lempel Model for Compressing Individual Sequences, I.E.E.E. Trans. Inform.TheoryIT 29,2 (1983), 284–287.

    Google Scholar 

  17. A. Lempel, J. Ziv, On the Complexity of Finite Sequences, I.E.E.E. Trans. Inform.TheoryIT 22,1 (1976), 75–81.

    Google Scholar 

  18. J.A. Llewellyn, Data Compression for a Source with Markov Charateristics, Comput. J.30,2 (1987), 149–156.

    Google Scholar 

  19. M.D. McIlroy, Development of a Spelling List, I.E.E.E. Trans. Commun.COM 30,1 (1982), 91–99.

    Google Scholar 

  20. V.S.Miller, M.N.Wegman, Variations on a Theme by Ziv and Lempel, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985), 131–140.

    Google Scholar 

  21. J. Rissanen, G.G. Langdon Jr., Arithmetic Coding, IBM J. Res. Dev.23,2 (1979), 149–162.

    Google Scholar 

  22. J. Rissanen, G.G. Langdon Jr., Universal Modeling and Coding, I.E.E.E. Trans. Inform.TheoryIT 27,1 (1981), 12–23.

    Google Scholar 

  23. M. Rodeh, V.R. Pratt, S. Even, Linear Algorithm for Data Compression via String Matching, J. Assoc. Comput. Mach.28,1 (1981), 16–24.

    Google Scholar 

  24. J.A.Storer, Textual Substitution Techniques for Data Compression, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 111–129.

    Google Scholar 

  25. J.A. Storer, T.G. Szymanski, Data Compression via Textual Substitution, J. Assoc. Comput. Mach.29,4 (1982), 928–951.

    Google Scholar 

  26. J.S. Vitter, Design and Analysis of Dynamic Huffman Codes, J. Assoc. Comput. Mach.34,4 (1987), 825–845.

    Google Scholar 

  27. T.A. Welch, A Technique for High-Performance Data Compression, I.E.E.E. Computer17,6 (1984), 8–19.

    Google Scholar 

  28. I.H. Witten, R.M. Neal, J.G. Cleary, Arithmetic coding for data compression, Commun. ACM30,6 (1987), 520–540.

    Google Scholar 

  29. J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression, I.E.E.E. Trans. Inform.TheoryIT 23,3 (1977), 337–343.

    Google Scholar 

  30. J. Ziv, A. Lempel, Compression of Individual Sequences via Variable-rate Coding, I.E.E.E. Trans. Inform.TheoryIT 24,5 (1978), 530–536.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maurice Gross Dominique Perrin

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Crochemore, M. (1989). Data compression with substitution. In: Gross, M., Perrin, D. (eds) Electronic Dictionaries and Automata in Computational Linguistics. LITP 1987. Lecture Notes in Computer Science, vol 377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-51465-1_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-51465-1_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-51465-7

  • Online ISBN: 978-3-540-48140-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics