Abstract
All the data compression methods described in this paper are based on substitutions acting on characters or factors occurring inside the source texts. The average expected compression ratio is often close to 2. Most methods have a bad behaviour when error appears in encoded texts. One bit lost and the decompression is almost impossible!
To increase the compression ratios, other methods can be used. Arithmetic coding is such an example which leads to higher efficiency.
Another way to increase the compression ratios is to give up the "lossless information" condition. These compaction methods must use semantic rule to recover the original information. Such methods cannot be applied to create archives or to communicate. A compaction example is found in [McI 82] for the "spell" program available under the Unix operating system.
This work has been supported by PRC Math.-Info.
Preview
Unable to display preview. Download preview PDF.
7. References
A.V. Aho, M.J. Corasick, Efficient string matching: An aid to bibliographic research, Commun. ACM18,6 (1975), 333–340.
J.L. Bentley, D.D. Sleator, R.E. Tarjan, V.K. Wei, A locally adaptive data compression scheme, Commun. ACM29,4 (1986), 320–330.
J.Berstel, D.Perrin, Theory of codes, Academic Press (1985).
M. Crochemore, Transducers and Repetitions, Theoret. Comput. Sci.45 (1986), 63–86.
P. Elias, Universal Codeword Sets and Representation of the Integers, I.E.E.E. Trans. Inform. TheoryIT 21,2 (1975), 194–203.
N.Faller, An adaptive system for data compression, in Record of the 7th Asilomar Conference on Circuits, Systems, and Computers (1973), 593–597.
R.G.Gallager, Information Theory and Reliable Communication, Wiley (1968).
R.G. Gallager, Variations on a theme by Huffman, I.E.E.E. Trans. Inform. TheoryIT 24,6 (1978), 668–674.
E.N. Gilbert, C.L. Monma, Multigram Codes, I.E.E.E. Trans. Inform. TheoryIT 28,2 (1982), 346–348.
A.Hartman, M.Rodeh, Optimal Parsing of Strings, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 155–167.
R.W.Hamming, Coding and Information Theory, Prentice-Hall (1980).
G.Held, La compression des données, méthodes et applications, Masson (1987).
D.A. Huffman, A method for the construction of minimum redundancy codes, Proc. IRE40 (1951), 1098–1101.
M. Jakobsson, Compression of character strings by an adaptive dictionary, BIT25 (1985), 593–603.
D.E. Knuth, Dynamic Huffman Coding, J. Algorithms6 (1985), 163–180.
G.G. Langdon Jr., A Note on the Ziv-Lempel Model for Compressing Individual Sequences, I.E.E.E. Trans. Inform.TheoryIT 29,2 (1983), 284–287.
A. Lempel, J. Ziv, On the Complexity of Finite Sequences, I.E.E.E. Trans. Inform.TheoryIT 22,1 (1976), 75–81.
J.A. Llewellyn, Data Compression for a Source with Markov Charateristics, Comput. J.30,2 (1987), 149–156.
M.D. McIlroy, Development of a Spelling List, I.E.E.E. Trans. Commun.COM 30,1 (1982), 91–99.
V.S.Miller, M.N.Wegman, Variations on a Theme by Ziv and Lempel, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985), 131–140.
J. Rissanen, G.G. Langdon Jr., Arithmetic Coding, IBM J. Res. Dev.23,2 (1979), 149–162.
J. Rissanen, G.G. Langdon Jr., Universal Modeling and Coding, I.E.E.E. Trans. Inform.TheoryIT 27,1 (1981), 12–23.
M. Rodeh, V.R. Pratt, S. Even, Linear Algorithm for Data Compression via String Matching, J. Assoc. Comput. Mach.28,1 (1981), 16–24.
J.A.Storer, Textual Substitution Techniques for Data Compression, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 111–129.
J.A. Storer, T.G. Szymanski, Data Compression via Textual Substitution, J. Assoc. Comput. Mach.29,4 (1982), 928–951.
J.S. Vitter, Design and Analysis of Dynamic Huffman Codes, J. Assoc. Comput. Mach.34,4 (1987), 825–845.
T.A. Welch, A Technique for High-Performance Data Compression, I.E.E.E. Computer17,6 (1984), 8–19.
I.H. Witten, R.M. Neal, J.G. Cleary, Arithmetic coding for data compression, Commun. ACM30,6 (1987), 520–540.
J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression, I.E.E.E. Trans. Inform.TheoryIT 23,3 (1977), 337–343.
J. Ziv, A. Lempel, Compression of Individual Sequences via Variable-rate Coding, I.E.E.E. Trans. Inform.TheoryIT 24,5 (1978), 530–536.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crochemore, M. (1989). Data compression with substitution. In: Gross, M., Perrin, D. (eds) Electronic Dictionaries and Automata in Computational Linguistics. LITP 1987. Lecture Notes in Computer Science, vol 377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-51465-1_1
Download citation
DOI: https://doi.org/10.1007/3-540-51465-1_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51465-7
Online ISBN: 978-3-540-48140-9
eBook Packages: Springer Book Archive