Abstract
With many types of networks (e.g., distributed computing, electronic mail, etc.) communication channels are relatively slow. The ability to put large amounts of processing power on a single chip promises to make sophisticated data compression algorithms truly practical. A data encoding/decoding chip can be placed at the ends of every communication channel, with no computational overhead incurred by the communicating processes. Similarly, secondary storage space can be increased by hardware that (invisible to the user) performs data compression. For the purposes of this paper, data compression refers to transforming a string of characters to another (presumable shorter) string, from which it is possible to recover (exactly) the original string at some point later in time. This paper surveys research on data compression methods that employ textual substitution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Apostolico [ 1979 ]. “Linear Pattern Matching and Problems of Data Compression ”, Proc. IEEE International Symposium on Information Theory.
G. Bilardi, M. Pracchi, and F. P. Preparata [ 1981 ]. “A Critique and Appraisal of VLSI Models of Computation”, Conference on VLSI Systems and Computations, Carnegie-Mellon U., 81–88.
M. Blum [ 1967b ]. “On the Size of Machines”, Information and Control 11, 257–265.
G. J. Chaitin [ 1966 ]. “On the Length of Programs for Computing Finite Binary Sequences”, JACM 13: 4, 547–569.
G. J. Chaitin [ 1969 ]. “On the length of Programs for Computing Finite Binary Sequences; Statistical Considerations”, JACM 16: 1, 145–159.
G. J. Chaitin [ 1969b ]. “On the simplicity and Speed for Computing Infinite Sets of Natural Numbers”, JACM 16: 3, 407–422.
G. J. Chaitin [ 1975 ]. “A Theory of Program Size Formally Identical to Information Theory”, JACM 22: 3, 329–340.
G. J. Chaitin [ 1976 ]. “Information-Theoretic Characterizations of Recursive Infinite Strings”, Theoretical Computer Science 2, 45–48.
M. T. Chen and J. Seiferas [ 1984 ]. “Efficient and Elegant Subword-Tree Construction”, Technical Report, Dept. of Computer Science, U. Rochester.
Y. Choueka, A. S. Fraenkel, and Y. Perl [ 1982 ]. “Polynomial Construction of Optimal Prefix Tables for Text Compression”, draft.
R. P. Daley [ 1973 ]. “An Example of Information and Computation Trade-Off”, JACM 20: 4, 687–695.
R. P. Daley [ 1974 ]. “The Extent and Density of Sequences Within the Minimal-Program Complexity Hierarchies”, JCSS 9, 151–163.
R. P. Daley [ 1976 ]. “Noncomplex Sequences: Characterizations and Examples”, Journal of Symbolic Logic 41: 3, 626–638.
R. G. Gallager [ 1978 ]. “Variations on a Theme by Huffman”, IEEE Transactions on Information Theory 24: 6, 668–674.
J. Gallant [ 1982 ]. “String Compression Algorithms”, Ph.D. Thesis, Dept. EECS, Princeton University.
J. Gallant, D. Maier, and J. A. Storer [ 1980 ]. “On finding Minimal Length Superstrings”, JCSS 20, 50–58.
Gonzalez and Storer [ 1982 ]. “Parallel Algorithms for Data Compression”, Technical Report CS-82109, Computer Science Department, Brandeis University.
W. D. Hagamen, D. J. Linden, H. S. Long, and J. C. Weber [ 1972 ]. “Encoding Verbal Information as Unique Numbers”, IBM Systems Journal 11.
B. Hahn [ 1974 ]. “A New Technique for Compression and Storage of Data”, CACM 17: 8, 434–436.
F. Henie [ 1977 ]. Introduction to Computability, Addison Wesley, Reading, MA, 226–236.
D. A. Huffman [ 1952 ]. “A Method for the Construction of Minimum-Redundancy Codes”, Proceedings of the IRE 40, 1098–1101.
T. Kamae [ 1973 ]. “On Kolmogorov’s Complexity and Information”, Osaka Journal of Mathematics 10, 305–307.
R. M. Karp [ 1960 ]. “Minimum Redundancy Coding for the Discrete Noiseless Channel”, IRE Transactions on Information Theory, 27–38.
H. P. Katseff and M.Sipser [ 1977 ]. “Several Results in Program Size Complexity”, Proceedings IEEE 18th Annual Symposium on Foundations of Computer Science, Providence, R. I.
A. N. Kolmogorov [ 1965 ]. “Three approaches to the Quantitative Definition of Information”, Problems of Information Transmission 1, 1–7.
A. N. Kolmogorov [ 1969 ]. “On the Logical Foundation of Information Theory”, Problems of Information Transmission 5, 3–7.
H. T. Kung and C. E. Leiserson [ 1978 ]. “Systolic Arrays (for VLSI)”, Technical Report CMU-CS-79103, Dept. of Computer Science, Carnegie-Mellon University.
G. Langdon [ 1981 ]. “A Note on the Ziv-Lempel Model for Compressing Individual Sequences”, Technical Report RJ3318, IBM Watson Research Laboratory.
H. Kucera and W. N. Francis [ 1967 ]. Computational Analysis of Present-Day American English, Brown University Press., Providence, RI.
A. Lempel and J. Ziv [ 1976 ]. “On the Complexity of Finite Sequences”, IEEE Transactions on Information Theory, 22: 1, 75–81.
A. Lempel and J. Ziv [ 1984 ]. “Compression of Two-Dimensional Data”, draft. A. Lempel and J. Ziv [19846]. Private communication.
M. E. Lesk [ 1970 ]. “Compressed Text Storage”, Bell Laboratories Technical Report, Bell Laboratories, Murray Hill, NJ.
D. W. Loveland [ 1969 ]. “A Variant of the Kolmogorov Concept of Complexity”, Information and Control 15, 510–526.
D. W. Loveland [ 1969b ]. “On Minimal-Program Complexity Measures”, Proceedings First Annual ACM Symposium on Theory of Computing, Marina Del Rey, California, 61–65.
D. Maier [ 1977 ]. “The Complexity of Some Problems on Subsequences and Supersequences”, Proc. Conference on Theoretical Computer Science, University of Waterloo, Waterloo, Ontario, Canada.
D. Maier and J. A. Storer [ 1977 ]. “A Note Concerning the Superstring Problem”, Proc. 1978 Conference on Information Sciences and Systems, Baltimore, MD.
M. E. Majster [ 1979 ]. “Efficient On-Line Construction and Correction of Position Trees”, Technical Report 79–393, Dept. of Computer Science, Cornell University.
B. A. Marron and P.A.D. DeMaine [ 1967 ]. “Automatic Data Compression”, CACM 10: 11, 711–715.
P. Martin-Ldf [ 1966 ]. “The Definition of Random Sequences”, Information and Control 9, 602–619.
A. Mayne and E. B. James [ 1975 ]. “Information Compression by Factorizing Common Strings”, The Computer Journal 18:2, 157–160.
J. P. McCarthy [ 1973 ]. “Automatic File Compression”, International Computing Symposium (North Holland).
E. M. McCreight [ 1976 ]. “A Space-Economical Suffix Tree Construction Algorithm”, JACM 23: 2, 262–272.
C. Mead and L. Conway [ 1982 ]. Introduction to VLSI Systems, Addison-Wesley, Reading, MA.
V. S. Miller and M. N. Wegman [ 1984 ]. “Variations on a Theme by Lempel and Ziv”, Technical Report, IBM Watson Research Laboratory.
R. Morris and K. Thompson [ 1974 ]. “Webster’s Second on the Head of a Pin”, Bell Laboratories Technical Report, Bell Laboratories, Murray Hill, NJ.
M. Pechura [1982]. “File Archival Techniques Using Data Compression”, CACM 25:9, 605–609. J. Reif and J. A. Storer [ 1984 ]. Draft.
M. Rodeh, V. R. Pratt, and S. Even [ 1981 ]. “Linear Algorithms for Data Compression Via String Matching”, JACM 28: 1, 16–24.
F. Rubin [ 1976 ]. “Experiments in Text File Compression”, CACM 19: 11, 617–623.
S. S. Ruth and P. J. Kreutzer [ 1972 ]. “Data Compression for Large Business Files”, Datamation 18: 9, 62–66.
J. B. Seery and J. Ziv [ 1977 ]. “A Universal Data Compression Algorithm: Description and Preliminary Results”, Technical Memorandum 77–1212–6, Bell Laboratories, Murray Hill, N.J.
J. B. Seery and J. Ziv [ 1978 ]. “Further Results on Universal Data Compression”, Technical Memorandum 78–1212–8, Bell Laboratories, Murray Hill, N.J.
C. E. Shannon [ 1951 ]. “Prediction and Entropy of Printed English”, Bell System Technical Journal 30, 50–64; Reprinted in D. Slepian (ed.) [1973]. Key Papers in the Development of Information Theory, IEEE Press, New York, NY, 42–46.
R. W. Sheifler [1977]. “An Analysis of Inline Substitution for a Structured Programming Language”,CA CM 20: 9, 647–654.
J. A. Storer [ 1977 ]. “NP-Completeness Results Concerning Data Compression”, Technical Report 234, Dept. of Electrical Engineering and Computer Science, Princeton University.
J. A. Storer)[1977b]. “PLOC- A Compiler-Compiler for PLI and PLC Users”, Technical Report 236, Dept. of Electrical Engineering and Computer Science, Princeton University.
J. A. Storer and T. G. Szymanski [ 1978 ]. “The Macro Model for Data Compression”, Proceedings Tenth Annual ACM Symposium on Theory of Computing, San Diego, C. A.
J. A. Storer [ 1979 ). “Data Compression: Methods and Complexity Issues”, Ph. D. Thesis, Dept. of Computer Science, Princeton University.
J. A. Storer [ 1983 ]. “Toward an Abstract Theory of Data Compression”, TCS 24, 221–237.
J. A. Storer [ 1982 ]. “Data Compression Arrays to Reduce VLSI Communication Traffic”, Technical Report CS-82–101, Dept. of Computer Science, Brandeis University.
J. A. Storer [ 1982b ]. “Combining Pipes and Trees in VLSI”, Technical Report CS-82–107, Dept. of Computer Science, Brandeis University.
J. A. Storer and T. G. Szymanski [ 1982 ]. “Data Compression Via Textual Substitution”, JACM 29: 4, 928–951.
J. A. Storer [ 1984 ]. “Experiments with On-Line Data Compression of Digital Text Using Dictionaries”, draft.Storer[1984b]. Draft.
N. D. Vasyukova [ 1977 ]. “On the Compact Representation of Information”, Mathematika i Kibernetika 4, 90–93.
M. Visvalingam [ 1976 ]. “Indexing with Coded Deltas–A Data Compaction Technique”, Software–Practice and Experience 6, 397–403.
R. A. Wagner [ 1973 ]. “Common Phrases and Minimum-Space Text Storage”, CACM 16: 3, 148–152.
P. Weiner [ 1973 ]. “Linear Pattern Matching Algorithms”, Proceedings 14th Annual Symposium on Switching and Automata Theory, 1–11.
T. A. Welch 11984]. “A Technique for High-Performance Data Compression”, IEEE Computer 17: 6, 8–19.
J. Ziv [ 1978 ]. “Coding Theorems for Individual Sequences”, IEEE Transactions on Information Theory 24: 4, 405–412.
J. Ziv and A. Lempel [ 1977 ]. “A Universal Algorithm for Sequential Data Compression”, IEEE Transactions on Information Theory 23: 3, 337–343.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1985 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Storer, J.A. (1985). Textual Substitution Techniques for Data Compression. In: Apostolico, A., Galil, Z. (eds) Combinatorial Algorithms on Words. NATO ASI Series, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-82456-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-82456-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-82458-6
Online ISBN: 978-3-642-82456-2
eBook Packages: Springer Book Archive