Skip to main content

Textual Substitution Techniques for Data Compression

  • Conference paper
Combinatorial Algorithms on Words

Part of the book series: NATO ASI Series ((NATO ASI F,volume 12))

Abstract

With many types of networks (e.g., distributed computing, electronic mail, etc.) communication channels are relatively slow. The ability to put large amounts of processing power on a single chip promises to make sophisticated data compression algorithms truly practical. A data encoding/decoding chip can be placed at the ends of every communication channel, with no computational overhead incurred by the communicating processes. Similarly, secondary storage space can be increased by hardware that (invisible to the user) performs data compression. For the purposes of this paper, data compression refers to transforming a string of characters to another (presumable shorter) string, from which it is possible to recover (exactly) the original string at some point later in time. This paper surveys research on data compression methods that employ textual substitution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • A. Apostolico [ 1979 ]. “Linear Pattern Matching and Problems of Data Compression ”, Proc. IEEE International Symposium on Information Theory.

    Google Scholar 

  • G. Bilardi, M. Pracchi, and F. P. Preparata [ 1981 ]. “A Critique and Appraisal of VLSI Models of Computation”, Conference on VLSI Systems and Computations, Carnegie-Mellon U., 81–88.

    Google Scholar 

  • M. Blum [ 1967b ]. “On the Size of Machines”, Information and Control 11, 257–265.

    Article  MATH  Google Scholar 

  • G. J. Chaitin [ 1966 ]. “On the Length of Programs for Computing Finite Binary Sequences”, JACM 13: 4, 547–569.

    Article  MathSciNet  MATH  Google Scholar 

  • G. J. Chaitin [ 1969 ]. “On the length of Programs for Computing Finite Binary Sequences; Statistical Considerations”, JACM 16: 1, 145–159.

    Article  MathSciNet  MATH  Google Scholar 

  • G. J. Chaitin [ 1969b ]. “On the simplicity and Speed for Computing Infinite Sets of Natural Numbers”, JACM 16: 3, 407–422.

    Article  MathSciNet  MATH  Google Scholar 

  • G. J. Chaitin [ 1975 ]. “A Theory of Program Size Formally Identical to Information Theory”, JACM 22: 3, 329–340.

    Article  MathSciNet  MATH  Google Scholar 

  • G. J. Chaitin [ 1976 ]. “Information-Theoretic Characterizations of Recursive Infinite Strings”, Theoretical Computer Science 2, 45–48.

    Article  MathSciNet  MATH  Google Scholar 

  • M. T. Chen and J. Seiferas [ 1984 ]. “Efficient and Elegant Subword-Tree Construction”, Technical Report, Dept. of Computer Science, U. Rochester.

    Google Scholar 

  • Y. Choueka, A. S. Fraenkel, and Y. Perl [ 1982 ]. “Polynomial Construction of Optimal Prefix Tables for Text Compression”, draft.

    Google Scholar 

  • R. P. Daley [ 1973 ]. “An Example of Information and Computation Trade-Off”, JACM 20: 4, 687–695.

    Article  MathSciNet  MATH  Google Scholar 

  • R. P. Daley [ 1974 ]. “The Extent and Density of Sequences Within the Minimal-Program Complexity Hierarchies”, JCSS 9, 151–163.

    MathSciNet  MATH  Google Scholar 

  • R. P. Daley [ 1976 ]. “Noncomplex Sequences: Characterizations and Examples”, Journal of Symbolic Logic 41: 3, 626–638.

    Article  MathSciNet  MATH  Google Scholar 

  • R. G. Gallager [ 1978 ]. “Variations on a Theme by Huffman”, IEEE Transactions on Information Theory 24: 6, 668–674.

    Article  MathSciNet  MATH  Google Scholar 

  • J. Gallant [ 1982 ]. “String Compression Algorithms”, Ph.D. Thesis, Dept. EECS, Princeton University.

    Google Scholar 

  • J. Gallant, D. Maier, and J. A. Storer [ 1980 ]. “On finding Minimal Length Superstrings”, JCSS 20, 50–58.

    MathSciNet  MATH  Google Scholar 

  • Gonzalez and Storer [ 1982 ]. “Parallel Algorithms for Data Compression”, Technical Report CS-82109, Computer Science Department, Brandeis University.

    Google Scholar 

  • W. D. Hagamen, D. J. Linden, H. S. Long, and J. C. Weber [ 1972 ]. “Encoding Verbal Information as Unique Numbers”, IBM Systems Journal 11.

    Google Scholar 

  • B. Hahn [ 1974 ]. “A New Technique for Compression and Storage of Data”, CACM 17: 8, 434–436.

    MATH  Google Scholar 

  • F. Henie [ 1977 ]. Introduction to Computability, Addison Wesley, Reading, MA, 226–236.

    Google Scholar 

  • D. A. Huffman [ 1952 ]. “A Method for the Construction of Minimum-Redundancy Codes”, Proceedings of the IRE 40, 1098–1101.

    Article  Google Scholar 

  • T. Kamae [ 1973 ]. “On Kolmogorov’s Complexity and Information”, Osaka Journal of Mathematics 10, 305–307.

    MathSciNet  MATH  Google Scholar 

  • R. M. Karp [ 1960 ]. “Minimum Redundancy Coding for the Discrete Noiseless Channel”, IRE Transactions on Information Theory, 27–38.

    Google Scholar 

  • H. P. Katseff and M.Sipser [ 1977 ]. “Several Results in Program Size Complexity”, Proceedings IEEE 18th Annual Symposium on Foundations of Computer Science, Providence, R. I.

    Google Scholar 

  • A. N. Kolmogorov [ 1965 ]. “Three approaches to the Quantitative Definition of Information”, Problems of Information Transmission 1, 1–7.

    Google Scholar 

  • A. N. Kolmogorov [ 1969 ]. “On the Logical Foundation of Information Theory”, Problems of Information Transmission 5, 3–7.

    MathSciNet  MATH  Google Scholar 

  • H. T. Kung and C. E. Leiserson [ 1978 ]. “Systolic Arrays (for VLSI)”, Technical Report CMU-CS-79103, Dept. of Computer Science, Carnegie-Mellon University.

    Google Scholar 

  • G. Langdon [ 1981 ]. “A Note on the Ziv-Lempel Model for Compressing Individual Sequences”, Technical Report RJ3318, IBM Watson Research Laboratory.

    Google Scholar 

  • H. Kucera and W. N. Francis [ 1967 ]. Computational Analysis of Present-Day American English, Brown University Press., Providence, RI.

    Google Scholar 

  • A. Lempel and J. Ziv [ 1976 ]. “On the Complexity of Finite Sequences”, IEEE Transactions on Information Theory, 22: 1, 75–81.

    Article  MathSciNet  MATH  Google Scholar 

  • A. Lempel and J. Ziv [ 1984 ]. “Compression of Two-Dimensional Data”, draft. A. Lempel and J. Ziv [19846]. Private communication.

    Google Scholar 

  • M. E. Lesk [ 1970 ]. “Compressed Text Storage”, Bell Laboratories Technical Report, Bell Laboratories, Murray Hill, NJ.

    Google Scholar 

  • D. W. Loveland [ 1969 ]. “A Variant of the Kolmogorov Concept of Complexity”, Information and Control 15, 510–526.

    Article  MathSciNet  MATH  Google Scholar 

  • D. W. Loveland [ 1969b ]. “On Minimal-Program Complexity Measures”, Proceedings First Annual ACM Symposium on Theory of Computing, Marina Del Rey, California, 61–65.

    Chapter  Google Scholar 

  • D. Maier [ 1977 ]. “The Complexity of Some Problems on Subsequences and Supersequences”, Proc. Conference on Theoretical Computer Science, University of Waterloo, Waterloo, Ontario, Canada.

    Google Scholar 

  • D. Maier and J. A. Storer [ 1977 ]. “A Note Concerning the Superstring Problem”, Proc. 1978 Conference on Information Sciences and Systems, Baltimore, MD.

    Google Scholar 

  • M. E. Majster [ 1979 ]. “Efficient On-Line Construction and Correction of Position Trees”, Technical Report 79–393, Dept. of Computer Science, Cornell University.

    Google Scholar 

  • B. A. Marron and P.A.D. DeMaine [ 1967 ]. “Automatic Data Compression”, CACM 10: 11, 711–715.

    Google Scholar 

  • P. Martin-Ldf [ 1966 ]. “The Definition of Random Sequences”, Information and Control 9, 602–619.

    Article  Google Scholar 

  • A. Mayne and E. B. James [ 1975 ]. “Information Compression by Factorizing Common Strings”, The Computer Journal 18:2, 157–160.

    Google Scholar 

  • J. P. McCarthy [ 1973 ]. “Automatic File Compression”, International Computing Symposium (North Holland).

    Google Scholar 

  • E. M. McCreight [ 1976 ]. “A Space-Economical Suffix Tree Construction Algorithm”, JACM 23: 2, 262–272.

    Article  MathSciNet  MATH  Google Scholar 

  • C. Mead and L. Conway [ 1982 ]. Introduction to VLSI Systems, Addison-Wesley, Reading, MA.

    Google Scholar 

  • V. S. Miller and M. N. Wegman [ 1984 ]. “Variations on a Theme by Lempel and Ziv”, Technical Report, IBM Watson Research Laboratory.

    Google Scholar 

  • R. Morris and K. Thompson [ 1974 ]. “Webster’s Second on the Head of a Pin”, Bell Laboratories Technical Report, Bell Laboratories, Murray Hill, NJ.

    Google Scholar 

  • M. Pechura [1982]. “File Archival Techniques Using Data Compression”, CACM 25:9, 605–609. J. Reif and J. A. Storer [ 1984 ]. Draft.

    Google Scholar 

  • M. Rodeh, V. R. Pratt, and S. Even [ 1981 ]. “Linear Algorithms for Data Compression Via String Matching”, JACM 28: 1, 16–24.

    Article  MathSciNet  MATH  Google Scholar 

  • F. Rubin [ 1976 ]. “Experiments in Text File Compression”, CACM 19: 11, 617–623.

    Google Scholar 

  • S. S. Ruth and P. J. Kreutzer [ 1972 ]. “Data Compression for Large Business Files”, Datamation 18: 9, 62–66.

    Google Scholar 

  • J. B. Seery and J. Ziv [ 1977 ]. “A Universal Data Compression Algorithm: Description and Preliminary Results”, Technical Memorandum 77–1212–6, Bell Laboratories, Murray Hill, N.J.

    Google Scholar 

  • J. B. Seery and J. Ziv [ 1978 ]. “Further Results on Universal Data Compression”, Technical Memorandum 78–1212–8, Bell Laboratories, Murray Hill, N.J.

    Google Scholar 

  • C. E. Shannon [ 1951 ]. “Prediction and Entropy of Printed English”, Bell System Technical Journal 30, 50–64; Reprinted in D. Slepian (ed.) [1973]. Key Papers in the Development of Information Theory, IEEE Press, New York, NY, 42–46.

    Google Scholar 

  • R. W. Sheifler [1977]. “An Analysis of Inline Substitution for a Structured Programming Language”,CA CM 20: 9, 647–654.

    Google Scholar 

  • J. A. Storer [ 1977 ]. “NP-Completeness Results Concerning Data Compression”, Technical Report 234, Dept. of Electrical Engineering and Computer Science, Princeton University.

    Google Scholar 

  • J. A. Storer)[1977b]. “PLOC- A Compiler-Compiler for PLI and PLC Users”, Technical Report 236, Dept. of Electrical Engineering and Computer Science, Princeton University.

    Google Scholar 

  • J. A. Storer and T. G. Szymanski [ 1978 ]. “The Macro Model for Data Compression”, Proceedings Tenth Annual ACM Symposium on Theory of Computing, San Diego, C. A.

    Google Scholar 

  • J. A. Storer [ 1979 ). “Data Compression: Methods and Complexity Issues”, Ph. D. Thesis, Dept. of Computer Science, Princeton University.

    Google Scholar 

  • J. A. Storer [ 1983 ]. “Toward an Abstract Theory of Data Compression”, TCS 24, 221–237.

    Article  MathSciNet  MATH  Google Scholar 

  • J. A. Storer [ 1982 ]. “Data Compression Arrays to Reduce VLSI Communication Traffic”, Technical Report CS-82–101, Dept. of Computer Science, Brandeis University.

    Google Scholar 

  • J. A. Storer [ 1982b ]. “Combining Pipes and Trees in VLSI”, Technical Report CS-82–107, Dept. of Computer Science, Brandeis University.

    Google Scholar 

  • J. A. Storer and T. G. Szymanski [ 1982 ]. “Data Compression Via Textual Substitution”, JACM 29: 4, 928–951.

    Article  MathSciNet  MATH  Google Scholar 

  • J. A. Storer [ 1984 ]. “Experiments with On-Line Data Compression of Digital Text Using Dictionaries”, draft.Storer[1984b]. Draft.

    Google Scholar 

  • N. D. Vasyukova [ 1977 ]. “On the Compact Representation of Information”, Mathematika i Kibernetika 4, 90–93.

    Google Scholar 

  • M. Visvalingam [ 1976 ]. “Indexing with Coded Deltas–A Data Compaction Technique”, Software–Practice and Experience 6, 397–403.

    Article  Google Scholar 

  • R. A. Wagner [ 1973 ]. “Common Phrases and Minimum-Space Text Storage”, CACM 16: 3, 148–152.

    Google Scholar 

  • P. Weiner [ 1973 ]. “Linear Pattern Matching Algorithms”, Proceedings 14th Annual Symposium on Switching and Automata Theory, 1–11.

    Google Scholar 

  • T. A. Welch 11984]. “A Technique for High-Performance Data Compression”, IEEE Computer 17: 6, 8–19.

    Google Scholar 

  • J. Ziv [ 1978 ]. “Coding Theorems for Individual Sequences”, IEEE Transactions on Information Theory 24: 4, 405–412.

    Article  MathSciNet  MATH  Google Scholar 

  • J. Ziv and A. Lempel [ 1977 ]. “A Universal Algorithm for Sequential Data Compression”, IEEE Transactions on Information Theory 23: 3, 337–343.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1985 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Storer, J.A. (1985). Textual Substitution Techniques for Data Compression. In: Apostolico, A., Galil, Z. (eds) Combinatorial Algorithms on Words. NATO ASI Series, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-82456-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-82456-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-82458-6

  • Online ISBN: 978-3-642-82456-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics