Skip to main content

Is text compression by prefixes and suffixes practical?

  • Conference paper
  • First Online:
  • 123 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 146))

Abstract

One approach to text compression is to replace high-frequency variable-length fragments of words by fixed-length codes pointing to a compression table containing these high-frequency fragments. It is shown that the problem of optimal fragment compression is NP-hard even if the fragments are restricted to prefixes and suffixes. This seems to be a simplest fragment compression problem which is NP-hard, since a polynomial algorithm for compressing by prefixes only (or suffixes only) has been found recently. Various compression heuristics based on using both prefixes and suffixes have been tested on large Hebrew and English texts. The best of these heuristics produce a net compression of some 37% for Hebrew and 45% for English using a prefix/suffix compression table of size 256.

This work was done within the Responsa Retrieval Project, developed initially at the Weizmann Institute of Science and Bar-Ilan University, now located at the Institute for Information Retrieval and Computational Linguistics (IRCOL), Bar-Ilan University, Ramat Gan, Israel. The work reported herein was done at the Weizmann Institute.

Partial affiliation with IRCOL.

Supported in part by a grant of Bank Leumi Le'Israel.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Choueka, A.S. Fraenkel and Y. Perl, Polynomial construction of optimal prefix tables for text compression, Proc. 19th Annual Allerton Conference on Communication, Control and Computing, pp. 762–768, Oct. 1981.

    Google Scholar 

  2. D. Cooper and M.F. Lynch, Text compression using variable-to-fixed-length encodings, Tech. Report, Postgraduate School of Librarianship and Information Science, University of Sheffield, Western Bank, Sheffield S10 2TN, England.

    Google Scholar 

  3. A.S. Fraenkel, All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary. Proc. 3rd Symp. Legal Data Process. in Europe (Oslo 1975), pp. 131–141, Council of Europe, Strasbourg (1976). Reprinted in Jurimetrics J. 16 (1976) (3), 149–156 and in Informatica e Diritto 1976 II, 362–370.

    Google Scholar 

  4. D. Gotlieb, S.A. Hagerth, P.G.H. Lehot and H.S. Rabinowitz, A classification of compression methods and their usefulness for a large data processing center, National Comp. Conference 44 (1975), 453–458.

    Google Scholar 

  5. D. Lichtenstein, Planar satisfiability and its uses, to appear in SIAM J. on Computing.

    Google Scholar 

  6. D. Maier and J.A. Storer, A note on the complexity of the superstring problem, Extended Abstract, Proc. 1978 Conference on Information Sciences and Systems, Dept. of Elect. Engr., The Johns Hopkins University, Baltimore, MD. 21218, pp. 52–56.

    Google Scholar 

  7. J.L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comp. Mach. 23 (1980), 676–687.

    Google Scholar 

  8. T. Radhakrishnan, Selection of prefix and postfix word fragments for data compression, Inform. Process. & Management 14 (1978), 97–106.

    Google Scholar 

  9. F. Rubin, Experiments in text file compression, Comm. Assoc. Comp. Mach. 19 (1976), 617–623.

    Google Scholar 

  10. J.A. Storer, Toward an abstract theory of data compression, Extended Abstract, Proc. 1978 Conference on Information Sciences and Systems, Dept. of Elect. Engr., The Johns Hopkins University, Baltimore, MD. 21218, pp. 391–399.

    Google Scholar 

  11. J.A. Storer and T.G. Szymanski, The macro model for data compression, Extended Abstract, 1978 Proc. Tenth Annual ACM Symposium on Theory of Computing, San Diego, CA, pp. 30–39.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gerard Salton Hans-Jochen Schneider

Rights and permissions

Reprints and permissions

Copyright information

© 1983 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fraenkel, A.S., Mor, M., Perl, Y. (1983). Is text compression by prefixes and suffixes practical?. In: Salton, G., Schneider, HJ. (eds) Research and Development in Information Retrieval. SIGIR 1982. Lecture Notes in Computer Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0036353

Download citation

  • DOI: https://doi.org/10.1007/BFb0036353

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-11978-4

  • Online ISBN: 978-3-540-39440-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics