Skip to main content

A New Approach to Alphabet Extension for Improving Static Compression Schemes

  • Chapter
Language, Culture, Computation. Computing - Theory and Technology

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8001))

  • 591 Accesses

Abstract

The performance of data compression on a large static text may be improved if certain variable-length strings are included in the character set for which a code is generated. A new method for extending the alphabet is presented, based on a reduction to a graph-theoretic problem. A related optimization problem is shown to be NP-complete, a fast heuristic is suggested, and experimental results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arnold, R., Bell, T.: A corpus for the evaluation of lossless compression algorithms. In: Proc. Data Compression Conference DCC 1997, Snowbird, Utah, pp. 201–210 (1997)

    Google Scholar 

  2. Apostolico, A.: The myriad virtues of subword trees, Combinatorial Algorithms on Words. NATO ASI Series, vol. F12, pp. 85–96. Springer, Berlin (1985)

    Book  MATH  Google Scholar 

  3. Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proc. Data Compression Conference DCC 1998, Snowbird, Utah, pp. 119–128 (1998)

    Google Scholar 

  4. Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. of the IEEE 88, 1733–1744 (2000)

    Article  Google Scholar 

  5. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  6. Bell, T.C., Cleary, J.G., Witten, I.A.: Text Compression. Prentice Hall, Englewood Cliffs (1990)

    Google Scholar 

  7. Bell, T., Witten, I.H., Cleary, J.G.: Modeling for Text Compression. ACM Computing Surveys 21, 557–591 (1989)

    Article  Google Scholar 

  8. Bentley, J., McIlroy, D.: Data compression using long common strings. In: Proc. Data Compression Conference, DCC 1999, Snowbird, Utah, pp. 287–295 (1999)

    Google Scholar 

  9. Bookstein, A., Klein, S.T.: Compression, Information Theory and Grammars: A Unified Approach. ACM Trans. on Information Systems 8, 27–49 (1990)

    Article  Google Scholar 

  10. Bookstein, A., Klein, S.T., Raita, T.: An overhead reduction technique for mega-state compression schemes. Information Processing & Management 33, 745–760 (1997)

    Article  Google Scholar 

  11. Bookstein, A., Klein, S.T., Ziff, D.A.: A systematic approach to compressing a full text retrieval system. Information Processing & Management 28, 795–806 (1992)

    Article  Google Scholar 

  12. Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F. (S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Cannane, A., Williams, H.E.: General-purpose compression for efficient retrieval. Journal of the ASIS 52(5), 430–437 (2001)

    Google Scholar 

  14. Choueka, Y.: Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of jewish heritage in Hebrew. IEEE Data Eng. Bull. 14(4), 22–31 (1989)

    Google Scholar 

  15. Even, S.: Graph Algorithms. Computer Science Press (1979)

    Google Scholar 

  16. Fraenkel, A.S.: All about the Responsa Retrieval Project you always wanted to know but were afraid to ask. Expanded Summary, Jurimetrics J. 16, 149–156 (1976)

    Google Scholar 

  17. Moffat, A.: Word-based text compression. Software – Practice & Experience 19, 185–198 (1989)

    Article  Google Scholar 

  18. Fraenkel, A.S., Mor, M., Perl, Y.: Is text compression by prefixes and suffixes practical? Acta Informatica 20, 371–389 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  19. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  20. Halldorsson, M.M., Radhakrishnan, J.: Greed is good: approximating independent sets in sparse and bounded degree graphs. In: Proc. 26th ACM-STOC, pp. 439–448 (1994)

    Google Scholar 

  21. Hochbaum, D.S.: Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1997)

    MATH  Google Scholar 

  22. Klein, S.T.: Skeleton trees for the efficient decoding of Huffman encoded texts. The Special issue on Compression and Efficiency in Information Retrieval of the Kluwer Journal of Information Retrieval 3, 7–23 (2000)

    Google Scholar 

  23. Klein, S.T.: Efficient optimal recompression. The Computer Journal 40, 117–126 (1997)

    Article  Google Scholar 

  24. Klein, S.T., Kopel Ben-Nissan, M.: On the Usefulness of Fibonacci Compression Codes. The Computer Journal 53, 701–716 (2010)

    Article  Google Scholar 

  25. Kortsarz, G., Peleg, D.: On choosing dense subgraphs. In: Proc. 34th FOCS, Palo-Alto, CA, pp. 692–701 (1993)

    Google Scholar 

  26. Larson, N.J., Moffat, A.: Offline dicionary based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  27. Longo, G., Galasso, G.: An application of informational divergence to Huffman codes. IEEE Trans. on Inf. Th. IT–28, 36–43 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  28. de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. on Information Systems 18, 113–139 (2000)

    Article  Google Scholar 

  29. Rissanen, J., Langdon, G.G.: Universal modeling and coding. IEEE Trans. on Inf. Th. IT–27, 12–23 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  30. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29, 928–951 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  31. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  32. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York (1994)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Klein, S.T. (2014). A New Approach to Alphabet Extension for Improving Static Compression Schemes. In: Dershowitz, N., Nissan, E. (eds) Language, Culture, Computation. Computing - Theory and Technology. Lecture Notes in Computer Science, vol 8001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45321-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45321-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45320-5

  • Online ISBN: 978-3-642-45321-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics