A New Approach to Alphabet Extension for Improving Static Compression Schemes

Klein, Shmuel T.

doi:10.1007/978-3-642-45321-2_10

Shmuel T. Klein¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8001))

591 Accesses

Abstract

The performance of data compression on a large static text may be improved if certain variable-length strings are included in the character set for which a code is generated. A new method for extending the alphabet is presented, based on a reduction to a graph-theoretic problem. A related optimization problem is shown to be NP-complete, a fast heuristic is suggested, and experimental results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arnold, R., Bell, T.: A corpus for the evaluation of lossless compression algorithms. In: Proc. Data Compression Conference DCC 1997, Snowbird, Utah, pp. 201–210 (1997)
Google Scholar
Apostolico, A.: The myriad virtues of subword trees, Combinatorial Algorithms on Words. NATO ASI Series, vol. F12, pp. 85–96. Springer, Berlin (1985)
Book MATH Google Scholar
Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proc. Data Compression Conference DCC 1998, Snowbird, Utah, pp. 119–128 (1998)
Google Scholar
Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. of the IEEE 88, 1733–1744 (2000)
Article Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)
MATH Google Scholar
Bell, T.C., Cleary, J.G., Witten, I.A.: Text Compression. Prentice Hall, Englewood Cliffs (1990)
Google Scholar
Bell, T., Witten, I.H., Cleary, J.G.: Modeling for Text Compression. ACM Computing Surveys 21, 557–591 (1989)
Article Google Scholar
Bentley, J., McIlroy, D.: Data compression using long common strings. In: Proc. Data Compression Conference, DCC 1999, Snowbird, Utah, pp. 287–295 (1999)
Google Scholar
Bookstein, A., Klein, S.T.: Compression, Information Theory and Grammars: A Unified Approach. ACM Trans. on Information Systems 8, 27–49 (1990)
Article Google Scholar
Bookstein, A., Klein, S.T., Raita, T.: An overhead reduction technique for mega-state compression schemes. Information Processing & Management 33, 745–760 (1997)
Article Google Scholar
Bookstein, A., Klein, S.T., Ziff, D.A.: A systematic approach to compressing a full text retrieval system. Information Processing & Management 28, 795–806 (1992)
Article Google Scholar
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F. (S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Chapter Google Scholar
Cannane, A., Williams, H.E.: General-purpose compression for efficient retrieval. Journal of the ASIS 52(5), 430–437 (2001)
Google Scholar
Choueka, Y.: Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of jewish heritage in Hebrew. IEEE Data Eng. Bull. 14(4), 22–31 (1989)
Google Scholar
Even, S.: Graph Algorithms. Computer Science Press (1979)
Google Scholar
Fraenkel, A.S.: All about the Responsa Retrieval Project you always wanted to know but were afraid to ask. Expanded Summary, Jurimetrics J. 16, 149–156 (1976)
Google Scholar
Moffat, A.: Word-based text compression. Software – Practice & Experience 19, 185–198 (1989)
Article Google Scholar
Fraenkel, A.S., Mor, M., Perl, Y.: Is text compression by prefixes and suffixes practical? Acta Informatica 20, 371–389 (1983)
Article MathSciNet MATH Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, San Francisco (1979)
MATH Google Scholar
Halldorsson, M.M., Radhakrishnan, J.: Greed is good: approximating independent sets in sparse and bounded degree graphs. In: Proc. 26th ACM-STOC, pp. 439–448 (1994)
Google Scholar
Hochbaum, D.S.: Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1997)
MATH Google Scholar
Klein, S.T.: Skeleton trees for the efficient decoding of Huffman encoded texts. The Special issue on Compression and Efficiency in Information Retrieval of the Kluwer Journal of Information Retrieval 3, 7–23 (2000)
Google Scholar
Klein, S.T.: Efficient optimal recompression. The Computer Journal 40, 117–126 (1997)
Article Google Scholar
Klein, S.T., Kopel Ben-Nissan, M.: On the Usefulness of Fibonacci Compression Codes. The Computer Journal 53, 701–716 (2010)
Article Google Scholar
Kortsarz, G., Peleg, D.: On choosing dense subgraphs. In: Proc. 34th FOCS, Palo-Alto, CA, pp. 692–701 (1993)
Google Scholar
Larson, N.J., Moffat, A.: Offline dicionary based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Article Google Scholar
Longo, G., Galasso, G.: An application of informational divergence to Huffman codes. IEEE Trans. on Inf. Th. IT–28, 36–43 (1982)
Article MathSciNet MATH Google Scholar
de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. on Information Systems 18, 113–139 (2000)
Article Google Scholar
Rissanen, J., Langdon, G.G.: Universal modeling and coding. IEEE Trans. on Inf. Th. IT–27, 12–23 (1981)
Article MathSciNet MATH Google Scholar
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29, 928–951 (1982)
Article MathSciNet MATH Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MathSciNet MATH Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York (1994)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bar Ilan University, Ramat-Gan, 52900, Israel
Shmuel T. Klein

Authors

Shmuel T. Klein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Nachum Dershowitz
Department of Computing, University of London, Goldsmiths College, 25–27 St. James, New Cross, SE14 6NW, London, UK
Ephraim Nissan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Klein, S.T. (2014). A New Approach to Alphabet Extension for Improving Static Compression Schemes. In: Dershowitz, N., Nissan, E. (eds) Language, Culture, Computation. Computing - Theory and Technology. Lecture Notes in Computer Science, vol 8001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45321-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-45321-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45320-5
Online ISBN: 978-3-642-45321-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics