Improving Static Compression Schemes by Alphabet Extension

  • Shmuel T. Klein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1848)


The performance of data compression on a large static text may be improved if certain variable-length strings are included in the character set for which a code is generated. A new method for extending the alphabet is presented, based on a reduction to a graph-theoretic problem. A related optimization problem is shown to be NP-complete, a fast heuristic is suggested, and experimental results are presented.


Position Tree Greedy Heuristic Black Vertex Arithmetic Code Fast Heuristic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Appostolico A., The myriad virtues of subword trees, Combinatorial Algorithms on Words, NATO ASI Series Vol F12, Springer Verlag, Berlin (1985) 85–96.Google Scholar
  2. 2.
    Aho A.V., Hopcroft J.E., Ullman J.D., The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA (1974).zbMATHGoogle Scholar
  3. 3.
    Bell T.C., Cleary J.G., Witten I.A., Text Compression, Prentice Hall, Englewood Cliffs, NJ (1990).Google Scholar
  4. 4.
    Bookstein A., Klein S.T., Compression, Information Theory and Grammars: A Unified Approach, ACM Trans. on Information Systems 8 (1990) 27–49.CrossRefGoogle Scholar
  5. 5.
    Bookstein A., Klein S.T., Raita T., An overhead reduction technique for megastate compression schemes, Information Processing & Management33 (1997) 745–760.CrossRefGoogle Scholar
  6. 6.
    Bookstein A., Klein S.T., Ziff D.A., A systematic approach to compressing a full text retrieval system, Information Processing & Management28 (1992) 795–806.CrossRefGoogle Scholar
  7. 7.
    Even S., Graph Algorithms, Computer Science Press (1979).Google Scholar
  8. 8.
    Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetrics J.16 (1976) 149–156.Google Scholar
  9. 9.
    Fraenkel A.S., Mor M., Perl Y., Is text compression by prefixes and suffixes practical? Acta Informatica20 (1983) 371–389.CrossRefMathSciNetGoogle Scholar
  10. 10.
    Garey M.R., Johnson D.S., Computers and Intractability: A Guidetothe Theory of NP-Completeness, W.H. Freeman, San Francisco (1979).Google Scholar
  11. 11.
    Halldorsson M.M., Radhakrishnan J., Greed is good: approximating independent sets in sparse and bounded degree graphs, Proc. 26th ACM-STOC (1994) 439–448.Google Scholar
  12. 12.
    Hochbaum D.S., Approximation Algorithms for NP-Hard Problems, PWS Publishing Company, Boston (1997).Google Scholar
  13. 13.
    Klein S.T., Space and time-efficient decoding with canonical Huffman trees, Proc. 8th Symp. on Combinatorial Pattern Matching, Aarhus, Denmark, Lecture Notes in Computer Science1264, Springer Verlag, Berlin (1997) 65–75.Google Scholar
  14. 14.
    McCreight E.M., A space economical suffix tree construction algorithm, Journal of the ACM23 (1976) 262–272.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Kortsarz G., Peleg D., On choosing dense subgraphs, Proc. 34th FOCS, Palo-Alto, CA (1993) 692–701.Google Scholar
  16. 16.
    Storer J.A., Szymanski, T.G., Data compression via textual substitution, J. ACM29 (1982) 928–951.zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Witten I.H., Moffat A., Bell T.C., Managing Gigabytes: Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York (1994).zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Shmuel T. Klein
    • 1
  1. 1.Department of Mathematics and Computer ScienceBar Ilan UniversityRamat-GanIsrael

Personalised recommendations