A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression

  • Hiroshi Sakamoto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)


A linear-time approximation algorithm for the grammar-based compression, which is an optimization problem to minimize the size of a context-free grammar deriving a given string, is presented. For each string of length n over unbounded alphabet, the algorithm guarantees O(log2 n) approximation ratio without suffix tree and runs in O(n) time in the sense of randomized model.


Approximation Algorithm Approximation Ratio Production Rule Priority Queue Input String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, 1999.Google Scholar
  2. 2.
    S. De Agostino and J. A. Storer. On-Line versus Off-Line Computation in Dynamic Text Compression. Inform. Process. Lett., 59:169–174, 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Rasala, A. Sahai, and A. Shelat. Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models. In Proc. 29th Ann. Sympo. on Theory of Computing, 792–801, 2002.Google Scholar
  4. 4.
    D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  5. 5.
    T. Kida, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. Collage System: a Unifying Framework for Compressed Pattern Matching. Theoret. Comput. Sci. (to appear).Google Scholar
  6. 6.
    J. C. Kieffer and E.-H. Yang. Grammar-Based Codes: a New Class of Universal Lossless Source Codes. IEEE Trans. on Inform. Theory, 46(3):737–754, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    J. C. Kieffer, E.-H. Yang, G. Nelson, and P. Cosman. Universal Lossless Compression via Multilevel Pattern Matching. IEEE Trans. Inform. Theory, IT-46(4), 1227–1245, 2000.CrossRefMathSciNetGoogle Scholar
  8. 8.
    D. Knuth. Seminumerical Algorithms. Addison-Wesley, 441–462, 1981.Google Scholar
  9. 9.
    N. J. Larsson and A. Moffat. Offline Dictionary-Based Compression. Proceedings of the IEEE, 88(11):1722–1732, 2000.CrossRefGoogle Scholar
  10. 10.
    E. Lehman. Approximation Algorithms for Grammar-Based Compression. PhD thesis, MIT, 2002.Google Scholar
  11. 11.
    E. Lehman and A. Shelat. Approximation Algorithms for Grammar-Based Compression. In Proc. 20th Ann. ACM-SIAM Sympo. on Discrete Algorithms, 205–212, 2002.Google Scholar
  12. 12.
    M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications. Addison-Wesley, 1983.Google Scholar
  13. 13.
    M. Farach. Optimal Suffix Tree Construction with Large Alphabets. In Proc. 38th Ann. Sympo. on Foundations of Computer Science, 137–143, 1997.Google Scholar
  14. 14.
    C. Nevill-Manning and I. Witten. Compression and Explanation Using Hierarchical Grammars. Computer Journal, 40(2/3):103–116, 1997.CrossRefGoogle Scholar
  15. 15.
    W. Rytter. Application of Lempel-Ziv Factorization to the Approximation of Grammar-Based Compression. In Proc. 13th Ann. Sympo. Combinatorial Pattern Matching, 20–31, 2002.Google Scholar
  16. 16.
    J. A. Storer and T. G. Szymanski. The Macro Model for Data Compression. In Proc. 10th Ann. Sympo. on Theory of Computing, pages 30–39, San Diego, California, 1978. ACM Press.Google Scholar
  17. 17.
    T. A. Welch. A Technique for High Performance Data Compression. IEEE Comput., 17:8–19, 1984.Google Scholar
  18. 18.
    E.-H. Yang and J. C. Kieffer. Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform-Part One: without Context Models. IEEE Trans. on Inform. Theory, 46(3):755–777, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Trans. on Inform. Theory, IT-23(3):337–349, 1977.CrossRefMathSciNetGoogle Scholar
  20. 20.
    J. Ziv and A. Lempel. Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. on Inform. Theory, 24(5):530–536, 1978.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Hiroshi Sakamoto
    • 1
  1. 1.Department of InformaticsKyushu University FukuokaJapan

Personalised recommendations