Advertisement

Grammar-Based Compression in a Streaming Model

  • Travis Gagie
  • Paweł Gawrychowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6031)

Abstract

We show that, given a string s of length n, with constant memory and logarithmic passes over a constant number of streams we can build a context-free grammar that generates s and only s and whose size is within an \({\mathcal O}\left({\min \left( g \log g, \sqrt{n / \log n} \right)}\right)\)-factor of the minimum g. This stands in contrast to our previous result that, with polylogarithmic memory and polylogarithmic passes over a single stream, we cannot build such a grammar whose size is within any polynomial of g.

Keywords

Turing Machine External Memory Constant Memory Binary Production Frequency Moment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Albert, P., Mayordomo, E., Moser, P., Perifel, S.: Pushdown compression. In: Proceedings of the Symposium on Theoretical Aspects of Computer Science, pp. 39–48 (2008)Google Scholar
  2. 2.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Amir, A., Aumann, Y., Levy, A., Roshko, Y.: Quasi-distinct parsing and optimal compression methods. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 12–25. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Symposium on Database Systems, pp. 1–16 (2002)Google Scholar
  5. 5.
    Beame, P., Huynh, T.: On the value of multiple read/write streams for approximating frequency moments. In: Proceedings of the Symposium on Foundations of Computer Science, pp. 499–508 (2008)Google Scholar
  6. 6.
    Beame, P., Jayram, T.S., Rudra, A.: Lower bounds for randomized read/write stream algorithms. In: Proceedings of the Symposium on Theory of Computing, pp. 689–698 (2007)Google Scholar
  7. 7.
    Bille, P., Landau, G., Weimann, O.: Random access to grammar compressed strings (2010), http://arxiv.org/abs/1001.1565
  8. 8.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., shelat, a.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Chen, J., Yap, C.-K.: Reversal complexity. SIAM Journal on Computing 20(4), 622–638 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Claude, F., Navarro, G.: Self-indexed text compression using straight-line programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    De Agostino, S., Storer, J.A.: On-line versus off-line computation in dynamic text compression. Information Processing Letters 59(3), 169–174 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. In: Proceedings of the Latin American Theoretical Informatics Symposium (to appear, 2010)Google Scholar
  13. 13.
    Gagie, T.: On the value of multiple read/write streams for data compression. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 68–77. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Gagie, T., Manzini, G.: Space-conscious compression. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 206–217. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Grohe, M., Hernich, A., Schweikardt, N.: Lower bounds for processing data with few random accesses to external memory. Journal of the ACM 56(3), 1–58 (2009)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Grohe, M., Schweikardt, N.: Lower bounds for sorting with few random accesses to external memory. In: Proceedings of the Symposium on Database Systems, pp. 238–249 (2005)Google Scholar
  17. 17.
    Hernich, A., Schweikardt, N.: Reversal complexity revisited. Theoretical Computer Science 401(1-3), 191–205 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Kieffer, J.C., Yang, E.-H.: Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory 46(3), 737–754 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory 46(4), 1227–1245 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proceedings of the Data Compression Conference (to appear, 2010)Google Scholar
  22. 22.
    Larsson, N.J., Moffat, A.: Offline dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  23. 23.
    Lifshits, Y.: Processing compressed texts: A tractability border. In: Proceedings of the Symposium on Combinatorial Pattern Matching, pp. 228–240 (2007)Google Scholar
  24. 24.
    Lifshits, Y., Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3), 379–399 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Magniez, F., Mathieu, C., Nayak, A.: Recognizing well-parenthesized expressions in the streaming model. Technical Report TR09-119, Electronic Colloquium on Computational Complexity (2009)Google Scholar
  26. 26.
    Mayordomo, E., Moser, P.: Polylog space compression is incomparable with Lempel-Ziv and pushdown compression. In: Proceedings of the Conference on Current Trends in Theory and Practice of Informatics, pp. 633–644 (2009)Google Scholar
  27. 27.
    Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Muthukrishnan, S.: Data Streams: Algorithms and Applications. In: Foundations and Trends in Theoretical Computer Science, vol. 1(2). Now Publishers (2005)Google Scholar
  29. 29.
    Navarro, G., Raffinot, M.: Practical and flexible pattern matching over Ziv-Lempel compressed text. Journal of Discrete Algorithms 2(3), 347–371 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Navarro, G., Russo, L.M.S.: Re-pair achieves high-order entropy. In: Proceedings of the Data Compression Conference, p. 537 (2008)Google Scholar
  31. 31.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms 3(2-4), 416–430 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Sakamoto, H., Kida, T., Shimozono, S.: A space-saving linear-time algorithm for grammar-based compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)Google Scholar
  34. 34.
    Sakamoto, H., Maruyama, S., Kida, T., Shimozono, S.: A space-saving approximation algorithm for grammar-based compression. IEICE Transactions 92-D(2), 158–165 (2009)CrossRefGoogle Scholar
  35. 35.
    Schweikardt, N.: Machine models and lower bounds for query processing. In: Proceedings of the Symposium on Principles of Database Systems, pp. 41–52 (2007)Google Scholar
  36. 36.
    Sheinwald, D., Lempel, A., Ziv, J.: On encoding and decoding with two-way head machines. Information and Computation 116(1), 128–133 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. Journal of the ACM 29(4), 928–951 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  39. 39.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Travis Gagie
    • 1
  • Paweł Gawrychowski
    • 2
  1. 1.Department of Computer ScienceUniversity of Chile 
  2. 2.Institute of Computer ScienceUniversity of WrocławPoland

Personalised recommendations