Skip to main content

A really Simple Approximation of Smallest Grammar

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8486))

Abstract

We present a really simple linear-time algorithm constructing a context-free grammar of size \(\mathcal{O}(g log (N/g))\) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear when the alphabet Σ of the input string can be identified with numbers from {1,…, N }. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses involved. The here presented algorithm computes the LZ77 factorisation (of size l) and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional \(\mathcal{O}(l)\) letters. In one phase in a greedy way (by a left-to-right sweep) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are \(\mathcal{O}(l)\) many different pairs among them. Hence there are \(\mathcal{O}(log N)\) phases, each introduces \(\mathcal{O}(l)\) nonterminals. A more precise analysis yields a bound \(\mathcal{O}(l log(N/l))\). As l ≤ g, this yields \(\mathcal{O}(g log(N/g))\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  2. Jeż, A.: Approximation of grammar-based compression via recompression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 165–176. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Jeż, A., Lohrey, M.: Approximation of smallest linear tree grammar. In: Mayr, E., Portier, N. (eds.) STACS. LIPIcs, vol. 24, pp. 445–457. Schloss Dagstuhl — Leibniz-Zentrum fuer Informatik (2014)

    Google Scholar 

  4. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time lempel-ziv factorization: Simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189–200. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Data Compression Conference, pp. 296–305. IEEE Computer Society (1999)

    Google Scholar 

  6. Lohrey, M.: Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology 4(2), 241–299 (2012)

    MATH  MathSciNet  Google Scholar 

  7. Rubin, F.: Experiments in text file compression. Commun. ACM 19(11), 617–623 (1976)

    Article  Google Scholar 

  8. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2-4), 416–430 (2005)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jeż, A. (2014). A really Simple Approximation of Smallest Grammar. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds) Combinatorial Pattern Matching. CPM 2014. Lecture Notes in Computer Science, vol 8486. Springer, Cham. https://doi.org/10.1007/978-3-319-07566-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07566-2_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07565-5

  • Online ISBN: 978-3-319-07566-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics