Abstract
Straight-line programs (SLPs) offer powerful text compression by representing a text T[1,u] in terms of a restricted context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present a grammar representation whose size is of the same order of that of a plain SLP representation, and can answer other queries apart from expanding nonterminals. This can be of independent interest. We then extend it to achieve the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(n). We also give byproducts on representing binary relations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proc. 2nd DCC, pp. 279–288 (1992)
Barbay, J., Golynski, A., Munro, I., Rao, S.S.: Adaptive searching in succinctly encoded binary relations and tree-structured documents. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 24–35. Springer, Heidelberg (2006)
Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comp. Sci. 321(1), 5–12 (2004)
Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE TIT 51(7), 2554–2576 (2005)
Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)
Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. 3(2), 20 (2007)
Gasieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)
Gasieniec, L., Potapov, I.: Time/space efficient compressed pattern matching. Fund. Inf. 56(1-2), 137–154 (2003)
Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)
Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proc. 3rd WSP, pp. 141–155. Carleton University Press (1996)
Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)
Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comp. Sci. 298(1), 253–272 (2003)
Kieffer, J., Yang, E.-H.: Grammar-based codes: A new class of universal lossless source codes. IEEE TIT 46(3), 737–754 (2000)
Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)
Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theor. Comp. Sci. 387(3), 332–347 (2007)
Morrison, D.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)
Munro, J., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)
Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discr. Alg. 2(1), 87–114 (2004)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), 2 (2007)
Nevill-Manning, C., Witten, I., Maulsby, D.: Compression by induction of hierarchical grammars. In: Proc. 4th DCC, pp. 244–253 (1994)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)
Russo, L., Oliveira, A.: A compressed self-index using a Ziv-Lempel dictionary. Inf. Retr. 11(4), 359–388 (2008)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1-3), 211–222 (2003)
Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 164–175. Springer, Heidelberg (2008)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE TIT 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable length coding. IEEE TIT 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Claude, F., Navarro, G. (2009). Self-indexed Text Compression Using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds) Mathematical Foundations of Computer Science 2009. MFCS 2009. Lecture Notes in Computer Science, vol 5734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03816-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-03816-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03815-0
Online ISBN: 978-3-642-03816-7
eBook Packages: Computer ScienceComputer Science (R0)