An Opportunistic Text Indexing Structure Based on Run Length Encoding

Tamakoshi, Yuya; Goto, Keisuke; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki

doi:10.1007/978-3-319-18173-8_29

An Opportunistic Text Indexing Structure Based on Run Length Encoding

Yuya Tamakoshi¹⁵,
Keisuke Goto¹⁵,
Shunsuke Inenaga¹⁵,
Hideo Bannai¹⁵ &
…
Masayuki Takeda¹⁵

Conference paper
First Online: 01 January 2015

833 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9079))

Abstract

We present a new text indexing structure based on the run length encoding (RLE) of a text string \(T\) which, given the RLE of a query pattern \(P\), reports all the \(occ\) occurrences of \(P\) in \(T\) in \(O(m + occ + \log n)\) time, where \(n\) and \(m\) are the sizes of the RLEs of \(T\) and \(P\), respectively. The data structure requires \(n (2\log N + \log n + \log \sigma ) + O(n)\) bits of space, where \(N\) is the length of the uncompressed text string \(T\) and \(\sigma \) is the alphabet size. Moreover, using \(n (3\log N + \log n + \log \sigma ) + 2 \sigma \log \frac{N}{\sigma } + O(n \log \log n)\) bits of total space, our data structure can be enhanced to answer the beginning position of the lexicographically \(i\)th smallest suffix of \(T\) for a given rank \(i\) in \(O(\log ^2 n)\) time. All these data structures can be constructed in \(O(n \log n)\) time using \(O(n \log N)\) bits of extra space.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A., Erdös, P.L., Jüttner, A.: Parameterized searching with mismatches for run-length encoded strings. Theor. Comput., Sci. (2012)
Google Scholar
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. SRC-RR-124, Systems Research Center (1994)
Google Scholar
Chen, K.Y., Chao, K.M.: A fully compressed algorithm for computing the edit distance of run-length encoded strings. Algorithmica (2011)
Google Scholar
Eltabakh, M.Y., Hon, W.K., Shah, R., Aref, W.G., Vitter, J.S.: The SBC-tree: an index for run-length compressed sequences. In: Proc. EDBT, pp. 523–534 (2008)
Google Scholar
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Article MATH MathSciNet Google Scholar
Freschi, V., Bogliolo, A.: Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism. IPL 90(4), 167–173 (2004)
Article MATH MathSciNet Google Scholar
Golynski, A.: Optimal lower bounds for rank and select indexes. Theor. Comput. Sci. 387(3), 348–359 (2007)
Article MATH MathSciNet Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Article MATH MathSciNet Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proc. CPM 2001, pp. 181–192 (2001)
Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)
Article MATH MathSciNet Google Scholar
Lee, S., Park, K.: Dynamic rank/select structures with applications to run-length encoded texts. Theor. Comput. Sci. 410(43), 4402–4413 (2009)
Article MATH Google Scholar
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)
Google Scholar
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Computational Biology 17(3), 281–308 (2010)
Article Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
Navarro, G.: Wavelet trees for all. In: Proc. CPM, pp. 2–26 (2012)
Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Computers 60(10), 1471–1484 (2011)
Article MathSciNet Google Scholar
Yamamoto, J., I, T., Bannai, H., Inenaga, S., Takeda, M.: Faster compact on-line Lempel-Ziv factorization. In: Proc. STACS 2014. pp. 675–686 (2014)
Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, Fukuoka, Japan
Yuya Tamakoshi, Keisuke Goto, Shunsuke Inenaga, Hideo Bannai & Masayuki Takeda

Authors

Yuya Tamakoshi
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Goto
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Inenaga
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Bannai
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shunsuke Inenaga .

Editor information

Editors and Affiliations

LAMSADE, Université Paris-Dauphine, Paris Cedex 16, France
Vangelis Th. Paschos
Inst. of Theoretical Computer Science, ETH Zürich, Zürich, Switzerland
Peter Widmayer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tamakoshi, Y., Goto, K., Inenaga, S., Bannai, H., Takeda, M. (2015). An Opportunistic Text Indexing Structure Based on Run Length Encoding. In: Paschos, V., Widmayer, P. (eds) Algorithms and Complexity. CIAC 2015. Lecture Notes in Computer Science(), vol 9079. Springer, Cham. https://doi.org/10.1007/978-3-319-18173-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-18173-8_29
Published: 16 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18172-1
Online ISBN: 978-3-319-18173-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics