Skip to main content

An Opportunistic Text Indexing Structure Based on Run Length Encoding

  • Conference paper
  • First Online:
  • 833 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9079))

Abstract

We present a new text indexing structure based on the run length encoding (RLE) of a text string \(T\) which, given the RLE of a query pattern \(P\), reports all the \(occ\) occurrences of \(P\) in \(T\) in \(O(m + occ + \log n)\) time, where \(n\) and \(m\) are the sizes of the RLEs of \(T\) and \(P\), respectively. The data structure requires \(n (2\log N + \log n + \log \sigma ) + O(n)\) bits of space, where \(N\) is the length of the uncompressed text string \(T\) and \(\sigma \) is the alphabet size. Moreover, using \(n (3\log N + \log n + \log \sigma ) + 2 \sigma \log \frac{N}{\sigma } + O(n \log \log n)\) bits of total space, our data structure can be enhanced to answer the beginning position of the lexicographically \(i\)th smallest suffix of \(T\) for a given rank \(i\) in \(O(\log ^2 n)\) time. All these data structures can be constructed in \(O(n \log n)\) time using \(O(n \log N)\) bits of extra space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Erdös, P.L., Jüttner, A.: Parameterized searching with mismatches for run-length encoded strings. Theor. Comput., Sci. (2012)

    Google Scholar 

  2. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. SRC-RR-124, Systems Research Center (1994)

    Google Scholar 

  3. Chen, K.Y., Chao, K.M.: A fully compressed algorithm for computing the edit distance of run-length encoded strings. Algorithmica (2011)

    Google Scholar 

  4. Eltabakh, M.Y., Hon, W.K., Shah, R., Aref, W.G., Vitter, J.S.: The SBC-tree: an index for run-length compressed sequences. In: Proc. EDBT, pp. 523–534 (2008)

    Google Scholar 

  5. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  6. Freschi, V., Bogliolo, A.: Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism. IPL 90(4), 167–173 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  7. Golynski, A.: Optimal lower bounds for rank and select indexes. Theor. Comput. Sci. 387(3), 348–359 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  8. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proc. CPM 2001, pp. 181–192 (2001)

    Google Scholar 

  10. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  11. Lee, S., Park, K.: Dynamic rank/select structures with applications to run-length encoded texts. Theor. Comput. Sci. 410(43), 4402–4413 (2009)

    Article  MATH  Google Scholar 

  12. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)

    Google Scholar 

  13. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Computational Biology 17(3), 281–308 (2010)

    Article  Google Scholar 

  14. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  15. Navarro, G.: Wavelet trees for all. In: Proc. CPM, pp. 2–26 (2012)

    Google Scholar 

  16. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Computers 60(10), 1471–1484 (2011)

    Article  MathSciNet  Google Scholar 

  17. Yamamoto, J., I, T., Bannai, H., Inenaga, S., Takeda, M.: Faster compact on-line Lempel-Ziv factorization. In: Proc. STACS 2014. pp. 675–686 (2014)

    Google Scholar 

  18. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunsuke Inenaga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Tamakoshi, Y., Goto, K., Inenaga, S., Bannai, H., Takeda, M. (2015). An Opportunistic Text Indexing Structure Based on Run Length Encoding. In: Paschos, V., Widmayer, P. (eds) Algorithms and Complexity. CIAC 2015. Lecture Notes in Computer Science(), vol 9079. Springer, Cham. https://doi.org/10.1007/978-3-319-18173-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18173-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18172-1

  • Online ISBN: 978-3-319-18173-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics