Skip to main content

Succinct Suffix Arrays Based on Run-Length Encoding

  • Conference paper
Combinatorial Pattern Matching (CPM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3537))

Included in the following conference series:

Abstract

A succinct full-text self-index is a data structure built on a text T=t 1 t 2... t n , which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P=p 1 p 2... p m in T, and is able to reproduce any text substring, so the self-index replaces the text. Several remarkable self-indexes have been developed in recent years. They usually take O(nH 0) or O(nH k ) bits, being H k the kth order empirical entropy of T. The time to count how many times does P occur in T ranges from O(m) to O(mlog n).

We present a new self-index, called run-length FM-index (RLFM index), that counts the occurrences of P in T in O(m) time when the alphabet size is \(\sigma=O(\textrm{polylog}(n))\). The index requires nH k log2 σ + O(n) bits of space for small k. We then show how to implement the RLFM index in practice, and obtain in passing another implementation with different space-time tradeoffs. We empirically compare ours against the best existing implementations of other indexes and show that ours are fastest among indexes taking less space than the text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  2. Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)

    Google Scholar 

  3. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. FOCS 2000, pp. 390–398 (2000)

    Google Scholar 

  4. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proc. SODA 2001, pp. 269–278 (2001)

    Google Scholar 

  5. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical Report TR/DCC-2004-5, Dept. of CS, Univ. Chile (August 2004)

    Google Scholar 

  7. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. To appear in Proc. WEA, poster (2005)

    Google Scholar 

  8. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA 2003, pp. 841–850 (2003)

    Google Scholar 

  9. Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. SODA 2004, pp. 636–645 (2004)

    Google Scholar 

  10. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proc. STOC 2000, pp. 397–406 (2000)

    Google Scholar 

  11. Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)

    Article  Google Scholar 

  12. Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS 1989, pp. 549–554 (1989)

    Google Scholar 

  13. Mäkinen, V.: Compact suffix array — a space-efficient full-text index. Fundamenta Informaticae 56(1–2), 191–210 (2003)

    MATH  MathSciNet  Google Scholar 

  14. Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 420–433. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Mäkinen, V., Navarro, G.: Run-length FM-index. In: Proc. DIMACS Workshop: The Burrows-Wheeler Transform: Ten Years Later, August 2004, pp. 17–19 (2004); Also in New Search Algorithms and Time/Space Tradeoffs for Succinct Suffix Arrays, Tech. Report. C-2004-20, Univ. Helsinki (April 2004)

    Google Scholar 

  16. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  18. Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Google Scholar 

  19. Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms 2(1), 87–114 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  20. Pagh, R.: Low redundancy in dictionaries with O(1) worst case lookup time. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 595–604. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA 2002, pp. 233–242 (2002)

    Google Scholar 

  22. Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  23. Sadakane, K.: Succinct representations of lcp information and improvements in the compressed suffix arrays. In: Proc. SODA 2002, pp. 225–232 (2002)

    Google Scholar 

  24. Weiner, P.: Linear pattern matching algorithm. In: Proc. IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mäkinen, V., Navarro, G. (2005). Succinct Suffix Arrays Based on Run-Length Encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds) Combinatorial Pattern Matching. CPM 2005. Lecture Notes in Computer Science, vol 3537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496656_5

Download citation

  • DOI: https://doi.org/10.1007/11496656_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26201-5

  • Online ISBN: 978-3-540-31562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics