Advertisement

Succinct Suffix Arrays Based on Run-Length Encoding

  • Veli Mäkinen
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

A succinct full-text self-index is a data structure built on a text T=t 1 t 2... t n , which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P=p 1 p 2... p m in T, and is able to reproduce any text substring, so the self-index replaces the text. Several remarkable self-indexes have been developed in recent years. They usually take O(nH 0) or O(nH k ) bits, being H k the kth order empirical entropy of T. The time to count how many times does P occur in T ranges from O(m) to O(mlog n).

We present a new self-index, called run-length FM-index (RLFM index), that counts the occurrences of P in T in O(m) time when the alphabet size is \(\sigma=O(\textrm{polylog}(n))\). The index requires nH k log2 σ + O(n) bits of space for small k. We then show how to implement the RLFM index in practice, and obtain in passing another implementation with different space-time tradeoffs. We empirically compare ours against the best existing implementations of other indexes and show that ours are fastest among indexes taking less space than the text.

Keywords

Binary Sequence Alphabet Size Suffix Array Text Size Wavelet Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  2. 2.
    Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo (1996)Google Scholar
  3. 3.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. FOCS 2000, pp. 390–398 (2000)Google Scholar
  4. 4.
    Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proc. SODA 2001, pp. 269–278 (2001)Google Scholar
  5. 5.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical Report TR/DCC-2004-5, Dept. of CS, Univ. Chile (August 2004)Google Scholar
  7. 7.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. To appear in Proc. WEA, poster (2005)Google Scholar
  8. 8.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA 2003, pp. 841–850 (2003)Google Scholar
  9. 9.
    Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. SODA 2004, pp. 636–645 (2004)Google Scholar
  10. 10.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proc. STOC 2000, pp. 397–406 (2000)Google Scholar
  11. 11.
    Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)CrossRefGoogle Scholar
  12. 12.
    Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS 1989, pp. 549–554 (1989)Google Scholar
  13. 13.
    Mäkinen, V.: Compact suffix array — a space-efficient full-text index. Fundamenta Informaticae 56(1–2), 191–210 (2003)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 420–433. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Mäkinen, V., Navarro, G.: Run-length FM-index. In: Proc. DIMACS Workshop: The Burrows-Wheeler Transform: Ten Years Later, August 2004, pp. 17–19 (2004); Also in New Search Algorithms and Time/Space Tradeoffs for Succinct Suffix Arrays, Tech. Report. C-2004-20, Univ. Helsinki (April 2004)Google Scholar
  16. 16.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  19. 19.
    Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms 2(1), 87–114 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Pagh, R.: Low redundancy in dictionaries with O(1) worst case lookup time. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 595–604. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  21. 21.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA 2002, pp. 233–242 (2002)Google Scholar
  22. 22.
    Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  23. 23.
    Sadakane, K.: Succinct representations of lcp information and improvements in the compressed suffix arrays. In: Proc. SODA 2002, pp. 225–232 (2002)Google Scholar
  24. 24.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Veli Mäkinen
    • 1
  • Gonzalo Navarro
    • 2
  1. 1.AG GenominformatikTechnische Fakultät Universität BielefeldGermany
  2. 2.Center for Web Research Dept. of Computer ScienceUniversity of Chile 

Personalised recommendations