Skip to main content

A Probabilistic Analysis of the Reduction Ratio in the Suffix-Array IS-Algorithm

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9133))

Included in the following conference series:

  • 788 Accesses

Abstract

We show that there are asymptotically \(\gamma n\) LMS-factors in a random word of length \(n\), for some explicit \(\gamma \) that depends on the model of randomness under consideration. Our results hold for uniform distributions, memoryless sources and Markovian sources. From this analysis, we give new insight on the typical behavior of the IS-algorithm [9], which is one of the most efficient algorithms available for computing the suffix array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In their proof, they compute the mean length of an LMS-factor. The types of such a factor form a word of \(SS^{*}LL^{*}S\). For the considered model, the mean length of an element of \(S^{*}\) (and of \(L^{*}\)) is one. Hence, the average length of an LMS-factor is 5 (and not the announced 4).

  2. 2.

    The formulas below hold when the extended letters are in \(\underline{\mathrm {LS}}(A)\) only. For instance, \(\alpha L=a_{1}L\) is not part of the definition, since it is not in \(\underline{\mathrm {LS}}(A)\).

  3. 3.

    This may be a consequence of the well-known fact that in a vertebrate genome, a C is very rarely followed by a G. This property is well captured by a Markov chain of order 1, but invisible to a memoryless model.

References

  1. Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.): CPM 2003. LNCS, vol. 2676. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  2. Dunham, I., Hunt, A., Collins, J., Bruskiewich, R., Beare, D., Clamp, M., Smink, L., Ainscough, R., Almeida, J., Babbage, A., et al.: The DNA sequence of human chromosome 22. Nature 402(6761), 489–495 (1999)

    Article  Google Scholar 

  3. Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.): ICALP 2003. LNCS, vol. 2719. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  4. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 186–199

    Google Scholar 

  5. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 200–210

    Google Scholar 

  6. Levin, D.A., Peres, Y., Wilmer, E.L.: Markov chains and mixing times. American Mathematical Soc., Providence (2009)

    MATH  Google Scholar 

  7. Manber, U., Myers, E.W.: Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  8. Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. In: Johnson, D.S. (eds.) Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, pp. 319–327. SIAM, 22–24 January 1990

    Google Scholar 

  9. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)

    Article  MathSciNet  Google Scholar 

  10. Norris, J.R.: Markov chains. Statistical and probabilistic mathematics. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Powell, M.: The Canterbury Corpus (2001). http://www.corpus.canterbury.ac.nz/. Accessed 25 April 2002

  12. Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)

    Article  Google Scholar 

  13. Vallée, B.: Dynamical sources in information theory: fundamental intervals and word prefixes. Algorithmica 29(1–2), 262–306 (2001)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyril Nicaud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nicaud, C. (2015). A Probabilistic Analysis of the Reduction Ratio in the Suffix-Array IS-Algorithm. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19929-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19928-3

  • Online ISBN: 978-3-319-19929-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics