Abstract
We show that there are asymptotically \(\gamma n\) LMS-factors in a random word of length \(n\), for some explicit \(\gamma \) that depends on the model of randomness under consideration. Our results hold for uniform distributions, memoryless sources and Markovian sources. From this analysis, we give new insight on the typical behavior of the IS-algorithm [9], which is one of the most efficient algorithms available for computing the suffix array.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In their proof, they compute the mean length of an LMS-factor. The types of such a factor form a word of \(SS^{*}LL^{*}S\). For the considered model, the mean length of an element of \(S^{*}\) (and of \(L^{*}\)) is one. Hence, the average length of an LMS-factor is 5 (and not the announced 4).
- 2.
The formulas below hold when the extended letters are in \(\underline{\mathrm {LS}}(A)\) only. For instance, \(\alpha L=a_{1}L\) is not part of the definition, since it is not in \(\underline{\mathrm {LS}}(A)\).
- 3.
This may be a consequence of the well-known fact that in a vertebrate genome, a C is very rarely followed by a G. This property is well captured by a Markov chain of order 1, but invisible to a memoryless model.
References
Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.): CPM 2003. LNCS, vol. 2676. Springer, Heidelberg (2003)
Dunham, I., Hunt, A., Collins, J., Bruskiewich, R., Beare, D., Clamp, M., Smink, L., Ainscough, R., Almeida, J., Babbage, A., et al.: The DNA sequence of human chromosome 22. Nature 402(6761), 489–495 (1999)
Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.): ICALP 2003. LNCS, vol. 2719. Springer, Heidelberg (2003)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 186–199
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 200–210
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov chains and mixing times. American Mathematical Soc., Providence (2009)
Manber, U., Myers, E.W.: Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22(5), 935–948 (1993)
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. In: Johnson, D.S. (eds.) Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, pp. 319–327. SIAM, 22–24 January 1990
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)
Norris, J.R.: Markov chains. Statistical and probabilistic mathematics. Cambridge University Press, Cambridge (1998)
Powell, M.: The Canterbury Corpus (2001). http://www.corpus.canterbury.ac.nz/. Accessed 25 April 2002
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Vallée, B.: Dynamical sources in information theory: fundamental intervals and word prefixes. Algorithmica 29(1–2), 262–306 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nicaud, C. (2015). A Probabilistic Analysis of the Reduction Ratio in the Suffix-Array IS-Algorithm. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-19929-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19928-3
Online ISBN: 978-3-319-19929-0
eBook Packages: Computer ScienceComputer Science (R0)