A Probabilistic Analysis of the Reduction Ratio in the Suffix-Array IS-Algorithm

Nicaud, Cyril

doi:10.1007/978-3-319-19929-0_32

Cyril Nicaud¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9133))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

788 Accesses

Abstract

We show that there are asymptotically \(\gamma n\) LMS-factors in a random word of length \(n\), for some explicit \(\gamma \) that depends on the model of randomness under consideration. Our results hold for uniform distributions, memoryless sources and Markovian sources. From this analysis, we give new insight on the typical behavior of the IS-algorithm [9], which is one of the most efficient algorithms available for computing the suffix array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In their proof, they compute the mean length of an LMS-factor. The types of such a factor form a word of \(SS^{*}LL^{*}S\). For the considered model, the mean length of an element of \(S^{*}\) (and of \(L^{*}\)) is one. Hence, the average length of an LMS-factor is 5 (and not the announced 4).
2.
The formulas below hold when the extended letters are in \(\underline{\mathrm {LS}}(A)\) only. For instance, \(\alpha L=a_{1}L\) is not part of the definition, since it is not in \(\underline{\mathrm {LS}}(A)\).
3.
This may be a consequence of the well-known fact that in a vertebrate genome, a C is very rarely followed by a G. This property is well captured by a Markov chain of order 1, but invisible to a memoryless model.

References

Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.): CPM 2003. LNCS, vol. 2676. Springer, Heidelberg (2003)
MATH Google Scholar
Dunham, I., Hunt, A., Collins, J., Bruskiewich, R., Beare, D., Clamp, M., Smink, L., Ainscough, R., Almeida, J., Babbage, A., et al.: The DNA sequence of human chromosome 22. Nature 402(6761), 489–495 (1999)
Article Google Scholar
Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.): ICALP 2003. LNCS, vol. 2719. Springer, Heidelberg (2003)
MATH Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 186–199
Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates et al. [1], pp. 200–210
Google Scholar
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov chains and mixing times. American Mathematical Soc., Providence (2009)
MATH Google Scholar
Manber, U., Myers, E.W.: Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. In: Johnson, D.S. (eds.) Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, pp. 319–327. SIAM, 22–24 January 1990
Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)
Article MathSciNet Google Scholar
Norris, J.R.: Markov chains. Statistical and probabilistic mathematics. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Powell, M.: The Canterbury Corpus (2001). http://www.corpus.canterbury.ac.nz/. Accessed 25 April 2002
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Article Google Scholar
Vallée, B.: Dynamical sources in information theory: fundamental intervals and word prefixes. Algorithmica 29(1–2), 262–306 (2001)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

LIGM, Université Paris-Est and CNRS, Marne-la-Vallée Cedex 2, 77454, Paris, France
Cyril Nicaud

Authors

Cyril Nicaud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cyril Nicaud .

Editor information

Editors and Affiliations

Department of Computer Science, University of Verona, Verona, Italy
Ferdinando Cicalese
Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Ely Porat
Department of Computer Science, University of Salerno, Fisciano, Italy
Ugo Vaccaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicaud, C. (2015). A Probabilistic Analysis of the Reduction Ratio in the Suffix-Array IS-Algorithm. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-19929-0_32
Published: 16 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19928-3
Online ISBN: 978-3-319-19929-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics