Skip to main content

From Nondeterministic Suffix Automaton to Lazy Suffix Tree

  • Chapter
Algorithms and Applications

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6060))

  • 1059 Accesses

Abstract

Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ of size σ, we consider the exact string matching problem, i.e. we want to report all occurrences of P in T. The well-known Backward-Nondeterministic-DAWG-Matching (BNDM) algorithm is one of the most efficient algorithm for short to moderate length patterns. In this paper – as a prelude – we take the underlying nondeterministic suffix automaton and apply it to the text instead of to the pattern. The resulting algorithm is surprisingly simple, and efficient for relatively short patterns and small alphabet sizes in practice. We then show how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner. Both of the algorithms are efficient if the text is static but the patterns are given on-line (without possibility to batch the queries). We discuss various variants of the algorithms, and conclude with some experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  2. Allauzen, C., Raffinot, M.: Simple optimal string matching. J. of Algorithms 36, 102–116 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  3. Apostolico, A.: The myriad virtues of suffix trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO Advanced Science Institutes, Series F, vol. 12, pp. 85–96. Springer, Heidelberg (1985)

    Google Scholar 

  4. Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Communications of the ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  5. Bille, P.: Fast searching in packed strings. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 116–126. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  7. Claude, F., Navarro, G., Peltola, H., Salmela, L., Tarhio, J.: Speeding up pattern matching by text sampling. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 87–98. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Crochemore, M., Czumaj, A., Ga̧sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string matching algorithms. Algorithmica 12(4/5), 247–267 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  9. Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press, Oxford (1994)

    MATH  Google Scholar 

  10. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)

    Google Scholar 

  11. Deorowicz, S.: Computing the longest common transposition-invariant subsequence with GPU. In: Proceedings of Man-Machine Interactions, Advances in Intelligent and Soft Computing, vol. 59, pp. 551–559. Springer, Heidelberg (2009)

    Google Scholar 

  12. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of FOCS 1997, pp. 137–143. IEEE, Los Alamitos (1997)

    Google Scholar 

  13. Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  14. Fredriksson, K.: Row-wise tiling for the myers’ bit-parallel approximate string matching algorithm. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 66–79. Springer, Heidelberg (2003)

    Google Scholar 

  15. Fredriksson, K., Grabowski, S.: Average-optimal string matching. J. Discrete Algorithms 7(4), 579–594 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  16. Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Softw., Pract. Exper. 33(11), 1035–1049 (2003)

    Article  Google Scholar 

  17. Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in O(nm/w ⌉) worst case time. Information Processing Letters 105(5), 182–187 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  18. Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  19. He, L., Fang, B.: Linear nondeterministic dawg string matching algorithm. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 70–71. Springer, Heidelberg (2004)

    Google Scholar 

  20. Hyyrö, H., Fredriksson, K., Navarro, G.: Increased bit-parallelism for approximate and multiple string matching. ACM J. of Experimental Algorithmics 10(2.6), 1–27 (2005)

    Google Scholar 

  21. Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)

    Google Scholar 

  22. Knuth, D.: The art of computer programming: Combinatorial algorithms. Pre-fascicle 1a. Draft of section 7.1.3: Bitwise tricks and techniques (2008)

    Google Scholar 

  23. Knuth, D.E., Morris Jr, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(1), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  24. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Algorithms 23(2), 262–272 (1976)

    MATH  MathSciNet  Google Scholar 

  25. Navarro, G.: NR-grep: a fast and flexible pattern matching tool. Softw. Pract. Exp. 31, 1265–1312 (2001)

    Article  MATH  Google Scholar 

  26. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)

    Article  Google Scholar 

  27. Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. of Experimental Algorithmics 5(4) (2000)

    Google Scholar 

  28. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  29. Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–94. Springer, Heidelberg (2003)

    Google Scholar 

  30. Raman, R.: Priority queues: Small, monotone and trans-dichotomous. In: Díaz, J. (ed.) ESA 1996. LNCS, vol. 1136, pp. 121–137. Springer, Heidelberg (1996)

    Google Scholar 

  31. Thorup, M.: Combinatorial power in multimedia processors. SIGARCH Comput. Archit. News 31(4), 5–11 (2003)

    Article  Google Scholar 

  32. Thorup, M.: On AC0 implementations of fusion trees and atomic heaps. In: Proceedings of SODA 2003, pp. 699–707. SIAM, Philadelphia (2003)

    Google Scholar 

  33. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  34. Weiner, P.: Linear pattern matching algorithm. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  35. Wu, S., Manber, U.: Fast text searching allowing errors. Communications of the ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  36. Yao, A.C.: The complexity of pattern matching for a random string. SIAM Journal on Computing 8(3), 368–387 (1979)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fredriksson, K. (2010). From Nondeterministic Suffix Automaton to Lazy Suffix Tree. In: Elomaa, T., Mannila, H., Orponen, P. (eds) Algorithms and Applications. Lecture Notes in Computer Science, vol 6060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12476-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12476-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12475-4

  • Online ISBN: 978-3-642-12476-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics