Advertisement

Algorithmica

, Volume 23, Issue 3, pp 246–260 | Cite as

Suffix Trees on Words

  • A. Andersson
  • N. J. Larsson
  • K. Swanson

Abstract.

We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words . This wordsuffixtree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters , whose definition depends on the application.

Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Ω(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions.

Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.

Key words. Suffix trees, Substring searching. 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag New York Inc. 1999

Authors and Affiliations

  • A. Andersson
    • 1
  • N. J. Larsson
    • 1
  • K. Swanson
    • 1
  1. 1.Department of Computer Science, Lund University, Box 118, S-221 00 Lund, Sweden. arne@dna.lth.se, jesper@dna.lth.se, kurt@dna.lth.se.SE

Personalised recommendations