A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space
The compressed suffix array and the compressed suffix tree for a given string S are full-text index data structures occupying O(nlog|Σ|) bits where n is the length of S and Σ is the alphabet from which symbols of S are drawn. When they were first introduced, they were constructed from suffix arrays and suffix trees, which implies they were not constructed in optimal O(nlog|Σ|)-bit working space. Recently, several methods were developed for constructing compressed suffix arrays and compressed suffix trees in optimal working space. By these methods, one can construct compressed suffix trees supporting the pattern search in O(m′ |Σ|) time where m′ = m log ε n, m is the length of a pattern, and log ε n is the time to find the ith smallest suffix of S from the compressed suffix array for any fixed 0 < ε ≤ 1. However, compressed suffix trees supporting the pattern search in O(m′ log|Σ| ) time are not constructed by these methods.
In this paper, we present a new compressed suffix tree supporting O(m′ log|Σ|)-time pattern search and its construction algorithm using optimal working space. To obtain this result, we developed a new succinct representation of the suffix trees, which is different from the classic succinct representation of parentheses encoding of the suffix trees. Our succinct representation technique can be generally applicable to succinct representation of other search trees.
KeywordsPattern Search Construction Algorithm Suffix Tree Sign Array Suffix Array
Unable to display preview. Download preview PDF.
- 1.Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 53–86 (2004)Google Scholar
- 7.Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2001)Google Scholar
- 8.Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: SODA, pp. 269–278 (2001)Google Scholar
- 10.Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice Hall, Englewood Cliffs (1992)Google Scholar
- 11.Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)Google Scholar
- 12.Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: SODA (2004)Google Scholar
- 13.Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: STOC, pp. 397–406 (2000)Google Scholar
- 16.Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: FOCS, pp. 251–260 (2003)Google Scholar
- 21.Kim, D.K., Kim, M., Park, H.: Linearized suffix tree: an efficient index data structre with the capabilities of suffix trees and suffix arrays (manuscript, 2004)Google Scholar
- 29.Sadakane, K.: Succinct representations of lcp Information and improvements in the compressed suffix arrays. In: SODA, pp. 225–232 (2002)Google Scholar
- 32.Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)Google Scholar