Augmenting Suffix Trees, with Applications
Information retrieval and data compression are the two main application areas where the rich theory of string algorithmics plays a fundamental role. In this paper, we consider one algorithmic problem from each of these areas and present highly efficient (linear or near linear time) algorithms for both problems. Our algorithms rely on augmenting the suffix tree, a fundamental data structure in string algorithmics. The augmentations are nontrivial and they form the technical crux of this paper. In particular, they consist of adding extra edges to suffix trees, resulting in Directed Acyclic Graphs (DAGs). Our algorithms construct these “suffix DAGs” and manipulate them to solve the two problems efficiently.
KeywordsData Compression Lexicographic Order Conditional Entropy Compression Scheme Suffix Tree
Unable to display preview. Download preview PDF.
- BCW90.T. Bell, T. Cleary, and I. Witten. Text Compression. Academic Press, 1990.Google Scholar
- Bro98.G. S. Brodal. Finger search trees with constant insertion time. In ACM-SIAM Symposium on Discrete Algorithms, 1998.Google Scholar
- BW94.M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, DEC SRC, 1994.Google Scholar
- CR94.M. Crochemore and W. Rytter. Text Algorithms. Oxford Press, 1994.Google Scholar
- Gus98.D. M. Gusfield. Algorithms on Strings, Trees, and Sequences. Addison Wesley, 1998.Google Scholar
- Hui92.J. Hui. Color set size problem with applications to string matching. In Combinatorial Pattern Matching, 1992.Google Scholar
- HZ95.Y. Hershkovits and J. Ziv. On sliding window universal data compression with limited memory. In Information Theory symposium, pages 17–22, September 1995.Google Scholar
- Wel84.T.A. Welch. A technique for high-performance data compression. IEEE Computer, pages 8–19, January 1984.Google Scholar
- Yok96.H. Yokoo. An adaptive data compression method based on context sorting. In IEEE Data Compression Conference, 1996.Google Scholar