Augmenting Suffix Trees, with Applications

  • Yossi Matias
  • S. Muthukrishnan
  • Süleyman Cenk Sahinalp
  • Jacob Ziv
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1461)


Information retrieval and data compression are the two main application areas where the rich theory of string algorithmics plays a fundamental role. In this paper, we consider one algorithmic problem from each of these areas and present highly efficient (linear or near linear time) algorithms for both problems. Our algorithms rely on augmenting the suffix tree, a fundamental data structure in string algorithmics. The augmentations are nontrivial and they form the technical crux of this paper. In particular, they consist of adding extra edges to suffix trees, resulting in Directed Acyclic Graphs (DAGs). Our algorithms construct these “suffix DAGs” and manipulate them to solve the two problems efficiently.


Data Compression Lexicographic Order Conditional Entropy Compression Scheme Suffix Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. BCW90.
    T. Bell, T. Cleary, and I. Witten. Text Compression. Academic Press, 1990.Google Scholar
  2. Bro98.
    G. S. Brodal. Finger search trees with constant insertion time. In ACM-SIAM Symposium on Discrete Algorithms, 1998.Google Scholar
  3. BW94.
    M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, DEC SRC, 1994.Google Scholar
  4. CR94.
    M. Crochemore and W. Rytter. Text Algorithms. Oxford Press, 1994.Google Scholar
  5. CW84.
    J. G. Cleary and I. H. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4):396–402, 1984.CrossRefGoogle Scholar
  6. Gus98.
    D. M. Gusfield. Algorithms on Strings, Trees, and Sequences. Addison Wesley, 1998.Google Scholar
  7. Hui92.
    J. Hui. Color set size problem with applications to string matching. In Combinatorial Pattern Matching, 1992.Google Scholar
  8. HZ95.
    Y. Hershkovits and J. Ziv. On sliding window universal data compression with limited memory. In Information Theory symposium, pages 17–22, September 1995.Google Scholar
  9. HZ98.
    Y. Hershkovits and J. Ziv. On sliding window universal data compression with limited memory. IEEE Trans. on Information Theory, 44:66–78, January 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  10. McC76.
    E. M. McCreight. A space economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, April 1976.zbMATHCrossRefMathSciNetGoogle Scholar
  11. RPE81.
    M. Rodeh, V. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16–24, January 1981.zbMATHCrossRefMathSciNetGoogle Scholar
  12. SV88.
    B. Schieber and U. Vishkin. On finding lowest common ancestors:simplification and parallelization. SIAM Journal of Computing, 17:1253–1262, 1988.zbMATHCrossRefMathSciNetGoogle Scholar
  13. Wel84.
    T.A. Welch. A technique for high-performance data compression. IEEE Computer, pages 8–19, January 1984.Google Scholar
  14. WRF95.
    M. J. Weinberger, J. J. Rissanen, and M. Feder. A universal finite memory source. IEEE Transactions on Information Theory, 41(3):643–652, 1995.zbMATHCrossRefGoogle Scholar
  15. Yok96.
    H. Yokoo. An adaptive data compression method based on context sorting. In IEEE Data Compression Conference, 1996.Google Scholar
  16. ZL77.
    J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337–343, May 1977.CrossRefMathSciNetGoogle Scholar
  17. ZL78.
    J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, IT-24(5):530–536, September 1978.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Yossi Matias
    • 1
    • 2
  • S. Muthukrishnan
    • 2
  • Süleyman Cenk Sahinalp
    • 3
    • 4
  • Jacob Ziv
    • 5
  1. 1.Department of Computer ScienceTel-Aviv UniversityTel-AvivIsrael
  2. 2.Bell LabsMurray HillUSA
  3. 3.Department of Computer ScienceUniversity of WarwickCoventryUK
  4. 4.Center for BioInformaticsUniversity of PennsylvaniaPhiladelphiaUSA
  5. 5.Department of Electrical EngineeringTechnionHaifaIsrael

Personalised recommendations