Advertisement

What can we learn about suffix trees from independent tries?

  • Philippe Jacquet
  • Wojciech Szpankowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 519)

Abstract

A suffix tree of a word is a digital tree that is built from suffixes of the underlying word. We consider words that are random sequences built from independent symbols over a finite alphabet. Our main finding shows that the depths in a suffix tree are asymptotically equivalent to the depths in a digital tree that stores independent keys (i.e., independent digital trees known also as tries). More precisely, we prove that the depths in a suffix tree build from the first n suffixes of a random word are normally distributed with the mean asymptotically equivalent to 1/h1 log n and the variance α·log n, where h1 is the entropy of the alphabet, and α is a parameter of the probabilistic model. Our results provide new insights into asymptotic properties of compression schemes, and therefore find direct applications in computer sciences and telecommunications, most notably in coding theory, theory of languages, and design and analysis of algorithms.

Keywords

Compression Scheme Repeated Pattern Suffix Tree Random String Finite Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AA]
    A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, pp. 8596, Springer-Verlag, ASI F12 (1985).Google Scholar
  2. [AHU]
    A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).Google Scholar
  3. [AS]
    A. Apostolico, W. Szpankowski, Self-alignments in Words and Their Applications, Purdue CSD-TR-732 (1987); Journal of Algorithms, to appear.Google Scholar
  4. [BEH]
    A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45 (1989).Google Scholar
  5. [BO]
    B. Bollobás Random Graphs, Academic Press, London (1985).Google Scholar
  6. [DE]
    L. Devroye, A Note on the Average Depth of Tries, Computing, 28, 367–371 (1982).Google Scholar
  7. [DSR]
    L., Devroye, W. Szpankowski and B. Rais, A note of the height of suffix trees, Purdue University, CSD TR-905 (1989); SIAM J. Computing, to appear.Google Scholar
  8. [FL]
    P. Flajolet, On the Performance Evaluation of Extendible Hashing and Trie Searching, Acta Informatica, 20, 345369 (1983).Google Scholar
  9. [FRS]
    P. Flajolet, M. Regnier and R. Sedgewick, Some Uses of the Mellin Transform Techniques in the Analysis of Algorithms, in Combinatorial Algorithms on Words, Springer NATO ASI Ser. F12, 241–254 (1985).Google Scholar
  10. [GO1]
    L. Guibas and A. Odlyzko Maximal Prefix-Synchronized Codes, SIAM J. Appl. Math, 35, 401–418 (1978).Google Scholar
  11. [GO2]
    L. Giubas and A. Odlyzko, Periods in Strings Journal of Combinatorial Theory, Series A, 30, 19–43 (1981).Google Scholar
  12. [GO3]
    L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).Google Scholar
  13. [HE]
    P. Henrici, Applied and Computational Complex Analysis, John Wiley & Sons (1977).Google Scholar
  14. [JR]
    P. Jacquet and M. Regnier, Trie Partitioning Process: Limiting Distribution, Proc. CAAP'86, Lecture Notes in Computer Science 214, 194–210 (1986).Google Scholar
  15. [JS]
    P. Jacquet and W. Szpankowski, Analysis of Tries With Markovian Dependency, Purdue University, CSD TR-906, 1989; IEEE Trans. Information Theory, to appear.Google Scholar
  16. [JS1]
    P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Trees by String-Ruler Approach, INRIA TR-1106, 1989.Google Scholar
  17. [KN]
    D. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley (1973).Google Scholar
  18. [LO]
    M. Lothaire, Combinatorics on Words, Addison-Wesley (1982).Google Scholar
  19. [LZ]
    A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).Google Scholar
  20. [MC]
    E.M. McCreight, A Space Economical Suffix Tree Construction Algorithm, JACM, 23, 262272 (1976).Google Scholar
  21. [PI1]
    B. Pittel, Asymptotic growth of a class of random trees, The Annals of Probability, 18, 414–427 (1985).Google Scholar
  22. [PI2]
    B. Pittel, Paths in a Random Digital Tree: Limiting Distributions, Adv. Appl. Prob., 18, 139–155 (1986).Google Scholar
  23. [RJ]
    M. Regnier and P. Jacquet, New Results on the Size of Tries, IEEE Trans. Information Theory, 35, 203–205 (1989).Google Scholar
  24. [RPE]
    M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16–24 (1981).Google Scholar
  25. [SZ1]
    W. Szpankowski, Some Results on V-ary Asymmetric Tries, Journal of Algorithms, 9, 224–244 (1988).Google Scholar
  26. [SZ2]
    W. Szpankowski, The Evaluation of an Alternating Sum with Applications to the Analysis of Some Data Structures, Information Processing Letters, 28, 13–19 (1988).Google Scholar
  27. [SZ3]
    W. Szpankowski, On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277 (1991).Google Scholar
  28. [WE]
    P. Weiner, Linear Pattern Matching Algorithms, Proc. of the 14-th Annual Symposium on Switching and Automata Theory, 111 (1973).Google Scholar
  29. [ZL]
    J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Information Theory, 23, 3, 337–343 (1977).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1991

Authors and Affiliations

  • Philippe Jacquet
    • 1
  • Wojciech Szpankowski
    • 2
  1. 1.INRIA RocquencourtLe Chesnay CedexFrance
  2. 2.Department of Computer SciencePurdue UniversityW. LafayetteU.S.A.

Personalised recommendations