Advertisement

Digital data structures and order statistics

  • Wojciech Szpankowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 382)

Abstract

This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of it to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possible infinite) symbols over a finite alphabet. A key notion of alignment matrix {C ij } n i,j=1 is introduced where C ij is the length of the longest string that is prefix of the i-th and the j-th word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of a digital trie under independent model (i.e., all words are statistically independent), is asymptotically equal to 2 logαn where n is the number of words stored in the trie and α is a parameter of the probabilistic model. Some extensions of our basic model to other digital trees such as b-tries, tries with random number of keys (Poisson model) and suffix trees (dependent keys !) are also shortly discussed.

Keywords

Probabilistic Framework Suffix Tree Dependent Random Variable Finite Alphabet External Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).Google Scholar
  2. [2]
    D. Knuth, The Art of Computer Programming. Sorting and Searching, vol. III, Addison-Wesley (1973).Google Scholar
  3. [3]
    A. Apostolico, “The Myriad Virtues of Suffix Trees”, Combinatorial Algorithms on Words, 85–96, Springer-Verlag, ASI F12 (1985).Google Scholar
  4. [4]
    R. Fagin, J. Nievergelt, N. Pippenger and H. Strong, “Extendible Hashing: A Fast Access Method for Dynamic Files”, ACM TODS, 4, 315–344 (1979).CrossRefGoogle Scholar
  5. [5]
    P. Flajolet, “On the Performance Evaluation of Extendible Hashing and Trie Searching”, Acta Informatica, 20, 345–369 (1983).CrossRefGoogle Scholar
  6. [6]
    R. Gallager, Information Theory and Reliable Communications, John Wiley & Sons, New York (1968).Google Scholar
  7. [7]
    J. Capetanakis, “Tree Algorithms for Packet Broadcast Channels”, IEEE Trans. on Information Theory, IT-25, 505–525 (1979).Google Scholar
  8. [8]
    IEEE Transaction on Information Theory, IT-31, 2 (1985).Google Scholar
  9. [9]
    Ph. Jacquet and M. Regnier, “Trie Partitioning Process: Limiting Distributions”, in Lecture Notes in Computer Science, vol. 214, pp. 196–210, Springer Verlag, New York 1986.Google Scholar
  10. [10]
    L. Devroye, “A Probabilistic Analysis of the Height of Tries and of the Complexity of Trie Sort”, Acta Informatica, 21, 229–232 (1984).Google Scholar
  11. [11]
    B. Pittel, “Asymptotic Growth of a Class of Random Trees”, The Annalus of Probability, 13, 414–427 (1985).Google Scholar
  12. [12]
    B. Pittel, “Path in a Random Digital Tree: Limiting Distributions”, Adv. Appl. Probl., 18, 139–155 (1986).Google Scholar
  13. [13]
    M. Regnier, “On the Average Height of Trees in Digital Searching and Dynamic Hashing”, Inform. Processing Letters, 13, 64–66 (1981).Google Scholar
  14. [14]
    A. Yao, “A Note on the Analysis of Extendible Hashing”, Inform. Processing Letters, 11, 84–86 (1980).Google Scholar
  15. [15]
    W. Szpankowski, “On the Analysis of the Average Height of a Digital Trie: Another Approach”, Purdue University CSD TR-646 (1986); revision TR-816 (1988).Google Scholar
  16. [16]
    A. Apostolico and W. Szpankowski, “Self-Alignments in Words and Their Applications”, Purdue University CSD TR-732 (1987), submitted to a journal.Google Scholar
  17. [17]
    W. Szpankowski, “Some Results on V-ary Asymmetric Tries”, Journal of Algorithms, 9, 224–244 (1988).Google Scholar
  18. [18]
    P. Kirschenhofer, H. Prodinger and W. Szpankowski, “On the Variance of the External Path Length in a Symmetric Digital Trie”, Discrete Applied Mathematics, to appear.Google Scholar
  19. [19]
    H. David, Order Statistics, John Wiley & Sons, New York (1980).Google Scholar
  20. [20]
    J. Galambos, The Asymptotic Theory of Extreme Order Statistics, John Wiley & Sons, New York (1978).Google Scholar
  21. [21]
    T. Lai and H. Robbins, “A Class of Dependent Random Variables and Their Maxima”, Z. Wahrscheinlichkeitscheorie, 42, 89–111 (1978).Google Scholar
  22. [22]
    P. Billingsley, Probability and Measures, John Wiley & Sons, New York (1986).Google Scholar
  23. [23]
    W. Szpankowski, “(Probably) Optimal Solutions to Some Problems NOT Only on Graphs”, Purdue University CSD TR 780. 1988; revision 1989.Google Scholar
  24. [24]
    B. Silverman and T.C. Brown, “Short distances, flat triangles and Poisson limits”, J. Appl. Probab., 15, 815–825 (1978).Google Scholar
  25. [25]
    D. Aldous, Probability Approximations via the Poisson Clumping Heuristic, Springer Verlag, New York 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Wojciech Szpankowski
    • 1
  1. 1.Department of Computer SciencePurdue UniversityWest LafayetteUSA

Personalised recommendations