WADS 1989: Algorithms and Data Structures pp 206-217

# Digital data structures and order statistics

• Wojciech Szpankowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 382)

## Abstract

This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of it to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possible infinite) symbols over a finite alphabet. A key notion of alignment matrix {C ij } n i,j=1 is introduced where C ij is the length of the longest string that is prefix of the i-th and the j-th word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of a digital trie under independent model (i.e., all words are statistically independent), is asymptotically equal to 2 logαn where n is the number of words stored in the trie and α is a parameter of the probabilistic model. Some extensions of our basic model to other digital trees such as b-tries, tries with random number of keys (Poisson model) and suffix trees (dependent keys !) are also shortly discussed.

## Keywords

Probabilistic Framework Suffix Tree Dependent Random Variable Finite Alphabet External Node
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. [1]
A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).Google Scholar
2. [2]
D. Knuth, The Art of Computer Programming. Sorting and Searching, vol. III, Addison-Wesley (1973).Google Scholar
3. [3]
A. Apostolico, “The Myriad Virtues of Suffix Trees”, Combinatorial Algorithms on Words, 85–96, Springer-Verlag, ASI F12 (1985).Google Scholar
4. [4]
R. Fagin, J. Nievergelt, N. Pippenger and H. Strong, “Extendible Hashing: A Fast Access Method for Dynamic Files”, ACM TODS, 4, 315–344 (1979).
5. [5]
P. Flajolet, “On the Performance Evaluation of Extendible Hashing and Trie Searching”, Acta Informatica, 20, 345–369 (1983).
6. [6]
R. Gallager, Information Theory and Reliable Communications, John Wiley & Sons, New York (1968).Google Scholar
7. [7]
J. Capetanakis, “Tree Algorithms for Packet Broadcast Channels”, IEEE Trans. on Information Theory, IT-25, 505–525 (1979).Google Scholar
8. [8]
IEEE Transaction on Information Theory, IT-31, 2 (1985).Google Scholar
9. [9]
Ph. Jacquet and M. Regnier, “Trie Partitioning Process: Limiting Distributions”, in Lecture Notes in Computer Science, vol. 214, pp. 196–210, Springer Verlag, New York 1986.Google Scholar
10. [10]
L. Devroye, “A Probabilistic Analysis of the Height of Tries and of the Complexity of Trie Sort”, Acta Informatica, 21, 229–232 (1984).Google Scholar
11. [11]
B. Pittel, “Asymptotic Growth of a Class of Random Trees”, The Annalus of Probability, 13, 414–427 (1985).Google Scholar
12. [12]
B. Pittel, “Path in a Random Digital Tree: Limiting Distributions”, Adv. Appl. Probl., 18, 139–155 (1986).Google Scholar
13. [13]
M. Regnier, “On the Average Height of Trees in Digital Searching and Dynamic Hashing”, Inform. Processing Letters, 13, 64–66 (1981).Google Scholar
14. [14]
A. Yao, “A Note on the Analysis of Extendible Hashing”, Inform. Processing Letters, 11, 84–86 (1980).Google Scholar
15. [15]
W. Szpankowski, “On the Analysis of the Average Height of a Digital Trie: Another Approach”, Purdue University CSD TR-646 (1986); revision TR-816 (1988).Google Scholar
16. [16]
A. Apostolico and W. Szpankowski, “Self-Alignments in Words and Their Applications”, Purdue University CSD TR-732 (1987), submitted to a journal.Google Scholar
17. [17]
W. Szpankowski, “Some Results on V-ary Asymmetric Tries”, Journal of Algorithms, 9, 224–244 (1988).Google Scholar
18. [18]
P. Kirschenhofer, H. Prodinger and W. Szpankowski, “On the Variance of the External Path Length in a Symmetric Digital Trie”, Discrete Applied Mathematics, to appear.Google Scholar
19. [19]
H. David, Order Statistics, John Wiley & Sons, New York (1980).Google Scholar
20. [20]
J. Galambos, The Asymptotic Theory of Extreme Order Statistics, John Wiley & Sons, New York (1978).Google Scholar
21. [21]
T. Lai and H. Robbins, “A Class of Dependent Random Variables and Their Maxima”, Z. Wahrscheinlichkeitscheorie, 42, 89–111 (1978).Google Scholar
22. [22]
P. Billingsley, Probability and Measures, John Wiley & Sons, New York (1986).Google Scholar
23. [23]
W. Szpankowski, “(Probably) Optimal Solutions to Some Problems NOT Only on Graphs”, Purdue University CSD TR 780. 1988; revision 1989.Google Scholar
24. [24]
B. Silverman and T.C. Brown, “Short distances, flat triangles and Poisson limits”, J. Appl. Probab., 15, 815–825 (1978).Google Scholar
25. [25]
D. Aldous, Probability Approximations via the Poisson Clumping Heuristic, Springer Verlag, New York 1989.Google Scholar