Weighted Depths of Labelled Nodes
In the permutation model, let \(A_{j,k}\) be the event that the node labelled k is in the subtree of the node labelled j. Then, \(D_k(n) = \sum _{j=1}^n \mathbf {1}_{ A_{j,k} } -1 \) and \(W_k(n) = \sum _{j=1}^n j \mathbf {1}_{ A_{j,k} }\). It is easy to see that \(A_{1, k}, \ldots , A_{k-1,k}\) and \(A_{k+1,k}, \ldots , A_{n,k}\) are two families of independent events; however, there exist subtle dependencies between the sets. Following the approach in [9], let \(B_{j,k} = A_{j,k-1}\) for \(j < k\) and \(B_{j,k} = A_{j,k+1}\) for \(j > k\). For convenience, let \(B_{k,k}\) be an almost sure event. The following lemma summarizes results in [9], and we refer to this paper for a proof. In this context, note that Devroye [8] gives distributional representations as sums of independent (or m-dependent) indicator variables for quantities growing linearly in n, such as the number of leaves.
Lemma 1
Let \(1 \le k \le n\). Then, the events \(B_{j,k}, j = 1, \ldots , n\), are independent. For \(j \ne k\), we have
$$\begin{aligned} \mathbb {P} \left( A_{j,k} \right) = \frac{1}{|k-j| + 1}, \quad \mathbb {P} \left( B_{j,k} \right) = \frac{1}{|k-j|}. \end{aligned}$$
From the lemma, it follows that
$$\begin{aligned}&\mathbf {E} \left[ \sum _{j = 1}^n \mathbf {1}_{ B_{j,k} \backslash A_{j,k} } \right] \le 2, \quad \text {and} \quad \mathbf {E} \left[ \sum _{j = 1}^n j \mathbf {1}_{ B_{j,k} \backslash A_{j,k} } \right] \le 2k + \log n. \end{aligned}$$
The ideas in [9] can also be used to analyse second (mixed) moments. Straightforward calculations show the following bounds:
$$\begin{aligned}&\mathbf {E} \left[ \sum _{i,j = 1}^n \mathbf {1}_{ B_{j,k} } \mathbf {1}_{ B_{i,k} \backslash A_{i,k} } \right] = O(1), \quad \text {and}\\&\mathbf {E} \left[ \sum _{i,j = 1}^n i j \mathbf {1}_{ B_{j,k} } \mathbf {1}_{ B_{i,k} \backslash A_{i,k} } \right] = O(k^2 + k(\log n)^2). \end{aligned}$$
Here, both O-terms are uniform in \(1 \le k \le n\). Define \(\bar{D}_k(n) = \sum _{j=1}^n \mathbf {1}_{ B_{j,k} }-1\) and \({\bar{W}}_k(n) = \sum _{j=1}^n j \mathbf {1}_{ B_{j,k} }\). We make the following observation:
For \(i=1,2, n \ge 0\) and \(1 \le k \le n\), set \(H^{(i)}_{n} := \sum _{j=1}^n j^{-i}\) and \(H^{(i)}_{k,n} := H^{(i)}_{k-1} + H^{(i)}_{n-k}\). Using Lemma 1, one easily computes
$$\begin{aligned} \mathbf {E} \left[ {\bar{W}}_k(n) \right]&= k (H_{k,n}^{(1)}-1) + n + 1 ,\\ \text {Var}({\bar{W}}_k(n))&= k^2 \left( H_{k,n}^{(1)} - H_{k,n}^{(2)}-3\right) + \frac{n^2}{2} + kn + 2k \left( H^{(1)}_{k-1} - H^{(1)}_{n-k}\right) - \frac{n}{2} + k + 1. \end{aligned}$$
As \(H^{(1)}_{n} = \log (n+1) + O(1)\) and \(H^{(2)}_n = O(1)\), both expansions (10) and (11) follow from observation O.
Weighted Depths of Large Nodes
We prove Theorem 1. First, (12) follows from (4) and
$$\begin{aligned} \mathbf {E} \left[ \left| k D_k(n) - W_k(n) \right| \right] \le k + \sum _{j=1}^n |k-j| \mathbb {P} \left( A_{j,k} \right) \le k+n. \end{aligned}$$
(23)
For \(k = \omega (n/\sqrt{\log n})\), combining (4), (5) and (10), in distribution,
$$\begin{aligned} \left( \frac{D_k(n) - \mathbf {E} \left[ D_k(n) \right] }{\sigma _{ D_k(n)}}, \frac{W_k(n) - \mathbf {E} \left[ W_k(n) \right] }{\sigma _{W_k(n)}} \right) \rightarrow (\mathscr {N}, \mathscr {N}). \end{aligned}$$
From here, statement (13) follows from (4) and (10).
Considering the last inserted node with value \(Y_n\), note that, conditionally on \(Y_n = k\), the correlations between the events \(A_{j,k}, j < k\) and \(A_{j,k}, j > k\) vanish. More precisely, given \(Y_n = k\), the family \(\{ \mathbf {1}_{ A_{j,k} }, j = 1, \ldots , n\}\) is distributed like a family of independent Bernoulli random variables \(\{V_{j,k}: j = 1, \ldots , n\}\) with \(\mathbb {P} \left( V_{j,k}=1 \right) = |k-j|^{-1}\) for \(j \ne k\) and \(\mathbb {P} \left( V_{k,k}=1 \right) = 1\). Thus,
$$\begin{aligned} \mathbf {E} \left[ |Y_n (X_n+1) - {\mathbb {X}}_n| \right]&\le \frac{1}{n} \sum _{k=1}^n \mathbf {E} \left[ \sum _{j=1}^n |k-j| \mathbf {1}_{ A_{j,k} } \Bigg | Y_n = k \right] \\&= \frac{1}{n} \sum _{k=1}^n\mathbf {E} \left[ \sum _{j=1}^n |k-j| V_{j,k} \right] \le n. \end{aligned}$$
By (3), we have \(X_n / \log n \rightarrow 2\) in probability. Hence, in order to prove (14), it suffices to show that, in distribution,
$$\begin{aligned} \left( \frac{X_n - 2 \log n}{\sqrt{2 \log n}}, \frac{Y_n}{n} \right) \rightarrow \left( \mathscr {N}, \xi \right) . \end{aligned}$$
(24)
For a sequence \((k_n)\) satisfying \(sn \le k_n \le tn\) for \(0< s< t < 1\), let us condition on the event \(Y_n = k_n\). Then, by the central limit theorem for triangular arrays of row-wise independent uniformly bounded random variables with diverging variance applied to \(V_{j, k_n}, j =1, \ldots , n\), in distribution,
$$\begin{aligned} \frac{X_n - 2 \log n}{\sqrt{2 \log n}} \rightarrow \mathscr {N}. \end{aligned}$$
Hence, (24) follows from an application of the theorem of dominated convergence noting that \(Y_n\) is uniformly distributed on \(\{1, \ldots , n\}\).
Weighted Depths of Small Nodes
We prove Theorem 2. Let \({\bar{D}}^>_k(n) = \sum _{j = k+1}^n \mathbf {1}_{ B_{j,k} }\) and \({\bar{W}}^>_k(n) = \sum _{j = k+1}^n j \mathbf {1}_{ B_{j,k} }\). Since \(k = O(n/ \sqrt{\log n})\), the same calculation as in (23) shows that,
$$\begin{aligned} \frac{\mathbf {E} \left[ | {\bar{W}}_k(n) - {\bar{W}}^>_k(n) - k ({\bar{D}}_k(n) - {\bar{D}}^>_k(n) )| \right] }{n} \le \frac{k}{n} \rightarrow 0, \quad n \rightarrow \infty . \end{aligned}$$
(25)
For \(\lambda , \mu \in \mathbb {R}\), we have
$$\begin{aligned} \log&\mathbf {E} \left[ \exp \left( i \lambda \left( {\bar{D}}^>_k(n) - \log n\right) / \sqrt{\log n} + i \mu \left( {\bar{W}}^>_k(n) - k {\bar{D}}^>_k(n)\right) /n\right) \right] \\&= - i\lambda \sqrt{\log n} + \log \mathbf {E} \left[ \exp \left( i \sum _{j = k+1}^n \left( \frac{\lambda }{\sqrt{\log n}} + \mu \frac{j -k}{n}\right) B_{j,k} \right) \right] \\&= - i\lambda \sqrt{\log n} + \sum _{j=k+1}^n \log \left( 1 + \frac{ \exp \left( i \left( \frac{\lambda }{\sqrt{\log n}} + \mu \frac{j -k}{n} \right) \right) -1}{j-k} \right) . \end{aligned}$$
By a standard Taylor expansion, the last display equals
$$\begin{aligned}&-i \lambda \sqrt{\log n} + \sum _{j=k+1}^n \frac{ \exp \left( i \left( \frac{\lambda }{\sqrt{\log n}} + \mu \frac{j -k}{n} \right) \right) -1}{j-k} + o(1) \\&\quad = - i\lambda \sqrt{\log n} + \sum _{j=k+1}^n \frac{ \exp \left( i \mu \frac{j -k}{n} \right) \left( 1 + \frac{i \lambda }{\sqrt{\log n}} - \frac{\lambda ^2}{2\log n} \right) -1}{j-k} + o(1) \\&\quad = - \lambda ^2 /2 + \left( 1 + \frac{i \lambda }{\sqrt{\log n}} - \frac{\lambda ^2}{2 \log n} \right) \sum _{j=0}^{n-1} \frac{ \exp \left( i \mu \frac{j+1}{n} \right) -1}{j+1} + o(1) \\&\quad = - \lambda ^2/2 + \int _0^1 \frac{e^{i \mu x}-1}{x} dx + o(1). \end{aligned}$$
Here, in the last step, we have used that the sum on the right-hand side is a Riemann sum over the unit interval whose mesh size \(n^{-1}\) tends to zero. Thus, using the notation of the theorem, (1) and Lévy’s continuity theorem, in distribution,
$$\begin{aligned} \left( \frac{{\bar{D}}^>_k(n) - \log n}{\sqrt{\log n}}, \frac{{\bar{W}}^>_k(n) - k {\bar{D}}^>_k(n)}{n}\right) \rightarrow (\mathscr {N}, \mathscr {Y}). \end{aligned}$$
(26)
In order to deduce (15) note that, by Lemma 1, \({\bar{D}}_k(n) - {\bar{D}}^>_k(n)\) and \(({\bar{D}}^>_k(n), {\bar{W}}^>_k(n))\) are independent while
$$\begin{aligned} \frac{{\bar{D}}_k(n) - {\bar{D}}^>_k(n) - \mathbf {E} \left[ {\bar{D}}_k(n) - {\bar{D}}^>_k(n) \right] }{\sigma _{{\bar{D}}_k(n) - {\bar{D}}^>_k(n)}} \rightarrow \mathscr {N}, \end{aligned}$$
in distribution if and only if \(k \rightarrow \infty \) using the central limit theorem for sums of independent and uniformly bounded random variables. Since
$$\begin{aligned} \frac{{\bar{D}}_k(n) - \mathbf {E} \left[ {\bar{D}}_k(n) \right] }{\sigma _{{\bar{D}}_k(n)}}&= \frac{{\bar{D}}^>_k(n) - \mathbf {E} \left[ D^>_k(n) \right] }{\sqrt{\log n}} \frac{\sqrt{\log n}}{\sigma _{{\bar{D}}_k(n)}} \\&\quad \; + \frac{{\bar{D}}_k(n) - {\bar{D}}^>_k(n) - \mathbf {E} \left[ {\bar{D}}_k(n) - {\bar{D}}^>_k(n) \right] }{\sigma _{{\bar{D}}_k(n) - {\bar{D}}^>_k(n)}} \frac{\sigma _{{\bar{D}}_k(n) - {\bar{D}}^>_k(n)}}{\sigma _{{\bar{D}}_k(n)}}, \end{aligned}$$
we deduce
$$\begin{aligned} \left( \frac{{\bar{D}}_k(n) - \mathbf {E} \left[ {\bar{D}}_k(n) \right] }{\sigma _{{\bar{D}}_k(n)}}, \frac{{\bar{W}}^>_k(n) - k {\bar{D}}^>_k(n)}{n}\right) \rightarrow (\mathscr {N}, \mathscr {Y}), \end{aligned}$$
from (26) upon treating the cases \(k = O(1)\) and \(k = \omega (1)\) separately. From here, the assertion (15) follows with the help of (25) and observation O.
Proof of (21)
The main observation is that the \(k\hbox {th}\) external node visited by the depth first search process is always contained in the subtree rooted at the node labelled k. This can be proved by induction exploiting the decomposition of the tree at the root. Thus, denoting by \(H_k(n)\) the height of the subtree rooted at the node labelled k, we have
$$\begin{aligned} D_k(n)&\le D_k^*(n) \le D_k(n) + H_k(n), \\ W_k(n)&\le W_k^*(n) \le W_k(n) + M_k(n) H_k(n). \end{aligned}$$
Here, \(M_k(n)\) stands for the largest label in the subtree rooted at the node labelled k. Let \(T_k(n)\) be the size of the subtree rooted at k. Then \(T_k(n) = 1 + T^{<}_k(n) + T^{>}_k(n)\) where \(T^{<}_k(n)\) denotes the number of elements in the subtree rooted at k with values smaller than k. By Lemma 1, for \(\ell \le n - k\), we have \(\mathbb {P} \left( T^{>}_k(n) \ge \ell \right) = \mathbb {P} \left( A_{k,k+\ell } \right) = 1/(\ell +1)\). Using the same arguments for the quantity \(T^{<}_k(n)\), we deduce that, uniformly in \(1 \le k \le n\),
$$\begin{aligned} \mathbf {E} \left[ T_k(n) \right] = \varTheta (\log n), \quad \mathbf {E} \left[ (T_k(n))^2 \right] = \varTheta (n^{1/2}), \quad \mathbf {E} \left[ (\log T_k(n))^2 \right] = O(1). \end{aligned}$$
Thus, by an application of (2), for some \(C_1 > 0\),
$$\begin{aligned} \mathbf {E} \left[ |D_k(n) - D_k^*(n)|^2 \right] \le \mathbf {E} \left[ (H_k(n))^2 \right]&\le C_1 \mathbf {E} \left[ (\log T_k(n))^2 \right] = O(1). \end{aligned}$$
By the same arguments, for some \(C_2 > 0\), we have
$$\begin{aligned} \mathbf {E} \left[ |W_k(n) - W_k^*(n)|^2 \right]&\le \mathbf {E} \left[ (M_k(n) H_k(n))^2 \right] \le \mathbf {E} \left[ (k + T_k(n))^2 (H_k(n))^2 \right] \\&\le C_2 k^2 + C_1\left( 2 k \mathbf {E} \left[ T_k(n) (\log T_k(n))^2 \right] \right. \\&\left. \quad +\, \mathbf {E} \left[ (T_k(n))^2 (\log T_k(n))^2 \right] \right) \\&= O(k^2 + (\log n)^{2} n^{1/2}). \end{aligned}$$
From here, (21) follows from (10).
The Weighted Silhouette
We prove Theorem 3 and Proposition 1.
Proof of Theorem 3
We start with the uniform convergence of \((\varXi _k)\). For all \(x \in [0,1]\), \(|\varXi _k(x) - \varXi _{k-1}(x)|\) is distributed like the product of \(k+1\) independent random variables, each of which having the uniform distribution on [0, 1]. In particular, by the union bound and Markov’s inequality, for any \(m \ge 1\),
$$\begin{aligned} \mathbb {P} \left( \sup _{x \in [0,1]} |\varXi _k(x) - \varXi _{k-1}(x)| \ge t \right) \le 2^k \mathbb {P} \left( \prod _{i=1}^{k+1} U_i \ge t \right) \le \left( \frac{2}{m+1}\right) ^{k} t^{-m}. \end{aligned}$$
For \(k \ge 1\), let \({\mathscr {D}}_k = \{\ell 2^{-k}: \ell = 1, \ldots , 2^k - 1\}\). By construction, for \(k \ge 1\), the map \(x \rightarrow \varXi _k(x)\) is a right continuous step function. Further, it is continuous at x if and only if \(x \notin {\mathscr {D}}_k\). Next, for \(0< q < 1\),
$$\begin{aligned} \mathbf {E} \left[ \sup _{x \in [0,1]} |\varXi _k(x) - \varXi _{k-1}(x)| \right]&= \int _0^\infty \mathbb {P} \left( \sup _{x \in [0,1]} |\varXi _k(x) - \varXi _{k-1}(x)| \ge t \right) dt \\&\le q^k + \int _{q^k}^\infty \left( \frac{2}{m+1}\right) ^{k} t^{-m} dt \\&= q^k + \frac{1}{m-1}\left( \frac{2}{m+1}\right) ^{k} q^{-k(m-1)}. \end{aligned}$$
With \(m=2\) and \(q = \sqrt{2/3}\), the latter expression is bounded by \(2 q^k\). By Markov’s inequality, it follows that \(\sup _{m \ge n} \sup _{x \in [0,1]} |\varXi _m(x) - \varXi _{n}(x)| \rightarrow 0\) in probability as \(n \rightarrow \infty \). An application of the triangle inequality shows that \(\sup _{m, p \ge n} \sup _{x \in [0,1]} |\varXi _m(x) - \varXi _{p}(x)| \rightarrow 0\) in probability as \(n \rightarrow \infty \). By monotonicity, this convergence is almost sure. Thus, almost surely, \((\varXi _k)\) is uniformly Cauchy in the space of càdlàg functions endowed with the uniform topology. By completeness, \((\varXi _k)\) converges to a limit denoted by \(\varXi \) with càdlàg paths. Moreover, \(\varXi \) is continuous at \(x \notin {\mathscr {D}}\) where \(\mathscr {D} = \cup _{m \ge 1} {\mathscr {D}}_m\) since this is true for all \(\varXi _k\), \(k \ge 1\). For \(x \in {\mathscr {D}}\), let \(\varPhi (x)\) be the key of the node associated with \(x_1 \ldots x_{k-1}\) where \(k \ge 1\) is chosen minimal with \(x \in {\mathscr {D}}_k\). Then, \(\lim _{y \uparrow x} \varXi (x) = \varPhi (x) = \varXi (x).\) Thus, \(x \mapsto \varXi (x)\) is continuous. By the construction of the tree, it is clear that \(\varXi (x) < \varXi (y)\) for any \(x, y \in {\mathscr {D}}\) with \(x < y\). As \({\mathscr {D}}\) is dense in [0, 1], the process \(\varXi \) is strictly monotonically increasing. Obviously, \(\varXi (0) = 0\) and \(\varXi (1) = 1\); hence, \(\varXi \) is the distribution function of a probability measure on [0, 1].\(\square \)
We turn to the convergence of \({\mathscr {B}}_n(x)\). For any fixed \(x \in [0,1]\), display (3) implies that, as \(n \rightarrow \infty \), in probability, \(B_n(x) / \log n \rightarrow 1\). Thus, (16) follows from the convergence \(\varXi _k(x) \rightarrow \varXi (x)\). The convergence (16) is with respect to all moments since \(B_n(x) \le H_n\) and we have convergence of all moments in (2). By the theorem of dominated convergence, for any \(m \ge 1\), again using (2), we have
$$\begin{aligned} \int _0^1 \mathbf {E} \left[ \left| \frac{{{\mathscr {B}}}_n(x)}{\log n} - \varXi (x) \right| ^m \right] dx \rightarrow 0. \end{aligned}$$
This shows (17). To prove (18), note that, for any \(k \ge 1\), \(\sup _{x \in [0,1]} {{\mathscr {B}}}_n(x)\) is larger than the product of the height of the subtree rooted at the node \(w_k := 1\ldots 1\) on level k and \(\varXi _{k-1}({\mathbf {1}})\). Let \(\varepsilon > 0\). Fix k large enough such that \(\mathbb {P} \left( \varXi _{k-1}({\mathbf {1}})< 1-\varepsilon \right) < \varepsilon \). Conditional on its size, the subtree rooted at \(w_k\) is a random binary search tree. Since its size grows linearly in n as \(n \rightarrow \infty \), it follows from (2) that, for all n sufficiently large, its height exceeds \((c^*-\varepsilon ) \log n\) with probability at least \(1-\varepsilon \). For these values of n, we have \(\sup _{x \in [0,1]} {{\mathscr {B}}}_n(x) \ge (c^* - 6 \varepsilon ) \log n\) with probability at least \(1-2 \varepsilon \). As \(\varepsilon \) was chosen arbitrarily, this shows (18).
For the joint convergence of \(B_n(x)\) and \({\mathscr {B}}_n(x)\) for fixed \(x \in [0,1]\), we abbreviate \(B_n := B_n(x), {\mathscr {B}}_n := {\mathscr {B}}_n(x)\), \(\varXi _k := \varXi _k(x), \varXi = \varXi (x)\) and \(\bar{B}_n = (B_n - \log n)/\sqrt{\log n}\). Note that \(\varXi \) and \(B_n\) are not independent which causes the proof to be more technical. Denote by \(N_k\) the time when the node associated with \(x_1 \ldots x_k\) is inserted in the binary search tree. For any \(\varepsilon > 0\), we can choose \(k, L \ge 1\) such that, for all n sufficiently large,
$$\begin{aligned} \mathbb {P} \left( |\varXi _k - \varXi | \ge \varepsilon \right) + \mathbb {P} \left( N_k \ge L \right) + \mathbb {P} \left( \left| \frac{{\mathscr {B}}_n}{\log n} - \varXi \right| \ge \varepsilon \right) \le \varepsilon . \end{aligned}$$
Further, there exists \(\delta > 0\) such that \(\mathbb {P} \left( |\varXi _k - \varXi _{k-1}| \le \delta \right) \le \varepsilon \). Then, for \(r , y \in \mathbb {R}\) with \(\mathbb {P} \left( \varXi = y \right) = 0\), and n large enough,
$$\begin{aligned} \mathbb {P} \left( \bar{B}_n \le r, \frac{{\mathscr {B}}_n}{ \log n} \le y \right) \le 2 \varepsilon + \mathbb {P} \left( \bar{B}_n \le r, \varXi _k \le y + 2 \varepsilon , |\varXi _k - \varXi _{k-1}| \ge \delta , N_k < L \right) . \end{aligned}$$
Let \(\bar{x} = x_{k+1} x_{k+2} \ldots \), \((V_1, V_2, \ldots )\) be an independent copy of \((U_1, U_2, \ldots )\) and
$$\begin{aligned} \text {Bin}(n,p) := \sum _{i=1}^n \mathbf {1}_{ \{V_i \le p\} }, \quad n \ge 0, p \in [0,1]. \end{aligned}$$
Given \(\varXi _k, |\varXi _k - \varXi _{k-1}|, N_k\), on \(N_k < n\), \(\bar{B}_n\) is distributed like \(\bar{B}^*_{\text {Bin}(n - N_k, |\varXi _k - \varXi _{k-1}|)}(\bar{x}) + k / \sqrt{\log n}\) where \((B^*_n(\bar{x}))\) is distributed like \((B_n(\bar{x}))\) and independent from the remaining quantities. We deduce
$$\begin{aligned}&\mathbb {P} \left( \bar{B}_n \le r, \frac{{\mathscr {B}}_n}{ \log n} \le y \right) \\&\quad \le 2 \varepsilon + \mathbb {P} \left( \frac{k}{\sqrt{ \log n}} + \bar{B}^*_{\text {Bin}(n - L, \delta )}(\bar{x}) \le r, \varXi _k \le y + 2 \varepsilon , |\varXi _k - \varXi _{k-1}| \ge \delta , N_k < L \right) \\&\quad \le 3 \varepsilon + \mathbb {P} \left( \frac{k}{\sqrt{ \log n}} + \bar{B}^*_{\text {Bin}(n - L, \delta )}(\bar{x}) \le r \right) \mathbb {P} \left( \varXi \le y + 2 \varepsilon \right) . \end{aligned}$$
Using the asymptotic normality of \((\bar{B}_n^*(\bar{x}))\) (after rescaling) in (3), taking the limit superior as \(n \rightarrow \infty \) and then letting \(\varepsilon \) tend to zero, we obtain
$$\begin{aligned} \limsup _{n \rightarrow \infty } \mathbb {P} \left( \bar{B}_n \le r, \frac{{\mathscr {B}}_n}{\log n} \le y \right) \le \mathbb {P} \left( \mathscr {N}\le r \right) \mathbb {P} \left( \varXi \le y \right) . \end{aligned}$$
The proof of the converse direction establishing (19) is easier. It runs along the same lines upon using the trivial bounds \(|\varXi _k - \varXi _{k-1}| \le 1\) and \(N_k \ge 0\).
Proof of Proposition 1
We start with the characterization of the distribution of the process. For a deterministic sequence of pairwise different numbers \(u_1, u_2, \ldots \) on the unit interval, we define \(\xi _k(x)\) analogously to \(\varXi _k(x)\) in the infinite binary search tree constructed from this sequence. Here, we abbreviate \(\xi _k(x) = 0\) if the node \(x_1 \ldots x_k\) is not in the tree. Let \(n_m^-, m \ge 1,\) be the subsequence defined by the elements \(u_{n^-_m} < u_1\) and \(u_m^+, m \ge 1\), be the subsequence defined by the elements \(u_{n^+_m} > u_1\). At least one of these sequences is infinite. For \(m \ge 1\), let \(y_m^{-} = u_{n^{-}_m} / u_1\) and \(y_m^+ = (u_{n^+_m} - u_1) / (1-u_1)\). Next, define \(\xi ^{-}_k\) (\(\xi ^+_k\), respectively) analogously to \(\xi _k\) based on the sequence \((y^-_m)\) (\((y^+_m)\), respectively). By construction, for \(k \ge 1\),
$$\begin{aligned} \xi _k(x) = \mathbf {1}_{ [0,1/2) }(x) u_1 \xi _{k-1}^-(2x) + \mathbf {1}_{ [1/2,1] }(x) ((1-u_1) \xi _{k-1}^+(2x - 1) + u_1). \end{aligned}$$
Applying the construction to the sequence \(U_1, U_2, \ldots \) yields
$$\begin{aligned} \varXi _k(x) = \mathbf {1}_{ [0,1/2) }(x) U_1 \varXi _{k-1}^-(2x) + \mathbf {1}_{ [1/2,1] }(x) ((1-U_1) \varXi _{k-1}^+(2x - 1) + U_1). \end{aligned}$$
Almost surely, the random sequences \(y_m^{-}\) and \(y_m^+\) are both infinite and \((\varXi ^-_k), (\varXi ^+_k)\) are independent copies of \((\varXi _k)\). Further, both sequences are independent of \(U_1\). Hence, letting \(k \rightarrow \infty \) in the last display, we obtain (22) on an almost sure level. The characterization of \(\mathscr {L}(\varXi )\) by (22) follows from a standard contraction argument, and the argument on page 267 in [12] applies to our setting without any modifications.\(\square \)
We move on to the statements (i) – (vi) on the marginal distributions of the process. Here, we use notation that was introduced in the proof of Theorem 3. By continuity, it suffices to show (i) for \(x \in {\mathscr {D}}\). Let \(k \ge 1\). By symmetry, for \(1 \le i \le 2^k-1\), we have \(\mathbf {E} \left[ \varPhi (i 2^{-k}) \right] = i 2^{-k}\). Thus, the assertion follows for \(x \in {\mathscr {D}}\) since \(\varPhi (x) = \varXi (x)\). The symmetry statement (ii) is reminiscent of the fact that the uniform distribution on [0, 1] is symmetric around 1 / 2. More precisely, we apply the reflection argument from [1] which is at the core of the proof of the second assertion in (6). Let \(U_1^* = 1 - U_1, U_2^* = 1 - U_2, \ldots \) and define \(\varXi ^*\) analogously to \(\varXi \) in the binary search tree process relying on the sequence \(U_1^*, U_2^*, \ldots \) Then, \(\varXi ^*(t) +\varXi (1-t) = 1\) for all \(t \in [0,1]\) which proves (ii). With \(Y = \varXi (\xi )\), (22) yields
$$\begin{aligned} \mathscr {L}(Y) = \mathscr {L}(U Y + \mathbf {1}_{ A } (1-U)), \end{aligned}$$
where \(\mathbf {1}_{ A }, U, Y\) are independent and \(\mathbb {P} \left( A \right) = 1/2\). From [5], it follows that Y has the arcsine distribution, proving (iii). We move on to the statements about the distribution of \(\varXi (t)\). Let \(t \in (0,1/2)\). Since \(\varXi \) is strictly increasing, we have \(\varXi (2t) \in (0,1)\) almost surely. By (22), \(\mathscr {L}(\varXi (t)) = \mathscr {L}(U \varXi (2t))\) with conditions as in (22). Therefore, \(\mathscr {L}(\varXi (t))\) admits a density. By symmetry, the same is true for \(t \in (1/2,1)\). For \(t \in (0,1/2)\), by conditioning on the value of U, one finds the density
$$\begin{aligned} f_t(x) = \mathbf {E} \left[ \frac{\mathbf {1}_{ [x,1] }(\varXi (2t))}{\varXi (2t)} \right] , \quad x \in (0,1]. \end{aligned}$$
(27)
\(f_t(x)\) is monotonically decreasing and continuous on (0, 1] with \(f(1)=0\). For \(t \in (1/2,1)\), \(f_t(x) = f_{1-t}(1-x), x \in (0,1)\) is a density of \(\mathscr {L}(\varXi (t))\) by (ii). By (27), for \(t \in (0,1/2), x \in (0,1)\),
$$\begin{aligned} f_t(x) = \int _x^1 \frac{f_{2t}(y)}{y} dy, \quad \text {or} \quad x f_t'(x) = -f_{2t}(x). \end{aligned}$$
(28)
Upon setting \(f_0 = f_1 = 0\), the last identity also holds for \(t =0\) and \(t = 1/2\) since \(f_{1/2} = \mathbf {1}_{ [0,1] }\) is a density of \(\mathscr {L}(\varXi (1/2))\). Thus, for any \(t \in (0,1)\), \(f_t\) is smooth on (0, 1). Since the uniform distribution takes values arbitrarily close to one, it follows that, for all \(\delta > 0, t \in (0,1)\), we have \(\mathbb {P} \left( \varXi (t)> 1 - \delta \right) > 0\). Hence, for all \(t \in (0,1)\), the density \(f_t\) is strictly positive on (0, 1). Thus, for \(t \in (0, 1/2)\), \(f_t\) is strictly monotonically decreasing. Summarizing, we have shown (iv) and (v). For \(t \in (0,1/4]\), the assertion \(\alpha ^{(0)}_t = \infty \) in (vi) follows immediately from (28) since \(\alpha ^{(0)}_{2t} > 0\). Let \(1/4< t < 1/2\). Assume \(\alpha ^{(0)}_{1-2(1-2t)} < \infty \). Then, \(f_{2(1-2t)}(1) < \infty \). By (28), it follows that \(f'_{1-2t}(1)\) is finite and hence \(f'_{2t}(0)\) is finite. Thus, \(f_{2t}(y) / y\) is bounded in a neighbourhood of zero and \(\alpha ^{(0)}_t < \infty \). For \(t > 3/8\), we have \(1-2(1-2t) > 1/2\); thus, \(\alpha ^{(0)}_t < \infty \). Iterating this argument leads to \(\alpha _t^{(0)} < \infty \) for all \(1/3< t < 1/2\). In order to proceed further, note that, for \(t > 1/4\), there exists \(k \in \mathbb {N}\), such that, in probability, \(\varXi (t) \ge Z := U_1(U_2 + (1-U_2)\prod _{\ell = 1}^kU_{2+\ell }).\) Z admits a density \(f_Z\) given by
$$\begin{aligned} f_Z(x) = 1 + \int _x^1 r(y) dy - x r(x), \quad r(x) = \frac{1}{x^2} \int _0^x \mathbb {P} \left( \prod _{\ell = 1}^kU_{2+\ell } \le \frac{x-v}{1-v} \right) dv. \end{aligned}$$
Thus,
$$\begin{aligned} \lim _{x \downarrow 0} f_Z(x) = 1 + \int _0^1 r(y) dy < \infty . \end{aligned}$$
It follows that \(\alpha _t^{(0)} \le 1 + \int _0^1 r(x) dx < \infty \). Since \(\varXi \) is increasing, the function \(t \mapsto \alpha _t^{(0)}\) is decreasing. Thus, by monotonicity and continuity, it follows \(\alpha ^{(0)}_t \uparrow \infty \) as \(t \downarrow 1/4\). For \(t \le 1/4\), \(\alpha ^{(0)}_t = \infty \) follows immediately from (28) since \(\alpha ^{(0)}_{2t} < \infty \). For \(1/4< t < 1/2\), the remaining statements about \(\alpha _t^{(1)}\) are direct corollaries of the results for \(\alpha _t^{(0)}\) since \(\alpha ^{(1)}_t = \alpha ^{(0)}_{1 -2(1-2t)}\). This finishes the proof of (vi).
The curvature We make a concluding remark about the curvature of \(f_t, t \in (0,1/2)\). First, since \(x f^{''}_t(x) = - f_{2t}'(x) - f_t'(x)\), for \(0 < t \le 1/4\), the function \(f_t\) is convex. From (28) it is easy to deduce \(f_{1/3}(x) = 2(1-x)\). Since \(f_{1/3}'' = f_{1/2}'' = 0\), it is plausible to conjecture that \(f_t\) is convex for \(t \le 1/3\) and concave for \(1/3 \le t < 1/2\). Concavity at rational points with small denominator such as \(t = 3/8\) or \(t = 5/12\) can be verified by hand using (28).
Weighted Path Length and Wiener Index
In order to obtain mean and variance for the weighted path length and the weighted Wiener index, we use the reflection argument from the proof of Proposition 1 (ii). To this end, let \(\mathscr {P}_n^*\) and \(\mathscr {W}_n^*\) denote weighted path length and weighted Wiener index in the binary search tree built from the sequence \(U_1^* = 1-U_1, U_2^* = 1-U_2, \ldots \) Then, \(\mathscr {P}_n + \mathscr {P}_n^* = P_n + n\) and \(\mathscr {W}_n + \mathscr {W}_n^* = W_n + {n \atopwithdelims ()2}\) providing the claimed expansions for \(\mathbf {E} \left[ \mathscr {P}_n \right] \) and \(\mathbf {E} \left[ \mathscr {W}_n \right] \) upon recalling (7) and (8).
For a finite rooted labelled binary tree T, denote by p(T) its path length, by \(\mathbf {p}(T)\) its weighted path length, by w(T) its Wiener index and by \(\mathbf {w}(T)\) its weighted Wiener index. Let \(T_1, T_2\) be its left and right subtree and x the label of the root. Then, denoting by |T| the size of T, for \(|T| \ge 1\),
$$\begin{aligned} p(T)&= p(T_1) + p(T_2) + |T|-1, \end{aligned}$$
(29)
$$\begin{aligned} w(T)&= w(T_1) + w(T_2) + (|T_2| + 1) p(T_1) + (|T_1| + 1) p(T_2) + |T| + 2 |T_1| |T_2| -1. \end{aligned}$$
(30)
The first statement is obvious, the argument for the second can be found in [25]. For the weighted quantities, one obtains
$$\begin{aligned} \mathbf {p}(T)&= \mathbf {p}(T_1) + \mathbf {p}(T_2) + |T| x, \end{aligned}$$
(31)
$$\begin{aligned} \mathbf {w}(T)&= \mathbf {w}(T_1) + \mathbf {w}(T_2) + (|T_2| + 1) \mathbf {p}(T_1) + (|T_1| + 1) \mathbf {p}(T_2) + (|T| + |T_1| |T_2|)x. \end{aligned}$$
(32)
Again, the first assertion is easy to see and we only justify the second. The terms \(\mathbf {w}(T_1)\) and \(\mathbf {w}(T_2)\) account for weighted distances within the subtrees. The sum of all weighted distances between nodes in the left subtree and the root equals \(\mathbf {p}(T_1) + |T_1|x\). Replacing \(T_1\) by \(T_2\), we obtain the analogous sum in the right subtree. The sum of all distances between nodes in different subtrees equals \(|T_1| \mathbf {p}(T_2) + |T_2| \mathbf {p}(T_1) + |T_1| |T_2| x\). Finally, we need to add x for the weighted distance of the root to itself. Adding up the terms and simplifying leads to (32). For \(\alpha , \beta > 0\) let \(\alpha T + \beta \) be the tree obtained from T where each label y is replaced by \(\alpha y + \beta \). Obviously, \(p(T) = p(\alpha T +\beta )\) with the analogous identity for the Wiener index. For the weighted quantities, we have
$$\begin{aligned} \mathbf {p}(\alpha T + \beta )&= \alpha \mathbf {p}(T) + (p(T) + |T|) \beta , \end{aligned}$$
(33)
$$\begin{aligned} \mathbf {w}(\alpha T + \beta )&= \alpha \mathbf {w}(T) + (w(T) + |T|(|T| + 1)/2) \beta . \end{aligned}$$
(34)
Let T be the binary search tree of size n in the i.i.d. model. Then, given \(I_n := \text {rank}(U_1), U := U_1\), in distribution, the trees \(\frac{1}{U} T_1\) and \(\frac{1}{1-U} T_2 - \frac{U}{1-U}\) are independent binary search trees of size \(I_n - 1\) and \(n- I_n\), constructed from independent sequences of uniformly distributed random variables on [0, 1]. Thus, combining (29)–(34), for the vector \(Y_n = (\mathscr {W}_n, W_n,\mathscr {P}_n,P_n)^T\), we have
$$\begin{aligned} Y_n&\mathop {=}\limits ^{d} \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} U &{}\quad 0&{}\quad (n+1-I_n)U&{}\quad 0 \\ 0 &{} 1&{} 0&{} n+1-I_n\\ 0 &{} 0&{} U&{} 0\\ 0 &{} 0&{} 0&{} 1 \end{array}\right] Y_{I_n-1}\nonumber \\&\quad \, \,+ \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1-U &{}\quad U&{}\quad I_n(1-U)&{}\quad I_nU \\ 0 &{} 1&{} 0&{} I_n\\ 0 &{} 0&{} 1-U&{} U\\ 0 &{} 0&{} 0&{} 1 \end{array}\right] Y'_{n-I_n}\\&\qquad +\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} (2n + (n-I_n)(3 I_n + n - 2))U/2\\ n-1+2(I_n-1)(n-I_n)\\ (2n - I_n)U\\ n-1 \end{array}\right) , \end{aligned}$$
where \((Y_n'), (Y_n), (I_n, U)\) are independent and \((Y'_n)\) is distributed like \((Y_n)\). Here, \(\mathop {=}\limits ^{d}\) indicates that left- and right-hand side are identically distributed.
We consider the sequence \((Z_n)_{n\ge 0}\) defined by
$$\begin{aligned} Z_n:=\left( \frac{\mathscr {W}_n-{\mathbb {E}}[\mathscr {W}_n]}{n^2},\frac{W_n-{\mathbb {E}}[W_n]}{n^2},\frac{\mathscr {P}_n-{\mathbb {E}}[\mathscr {P}_n]}{n},\frac{P_n-{\mathbb {E}}[P_n]}{n}\right) ^{T}, \quad n \ge 1, \end{aligned}$$
and \(Z_0 = 0\). Let \(\alpha _n = \mathbf {E} \left[ \mathscr {W}_n \right] , \beta _n = \mathbf {E} \left[ W_n \right] , \gamma _n = \mathbf {E} \left[ \mathscr {P}_n \right] \) and \(\delta _n = \mathbf {E} \left[ P_n \right] \). Further, let
$$\begin{aligned} A_1^{(n)}&=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \left( \frac{I_n-1}{n}\right) ^2 U &{}\quad 0&{}\quad \left( 1-\frac{I_n-1}{n}\right) \frac{I_n-1}{n} U&{}\quad 0 \\ 0 &{} \left( \frac{I_n-1}{n}\right) ^2&{} 0&{} \left( 1-\frac{I_n-1}{n}\right) \frac{I_n-1}{n}\\ 0 &{} 0&{} \frac{I_n-1}{n} U &{} 0\\ 0 &{} 0&{} 0&{}\frac{I_n-1}{n} \end{array}\right] ,\\ A_2^{(n)}&=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \left( 1-\frac{I_n}{n}\right) ^2(1-U) &{}\quad \left( 1-\frac{I_n}{n}\right) ^2 U &{}\quad \frac{I_n}{n}\left( 1-\frac{I_n}{n}\right) (1-U)&{}\quad \frac{I_n}{n}\left( 1-\frac{I_n}{n}\right) U \\ 0 &{} \left( 1-\frac{I_n}{n}\right) ^2&{} 0&{} \frac{I_n}{n}\left( 1-\frac{I_n}{n}\right) \\ 0 &{} 0&{} \left( 1-\frac{I_n}{n}\right) (1-U)&{} \left( 1-\frac{I_n}{n}\right) U \\ 0 &{} 0&{} 0&{}1-\frac{I_n}{n} \end{array}\right] , \end{aligned}$$
and \(C^{(n)}=(C_1^{(n)}, C_2^{(n)}, C_3^{(n)}, C_4^{(n)})^T\) with
$$\begin{aligned} C_1^{(n)}&= \frac{U}{n^2}\alpha _{I_n-1}+\frac{1-U}{n^2}\alpha _{n-I_n}+\frac{U}{n^2}\beta _{n-I_n}+U\frac{(n+1-I_n)}{n^2}\gamma _{I_n-1}+(1-U)\frac{I_n}{n^2}\gamma _{n-I_n}\\&\quad \, \, +U\frac{I_n}{n^2}\delta _{n-I_n} + U \frac{2n + (n-I_n)(3 I_n + n - 2)}{2n^2}-\frac{1}{n^2}\alpha _n,\\ C_2^{(n)}&=\frac{1}{n^2}\beta _{I_n-1}+\frac{1}{n^2}\beta _{n-I_n}+\left( 1-\frac{I_n-1}{n}\right) \frac{1}{n}\delta _{I_n-1}+\frac{I_n}{n^2}\delta _{n-I_n}\\&\quad \, \, +\frac{n-1+2(n-1)(n-I_n)}{n^2}-\frac{1}{n^2}\beta _n,\\ C_3^{(n)}&=\frac{U}{n}\gamma _{I_n-1}+\frac{1-U}{n}\gamma _{n-I_n}+\frac{U}{n}\delta _{n-I_n}+\left( 2 - \frac{I_n}{n} \right) U-\frac{1}{n}\gamma _n,\\ C_4^{(n)}&=\frac{1}{n}\delta _{I_n-1}+\frac{1}{n}\delta _{n-I_n}+ 1 - \frac{1}{n}-\frac{1}{n}\delta _n. \end{aligned}$$
Then, from the recurrence for \((Y_n)\), it follows
$$\begin{aligned} Z_n\mathop {=}\limits ^{d}A_1^{(n)}Z_{I_n-1}+A_2^{(n)}Z'_{n-I_n}+C^{(n)}, \quad n \ge 1, \end{aligned}$$
where \((Z_n), (Z'_n), (I_n, U)\) are independent and \((Z'_n)\) is distributed like \((Z_n)\). We prove convergence of \(Z_n\) in distribution by an application of the contraction method. To this end, note that \(I_n/n \rightarrow U\) almost surely by the strong law of large numbers. Thus, with convergence in \(L_2\) and almost surely,
$$\begin{aligned} A_1^{(n)}&\rightarrow A_1:=\left[ \begin{array}{cccc} U^3 &{}\quad 0&{}\quad U^2(1-U)&{}\quad 0 \\ 0 &{} U^2&{} 0&{} U(1-U)\\ 0 &{} 0&{} U^2&{} 0\\ 0 &{} 0&{} 0&{}U \end{array}\right] ,\\ A_2^{(n)}&\rightarrow A_2:=\left[ \begin{array}{cccc} (1-U)^3 &{}\quad U(1-U)^2&{}\quad U(1-U)^2&{}\quad U^2(1-U)\\ 0 &{} (1-U)^2&{} 0&{} U(1-U)\\ 0 &{} 0&{} (1-U)^2&{} U(1-U)\\ 0 &{} 0&{} 0&{} 1-U \end{array}\right] ,\end{aligned}$$
and
$$\begin{aligned} C^{(n)} \rightarrow C :=\left( \begin{array}{cccc}U^2\log {U}+(1-U^2)\log {(1-U)}+U(-14U^2 + 9U + 5)/4 \\ 2U\log {U}+2(1-U)\log (1-U)+6U(1-U)\\ U^2\ln {U}+(1-U^2)\ln (1-U)+U \\ 2U\ln {U}+2(1-U)\ln {(1-U)} + 1\end{array}\right) . \end{aligned}$$
For a quadratic matrix A, denote by \(\Vert A\Vert _{\text {op}}\) its spectral radius. By calculating the eigenvalues of \(A_1 A_1^T\) and \(A_2 A_2^T\), one checks that \(\Vert A_1\Vert _{\text {op}} = U\) and \(\Vert A_2\Vert _{\text {op}} = 1-U\). Thus,
$$\begin{aligned} \mathbf {E} \left[ \Vert A_1 A_1^T\Vert _{\text {op}} \right] +\mathbf {E} \left[ \Vert A_2 A_2^T\Vert _{\text {op}} \right] \le \mathbf {E} \left[ \Vert A_1\Vert ^2_{\text {op}} \right] +\mathbf {E} \left[ \Vert A_2\Vert ^2_{\text {op}} \right] <1. \end{aligned}$$
Moreover, we have \(\mathbb {P} \left( I_n \in \{1, \ldots , \ell \} \cup \{n\} \right) \rightarrow 0\) for all fixed \(\ell \). Thus, by Theorem 4.1 in [24], in distribution and with convergence of the first two moments, we have \(Z_n \rightarrow (\mathscr {W},W,\mathscr {P},P)\) where \({\mathscr {L}}(\mathscr {W},W,\mathscr {P},P)\) is the unique fixed-point of the map:
$$\begin{aligned} T : {\mathscr {M}}_2^4(0) \longrightarrow {\mathscr {M}}_2^4(0), \quad T(\mu ) = {\mathscr {L}} \left( A_1 Z +A_2 Z' + C \right) , \end{aligned}$$
(35)
with \(A_1, A_2, C\) defined above, where \(Z, Z', U\) are independent and \({\mathscr {L}}(Z)={\mathscr {L}}(Z')=\mu \). Here, \({\mathscr {M}}_2^4(0)\) denotes the set of probability measures on \(\mathbb {R}^4\) with finite absolute second moment and zero mean. Variances and covariances can be computed successively using the fixed-point equation, e.g. in the following order: \(\mathbf {E} \left[ P^2 \right] , \mathbf {E} \left[ P W \right] \), \(\mathbf {E} \left[ W^2 \right] , \mathbf {E} \left[ P \mathscr {P} \right] ,\) \(\mathbf {E} \left[ \mathscr {P}^2 \right] , \mathbf {E} \left[ \mathscr {P}W \right] ,\) \(\mathbf {E} \left[ P \mathscr {W} \right] , \mathbf {E} \left[ W \mathscr {W} \right] \), \(\mathbf {E} \left[ \mathscr {P}\mathscr {W} \right] , \mathbf {E} \left[ \mathscr {W}^2 \right] \). Additionally to the variances given in the theorem, one obtains
$$\begin{aligned} \text {Cov}(P_n, \mathscr {P}_n)&\sim \frac{21 - 2 \pi ^2}{ 6} n^2, \quad \text {Cov}(P_n, W_n) \sim \frac{20 -2 \pi ^2}{3} n^3, \end{aligned}$$
(36)
$$\begin{aligned} \text {Cov}(\mathscr {P}_n, W_n)&\sim \frac{10 - \phantom {2} \pi ^2}{3} n^3, \quad \text {Cov}(P_n, \mathscr {W}_n) \sim \frac{10 - \phantom {2} \pi ^2}{3} n^3, \end{aligned}$$
(37)
$$\begin{aligned} \text {Cov}(W_n, \mathscr {W}_n)&\sim \frac{10 - \phantom {2} \pi ^2}{3} n^4, \quad \text {Cov}(\mathscr {P}_n, \mathscr {W}_n) \sim \frac{481 -48 \pi ^2}{288} n^3. \end{aligned}$$
(38)