Advertisement

Inference on a distribution function from ranked set samples

  • 96 Accesses

  • 2 Citations

Abstract

Consider independent observations \((X_i,R_i)\) with random or fixed ranks \(R_i\), while conditional on \(R_i\), the random variable \(X_i\) has the same distribution as the \(R_i\)-th order statistic within a random sample of size k from an unknown distribution function F. Such observation schemes are well known from ranked set sampling and judgment post-stratification. Within a general, not necessarily balanced setting we derive and compare the asymptotic distributions of three different estimators of the distribution function F: a stratified estimator, a nonparametric maximum-likelihood estimator and a moment-based estimator. Our functional central limit theorems generalize and refine previous asymptotic analyses. In addition, we discuss briefly pointwise and simultaneous confidence intervals for the distribution function with guaranteed coverage probability for finite sample sizes. The methods are illustrated with a real data example, and the potential impact of imperfect rankings is investigated in a small simulation experiment.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Balakrishnan, N., Li, T. (2006). Confidence intervals for quantiles and tolerance intervals based on ordered ranked set samples. Annals of the Institute of Statistical Mathematics, 58, 757–777.

  2. Bhoj, D. S. (2001). Ranked set sampling with unequal samples. Biometrics, 57(3), 957–962.

  3. Chen, Z. (2001). Non-parametric inferences based on general unbalanced ranked-set samples. Journal of Nonparametric Statistics, 13(2), 291–310.

  4. Chen, Z., Bai, Z., Sinha, B. K. (2004). Ranked set sampling. Theory and applications. New York: Springer.

  5. Clopper, C. J., Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404–413.

  6. Dastbaravarde, A., Arghami, N. R., Sarmad, M. (2016). Some theoretical results concerning non parametric estimation by using a judgment poststratification sample. Communications in Statistics, Theory and Methods, 45(8), 2181–2203.

  7. David, H. A., Nagaraja, H. N. (2003). Order statistics (3rd ed.). Hoboken, NJ: Wiley-Interscience.

  8. Dell, T. R., Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28(2), 545–555.

  9. Frey, J., Ozturk, O. (2011). Constrained estimation using judgement post-stratification. Annals of the Institute of Statistical Mathematics, 63, 769–789.

  10. Ghosh, K., Tiwari, R. C. (2008). Estimating the distribution function using \(k\)-tuple ranked set samples. Journal of Statistical Planning and Inference, 138(4), 929–949.

  11. Huang, J. (1997). Properties of the Npmle of a distribution function based on ranked set samples. Annals of Statistics, 25(3), 1036–1049.

  12. Kvam, P. H., Samaniego, F. J. (1994). Nonparametric maximum likelihood estimation based on ranked set samples. Journal of the American Statistical Association, 89(426), 526–537.

  13. MacEachern, S. N., Stasny, E. A., Wolfe, D. A. (2004). Judgement post-stratification with imprecise rankings. Biometrics, 60, 207–215.

  14. McIntyre, G. A. (1952). A method of unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.

  15. Presnell, B., Bohn, L. L. (1999). U-Statistics and imperfect ranking in ranked set sampling. Journal of Nonparamatric Statistics, 10(2), 111–126.

  16. R Core Team. (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed July 2018.

  17. Shorack, G. R., Wellner, J. A. (1986). Empirical processes with applications to statistics. New York: Wiley.

  18. Stokes, S. L., Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83(402), 374–381.

  19. Terpstra, J. T., Miller, Z. A. (2006). Exact inference for a population proportion based on a ranked set sample. Communications in Statistics, Simulation and Computation, 35(1), 19–26.

  20. Wang, X., Wang, K., Lim, J. (2012). Isotonized CDF estimation from judgement poststratification data with empty strata. Biometrics, 68(1), 194–202.

  21. Wolfe, D. A. (2004). Ranked set sampling: An approach to more efficient data collection. Statistical Science, 19(4), 636–643.

  22. Wolfe, D. A. (2012). Ranked set sampling: Its relevance and impact on statistical inference. ISRN Probability and Statistics, 2012, 568385. https://doi.org/10.5402/2012/568385.

Download references

Acknowledgements

Constructive comments by an associate editor and two referees are gratefully acknowledged.

Author information

Correspondence to Ehsan Zamanzade.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1016 KB)

Appendix

Appendix

We first recall two well-known facts about uniform empirical processes, see Shorack and Wellner (1986).

Proposition 6

Let \(U_1, U_2, U_3, \ldots \) be independent random variables with uniform distribution on [0, 1]. For \(N \in \mathbb {N}\) and \(u \in [0,1]\) define

$$\begin{aligned} \mathbb {V}^{(N)}(u) \ := \ N^{-1/2} \sum _{i=1}^N \bigl ( 1 \{U_i \le u\} - u) . \end{aligned}$$

Then, as \(N \rightarrow \infty \), \(\mathbb {V}^{(N)}\) converges in distribution in \(\ell _\infty ([0,1])\) to a standard Brownian bridge \(\mathbb {V}\) on [0, 1]. Moreover, for any fixed \(\delta \in [0,1/2)\) and \(\epsilon > 0\),

$$\begin{aligned} \sup _{N \ge 1} \mathop {\mathrm {I\!P}}\nolimits \left( \sup _{u \in (0,1)} \frac{|\mathbb {V}^{(N)}(u)|}{u^\delta (1 - u)^\delta } \ge C \right)&\rightarrow \ 0 \quad \text {as} \ C \uparrow \infty , \\ \sup _{N \ge 1} \mathop {\mathrm {I\!P}}\nolimits \left( \sup _{u \in (0,c] \cup [1-c,1)} \frac{|\mathbb {V}^{(N)}(u)|}{u^\delta (1 - u)^\delta } \ge \epsilon \right)&\rightarrow \ 0 \quad \text {as} \ c \downarrow 0. \end{aligned}$$

For the estimators \(\widehat{F}_n^\mathrm{M}\), \(\widehat{F}_n^\mathrm{L}\) we need some basic facts and inequalities for the auxiliary functions \(w_k\) and \(B_k\) which are proved in the supplement:

Lemma 7

(a):

For \(r = 1,2,\ldots ,k\), the function \(w_r\) on (0, 1) may be written as \(w_r(t) = \widetilde{w}_r(t) / (t(1-t))\) with \(\widetilde{w}_r : [0,1] \rightarrow (0,\infty )\) continuously differentiable. Moreover, for \(r = 1,2,\ldots ,k\) and \(t \in (0,1)\),

$$\begin{aligned} 1 \ \le \ \widetilde{w}_r(t) \ \le \ \max (r,k+1-r). \end{aligned}$$
(b):

For any constant \(c \in (0,1)\) there exists a number \(c' = c'(k,c) > 0\) with the following property: If \(t,p \in (0,1)\) such that

$$\begin{aligned} \frac{|p - t|}{t(1-t)} \ \le \ c , \end{aligned}$$

then for \(r = 1,2,\ldots ,k\),

$$\begin{aligned} \max \left\{ \left| \frac{w_r(p)}{w_r(t)} - 1 \right| , \left| \frac{B_r(p) - B_r(t)}{\beta _r(t) (p - t)} - 1 \right| \right\} \ \le \ c' \frac{|p - t|}{t(1-t)}. \end{aligned}$$

Proof of Theorem 2

We start with the weight functions \(\gamma _{nr}^\mathrm{Z}\): Note that by Lemma 7,

$$\begin{aligned} \gamma _{nr}^\mathrm{S}(t)&= \ \frac{1}{k \sqrt{\pi _{nr}}} , \\ \gamma _{nr}^\mathrm{M}(t)&= \ \sqrt{\pi _{nr}} \bigr / \sum _{s=1}^k \pi _{ns} \beta _s(t) , \\ \gamma _{nr}^\mathrm{L}(t)&= \sqrt{\pi _{nr}}\, \widetilde{w}_r(t) \Big / \sum _{s=1}^k \pi _{ns} \widetilde{w}_s(t) \beta _s(t) \end{aligned}$$

with the probability weights \(\pi _{nr} := N_{nr}/n\) and continuous functions \(\widetilde{w}_r : [0,1] \rightarrow [1,k]\). Since the beta densities \(\beta _r\) are also continuous with \(\beta _1(0) = \beta _k(1) = k\), this shows that \(\gamma _{nr}^\mathrm{Z}\) is well-defined and continuous, provided that its denominator is strictly positive, i.e.,

$$\begin{aligned} {\left\{ \begin{array}{ll} \pi _{n1}, \ldots , \pi _{nk}> 0 &{} \text {if} \ \mathrm{Z} = \mathrm{S} , \\ \pi _{n1}, \pi _{nk} > 0 &{} \text {if} \ \mathrm{Z} = \mathrm{M}, \mathrm{L}. \end{array}\right. } \end{aligned}$$

For sufficiently large n this is the case, because \(\lim _{n\rightarrow \infty } \pi _{nr} = \pi _r\) for all r. The functions \(\gamma _r^\mathrm{Z}\) in Corollary 4 are continuous, too, and elementary considerations reveal that

$$\begin{aligned} \max _{t \in [0,1], \, 1 \le r \le k} \bigl | \gamma _{nr}^\mathrm{Z}(t) - \gamma _r^\mathrm{Z}(t) \bigr | \ \rightarrow \ 0 \end{aligned}$$
(5)

as \(n \rightarrow \infty \). In particular, \(\max _{t \in [0,1], 1 \le r \le k} \gamma _{nr}^\mathrm{Z}(t) = O(1)\).

Note that for \(n \ge 1\) and \(1 \le r \le k\), the empirical process \(\mathbb {V}_{nr}\) is distributed as \(\mathbb {V}^{(N_{nr})}\) in Proposition 6. Note also that the distribution functions \(B_r\) satisfy \(B_1 \ge B_2 \ge \cdots \ge B_k\), because for \(1 \le r < k\) the density ratio \(\beta _{r+1}/\beta _r\) is a positive multiple of \(t/(1 - t)\) and thus strictly increasing. Consequently, for \(1 \le r \le k\),

$$\begin{aligned} B_r(t) \ \le \ B_1(t) \ \le \ kt \quad \text {and}\quad 1 - B_r(t) \ \le \ 1 - B_k(t) \ \le \ k(1-t) , \end{aligned}$$

so

$$\begin{aligned} \frac{B_r(t)(1 - B_r(t))}{t(1-t)} \ \le \ k. \end{aligned}$$

Consequently,

$$\begin{aligned} \sup _{t \in (0,1)} \frac{|\mathbb {V}_{nr}(B_r(t))|}{t^\delta (1 - t)^\delta }&\le \ k^\delta \sup _{u \in (0,1)} \frac{|\mathbb {V}_{nr}(u)|}{u^\delta (1 - u)^\delta } \ = \ O_p(1) \quad \text {and} \\ \sup _{u \in (0,c] \cup [1-c,1)} \frac{|\mathbb {V}_{nr}(B_r(t))|}{t^\delta (1 - t)^\delta }&\le \ k^\delta \sup _{u \in (0,kc] \cup [1 - kc,1)} \frac{|\mathbb {V}_{nr}(u)|}{u^\delta (1 - u)^\delta } \ \rightarrow _p \ 0 \end{aligned}$$

as \(n \rightarrow \infty \) and \(c \downarrow 0\). All in all, we may conclude that

$$\begin{aligned} \sup _{t \in (0,1)} \, \frac{|\mathbb {V}_n^\mathrm{Z}(t)|}{t^\delta (1 - t)^\delta }&= \ O_p(1) , \end{aligned}$$
(6)
$$\begin{aligned} \sup _{t \in (0,c] \cup [1-c,1)} \, \frac{|\mathbb {V}_n^\mathrm{Z}(t)|}{t^\delta (1 - t)^\delta }&\rightarrow _p \ 0 \quad \text {as} \ n \rightarrow \infty \ \text {and}\ c \downarrow 0. \end{aligned}$$
(7)

It remains to be shown that the process \(\sqrt{n} (\widehat{B}_n^\mathrm{Z} - B)\) may be approximated by \(\mathbb {V}_n^\mathrm{Z}\). In case of \(\mathrm{Z} = \mathrm{S}\) it follows from \(\sum _{r=1}^k \beta _r \equiv k\) that \(\sum _{r=1}^k B_r = k B\), and this implies that

$$\begin{aligned} \sqrt{n} (\widehat{B}_n^\mathrm{S}- B) \ = \ \sum _{r=1}^k \frac{\sqrt{n} (\widehat{B}_{nr} - B_r)}{k} \ = \ \sum _{r=1}^k \gamma _{nr}^\mathrm{S} \, \mathbb {V}_{nr} \circ B_r \ = \ \mathbb {V}_n^\mathrm{S}. \end{aligned}$$

For \(\mathrm{Z} = \mathrm{M}, \mathrm{L}\) it suffices to show that for any fixed number \(b \ne 0\) and

$$\begin{aligned} p_n^\mathrm{Z}(t) \ := \ t + \frac{\mathbb {V}_n^\mathrm{Z}(t) + b t^\delta (1-t)^\delta }{\sqrt{n}} \end{aligned}$$

the following statements are true: If \(b < 0\), then with asymptotic probability one,

$$\begin{aligned} \left. \begin{array}{c} \displaystyle \inf _{t \in (0,1)} \left( n \widehat{B}_n(t) - \sum _{r=1}^k N_{nr} B_r(p_n^\mathrm{M}(t)) \right) \\ \displaystyle \inf _{t \in (0,1)} \, L_n'(t,p_n^\mathrm{L}(t)) \end{array}\right\} \ \ge \ 0. \end{aligned}$$
(8)

If \(b > 0\), then with asymptotic probability one,

$$\begin{aligned} \left. \begin{array}{c} \displaystyle \sup _{t \in (0,1)} \left( n \widehat{B}_n(t) - \sum _{r=1}^k N_{nr} B_r(p_n^\mathrm{M}(t)) \right) \\ \displaystyle \sup _{t \in (0,1)} \, L_n'(t,p_n^\mathrm{L}(t)) \end{array}\right\} \ \le \ 0. \end{aligned}$$
(9)

Here we use the conventions that \(L_n'(t,\cdot ) := \infty \) and \(B_r := 0\) on \((-\infty ,0]\) while \(L_n'(t,\cdot ) := -\infty \) and \(B_r := 1\) on \([1,\infty )\).

To verify these claims, we split the interval (0, 1) into \((0,c_n]\), \([c_n,1-c_n]\) and \([1-c_n,1)\) with numbers \(c_n \in (0,1/2)\) to be specified later, where \(c_n \downarrow 0\).

On \([c_n,1-c_n]\) we utilize Lemma 7: For \(t \in [c_n, 1 - t_n]\) and \(p \in (0,1)\) such that \(|p - t| \le t(1-t)/2\) we may write

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p) \\&\quad = \ \sum _{r=1}^k \sqrt{N_{nr}} \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr}(B_r(p) - B_r(t)) \\&\quad = \ \sum _{r=1}^k \sqrt{N_{nr}} \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^m N_{nr} \beta _r(t) (p - t) + \rho _n^\mathrm{M}(t,p) \\&\quad = \ \sum _{r=1}^k N_{nr} \beta _r(t) \left( \frac{\mathbb {V}_n^\mathrm{M}(t)}{\sqrt{n}} - (p - t) \right) + \rho _n^\mathrm{M}(t,p) \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p)&= \ \sum _{r=1}^k \sqrt{N_{nr}} w_r(p) \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr} w_r(p) (B_r(p) - B_r(t)) \\&= \ \sum _{r=1}^k \sqrt{N_{nr}} w_r(t) \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr} w_r(t) \beta _r(t) (p - t) + \rho _n^\mathrm{L}(t,p) \\&= \ \sum _{r=1}^k N_{nr} w_r(t) \beta _r(t) \left( \frac{\mathbb {V}_n^\mathrm{L}(t)}{\sqrt{n}} - (p - t) \right) + \rho _n^\mathrm{L}(t,p) , \end{aligned}$$

where

$$\begin{aligned} |\rho _n^\mathrm{M}(t,p)|&\le \ \frac{O(n) |p - t|^2}{t(1-t)} , \\ |\rho _n^\mathrm{L}(t,p)|&\le \ \frac{O_p(\sqrt{n}) t^{\delta } (1 - t)^{\delta } |p - t|}{t(1-t)} + \frac{O(n) |p - t|^2}{t^2(1-t)^2}. \end{aligned}$$

Note that for \(t \in [c_n,1-c_n]\),

$$\begin{aligned} \frac{\bigl | p_n^\mathrm{Z}(t) - t \bigr |}{t(1-t)} \ \le \ \frac{O_p(1) t^\delta (1 - t)^{\delta }}{\sqrt{n} \, t(1-t)} \ \le \ \frac{O_p(1)}{\sqrt{n} \, c_n^{1-\delta }}. \end{aligned}$$

Hence we choose \(c_n\) such that \(c_n \downarrow 0\) but \(n c_n^{2(1-\delta )} \rightarrow \infty \). With this choice, we may conclude that uniformly in \(t \in [c_n,1-c_n]\),

$$\begin{aligned} \bigl | \rho _n^\mathrm{M}(t,p_n^\mathrm{M}(t)) \bigr |&\le \ O_p(c_n^{\delta -1}) t^\delta (1-t)^\delta , \\ \bigl | \rho _n^\mathrm{L}(t,p_n^\mathrm{L}(t)) \bigr |&\le \ O_p(c_n^{\delta -1}) t^{\delta -1}(1-t)^{\delta -1}. \end{aligned}$$

On the other hand, since \(\beta _1(t) + \beta _k(t) \ge \beta _1(1/2) + \beta _k(1/2) = k 2^{2-k}\),

$$\begin{aligned} \sum _{r=1}^k N_{nr} \beta _r(t)&\ge \ k 2^{2 - k} \min \{N_{n1},N_{nk}\} , \\ \sum _{r=1}^k N_{nr} w_r(t) \beta _r(t)&\ge \ \frac{k 2^{2-k} c_w}{t(1-t)} \, \min \{N_{n1},N_{nk}\}. \end{aligned}$$

Consequently,

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p_n^\mathrm{M}(t)) \\&\quad = \ \sum _{r=1}^k N_{nr} \beta _r(t) \frac{- b t^\delta (1-t)^\delta }{\sqrt{n}} + \rho _n^\mathrm{M}(t,p_n^\mathrm{M}(t)) \\&\quad = \ \sum _{r=1}^m N_{nr} \beta _r(t) \frac{t^\delta (1-t)^\delta }{\sqrt{n}} \left( - b + O_p(c_n^{\delta -1} n^{-1/2}) \kappa _n^\mathrm{M}(t) \right) \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p_n^\mathrm{L}(t))&= \ \sum _{r=1}^k N_{nr} w_r(t) \beta _r(t) \frac{-b t^\delta (1-t)^\delta }{\sqrt{n}} + \rho _n^\mathrm{L}(t,p_n^\mathrm{L}(t)) \\&= \ \sum _{r=1}^k N_{nr} w_r(t) \beta _r(t) \frac{t^\delta (1-t)^\delta }{\sqrt{n}} \Bigg ( - b + O_p\left( c_n^{\delta -1} n^{-1/2}\right) \kappa _n^\mathrm{L}(t) \Bigg ) \end{aligned}$$

for some random functions \(\kappa _n^\mathrm{M}, \kappa _n^\mathrm{L} : [c_n,1-c_n] \rightarrow [-1,1]\). These considerations show that (8) and (9) are satisfied with \([c_n,1-c_n]\) in place of (0, 1).

It remains to verify (8) and (9) with \((0,c_n]\) in place of (0, 1); the interval \([1-c_n,1)\) may be treated analogously. Note first that for \(2 \le r \le k\),

$$\begin{aligned} B_r(t) \ \le \ B_2(t) \ \le \ k(k-1) t^2/2 \quad \text {and}\quad \beta _r(t) \ \le \ k 2^{k-1} t , \end{aligned}$$

so

$$\begin{aligned} \bigl | B_r(p) - B_r(t) \bigr | \ = \ \Bigl | \int _t^p \beta _r(u) \, \mathrm{d}u \Bigr | \ \le \ O(\max (p,t)) (p - t). \end{aligned}$$

Furthermore, since \(B_1(t) = 1 - (1 - t)^k\),

$$\begin{aligned} B_1(p) - B_1(t) \ = \ k (p - t) + O(\max (t,p)) (p - t). \end{aligned}$$

Hence for \(t \in (0,c_n]\) and \(p \in (0,2c_n]\),

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p) \\&\quad = \ \sum _{r=1}^k \sqrt{N_{nr}} \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr}(B_r(p) - B_r(t)) \\&\quad = \ - N_{n1} k (p - t) + \rho _n^\mathrm{M}(t,p) \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p)&= \ \sum _{r=1}^k \sqrt{N_{nr}} w_r(p) \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr} w_r(p) (B_r(p) - B_r(t)) \\&= \ - N_{n1} w_1(p) k (p - t) + \rho _n^\mathrm{L}(t,p) , \end{aligned}$$

where

$$\begin{aligned} |\rho _n^\mathrm{M}(t,p)|&\le \ o_p(\sqrt{n}) t^\delta + O(n c_n) (p - t) , \\ |\rho _n^\mathrm{L}(t,p)|&\le \ o_p(\sqrt{n}) p^{-1} t^\delta + O(n c_n) p^{-1} (p - t). \end{aligned}$$

Note also that

$$\begin{aligned} \sup _{t \in (0,c_n]} \Bigl | \frac{\sqrt{n}(p_n^\mathrm{Z}(t) - t)}{t^\delta (1-t)^\delta } - b \Bigr | \ \rightarrow _p \ 0. \end{aligned}$$

In particular, \(\sup _{t \in (0,c_n]} p_n^\mathrm{Z}(t) = c_n + o_p(n^{-1/2} c_n^\delta ) = c_n (1 + o_p(1))\), and in case of \(b > 0\), \(\mathop {\mathrm {I\!P}}\nolimits \bigl ( p_n^\mathrm{Z}(t) > 0 \ \text {for} \ 0 < t \le c_n \bigr ) \rightarrow 1\).

In case of \(b > 0\), these considerations show that for \(0 < t \le c_n\),

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p_n^\mathrm{M}(t)) \\&\quad = \ - N_{n1} k (p_n^\mathrm{M}(t) - t) + \rho _n^\mathrm{M}(t,p_n^\mathrm{M}(t)) \\&\quad \le \ \frac{N_{n1} k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( -b + o_p(1) \bigr ) + o_p(\sqrt{n}) t^\delta + O(\sqrt{n} c_n) t^\delta \\&\quad \le \ \frac{N_{n1} k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( - b + o_p(1) \bigr ) \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p_n^\mathrm{L}(t))&= \ - N_{n1} w_1(p) k (p_n^\mathrm{L}(t) - t) + \rho _n^\mathrm{L}(t,p_n^\mathrm{Z}(t)) \\&\le \ \frac{N_{n1} w_1(p) k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( - b + o_p(1) \bigr ) + o_p(\sqrt{n}) p^{-1} t^\delta \\&\quad + O(\sqrt{n} c_n) p^{-1} t^\delta \\&\le \ \frac{N_{n1} w_1(p) k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( - b + o_p(1) \bigr ). \end{aligned}$$

Analogously, in case of \(b < 0\), for any \(t \in (0,c_n]\) we obtain the inequalities

$$\begin{aligned} n \widehat{B}_n(t) \!-\! \sum _{r=1}^k N_{nr} B_r(p_n^\mathrm{M}(t))&\ge {\left\{ \begin{array}{ll} \displaystyle \frac{N_{n1} k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( - b + o_p(1) \bigr ) &{} \text {if} \ p_n^\mathrm{M}(t)> 0 , \\ 0 &{} \text {if} \ p_n^\mathrm{M}(t) \le 0 , \end{array}\right. } \\ L_n'(t,p_n^\mathrm{L}(t))&\ge {\left\{ \begin{array}{ll} \displaystyle \frac{N_{n1} w_1(p) k t^\delta (1-t)^\delta }{\sqrt{n}} \bigl ( - b + o_p(1) \bigr ) &{} \text {if} \ p_n^\mathrm{L}(t) > 0 , \\ \infty &{} \text {if} \ p_n^\mathrm{L}(t) \le 0. \end{array}\right. } \end{aligned}$$

Hence, (8) and (9) are satisfied with \((0,c_n]\) in place of (0, 1). \(\square \)

Proof of Theorem 3

For symmetry reasons it suffices to prove the first part about the left tails. Let \((c_n)_n\) be a sequence of numbers in (0, 1 / 2] converging to zero. Then for \(t \in (0,c_n]\) and \(\delta := \kappa /2 \in (0,1/2)\),

$$\begin{aligned} \bigl | \sqrt{n} \bigl ( \widehat{B}_n^\mathrm{S}(t) - t \bigr ) - \mathbb {V}_n^{(\ell )}(t) \bigr | \ = \ \left| \sum _{r=2}^k \frac{\mathbb {V}_{nr}(B_r(t))}{k \sqrt{N_{nr}/n}} \right| \ \le \ t^{2 \delta } o_p(1) \ = \ t^\kappa o_p(1). \end{aligned}$$

Concerning \(\widehat{B}_n^\mathrm{M}\) and \(\widehat{B}_n^\mathrm{L}\), for any \(t \in (0,c_n]\) and \(p \in (0,1)\),

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p) \\&\quad = \ \sum _{r=1}^k \sqrt{N_{nr}} \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr}(B_r(p) - B_r(t)) \\&\quad = \ \sqrt{N_{n1}} \mathbb {V}_{n1}(B_1(t)) - N_{n1} k(p - t) + \rho _n^\mathrm{M}(t,p) \\&\quad = \ N_{n1} k \Bigl ( \frac{\mathbb {V}_{n1}(B_1(t))}{k \sqrt{N_{n1}}} - (p - t) \Bigr ) + \rho _n^\mathrm{M}(t,p) \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p)&= \ \sum _{r=1}^k \sqrt{N_{nr}} w_r(p) \mathbb {V}_{nr}(B_r(t)) - \sum _{r=1}^k N_{nr} w_r(p) (B_r(p) - B_r(t)) \\&= \ \sqrt{N_{n1}} w_1(p) \mathbb {V}_{n1}(B_1(t)) - N_{n1} w_1(p) k (p - t) + \rho _n^\mathrm{L}(t,p) \\&= \ N_{n1} k w_1(p) \left( \frac{\mathbb {V}_{n1}(B_1(t))}{k \sqrt{N_{n1}}} - (p - t) \right) + \rho _n^\mathrm{L}(t,p) , \end{aligned}$$

where

$$\begin{aligned} |\rho _n^\mathrm{M}(t,p)|&\le \ o_p(\sqrt{n}) t^{2\delta } + O(n) \max (t,p) (p - t) , \\ |\rho _n^\mathrm{L}(t,p)|&\le \ o_p(\sqrt{n}) p^{-1} t^{2\delta } + O(n) p^{-1} \max (t,p) (p - t). \end{aligned}$$

Now we proceed similarly as in the proof of Theorem 2, defining

$$\begin{aligned} p_n(t) \ := \ t + \frac{\mathbb {V}_n^{(\ell )}(t) + b t^\kappa }{\sqrt{n}} \end{aligned}$$

for some fixed \(b \ne 0\). Note that for \(t \in (0,c_n]\),

$$\begin{aligned} |p_n(t) - t| \ \le \ o_p(n^{-1/2}) t^\delta + O(n^{-1/2}) t^\kappa \ = \ o_p(n^{-1/2}) t^\delta , \end{aligned}$$

because \(\kappa > \delta \). Note also that

$$\begin{aligned} t + \frac{\mathbb {V}_n^{(\ell )}(t)}{\sqrt{n}} \ = \ t + \frac{\mathbb {V}_{n1}(B_1(t))}{k \sqrt{N_{n1}}} \ = \ t - \frac{1 - (1-t)^k}{k} + \frac{\widehat{B}_{n1}(t)}{k} \ > \ 0 \quad \text {on} \ (0,1) , \end{aligned}$$

because \(\widehat{B}_{n1} \ge 0\) and \(t \mapsto t - (1 - (1-t)^k)/k\) is strictly convex on [0, 1] with derivative 0 at 0. Thus \(p_n(t) > 0\) for all \(t \in (0,c_n]\) in case of \(b > 0\).

In case of \(b > 0\), we may conclude that

$$\begin{aligned}&n \widehat{B}_n(t) \ - \sum _{r=1}^k N_{nr} B_r(p_n(t)) \\&\quad = \ N_{n1} k \frac{-b t^\kappa }{\sqrt{n}} + \rho _n^\mathrm{M}(t,p_n(t)) \\&\quad \le \ \frac{N_{n1} k}{\sqrt{n}} \bigl ( -b t^\kappa + o_p(1) t^{2\delta } + O(1) (t + o_p(n^{-1/2}) t^\delta ) t^\delta \bigr ) \\&\quad \le \ \frac{N_{n1} k t^\kappa }{\sqrt{n}} \bigl ( -b + o_p(1) \bigr ) , \end{aligned}$$

and

$$\begin{aligned} L_n'(t,p_n(t))&\le \ \frac{N_{n1} k w_1(p) t^\kappa }{\sqrt{n}} \bigl ( -b + o_p(1) \bigr ). \end{aligned}$$

Hence for any fixed \(b > 0\),

$$\begin{aligned} \mathop {\mathrm {I\!P}}\nolimits \left( \sqrt{n} (\widehat{B}_n^\mathrm{Z}(t) - t) \le \mathbb {V}_n^{(\ell )}(t) + b t^\kappa \ \text {for} \ t \in (0,c_n] \right) \ \rightarrow \ 0. \end{aligned}$$

Similarly we can show that for any fixed \(b < 0\), with asymptotic probability one, \(\sqrt{n} (\widehat{B}_n^\mathrm{Z}(t) - t) \le \mathbb {V}_n^{(\ell )}(t) + b t^\kappa \) for all \(t \in (0,c_n]\). \(\square \)

Proof of Corollary 4

It follows from Proposition 6 that

$$\begin{aligned} \sup _{1 \le r \le k, \, u \in [0,1]} |\mathbb {V}_{nr}(u)| \ = \ O_p(1). \end{aligned}$$

Together with (5) this entails that \(\sup _{t \in [0,1]} \bigl | \mathbb {V}_n^\mathrm{Z}(t) - \widetilde{\mathbb {V}}_n^\mathrm{Z}(t) \bigr | \rightarrow _p 0\), where \(\widetilde{\mathbb {V}}_n^\mathrm{Z} := \sum _{r=1}^k \gamma _r^\mathrm{Z} \, \mathbb {V}_{nr} \circ B_r\). But \(\gamma _r^\mathrm{Z} \equiv 0\) whenever \(\pi _r = 0\). In case of \(\pi _r > 0\) it follows from Proposition 6 that \(\mathbb {V}_{nr}\) converges in distribution to \(\mathbb {V}_r\). Consequently, \(\widetilde{\mathbb {V}}_n^\mathrm{Z}\) converges in distribution to the Gaussian process \(\mathbb {V}^\mathrm{Z} = \sum _{r=1}^k \gamma _r^\mathrm{Z} \, \mathbb {V}_r \circ B_r\). \(\square \)

Proof of Theorem 5

The asserted inequalities follow from Jensen’s inequality. On the one hand, it follows from \(w_r = \beta _r / (B_r(1 - B_r))\) and \(\sum _{r=1}^k \beta _r \equiv k\) that

$$\begin{aligned} K^\mathrm{S}(t)&= \ \frac{1}{k} \sum _{r=1}^k \frac{\beta _r(t)}{k} \cdot (\pi _r w_r(t))^{-1} \\&\ge \ \frac{1}{k} \left( \sum _{r=1}^k \frac{\beta _r(t)}{k} \cdot \pi _r w_r(t) \right) ^{-1} \\&= \ \left( \sum _{r=1}^k \pi _r \beta _r(t) w_r(t) \right) ^{-1} \ = \ K^\mathrm{L}(t). \end{aligned}$$

Equality holds if, and only if,

$$\begin{aligned} \pi _1 w_1(t) = \pi _2 w_2(t) = \cdots = \pi _k w_k(t). \end{aligned}$$

But

$$\begin{aligned} w_1(t) \ = \ \frac{k}{(1-t)( 1 - (1 - t)^k)} \quad \text {and}\quad w_k(t) \ = \ \frac{k}{t(1 - t^k)} , \end{aligned}$$

so

$$\begin{aligned} \frac{w_k(t)}{w_1(t)} \ = \ \frac{(1-t)(1 - (1-t)^k)}{t(1-t^k)} \ = \ \frac{\sum _{j=0}^{k-1} (1-t)^j}{\sum _{j=0}^{k-1} t^j} \end{aligned}$$

is strictly decreasing in t. Hence there is at most one solution of the equation \(\pi _1 w_1(t) = \pi _k w_k(t)\).

Similarly, with \(a_r(t) := \pi _r \beta _r(t) \big / \sum _{s=1}^k \pi _s \beta _s(t)\),

$$\begin{aligned} K^\mathrm{M}(t)&= \ \sum _{r=1}^k \pi _r \beta _r(t) \cdot w_r(t)^{-1} \Big / \left( \sum _{s=1}^k \pi _s \beta _s(t) \right) ^2 \\&= \ \sum _{r=1}^k a_r(t) \cdot w_r(t)^{-1} \Big / \sum _{s=1}^k \pi _s \beta _s(t) \\&\ge \ \left( \sum _{r=1}^k a_r(t) w_r(t) \right) ^{-1} \Big / \sum _{s=1}^k \pi _s \beta _s(t) \\&= \ \left( \sum _{r=1}^k \pi _r \beta _r(t) w_r(t) \right) ^{-1} \ = \ K^\mathrm{L}(t). \end{aligned}$$

Here the inequality is strict unless

$$\begin{aligned} w_1(t) = w_2(t) = \cdots = w_k(t). \end{aligned}$$

But \(w_1(t) = w_k(t)\) implies that \(t = 1/2\). Moreover, \(w_1(1/2) = 2k/(1 - 2^{-k})\) and

$$\begin{aligned} w_{k-1}(1/2) \ = \ \frac{2k(k-1)}{(k+1)(1 - (k+1)2^{-k})} \end{aligned}$$

are identical if, and only if, \(k^2 + k + 2 = 2^{k+1}\). But \(2^{k+1} = 2 \sum _{j=0}^k \left( {\begin{array}{c}k\\ j\end{array}}\right) \) is strictly larger than \(2(1 + k + k(k-1)/2) = k^2 + k + 2\) if \(k \ge 3\).

As to the ratios \(E^\mathrm{Z}(t) := K^\mathrm{Z}(t)/K^\mathrm{L}(t)\), note first that

$$\begin{aligned} E^\mathrm{S}(t)&= \ \sum _{r=1}^k \frac{B_r(t)(1 - B_r(t))}{k^2 \pi _r} \sum _{s=1}^k \pi _s \beta _s(t) w_s(t) \\&\ge \ \min _{r,s=1,\ldots ,k} \frac{B_r(t)(1 - B_r(t)) \beta _s(t) w_s(t)}{k^2} \Big / \min _{r=1,\ldots ,k} \pi _r \\&\rightarrow \ \infty \quad \text {as} \ \min _{r=1,\ldots ,k} \pi _r \downarrow 0. \end{aligned}$$

On the other hand, with \(a_r(t)\) as above,

$$\begin{aligned} E^\mathrm{M}(t) \ = \ \sum _{r=1}^k a_r(t) w_r(t)^{-1} \sum _{s=1}^k a_s(t) w_s(t) \ = \ \mathop {\mathrm {I\!E}}\nolimits (W) \mathop {\mathrm {I\!E}}\nolimits (W^{-1}) \end{aligned}$$

with a random variable W with distribution \(\sum _{r=1}^k a_r(t) \delta _{w_r(t)}\). But with \(\ell (t) := \min _r w_r(t)\) and \(u(t) := \max _r w_r(t)\), convexity of \(w \mapsto w^{-1}\) on \([\ell (t),u(t)]\) implies that

$$\begin{aligned} W^{-1} \ \le \ \frac{W - \ell (t)}{u(t) - \ell (t)} u(t)^{-1} + \frac{u(t) - W}{u(t) - \ell (t)} \ell (t)^{-1} , \end{aligned}$$

so

$$\begin{aligned} \mathop {\mathrm {I\!E}}\nolimits (W) \mathop {\mathrm {I\!E}}\nolimits (W^{-1})&\le \ \mathop {\mathrm {I\!E}}\nolimits (W) \left( \frac{\mathop {\mathrm {I\!E}}\nolimits (W) - \ell (t)}{u(t) - \ell (t)} u(t)^{-1} + \frac{u(t) - \mathop {\mathrm {I\!E}}\nolimits (W)}{u(t) - \ell (t)} \ell (t)^{-1} \right) \\&= \ \frac{\mathop {\mathrm {I\!E}}\nolimits (W) (\ell (t) + u(t) - \mathop {\mathrm {I\!E}}\nolimits (W))}{\ell (t) u(t)} \\&\le \ \frac{(\ell (t) + u(t))^2}{4 \ell (t)u(t)} \ = \ \frac{\rho (t) + \rho (t)^{-1} + 2}{4}. \end{aligned}$$

This upper bound for \(E^\mathrm{M}(t)\) is attained approximately, if the distribution of W approaches the uniform distribution on \(\{\ell (t),u(t)\}\). Hence we should choose \((\pi _r)_{r=1}^k\) as follows: Let r(1), r(2) be two different numbers in \(\{1,\ldots ,k\}\) such that \(w_{r(1)}(t) = \ell (t)\) and \(w_{r(2)}(t) = u(t)\). Then let

$$\begin{aligned} \pi _r \ \approx \ {\left\{ \begin{array}{ll} \beta _{r}(t)^{-1}/(\beta _{r(1)}^{-1} + \beta _{r(2)}^{-1}) &{}\text {for} \ r \in \{r(1),r(2)\} , \\ 0 &{}\text {for} \ r \not \in \{r(1),r(2)\}. \end{array}\right. } \end{aligned}$$

The inequality \(\rho (t) \le k\) follows from Lemma 7 and the fact that \(\rho (t)\) remains unchanged if we replace \(w_r(t)\) with \(\widetilde{w}_r(t) = t(1-t) w_t(t) \in [1,k]\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dümbgen, L., Zamanzade, E. Inference on a distribution function from ranked set samples. Ann Inst Stat Math 72, 157–185 (2020). https://doi.org/10.1007/s10463-018-0680-y

Download citation

Keywords

  • Conditional inference
  • Confidence band
  • Empirical process
  • Functional limit theorem
  • Moment equations
  • Imperfect ranking
  • Relative asymptotic efficiency
  • Unbalanced samples