Skip to main content

Advertisement

Log in

Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Austin D, Simon DK, Betensky RA (2013) Computationally simple estimation and improved efficiency for special cases of double truncation. Lifetime Data Anal. doi:10.1007/s10985-013-9287-z

  • Chen YH (2010) Semiparametric marginal regression analysis for dependent competing risks under an assumed copula. J R Stat Soc B 72:235–251

    Article  Google Scholar 

  • Commenges D (2002) Inference for multi-state models from interval-censored data. Stat Methods Med Res 11:167–182

    Article  MATH  Google Scholar 

  • Efron B, Petrosian V (1999) Nonparametric method for doubly truncated data. J Am Stat Assoc 94:824–834

    Article  MATH  MathSciNet  Google Scholar 

  • Emura T, Wang W (2012) Nonparametric maximum likelihood estimation for dependent truncation data based on copulas. J Multivar Anal 110:171–188

    Article  MATH  MathSciNet  Google Scholar 

  • Emura T, Konno Y (2012) Multivariate normal distribution approaches for dependently truncated data. Stat Papers 53:133–149

    Article  MATH  MathSciNet  Google Scholar 

  • Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York

    Google Scholar 

  • Moreira C, Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametr Stat 22:567–583

    Article  MATH  MathSciNet  Google Scholar 

  • Moreira C, Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electron J Stat 6:501–521

    Article  MATH  MathSciNet  Google Scholar 

  • Moreira C, Keilegom IV (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123

    Article  Google Scholar 

  • Moreira C, Uña-Álvarez J, Meira-Machado L (2014) Nonparametric regression with doubly truncated data. Comput Stat Data Anal. doi:10.1016/j.csda.2014.03.017

  • Murphy SA (1995) Asymptotic theory for the frailty model. Ann Stat 23:182–198

    Article  MATH  Google Scholar 

  • Nair VN (1984) Confidence bands for survival functions with censored data: a comparative study. Technometrics 26:265–275

    Article  Google Scholar 

  • Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62:835–853

    Article  Google Scholar 

  • Shen PS (2011) Testing quasi-independence for doubly truncated data. J Nonparametr Stat 23:1–9

    Article  MATH  MathSciNet  Google Scholar 

  • Shen PS (2012) Empirical likelihood ratio with doubly truncated data. J Appl Stat 38:2345–2353

    Article  Google Scholar 

  • Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Med Res Methodol 7:53

    Article  Google Scholar 

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical process. Springer-Varlag, New York

    Book  Google Scholar 

  • Zeng D, Lin DY (2006) Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93:627–640

    Article  MATH  MathSciNet  Google Scholar 

  • Zhu H, Wang MC (2012) Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

We would like to thank the editor, the associate editor and the two reviewers for their helpful comments and corrections that greatly improved the manuscript. This work was financially supported by the National Science Council of Taiwan (NSC101-2118-M008-002-MY2) to T. Emura, and a Grant-in-Aid for a Research Fellow of the Japan Society for the Promotion of Science to H. Michimae (No. 23570036). The work of Y. Konno was partially supported by Grant-in-Aid for Scientific Research(C) (No. 25330043 and 21500283).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takeshi Emura.

Appendices

Appendix A: Bootstrap and jackknife algorithms

Simple bootstrap algorithm (Moreira and Uña-Álvarez 2010):

  • Step 1: For each \(b=1,\;\ldots ,\;B\), draw bootstrap resamples \(\{\;(U_{jb}^*,\;T_{jb}^*,\;V_{jb}^*):j=1,\;\ldots ,\;n\;\}\) from \(\{\;(U_j ,\;T_j ,\;V_j ):j=1,\;\ldots ,\;n\;\}\), and then compute the NPMLE \(\hat{{F}}_b^*(t)\) from them.

  • Step 2: Compute the bootstrap variance estimator

    $$\begin{aligned} \hat{{V}}_{\mathrm{Boot}} \{\hat{{F}}(t)\}=\frac{1}{B-1}\sum _{b=1}^B {\{\hat{{F}}_b^*(t)-\bar{{F}}^{*}(t)\}^{2}} , \end{aligned}$$

    where \(\bar{{F}}^{*}(t)=\frac{1}{B} \sum _{b=1}^B {\hat{{F}}_b^*(t)}\), and take the \((\alpha /2)\times 100\)% and \((1-\alpha /2)\times 100\)% points of \(\{\;\hat{{F}}_b^*(t):\;b=1,\;\ldots ,\;B\;\}\) for the\((1-\alpha )\times 100\)% confidence interval.

Jackknife algorithm:

  • Step 1: For each \(i=1,\;\ldots ,\;n\), delete the \(i \) th sample from \(\{\;(U_j ,\;T_j ,\;V_j ):j=1,\;\ldots ,\;n\;\}\), and then compute the NPMLE \(\hat{{F}}_{(-i)}(t)\) from the remaining \(n-\)1 samples.

  • Step 2: Compute the jackknife variance estimator

    $$\begin{aligned} \hat{{V}}_{\mathrm{Jack}} \{\hat{{F}}(t)\}=\frac{n-1}{n}\sum _{i=1}^n {\{\hat{{F}}_{(-i)} (t)-\bar{{F}}_{(\cdot )} (t)\}^{2}} , \end{aligned}$$

    where \(\bar{{F}}_{(\cdot )} (t)=\frac{1}{n}\sum _{i=1}^n {\hat{{F}}_{(-i)} (t)} \), and the log-transformed \((1-\alpha )\times 100\)% confidence interval

    $$\begin{aligned} (\;\hat{{F}}(t)\exp [\;-z_{\alpha /2} \hat{{V}}_{\mathrm{Jack}}^{\mathrm{1/2}} \{\hat{{F}}(t)\}/\hat{{F}}(t)\;],\;\;\hat{{F}}(t)\exp [\;z_{\alpha /2} \hat{{V}}_{\mathrm{Jack}}^{\mathrm{1/2}} \{\hat{{F}}(t)\}/\hat{{F}}(t)\;]\;). \end{aligned}$$

Appendix B: Asymptotic theory

1.1 Appendix B1. Weak convergence of \(\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)\)

Although not stated explicitly, we assume that the identifiability conditions (Shen 2010, p. 836) are satisfied. Consider the log-likelihood function

$$\begin{aligned} \ell _n (F)/n=\sum _{i=1}^n {(\log f_j -\log F_j )} /n. \end{aligned}$$

For any \(h\in Q\), where \(Q\) is the set of all uniformly bounded functions, let \(H(t)=\int _0^t {h(s)dF(s)} \) and \(\hat{{H}}(t)=\int _0^t {h(s)d\hat{{F}}(s)} \) where \(h\) satisfies the constraint \(\hat{{H}}(\infty )=1\). Suppose that \(\hat{{F}}\) is the maximizer of \(\ell _n (F)\). Then for any \(h\in Q\) and \(\varepsilon \ge 0\), we have \(\ell _n (\hat{{F}}+\varepsilon \hat{{H}})\le \ell _n (\hat{{F}})\). Hence, the score function \(\partial \ell _n (F+\varepsilon H)/\partial \varepsilon |_{\varepsilon =0} \) is equal to

$$\begin{aligned} \Psi _n (F)[h]\equiv \frac{1}{n}\sum _{i=1}^n {\left[ {h(T_i )-\frac{\int {\mathbf{I}(U_i \le s\le V_i )h(s)dF(s} )}{\int {\mathbf{I}(U_i \le s\le V_i )dF(s} )}} \right] } , \end{aligned}$$

for any \(h\in Q\). The expectation is defined as

$$\begin{aligned} \Psi (F)[h]\equiv E\left[ {h(T^{*})-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*})h(s)dF(s)} }{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s)} }} \right] . \end{aligned}$$

Consider \(\Psi _n (F)[h]\) as a random function defined on \(Q\). Accordingly, consider a random map \(\Theta \rightarrow l^{\infty }(Q)\), defined by \(F\mapsto \Psi _n (F)[\cdot ]\). Then, the equation \(\Psi _n (F)[\cdot ]=0\) is considered the estimating function that takes its value on \(l^{\infty }(Q)\). It follows that the NPMLE is the Z-estimator that satisfies \(\Psi _n (\hat{{F}})[\cdot ]=0\) (van der Vaart and Wellner 1996, p. 309). In the following, we assume that certain regularity conditions for the asymptotic theory for the Z-estimator hold, which include the asymptotic approximation condition, the Fréchet differentiability of the map, and the invertibility of the derivative map.

Then, one can write

$$\begin{aligned} 0=n^{1/2}\Psi _n (\hat{{F}})[h]=n^{1/2}\Psi _n (F)[h]+n^{1/2}\dot{\Psi }_F (\hat{{F}}-F)[h]+o_P (1), \end{aligned}$$
(5)

where \(\dot{\Psi }_F (\hat{{F}}-F)[h]\) is the derivative of \(\Psi _n (F)[h]\) at \(F\) with direction \(\hat{{F}}-F\). It follows from the form of \(\Psi (F)[\cdot ]\) that

$$\begin{aligned} \dot{\Psi }_F (\hat{{F}}-F)[h]=\frac{d}{dt}\Psi \{\;\hat{{F}}+t(\hat{{F}}-F)\;\}[h]|_{t=0} =-\int {\sigma _F (x)[h]d(\hat{{F}}-F)(x)} . \end{aligned}$$
(6)

It follows from Eqs. (5) and (6) that the NPMLE satisfies the asymptotic linear expression

$$\begin{aligned}&\sqrt{n}\int {\sigma _F (x)[h]d(\hat{{F}}-F)(x)} \nonumber \\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n {\left[ {h(T_i )-\frac{\int {I(U_i \le s\le V_i )h(s)dF(s} )}{\int {I(U_i \le s\le V_i )dF(s} )}} \right] } +o_P (1), \end{aligned}$$
(7)

where the right-side converges weakly to a mean zero Gaussian process with the covariance structure

$$\begin{aligned}&E\left[ {h(T^{*})-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*})h(s)dF(s} )}{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s} )}} \right] \left[ {{h}'(T^*)-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*}){h}'(s)dF(s} )}{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s} )}} \right] \\&\quad =\int {\sigma _F (x)[h]{h}'(x)dF(x)} , \end{aligned}$$

for bounded functions \(h\) and \({h}'\). The desired weak convergence of \(\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)\) is obtained by setting \(h=\sigma _F^{-1} (w_t )\) in Eq. (7).

1.2 Appendix B2: Proof of \(\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n (\mathbf{f})}{n}} \right\} ^{-1}\mathbf{W}_t \)

It follows that

$$\begin{aligned} \hat{{\sigma }}_F (T_j )[h]=\frac{1}{n}\sum _{i=1}^n {J_{ij} \left\{ {\frac{h_j }{\hat{{F}}_i }-\frac{1}{\hat{{F}}_i^2 }\sum _{k=1}^n {J_{ik} h_k \hat{{f}}_k } } \right\} }=\frac{1}{n}\left[ {\frac{h_j \hat{{f}}_j}{\hat{{f}}_j^2 }-\sum _{i=1}^n {\sum _{k=1}^n {\frac{J_{ij} J_{ik} }{\hat{{F}}_i^2 }h_k \hat{{f}}_k } } } \right] .\nonumber \\ \end{aligned}$$
(8)

Note that

$$\begin{aligned} J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{\mathbf{F}^{2}}} \right) J=\left[ {{\begin{array}{c@{\quad }c@{\quad }c} {\sum _{i=1}^n {\frac{J_{i1} J_{i1} }{F_i^2 }} }&{} \cdots &{} {\sum _{i=1}^n {\frac{J_{i1} J_{in} }{F_i^2 }} } \\ \vdots &{} \ddots &{} \vdots \\ {\sum _{i=1}^n {\frac{J_{in} J_{i1} }{F_i^2 }} }&{} \cdots &{} {\sum _{i=1}^n {\frac{J_{in} J_{in} }{F_i^2 }} } \\ \end{array} }} \right] . \end{aligned}$$

Hence, Eq. (8) with \(h={h}'\) and \(\sigma _F (x)[{h}']=w_t (x)=\mathbf{I}(x\le t)\) yield

$$\begin{aligned} \left[ {{\begin{array}{c} {w_t (T_1 )} \\ \vdots \\ {w_t (T_n )} \\ \end{array} }} \right]&= \frac{1}{n}\left[ {\left. {\left\{ {\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{f}}}^{2}}} \right) -J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{F}}}^{2}}} \right) J} \right\} } \right| _{\hat{{f}}_n =1-\mathbf{1}_{n-1}^\mathrm{T} {\hat{\mathbf{f}}}} } \right] \left[ {{\begin{array}{c} {h_1 \hat{{f}}_1 } \\ \vdots \\ {h_n \hat{{f}}_n } \\ \end{array} }} \right] \\&= \frac{1}{n}\left[ {\left. {\left\{ {\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{f}}}^{2}}} \right) -J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{F}}}^{2}}} \right) J} \right\} } \right| _{\hat{{f}}_n =1-\mathbf{1}_{n-1}^\mathrm{T} {\hat{\mathbf{f}}}} } \right] D^{\mathrm{T}}\left[ {{\begin{array}{c} {h_1 \hat{{f}}_1 } \\ \vdots \\ {h_{n-1} \hat{{f}}_{n-1} } \\ \end{array} }} \right] , \end{aligned}$$

where the last equation uses the constraint \(\sum _{j=1}^n {h_j \hat{{f}}_j } =0\). Multiplying \(D\) for both sides, and taking the inverse of the information matrix,

$$\begin{aligned} \left[ {{\begin{array}{c} {\hat{{\sigma }}_F^{-1} (w_t )(T_1 )\hat{{f}}_1 } \\ \vdots \\ {\hat{{\sigma }}_F^{-1} (w_t )(T_{n-1} )\hat{{f}}_{n-1} } \\ \end{array} }} \right] =\left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\left[ {{\begin{array}{c} {w_t (T_1 )-w_t (T_n )} \\ \vdots \\ {w_t (T_1 )-w_t (T_n )} \\ \end{array} }} \right] . \end{aligned}$$

It follows that

$$\begin{aligned}&\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\sum _{j=1}^{n-1} {\{\;w_s (T_j )-w_s (T_n )\;\}\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } \\&\quad =\left[ {{\begin{array}{lll} {w_s (T_1 )-w_s (T_n )}&{} \cdots &{} {w_s (T_{n-1} )-w_s (T_n )\;} \\ \end{array} }} \right] \left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\left[ {{\begin{array}{c} {w_t (T_1 )-w_t (T_n )} \\ \vdots \\ {w_t (T_1 )-w_t (T_n )} \\ \end{array} }} \right] \\&\quad =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\mathbf{W}_t . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Emura, T., Konno, Y. & Michimae, H. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal 21, 397–418 (2015). https://doi.org/10.1007/s10985-014-9297-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-014-9297-5

Keywords

Navigation