Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation

Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

doi:10.1007/s10985-014-9297-5

Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation

Published: 08 July 2014

Volume 21, pages 397–418, (2015)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Takeshi Emura¹,
Yoshihiko Konno² &
Hirofumi Michimae³

444 Accesses
21 Citations
Explore all metrics

Abstract

Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The inverted exponentiated Chen distribution with application to cancer data

Article 20 April 2023

Reza Azimi, Mahdy Esmailian & Diego I. Gallardo

Parametric Estimation Under Exponential Family

Statistical Inference for Truncated Inverse Lomax Distribution and its Application to Survival Data

Article 21 November 2019

Abhimanyu Singh Yadav, Shivanshi Shukla & Amrita Kumari

References

Austin D, Simon DK, Betensky RA (2013) Computationally simple estimation and improved efficiency for special cases of double truncation. Lifetime Data Anal. doi:10.1007/s10985-013-9287-z
Chen YH (2010) Semiparametric marginal regression analysis for dependent competing risks under an assumed copula. J R Stat Soc B 72:235–251
Article Google Scholar
Commenges D (2002) Inference for multi-state models from interval-censored data. Stat Methods Med Res 11:167–182
Article MATH Google Scholar
Efron B, Petrosian V (1999) Nonparametric method for doubly truncated data. J Am Stat Assoc 94:824–834
Article MATH MathSciNet Google Scholar
Emura T, Wang W (2012) Nonparametric maximum likelihood estimation for dependent truncation data based on copulas. J Multivar Anal 110:171–188
Article MATH MathSciNet Google Scholar
Emura T, Konno Y (2012) Multivariate normal distribution approaches for dependently truncated data. Stat Papers 53:133–149
Article MATH MathSciNet Google Scholar
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York
Google Scholar
Moreira C, Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametr Stat 22:567–583
Article MATH MathSciNet Google Scholar
Moreira C, Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electron J Stat 6:501–521
Article MATH MathSciNet Google Scholar
Moreira C, Keilegom IV (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123
Article Google Scholar
Moreira C, Uña-Álvarez J, Meira-Machado L (2014) Nonparametric regression with doubly truncated data. Comput Stat Data Anal. doi:10.1016/j.csda.2014.03.017
Murphy SA (1995) Asymptotic theory for the frailty model. Ann Stat 23:182–198
Article MATH Google Scholar
Nair VN (1984) Confidence bands for survival functions with censored data: a comparative study. Technometrics 26:265–275
Article Google Scholar
Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62:835–853
Article Google Scholar
Shen PS (2011) Testing quasi-independence for doubly truncated data. J Nonparametr Stat 23:1–9
Article MATH MathSciNet Google Scholar
Shen PS (2012) Empirical likelihood ratio with doubly truncated data. J Appl Stat 38:2345–2353
Article Google Scholar
Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Med Res Methodol 7:53
Article Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical process. Springer-Varlag, New York
Book Google Scholar
Zeng D, Lin DY (2006) Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93:627–640
Article MATH MathSciNet Google Scholar
Zhu H, Wang MC (2012) Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank the editor, the associate editor and the two reviewers for their helpful comments and corrections that greatly improved the manuscript. This work was financially supported by the National Science Council of Taiwan (NSC101-2118-M008-002-MY2) to T. Emura, and a Grant-in-Aid for a Research Fellow of the Japan Society for the Promotion of Science to H. Michimae (No. 23570036). The work of Y. Konno was partially supported by Grant-in-Aid for Scientific Research(C) (No. 25330043 and 21500283).

Author information

Authors and Affiliations

Graduate Institute of Statistics, National Central University, Zhongli, Taiwan
Takeshi Emura
Department of Mathematical and Physical Sciences, Japan Women’s University, Tokyo, Japan
Yoshihiko Konno
Department of Clinical Medicine (Biostatistics), School of Pharmacy, Kitasato University, Tokyo, Japan
Hirofumi Michimae

Authors

Takeshi Emura
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiko Konno
View author publications
You can also search for this author in PubMed Google Scholar
Hirofumi Michimae
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takeshi Emura.

Appendices

Appendix A: Bootstrap and jackknife algorithms

Simple bootstrap algorithm (Moreira and Uña-Álvarez 2010):

Step 1: For each $b=1,\;\ldots ,\;B$, draw bootstrap resamples $\{\;(U_{jb}^*,\;T_{jb}^*,\;V_{jb}^*):j=1,\;\ldots ,\;n\;\}$ from $\{\;(U_j ,\;T_j ,\;V_j ):j=1,\;\ldots ,\;n\;\}$, and then compute the NPMLE $\hat{{F}}_b^*(t)$ from them.
Step 2: Compute the bootstrap variance estimator
$$\begin{aligned} \hat{{V}}_{\mathrm{Boot}} \{\hat{{F}}(t)\}=\frac{1}{B-1}\sum _{b=1}^B {\{\hat{{F}}_b^*(t)-\bar{{F}}^{*}(t)\}^{2}} , \end{aligned}$$
where $\bar{{F}}^{*}(t)=\frac{1}{B} \sum _{b=1}^B {\hat{{F}}_b^*(t)}$, and take the $(\alpha /2)\times 100$% and $(1-\alpha /2)\times 100$% points of $\{\;\hat{{F}}_b^*(t):\;b=1,\;\ldots ,\;B\;\}$ for the$(1-\alpha )\times 100$% confidence interval.

Jackknife algorithm:

Step 1: For each $i=1,\;\ldots ,\;n$, delete the $i $ th sample from $\{\;(U_j ,\;T_j ,\;V_j ):j=1,\;\ldots ,\;n\;\}$, and then compute the NPMLE $\hat{{F}}_{(-i)}(t)$ from the remaining $n-$1 samples.
Step 2: Compute the jackknife variance estimator
$$\begin{aligned} \hat{{V}}_{\mathrm{Jack}} \{\hat{{F}}(t)\}=\frac{n-1}{n}\sum _{i=1}^n {\{\hat{{F}}_{(-i)} (t)-\bar{{F}}_{(\cdot )} (t)\}^{2}} , \end{aligned}$$
where $\bar{{F}}_{(\cdot )} (t)=\frac{1}{n}\sum _{i=1}^n {\hat{{F}}_{(-i)} (t)} $, and the log-transformed $(1-\alpha )\times 100$% confidence interval
$$\begin{aligned} (\;\hat{{F}}(t)\exp [\;-z_{\alpha /2} \hat{{V}}_{\mathrm{Jack}}^{\mathrm{1/2}} \{\hat{{F}}(t)\}/\hat{{F}}(t)\;],\;\;\hat{{F}}(t)\exp [\;z_{\alpha /2} \hat{{V}}_{\mathrm{Jack}}^{\mathrm{1/2}} \{\hat{{F}}(t)\}/\hat{{F}}(t)\;]\;). \end{aligned}$$

Appendix B: Asymptotic theory

1.1 Appendix B1. Weak convergence of $\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)$

Although not stated explicitly, we assume that the identifiability conditions (Shen 2010, p. 836) are satisfied. Consider the log-likelihood function

$$\begin{aligned} \ell _n (F)/n=\sum _{i=1}^n {(\log f_j -\log F_j )} /n. \end{aligned}$$

For any $h\in Q$, where $Q$ is the set of all uniformly bounded functions, let $H(t)=\int _0^t {h(s)dF(s)} $ and $\hat{{H}}(t)=\int _0^t {h(s)d\hat{{F}}(s)} $ where $h$ satisfies the constraint $\hat{{H}}(\infty )=1$. Suppose that $\hat{{F}}$ is the maximizer of $\ell _n (F)$. Then for any $h\in Q$ and $\varepsilon \ge 0$, we have $\ell _n (\hat{{F}}+\varepsilon \hat{{H}})\le \ell _n (\hat{{F}})$. Hence, the score function $\partial \ell _n (F+\varepsilon H)/\partial \varepsilon |_{\varepsilon =0} $ is equal to

$$\begin{aligned} \Psi _n (F)[h]\equiv \frac{1}{n}\sum _{i=1}^n {\left[ {h(T_i )-\frac{\int {\mathbf{I}(U_i \le s\le V_i )h(s)dF(s} )}{\int {\mathbf{I}(U_i \le s\le V_i )dF(s} )}} \right] } , \end{aligned}$$

for any $h\in Q$. The expectation is defined as

$$\begin{aligned} \Psi (F)[h]\equiv E\left[ {h(T^{*})-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*})h(s)dF(s)} }{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s)} }} \right] . \end{aligned}$$

Consider $\Psi _n (F)[h]$ as a random function defined on $Q$. Accordingly, consider a random map $\Theta \rightarrow l^{\infty }(Q)$, defined by $F\mapsto \Psi _n (F)[\cdot ]$. Then, the equation $\Psi _n (F)[\cdot ]=0$ is considered the estimating function that takes its value on $l^{\infty }(Q)$. It follows that the NPMLE is the Z-estimator that satisfies $\Psi _n (\hat{{F}})[\cdot ]=0$ (van der Vaart and Wellner 1996, p. 309). In the following, we assume that certain regularity conditions for the asymptotic theory for the Z-estimator hold, which include the asymptotic approximation condition, the Fréchet differentiability of the map, and the invertibility of the derivative map.

Then, one can write

$$\begin{aligned} 0=n^{1/2}\Psi _n (\hat{{F}})[h]=n^{1/2}\Psi _n (F)[h]+n^{1/2}\dot{\Psi }_F (\hat{{F}}-F)[h]+o_P (1), \end{aligned}$$

(5)

where $\dot{\Psi }_F (\hat{{F}}-F)[h]$ is the derivative of $\Psi _n (F)[h]$ at $F$ with direction $\hat{{F}}-F$. It follows from the form of $\Psi (F)[\cdot ]$ that

$$\begin{aligned} \dot{\Psi }_F (\hat{{F}}-F)[h]=\frac{d}{dt}\Psi \{\;\hat{{F}}+t(\hat{{F}}-F)\;\}[h]|_{t=0} =-\int {\sigma _F (x)[h]d(\hat{{F}}-F)(x)} . \end{aligned}$$

(6)

It follows from Eqs. (5) and (6) that the NPMLE satisfies the asymptotic linear expression

$$\begin{aligned}&\sqrt{n}\int {\sigma _F (x)[h]d(\hat{{F}}-F)(x)} \nonumber \\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n {\left[ {h(T_i )-\frac{\int {I(U_i \le s\le V_i )h(s)dF(s} )}{\int {I(U_i \le s\le V_i )dF(s} )}} \right] } +o_P (1), \end{aligned}$$

(7)

where the right-side converges weakly to a mean zero Gaussian process with the covariance structure

$$\begin{aligned}&E\left[ {h(T^{*})-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*})h(s)dF(s} )}{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s} )}} \right] \left[ {{h}'(T^*)-\frac{\int {\mathbf{I}(U^{*}\le s\le V^{*}){h}'(s)dF(s} )}{\int {\mathbf{I}(U^{*}\le s\le V^{*})dF(s} )}} \right] \\&\quad =\int {\sigma _F (x)[h]{h}'(x)dF(x)} , \end{aligned}$$

for bounded functions $h$ and ${h}'$. The desired weak convergence of $\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)$ is obtained by setting $h=\sigma _F^{-1} (w_t )$ in Eq. (7).

1.2 Appendix B2: Proof of $\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n (\mathbf{f})}{n}} \right\} ^{-1}\mathbf{W}_t $

It follows that

$$\begin{aligned} \hat{{\sigma }}_F (T_j )[h]=\frac{1}{n}\sum _{i=1}^n {J_{ij} \left\{ {\frac{h_j }{\hat{{F}}_i }-\frac{1}{\hat{{F}}_i^2 }\sum _{k=1}^n {J_{ik} h_k \hat{{f}}_k } } \right\} }=\frac{1}{n}\left[ {\frac{h_j \hat{{f}}_j}{\hat{{f}}_j^2 }-\sum _{i=1}^n {\sum _{k=1}^n {\frac{J_{ij} J_{ik} }{\hat{{F}}_i^2 }h_k \hat{{f}}_k } } } \right] .\nonumber \\ \end{aligned}$$

(8)

Note that

$$\begin{aligned} J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{\mathbf{F}^{2}}} \right) J=\left[ {{\begin{array}{c@{\quad }c@{\quad }c} {\sum _{i=1}^n {\frac{J_{i1} J_{i1} }{F_i^2 }} }&{} \cdots &{} {\sum _{i=1}^n {\frac{J_{i1} J_{in} }{F_i^2 }} } \\ \vdots &{} \ddots &{} \vdots \\ {\sum _{i=1}^n {\frac{J_{in} J_{i1} }{F_i^2 }} }&{} \cdots &{} {\sum _{i=1}^n {\frac{J_{in} J_{in} }{F_i^2 }} } \\ \end{array} }} \right] . \end{aligned}$$

Hence, Eq. (8) with $h={h}'$ and $\sigma _F (x)[{h}']=w_t (x)=\mathbf{I}(x\le t)$ yield

$$\begin{aligned} \left[ {{\begin{array}{c} {w_t (T_1 )} \\ \vdots \\ {w_t (T_n )} \\ \end{array} }} \right]&= \frac{1}{n}\left[ {\left. {\left\{ {\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{f}}}^{2}}} \right) -J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{F}}}^{2}}} \right) J} \right\} } \right| _{\hat{{f}}_n =1-\mathbf{1}_{n-1}^\mathrm{T} {\hat{\mathbf{f}}}} } \right] \left[ {{\begin{array}{c} {h_1 \hat{{f}}_1 } \\ \vdots \\ {h_n \hat{{f}}_n } \\ \end{array} }} \right] \\&= \frac{1}{n}\left[ {\left. {\left\{ {\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{f}}}^{2}}} \right) -J^{\mathrm{T}}\hbox {diag}\left( {\frac{1}{{\hat{\mathbf{F}}}^{2}}} \right) J} \right\} } \right| _{\hat{{f}}_n =1-\mathbf{1}_{n-1}^\mathrm{T} {\hat{\mathbf{f}}}} } \right] D^{\mathrm{T}}\left[ {{\begin{array}{c} {h_1 \hat{{f}}_1 } \\ \vdots \\ {h_{n-1} \hat{{f}}_{n-1} } \\ \end{array} }} \right] , \end{aligned}$$

where the last equation uses the constraint $\sum _{j=1}^n {h_j \hat{{f}}_j } =0$. Multiplying $D$ for both sides, and taking the inverse of the information matrix,

$$\begin{aligned} \left[ {{\begin{array}{c} {\hat{{\sigma }}_F^{-1} (w_t )(T_1 )\hat{{f}}_1 } \\ \vdots \\ {\hat{{\sigma }}_F^{-1} (w_t )(T_{n-1} )\hat{{f}}_{n-1} } \\ \end{array} }} \right] =\left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\left[ {{\begin{array}{c} {w_t (T_1 )-w_t (T_n )} \\ \vdots \\ {w_t (T_1 )-w_t (T_n )} \\ \end{array} }} \right] . \end{aligned}$$

It follows that

$$\begin{aligned}&\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\sum _{j=1}^{n-1} {\{\;w_s (T_j )-w_s (T_n )\;\}\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } \\&\quad =\left[ {{\begin{array}{lll} {w_s (T_1 )-w_s (T_n )}&{} \cdots &{} {w_s (T_{n-1} )-w_s (T_n )\;} \\ \end{array} }} \right] \left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\left[ {{\begin{array}{c} {w_t (T_1 )-w_t (T_n )} \\ \vdots \\ {w_t (T_1 )-w_t (T_n )} \\ \end{array} }} \right] \\&\quad =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n ({\hat{\mathbf{f}}})}{n}} \right\} ^{-1}\mathbf{W}_t . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Emura, T., Konno, Y. & Michimae, H. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal 21, 397–418 (2015). https://doi.org/10.1007/s10985-014-9297-5

Download citation

Received: 19 August 2013
Accepted: 18 June 2014
Published: 08 July 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10985-014-9297-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation

Abstract

Access this article

Similar content being viewed by others

The inverted exponentiated Chen distribution with application to cancer data

Parametric Estimation Under Exponential Family

Statistical Inference for Truncated Inverse Lomax Distribution and its Application to Survival Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Bootstrap and jackknife algorithms

Appendix B: Asymptotic theory

1.1 Appendix B1. Weak convergence of \(\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)\)

1.2 Appendix B2: Proof of \(\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n (\mathbf{f})}{n}} \right\} ^{-1}\mathbf{W}_t \)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation

Abstract

Access this article

Similar content being viewed by others

The inverted exponentiated Chen distribution with application to cancer data

Parametric Estimation Under Exponential Family

Statistical Inference for Truncated Inverse Lomax Distribution and its Application to Survival Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Bootstrap and jackknife algorithms

Appendix B: Asymptotic theory

1.1 Appendix B1. Weak convergence of \(\sqrt{n}(\;\hat{{F}}(t)-F(t)\;)\)

1.2 Appendix B2: Proof of \(\sum _{j=1}^n {w_s (T_j )\hat{{\sigma }}_F^{-1} (w_t )(T_j )\hat{{f}}_j } =\mathbf{W}_s^\mathrm{T} \left\{ {\frac{i_n (\mathbf{f})}{n}} \right\} ^{-1}\mathbf{W}_t \)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation