Skip to main content

Advertisement

Log in

Nonparametric estimation of time-to-event distribution based on recall data in observational studies

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

In a cross-sectional observational study, time-to-event distribution can be estimated from data on current status or from recalled data on the time of occurrence. In either case, one can treat the data as having been interval censored, and use the nonparametric maximum likelihood estimator proposed by Turnbull (J R Stat Soc Ser B 38:290–295, 1976). However, the chance of recall may depend on the time span between the occurrence of the event and the time of interview. In such a case, the underlying censoring would be informative, rendering the Turnbull estimator inappropriate. In this article, we provide a nonparametric maximum likelihood estimator of the distribution of interest, by using a model adapted to the special nature of the data at hand. We also provide a computationally simple approximation of this estimator, and establish the consistency of both the original and the approximate versions, under mild conditions. Monte Carlo simulations indicate that the proposed estimators have smaller bias than the Turnbull estimator based on incomplete recall data, smaller variance than the Turnbull estimator based on current status data, and smaller mean squared error than both of them. The method is applied to menarcheal data from a recent Anthropometric study of adolescent and young adult females in Kolkata, India.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Aksglaede L, Sorensen K, Petersen JH, Skakkebak NE, Juul A (2009) Recent decline in age at breast development: The copenhagen puberty study. Pediatrics 123(5):932–939

    Article  Google Scholar 

  • Allison PD (1982) Discrete-time methods for the analysis of event histories. Sociol Methodol 13:61–98

    Article  Google Scholar 

  • Ayatollahi SM, Dowlatabadi E, Ayatollahi SA (2002) Age at menarche in iran. Ann Hum Biol 29(4):355–362

    Article  Google Scholar 

  • Beckett M, DaVanzo J, Sastry N, Panis C, Peterson C (2001) The quality of retrospective data: An examination of long-term recall in a developing country. J Hum Resour 36(3):593–625

    Article  Google Scholar 

  • Bergsten-Brucefors A (1976) A note on the accuracy of recalled age at menarche. Ann Hum Biol 3:71–73

    Article  Google Scholar 

  • Bickel PJ, Gotze F, van Zwet WR (1997) Resampling fewer than \(n\) observations: gains, losses, and remedies for losses. Stat. Sinica 7(1):1–31 Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995)

    MathSciNet  MATH  Google Scholar 

  • Bickel PJ, Sakov A (2008) On the choice of \(m\) in the \(m\) out of \(n\) bootstrap and confidence bounds for extrema. Stat. Sinica 18(3):967–985

    MathSciNet  MATH  Google Scholar 

  • Billingsley P (1968) Convergence of probability measures. Wiley, New York-London-Sydney

    MATH  Google Scholar 

  • Cameron N (2002) Human growth and development. Academic Press, San Diego

    Google Scholar 

  • Chumlea WC, Schubert CM, Roche AF, Kulin HE, Lee PA, Himes JH, Sun SS (2003) Age at menarche and racial comparisons in us girls. Pediatrics 11(1):110–113

    Article  Google Scholar 

  • Demirjian A, Goldstien H, Tanner JM (1973) A new system of dental age assessment. Ann Hum Biol 45:211–227

    Google Scholar 

  • Efron B (1967) The two sample problem with censored data. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp 831–853

  • Eveleth PB, Tanner JM (1990) Worldwide variation in human growth, 2nd edn. Cambridge University Press, Cambridge

    Google Scholar 

  • Gentleman R, Geyer CJ (1994) Maximum likelihood for interval censored data: consistency and computation. Biometrika 81(3):618–623

    Article  MathSciNet  MATH  Google Scholar 

  • Hediger ML, Stine RA (1987) Age at menarche based on recall data. Ann Hum Biol 14:133–142

    Article  Google Scholar 

  • Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time to event data. Wiley, New York

    MATH  Google Scholar 

  • ISI (2012) Annual report of the Indian Statistical Institute. http://library.isical.ac.in/jspui/handle/10263/5345?mode=full

  • Kalbfleisch JD, Lawless JF (1989) Inference based on retrospective ascertainment: an analysis of the data on transfusion-related aids. J Am Stat Assoc 84:360–372

    Article  MathSciNet  MATH  Google Scholar 

  • Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481

    Article  MathSciNet  MATH  Google Scholar 

  • Keiding N, Begtrup K, Scheike TH, Hasibeder G (1996) Estimation from current-status data in continuous time. Lifetime Data Anal 2(2):119–129

    Article  MATH  Google Scholar 

  • Korn EL, Graubard BI, Midthune D (1997) Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol 145:72–80

    Article  Google Scholar 

  • LeClere MJ (2005) Preface modeling time to event: applications of survival analysis in accounting, economics and finance. Rev Acc Financ 4:5–12

    Article  Google Scholar 

  • McKay HA, Bailey DB, Mirwald RL, Davison KS, Faulkner RA (1998) Peak bone mineral accrual and age at menarche in adolescent girls: A 6-year longitudinal study. J Pediatr 13:682–687

    Article  Google Scholar 

  • Mirzaei SS, Das R, Sengupta D (2015) Parametric estimation of menarcheal age distribution based on recall data. Scand J Stat. doi:10.1111/sjos.12107

  • Mirzaei SS, Sengupta D (2013) Nonparametric estimation of time-to-event distribution based on recall data in observational studies. Thecnical Report No. ASD/2013/7, Applied Statistical Unit, Indian Statistical Institue 7. http://www.isical.ac.in/~asu/TR/TechRepASU201307.pdf

  • Rabe-Hesketh S, Yang S, Pickles A (2001) Multilevel models for censored and latent responses. Stat Methods Med Res 10(6):409–427

    Article  MATH  Google Scholar 

  • Redner R (1981) Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann Stat 9(1):225–228

    Article  MathSciNet  MATH  Google Scholar 

  • Sen B, Banerjee M, Woodroofe M (2010) Inconsistency of bootstrap: the Grenander estimator. Ann Stat 38(4):1953–1977

    Article  MathSciNet  MATH  Google Scholar 

  • Simon CP, Blume L (1994) Mathematics for economists. W W Norton, New York

    Google Scholar 

  • Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York

    MATH  Google Scholar 

  • Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38:290–295

    MathSciNet  MATH  Google Scholar 

  • Wang JL (1985) Strong consistency of approximate maximum likelihood estimators with applications in nonparametrics. Ann. Statist. 13(3):932–946

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research is partially sponsored by the project ‘Physical growth, body composition and nutritional status of the Bengal school aged children, adolescents, and young adults of Calcutta, India: Effects of socioeconomic factors on secular trends’, funded by the Neys Van Hoogstraten Foundation of the Netherlands. The authors thank Professor Parasmani Dasgupta, leader of the project, for making the data available for this research. The interpretation given in the end of Section 8 emerged from a discussion with Professor John D. Kalbfleisch of the University of Michigan. The authors wish to thank him for helpful discussion. The authors thank two referees and an associate editor for making extensive and useful comments that led to substantial improvement of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sedigheh Mirzaei Salehabadi.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Research involving Human Participants and/or Animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Appendix

Appendix

1.1 Proof of Theorem 1

  1. (a)

    We have, from (6) (with \(v>0\) and \(\delta =1\)),

    $$\begin{aligned} h(s,v,1)=g(s)f(s-v)(1-\pi _\eta (v)), \end{aligned}$$

    that is,

    $$\begin{aligned} 1-\pi _\eta (v)=\frac{h(s,v,1)}{g(s)f(s-v)} \qquad \forall s,v \ \text{ s.t. } \ v<s. \end{aligned}$$
    (25)

    By substituting the above expression in (6) for \(v=0\) and \(\delta =1\) and simplifying the equation, we have

    $$\begin{aligned} F(s)=\frac{h(s,0,1)+\int _{0}^{s}h(s,s-u,1)du}{g(s)}. \end{aligned}$$
    (26)

    By substituting the above expression of F(s) in (6) for \(v=0\) and \(\delta =0\), we have g(s) as

    $$\begin{aligned} g(s)=h(s,0,0)+h(s,0,1)+\int _{0}^{s} h(s,s-u,1)du. \end{aligned}$$
    (27)

    The above identity holds over the support of G irrespective of whether G is a discrete, a continuous or a mixed distribution. The identifiability of G follows.

  2. (b)

    By substituting (27) in (26), we have

    $$\begin{aligned} F(s)=\frac{h(s,0,1)+\int _{0}^{s}h(s,s-u,1)du}{h(s,0,0)+h(s,0,1)+\int _{0}^{s} h(s,s-u,1)du}. \end{aligned}$$
    (28)

    If G has an absolutely continuous component over the support of F, then for every s and all real valued \(v<s\), we have from (25),

    $$\begin{aligned} \pi _\eta (v)=1-\frac{h(s,v,1)}{g(s)f(s-v)}. \end{aligned}$$
    (29)

    Thus, (29) together with (27) and (28) identify F and \(\pi _\eta \) completely.

  3. (c)

    For the sake of contradiction, let us assume there are two pairs of choices of f and \(\pi _\eta \), say \((f_1,\pi _1)\) and \((f_2,\pi _2)\), such that their substitution in the right hand side of (6) produces the same function. If we follow the steps leading to (25) for these two pairs of functions, then we have, for all integers s and all \(v<s\),

    $$\begin{aligned} f_1(s-v)(1-\pi _1(v))=f_2(s-v)(1-\pi _2(v)). \end{aligned}$$

    Hence,

    $$\begin{aligned} \frac{f_1(v)}{f_2(v)}=\frac{1-\pi _2(s-v)}{1-\pi _1(s-v)}\qquad \forall s,v \ \text{ s.t. } \ v<s. \end{aligned}$$
    (30)

    Since the above identity holds for all integers s, we can write

    $$\begin{aligned} \frac{1-\pi _2(s-v)}{1-\pi _1(s-v)}=\frac{1-\pi _2(1-v)}{1-\pi _1(1-v)}\quad \text{ for } \text{ all } \text{ integer } s\hbox { and all }v<s. \end{aligned}$$
    (31)

    The above equation implies that the function \((1-\pi _1)/(1-\pi _2)\) is periodic over the relevant domain with period 1, which contradicts the assumption. Therefore, the pair \((f,\pi _\eta )\) is uniquely defined for any given h.

1.2 Proof of Theorem 2

By definition of \({\mathcal {C}}\) and \({\mathcal {C}}_0\) we can rewrite the likelihood (12) as follows.

$$\begin{aligned} L=\prod _{i=1}^{n}&\left[ \left\{ \sum _{t=1}^{k} b_t \left( \sum \limits _{\begin{array}{c} r:l_{it} \in s_r \\ s_r\in {\mathcal {C}}_0 \end{array}} p_r+\sum \limits _{\begin{array}{c} r:l_{it} \in s_r \\ s_r\in {\mathcal {C}}\backslash {\mathcal {C}}_0 \end{array}} p_r\right) \right\} ^{1-\varepsilon _i} \left\{ p_{l_i} \left( 1-\sum _{t=1}^{k} b_t I\big (T_i \in A_{it})\right) \right\} ^{\varepsilon _i}\right] ^{\delta _i}\nonumber \\&\left[ \sum \limits _{\begin{array}{c} r:l_i\in s_r \\ s_r \in {\mathcal {C}}_0 \end{array}} p_r+\sum \limits _{\begin{array}{c} r:l_i\in s_r \\ s_r \in {\mathcal {C}}\backslash {\mathcal {C}}_0 \end{array}} p_r\right] ^{1-\delta _i}. \end{aligned}$$
(32)

For any \( s_r\in {\mathcal {C}}\backslash {\mathcal {C}}_0 \), let \({\mathcal {A}}_r=\{I_{r'}: s_{r'}\in {\mathcal {C}}_0, s_r\subset s_{r'} \}\). By construction of \({\mathcal {C}}_0\), \({\mathcal {A}}_r\) is a non-empty set. The elements of \({\mathcal {A}}_r\) are disjoint sets consisting of unions of intervals, which are subsets of \([t_{min},t_{max}]\). Let \(I_{r^*}\) be that member of \({\mathcal {A}}_r\) which satisfies the condition ‘there is \(\alpha \in I_{r^*}\) such that \(\alpha < \beta \) whenever \(\beta \in I_{r^\dag }\) for any \(I_{r^\dag } \in {\mathcal {A}}_r\)’. We shall show that by shifting mass from any \(I_{r}\) to \(I_{r^*}\), there will be no reduction in the contribution of any individual to the likelihood (32).

We can check the effect of shifting mass on contribution of different individuals \((i=1,\ldots ,n)\) to the likelihood.

  1. Case (i).

    Let \(\delta _i=0\). If \(l_i\in s_r\) or \(l_i\notin s_{r^*}\), then there is no change in the likelihood. If \(l_i\in s_{r^*}\backslash s_r\), then the factor contributed by individual i to the likelihood increases by \(p_r\).

  2. Case (ii).

    Let \(\delta _i\varepsilon _i=1\). If \(l_i\notin s_{r^*}\), then there is no change in the likelihood. If \(l_i\in s_{r^*}\backslash s_r\), then the factor contributed by individual i to the likelihood increases by \(p_r\). The case \(l_i\in s_r\) cannot occur, because \(I_r\) and \(I_{r^*}\) are distinct and disjoint.

  3. Case (iii).

    Let \(\delta _i(1-\varepsilon _i)=1\). There exists at most one t such that \(l_{it} \in s_r\). If there is such a t, then there is no change in the likelihood. If there exists no t such that \(l_{it}\in s_{r^*}\), then there is no change in the likelihood also. In case there is a t such that \(l_{it}\in s_{r^*}\backslash s_r\), the factor contributed by individual i to the likelihood increases by \(p_r\).

It follows that maximizing L can be restricted to \(\{p_r:s_r \in {\mathcal {C}}_0\}\).

1.3 Proof of Theorem 3

It is easy to see, from the construction of \({\mathcal {A}}_0\), that every singleton set consisting of a perfectly recalled time-to-event is a nominal interval with zero width, belonging to \({\mathcal {A}}_0\). Therefore \({\mathcal {A}}_1\subseteq {\mathcal {A}}_0\).

Define \({\mathcal {S}}_1\), \({\mathcal {S}}_2\), \({\mathcal {S}}_3\) as index sets of individuals in the three different cases of censoring. The interview times are discrete valued with finite domain; \(x_1,x_2,\ldots ,x_k\) are also finite. Therefore, even when n is large, there is at most a finite number (say N) of distinct sets of the form

$$\begin{aligned} A_s=\big \{\bigcap _{i\in s}B_i\big \}\bigcap \big \{\bigcap _{i\in {{\mathcal {S}}_1 \cup {\mathcal {S}}_3 \backslash s}}B_i^c\big \}, \end{aligned}$$

where \(s\subseteq {\mathcal {S}}_1 \cup {\mathcal {S}}_3.\) Denote \(s^{(1)}, s^{(1)},\ldots ,s^{(N)}\), the index sets corresponding to the N distinct sets described above.

Consider a member of \({\mathcal {A}}_0\), say \(I_s\), where s is a subset of \(\{1,2,\ldots ,n\}\). If \(s\subseteq {\mathcal {S}}_2\), then it is already a singleton. If not, it can be written as \(s^{(j)}\cup (s\backslash s^{(j)})\), with \(s^{(j)}\subseteq {\mathcal {S}}_1 \cup {\mathcal {S}}_3\) and \(s\backslash s^{(j)}\subseteq {\mathcal {S}}_2\) for some \(j \in \{1,2,\ldots ,N\}\). Let us consider three further special cases.

  1. Case (a).

    Let \(s=s^{(j)} \cup \{r\}\) for \(r \in {\mathcal {S}}_2\). In this case, \(I_s\) is either a singleton or a null set. If it is a null set, then it cannot be a member of \({\mathcal {A}}\), and hence of \({\mathcal {A}}_0\). Thus, Case (a) contributes only singletons to \({\mathcal {A}}_0\).

  2. Case (b).

    Let \(s=s^{(j)} \cup \{r_1,r_2,\ldots ,r_p\},\) for \(r_1,r_2,\ldots , r_p \in {\mathcal {S}}_2\) when \(p >1\). In this case, \(I_s\) is either a singleton or a null set. Since the absolute continuity of the time-to-event distribution almost surely precludes coincidence of two sample values (say, \(T_{r_1}\) and \(T_{r_2}\)), \(I_s\) is a null set with probability 1. In summary, Case (b) cannot contribute anything other than a singleton to \({\mathcal {A}}_0\).

  3. Case (c).

    Let \(s=s^{(j)}\). The probability that a specific individual (say, the i-th one) has the landmark event at an age contained in \(A_{s^{(j)}}\) is

    $$\begin{aligned} P(T_i \in A_{s^{(j)}}, \delta _i \epsilon _i=1). \end{aligned}$$

    Since this quantity is strictly positive, the probability that none of the n individuals have had the landmark event in \(A_{s^{(j)}}\) and recalled the date is

    $$\begin{aligned} \left( 1-P(T_i \in A_{s^{(j)}}, \delta _i \epsilon _i=1)\right) ^n, \end{aligned}$$

    which goes to zero as \(n \rightarrow \infty \). Thus, the probability that there is \(i \in {\mathcal {S}}_2\) such that \(T_i \in A_{s^{(j)}}\) goes to one as \(n\rightarrow \infty \). Therefore, \(I_{s^{(j)} \cup \{i\}}=I_{s^{(j)}} \cap \{T_i\}\) is non-null. It follows that \(P[I_s\notin {\mathcal {A}}_0]\) goes to one.

The statement of the theorem follows by combining the three cases.

1.4 Proof of Theorem 4

From (14), the log-likelihood is given by

$$\begin{aligned} \ell ({{\mathbf {p}}},{{\mathbf {b}}})=\sum _{i=1}^{n}\left( \ln \bigg (\sum _{j=1}^{m}\beta _{ij}q_j\bigg )\right) \end{aligned}$$
(33)

Consider maximization of \(\ell ({{\mathbf {p}}},{{\mathbf {b}}})\) periodically with respect to \({{\mathbf {p}}}\) and \({{\mathbf {b}}}\). Given \(({{\mathbf {p}}}^{(n)},{{\mathbf {b}}}^{(n)})\), the iterate at the nth stage, define the next iterate \(({{\mathbf {p}}}^{(n+1)},{{\mathbf {b}}}^{(n+1)})\) by

$$\begin{aligned} {{\mathbf {b}}}^{(n+1)}= & {} \left\{ \begin{array}{ll} {{\mathbf {b}}}^{(n)} &{} \text {if }n\hbox { is even},\\ \mathop {\hbox {argmax}}\limits _{\displaystyle {{\mathbf {b}}}\in S_2}\ell ({{\mathbf {p}}}^{(n)},{{\mathbf {b}}}) &{} \text {if }n\hbox { is odd}, \end{array} \right. \nonumber \\ {{\mathbf {p}}}^{(n+1)}= & {} \left\{ \begin{array}{ll} {{\mathbf {p}}}^{(n)} &{} \text {if }n\hbox { is odd},\\ \mathop {\hbox {argmax}}\limits _{\displaystyle {{\mathbf {p}}}\in S_1}\ell ({{\mathbf {p}}},{{\mathbf {b}}}^{(n)}) &{} \text {if }n\hbox { is even}, \end{array} \right. \end{aligned}$$
(34)

where \(S_1=\{{{\mathbf {p}}}:\sum _{j=1}^{m}q_j=1,\ 0\le q_1,\ldots ,q_m\le 1\}\) and \(S_2=\{{{\mathbf {b}}}:0\le b_1\le \ldots \le b_k\le 1\}\). We shall show that the functions \(\ell ({{\mathbf {p}}},\cdot )\) and \(\ell (\cdot ,{{\mathbf {b}}})\) are concave over the convex sets \(S_1\) and \(S_2\), respectively, so that there exists a maximum at each iteration. Thus, in each stage there is an increase in the likelihood (14), which is bounded by \((km)^n\), and the sequence of partially maximized likelihoods converges. Under the conditions stated in the theorem, we shall also show that the objective function is strictly concave, so that the maximum at each stage is unique, with probability tending one as \(n_2\) goes to infinity. Finally, since \(S_1\times S_2\) is a closed set, the sequence of maxima obtained at successive stages converges to a unique limit, with probability tending to one.

Let \({{\mathbf {B}}}\) be an \(n\times m\) matrix with \(\beta _{ij}\) in the ijth position. For fixed \({{\mathbf {b}}}\), the partial derivative of (33) with respect to \({{\mathbf {p}}}\) is

$$\begin{aligned} \frac{\partial \ell }{\partial {{\mathbf {p}}}}=\sum \limits _{i=1}^n\frac{B_i}{{B_i}^T{{\mathbf {p}}}} \end{aligned}$$

where \(B_i\) is the ith row of \({{\mathbf {B}}}\) matrix. The second derivative or the Hessian is

$$\begin{aligned} \frac{\partial \ell }{\partial {{\mathbf {p}}}\partial {{{\mathbf {p}}}}^T}=-\sum \limits _{i=1}^n\frac{ B_iB_i^T}{({B_i}^T{{\mathbf {p}}})^2} \end{aligned}$$
(35)

which is a non-positive definite matrix. Hence \(\ell \) is a concave function over a convex and bounded domain, which ensures the existence of a maximum (Simon and Blume 1994). Now, we need to show that the probability of the Hessian matrix being negative definite goes to one. It is enough to show for any vector \({{\mathbf {u}}}\ne 0\),

$$\begin{aligned} P\left( \sum \limits _{i=1}^n\frac{(B_i^T{{\mathbf {u}}})^2}{({B_i}^T{{\mathbf {p}}})^2}=0\right) \rightarrow 0. \end{aligned}$$

In other words, we need to show that for any arbitrary vector \({{\mathbf {u}}}\ne 0\),

$$\begin{aligned} P\left( B_i^T {{\mathbf {u}}}=0 \quad \forall i\right) =P\left( {{\mathbf {B}}}{{\mathbf {u}}}=0\right) \rightarrow 0. \end{aligned}$$
(36)

It is clear from (15) that for an individual (say i) having exactly recalled age at landmark event, \(B_i\) has only one non-zero element. In this situation, the equation \(B_i^T{{\mathbf {u}}}=0\) implies that the corresponding element of \({{\mathbf {u}}}\) is zero. Further, Theorem 3 shows that, with probability tending to one, the columns of \({{\mathbf {B}}}\) correspond only to singleton members of \({\mathcal {A}}_0\) associated with individuals recalling age at event exactly. Therefore, with probability tending to one, the event \({{\mathbf {B}}}{{\mathbf {u}}}=0\) coincides with the event \({{\mathbf {u}}}=0\).

For fixed \({{\mathbf {p}}}\), the first derivative of (33) with respect to \({{\mathbf {b}}}\) is

$$\begin{aligned} \frac{\partial \ell }{\partial {{\mathbf {b}}}}=\sum \limits _{i=1}^n \frac{\mathbf{A}_i{{\mathbf {p}}}}{{B_i}^T{{\mathbf {p}}}} \end{aligned}$$

where \(\mathbf{A}_i\) is the \(k\times m\) matrix with the \((l,j)^\mathrm{th}\) element given by \(\frac{\partial \beta _{ij}}{\partial b_l}\).

The Hessian with respect to \({{\mathbf {b}}}\) is

$$\begin{aligned} \frac{\partial \ell }{\partial {{\mathbf {b}}}\partial {{{\mathbf {b}}}^T}}=-\sum _{i=1}^n\left( B_i^T{{\mathbf {p}}}\right) ^{-2} \mathbf{A}_i{{\mathbf {p}}}{{\mathbf {p}}}^T\mathbf{A}_i^T \end{aligned}$$
(37)

which is non-positive definite matrix. Hence \(\ell \) is a concave function over a convex domain, it ensures the existence of a maximum (Simon and Blume 1994).

In order to prove the negative definiteness of the Hessian with probability tending to one, we need to show that for any arbitrary vector \({{\mathbf {v}}}\ne 0\),

$$\begin{aligned} P\left( {{\mathbf {v}}}^T \mathbf{A}_i{{\mathbf {p}}}=0 \quad \forall i\right) \rightarrow 0. \end{aligned}$$
(38)

From (15), it follows that for \(i\in {\mathcal {I}}_2\),

$$\begin{aligned} \mathbf{A}_i {{\mathbf {p}}}=-\left( \sum _{j=1}^m q_j\cdot I(J_j \subset A_i)\right) \big (I(T_i \in A_{i1}),\ldots ,I(T_i \in A_{ik})\big )^T, \end{aligned}$$
(39)

which is a vector with a non-zero element exactly at one place. The condition \({{\mathbf {v}}}^T \mathbf{A}_i{{\mathbf {p}}}=0\) is equivalent to the requirement that the element of \({{\mathbf {v}}}\) corresponding to the non-zero element of \(\mathbf{A}_i{{\mathbf {p}}}\) is zero. On the other hand, as \(n_2 \rightarrow \infty \),

$$\begin{aligned}&P\Big (\sum _{i\in {\mathcal {I}}_2} I \big ((S_i-T_i) \in [x_l,x_{l+1}]\big )=0\Big )\\&\quad =\Big [P\Big ((S_i-T_i)\in [x_l,x_{l+1}]|\delta _i\varepsilon _i=1\Big )\Big ]^{n_2} \rightarrow 0 \ \ \ \forall l. \end{aligned}$$

Thus, for all \(l=1,\ldots ,k\), there is at least one \(i\in {\mathcal {I}}_2\) such that \(T_i\in A_{il}\), with probability tending to one. Therefore, the condition \({{\mathbf {v}}}^T \mathbf{A}_i{{\mathbf {p}}}=0 \quad \forall i\in {\mathcal {I}}_2\) reduces, with probability tending to one, to the requirement that all the elements of \({{\mathbf {v}}}\) are zero. Therefore, for \({{\mathbf {v}}}\ne 0\), we have \(P\left( {{\mathbf {v}}}^T \mathbf{A}_i{{\mathbf {p}}}=0, \quad \forall i\right) \le P\left( {{\mathbf {v}}}^T \mathbf{A}_i{{\mathbf {p}}}=0, \quad \forall i\in {\mathcal {I}}_2\right) \rightarrow 0\). Thus, the probability that the Hessian matrix defined in (37) is negative definite goes to one. This completes the proof.

1.5 Proof of Theorem 6

The proof relies on an application of Theorem 3.1 of Wang (1985), in the manner it was used by Gentleman and Geyer (1994). The said theorem makes use of five assumptions.

The first assumption requires a separable compactification of parameter space \(\varTheta \). In the present case, the set \(\overline{\varTheta }\) serves this purpose. The Lévy distance can be used as metric, and the compactness follows by the Helley selection theorem. Homeomorphic mapping of \([t_{min},t_{max}]\) to [0, 1] can be used to establish separability (Billingsley 1968, p. 239). The equivalence class \({\mathcal {E}}\) defined by (24) is regarded as a single point in \(\varTheta \). This takes care of the issue of non-identifiability as in Redner (1981).

Let, for \(r=1,2,\ldots ,\) \(V_r(F)\) be the Lévy neighborhood of \(F\in \varTheta \) with radius 1 / r. For such a sequence of decreasing open neighborhoods, Wang (1985)’s second assumption requires that, for any \(F_0\) in \(\varTheta \), there is a function \(F_r:\,\overline{\varTheta }\rightarrow V_r(F_0)\) such that (a) \(\ell (F)-\ell (F_r(F))\) is locally dominated on \(\overline{\varTheta }\) and (b) \(F_r(F)\) is in \(\varTheta \) if \(F\in \varTheta \). We define \(F_r(F)=\frac{1}{r+1}F+\frac{r}{r+1}F_0\). Since \(\Vert F_r(F)-F_0\Vert =\frac{1}{r+1}\Vert F-F_0\Vert \), and the Lévy distance is dominated by the Kolmogorov-Smirnov distance, it is clear that \(F_r(F)\in V_r(F_0)\). Condition (b) is obviously satisfied. As for condition (a), note that

$$\begin{aligned}&\sup _{F\in \overline{\varTheta }}\big [\ell (F)-\ell (F_{F,r})\big ]\\&\quad =\sup _{F\in \overline{\varTheta }} \ln \frac{\sum _{j=1}^p \alpha _{ij} \left( F(t_j)-F(t_{j^-})\right) }{\frac{1}{r+1}\big [\sum _{j=1}^p \alpha _{ij} \left( F(t_j)-F(t_{j^-})\right) \big ]+\frac{r}{r+1}\big [\sum _{j=1}^p \alpha _{ij} \left( F_0(t_j)-F_0(t_{j^-})\right) \big ]} \\&\quad \le \ln (r+1), \end{aligned}$$

which has finite expectation. Thus, \(\ell (F)-\ell (F_r(F))\) is globally dominated on \(\overline{\varTheta }\).

The third assumption requires that \(E[\ell (F)-\ell (F_r(F))]<0\) for \(F_0\in \varTheta \), \(F\in \overline{\varTheta }\), \(F\ne F_0\). Here, \(F_0\) needs to be interpreted as \({\mathcal {E}}\), and the result follows along the lines of the proof of Lemma 4.4 of Wang (1985).

The fourth and fifth assumptions require that \(\ell (F)-\ell (F_r(F))\) is lower and upper semicontinuous for \(F \in \overline{\varTheta }\) except for a null set of points (which may depend on F only in the case of upper semicontinuity). Both the conditions follow from the portmanteau theorem (Billingsley 1968, p. 11), as argued by Gentleman and Geyer (1994). No null set needs to be invoked.

Since all the assumptions hold, the stated result follows from Theorem 3.1 of Wang (1985).

1.6 Proof of Theorem 7

Theorem 6 says that the Lévy distance of \(\{\tilde{F_n}\}\) from the equivalence class \({\mathcal {E}}\) goes to zero almost surely as n goes to infinity, that is,

$$\begin{aligned} \mathop {\inf }\limits _{F\in {\mathcal {E}}} d_L(\tilde{F_n},F)\rightarrow 0 \qquad \text{ as } n\rightarrow \infty \qquad \text{ with } \text{ probability } \text{1. } \end{aligned}$$

It follows that \(P(\inf _{F\in {\mathcal {E}}} d_L(\tilde{F_n},F) > \epsilon )\rightarrow 0.\)

Using the fact that \(P(\omega : \tilde{F_n}(\omega )=\hat{F_n}(\omega ))\rightarrow 1\), we conclude

$$\begin{aligned} P\left( \mathop {\inf }\limits _{F\in {\mathcal {E}}} d_L(\hat{F_n},F) > \epsilon \right) \rightarrow 0. \end{aligned}$$

which proves the statement.

1.7 Proof of Theorem 8

Note that the equivalence class defined in (24) is the class of all distribution functions that have Kullback-Liebler ‘distance’ zero from the true unknown distribution. Let H be the probability measure corresponding to the density h, (which is determined by g, \(\pi _\eta \) and F through (6)). Let \(H_0\) be the ‘true’ value of H. The Kullback-Liebler ‘distance’ between H and \(H_0\) is defined as \(D(H\Vert H_0)=\mu (h\log (\frac{h}{h_0}))\). By Jensen’s inequality it is easy to see that \(D(H\Vert H_0)\ge 0\). The equality in Jensen’s inequality holds if and only if the argument of the \(\log \) function is a constant, i.e.,

$$\begin{aligned} D(H\Vert H_0)=0 \quad \text{ iff } \quad H=H_0. \end{aligned}$$
(40)

Under the conditions given in part (b) or (c) of Theorem 1, H completely identifies F. Hence, \(H=H_0\) implies \(F=F_0\). It follows that the true distribution of the time-to-event, \(F_0\), is the only member of the equivalence class \({\mathcal {E}}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirzaei Salehabadi, S., Sengupta, D. Nonparametric estimation of time-to-event distribution based on recall data in observational studies. Lifetime Data Anal 22, 473–503 (2016). https://doi.org/10.1007/s10985-015-9345-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-015-9345-9

Keywords

Navigation