Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

  • Qingning Zhou
  • Jianwen Cai
  • Haibo Zhou


We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.


Bernstein polynomial Biased sampling Missing data Proportional hazards model Sieve estimation 



The authors thank the Editor, Associate Editor and reviewers for their helpful comments and suggestions that have improved the paper. The authors also thank the Global Solutions in Infectious Diseases (GSID) and Dr. Peter Gilbert for providing data from the phase 3 HIV vaccine trial VAX004. This research was partially supported by grants from the National Institutes of Health (R01ES021900, P01CA142538 and P30ES010126). Qingning Zhou’s work was supported, in part, by funds provided by the University of North Carolina at Charlotte.

Supplementary material

10985_2019_9461_MOESM1_ESM.pdf (171 kb)
Supplementary Materials The supplementary materials include the two lemmas and their proofs as well as some additional simulation results for a smaller cohort size N=1000. (171 KB)


  1. Bickel PJ, Klaassen CA, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, BaltimorezbMATHGoogle Scholar
  2. Chatterjee N, Chen Y-H, Breslow NE (2003) A pseudoscore estimator for regression problems with two-phase sampling. J Am Stat Assoc 98(461):158–168MathSciNetCrossRefzbMATHGoogle Scholar
  3. Chen D-G, Sun J, Peace KE (2012) Interval-censored time-to-event data: methods and applications. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  4. Chen K, Lo S-H (1999) Case-cohort and case-control analysis with Cox’s model. Biometrika 86(4):755–764MathSciNetCrossRefzbMATHGoogle Scholar
  5. Cornfield J (1951) A method of estimating comparative rates from clinical data. applications to cancer of the lung, breast, and cervix. J Nat Cancer Inst 11(6):1269–1275Google Scholar
  6. Ding J, Zhou H, Liu Y, Cai J, Longnecker MP (2014) Estimating effect of environmental contaminants on women’s subfecundity for the MoBa study data with an outcome-dependent sampling scheme. Biostatistics 15(4):636–650CrossRefGoogle Scholar
  7. Ding J, Lu T-S, Cai J, Zhou H (2017) Recent progresses in outcome-dependent sampling with failure time data. Lifetime Data Anal 23(1):57–82MathSciNetCrossRefzbMATHGoogle Scholar
  8. Gilbert PB, Peterson ML, Follmann D, Hudgens MG, Francis DP, Gurwith M, Heyward WL, Jobes DV, Popovic V, Self SG, Sinangil F, Burke D, Berman PW (2005) Correlation between immunologic responses to a recombinant glycoprotein 120 vaccine and incidence of HIV-1 infection in a phase 3 HIV-1 preventive vaccine trial. J Infect Dis 191(5):666–677CrossRefGoogle Scholar
  9. Harro CD, Judson FN, Gorse GJ, Mayer KH, Kostman JR, Brown SJ, Koblin B, Marmor M, Bartholow BN, Popovic V et al (2004) Recruitment and baseline epidemiologic profile of participants in the first phase 3 HIV vaccine efficacy trial. J Acquir Immune Defic Syndr 37(3):1385–1392CrossRefGoogle Scholar
  10. Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967MathSciNetCrossRefzbMATHGoogle Scholar
  11. Huang J, Wellner JA (1997) Interval censored survival data: a review of recent progress. In: Proceedings of the first Seattle symposium in biostatistics, pp 123–169. SpringerGoogle Scholar
  12. Huang J, Zhang Y, Hua L (2012) Consistent variance estimation in semiparametric models with application to interval-censored data. In: Chen DG, Sun J, Peace KE (eds)Interval-censored time-to-event data: methods and applications, pp 233–268Google Scholar
  13. Kang S, Cai J (2009) Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96(4):887–901MathSciNetCrossRefzbMATHGoogle Scholar
  14. Kulich M, Lin D (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99(467):832–844MathSciNetCrossRefzbMATHGoogle Scholar
  15. Li Z, Nan B (2011) Relative risk regression for current status data in case-cohort studies. Canad J Stat 39(4):557–577MathSciNetCrossRefzbMATHGoogle Scholar
  16. Li Z, Gilbert P, Nan B (2008) Weighted likelihood method for grouped survival data in case-cohort studies with application to HIV vaccine trials. Biometrics 64(4):1247–1255MathSciNetCrossRefzbMATHGoogle Scholar
  17. Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New YorkzbMATHGoogle Scholar
  18. Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11MathSciNetCrossRefzbMATHGoogle Scholar
  19. Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81MathSciNetCrossRefzbMATHGoogle Scholar
  20. Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615MathSciNetCrossRefzbMATHGoogle Scholar
  21. Song R, Zhou H, Kosorok MR (2009) A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika 96(1):221–228MathSciNetCrossRefzbMATHGoogle Scholar
  22. Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New YorkzbMATHGoogle Scholar
  23. Sun Y, Qian X, Shou Q, Gilbert PB (2017) Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal 23(3):377–399MathSciNetCrossRefzbMATHGoogle Scholar
  24. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New YorkCrossRefzbMATHGoogle Scholar
  25. Weaver MA, Zhou H (2005) An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. J Am Stat Assoc 100(470):459–469MathSciNetCrossRefzbMATHGoogle Scholar
  26. Whittemore AS (1997) Multistage sampling designs and estimating equations. J R Stat Soc B 59(3):589–602MathSciNetCrossRefzbMATHGoogle Scholar
  27. Xue H, Lam K, Li G (2004) Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Am Stat Assoc 99(466):346–356MathSciNetCrossRefzbMATHGoogle Scholar
  28. Yu J, Liu Y, Sandler DP, Zhou H (2015) Statistical inference for the additive hazards model under outcome-dependent sampling. Canad J Stat 43(3):436–453MathSciNetCrossRefzbMATHGoogle Scholar
  29. Zeng D, Lin DY (2014) Efficient estimation of semiparametric transformation models for two-phase cohort studies. J Am Stat Assoc 109(505):371–383MathSciNetCrossRefzbMATHGoogle Scholar
  30. Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354MathSciNetCrossRefzbMATHGoogle Scholar
  31. Zhou H, Weaver M, Qin J, Longnecker M, Wang M (2002) A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58(2):413–421MathSciNetCrossRefzbMATHGoogle Scholar
  32. Zhou H, Song R, Wu Y, Qin J (2011) Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67(1):194–202MathSciNetCrossRefzbMATHGoogle Scholar
  33. Zhou Q, Zhou H, Cai J (2017a) Case-cohort studies with interval-censored failure time data. Biometrika 104(1):17–29MathSciNetCrossRefGoogle Scholar
  34. Zhou Q, Hu T, Sun J (2017b) A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J Am Stat Assoc 112(518):664–672MathSciNetCrossRefGoogle Scholar
  35. Zhou Q, Cai J, Zhou H (2018) Outcome-dependent sampling with interval-censored failure time data. Biometrics 74(1):58–67MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of North Carolina at CharlotteCharlotteUSA
  2. 2.Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel HillUSA
  3. 3.Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations