# Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

- 50 Downloads

## Abstract

We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.

## Keywords

Bernstein polynomial Biased sampling Missing data Proportional hazards model Sieve estimation## Notes

### Acknowledgements

The authors thank the Editor, Associate Editor and reviewers for their helpful comments and suggestions that have improved the paper. The authors also thank the Global Solutions in Infectious Diseases (GSID) and Dr. Peter Gilbert for providing data from the phase 3 HIV vaccine trial VAX004. This research was partially supported by grants from the National Institutes of Health (R01ES021900, P01CA142538 and P30ES010126). Qingning Zhou’s work was supported, in part, by funds provided by the University of North Carolina at Charlotte.

## Supplementary material

## References

- Bickel PJ, Klaassen CA, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, BaltimorezbMATHGoogle Scholar
- Chatterjee N, Chen Y-H, Breslow NE (2003) A pseudoscore estimator for regression problems with two-phase sampling. J Am Stat Assoc 98(461):158–168MathSciNetCrossRefzbMATHGoogle Scholar
- Chen D-G, Sun J, Peace KE (2012) Interval-censored time-to-event data: methods and applications. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
- Chen K, Lo S-H (1999) Case-cohort and case-control analysis with Cox’s model. Biometrika 86(4):755–764MathSciNetCrossRefzbMATHGoogle Scholar
- Cornfield J (1951) A method of estimating comparative rates from clinical data. applications to cancer of the lung, breast, and cervix. J Nat Cancer Inst 11(6):1269–1275Google Scholar
- Ding J, Zhou H, Liu Y, Cai J, Longnecker MP (2014) Estimating effect of environmental contaminants on women’s subfecundity for the MoBa study data with an outcome-dependent sampling scheme. Biostatistics 15(4):636–650CrossRefGoogle Scholar
- Ding J, Lu T-S, Cai J, Zhou H (2017) Recent progresses in outcome-dependent sampling with failure time data. Lifetime Data Anal 23(1):57–82MathSciNetCrossRefzbMATHGoogle Scholar
- Gilbert PB, Peterson ML, Follmann D, Hudgens MG, Francis DP, Gurwith M, Heyward WL, Jobes DV, Popovic V, Self SG, Sinangil F, Burke D, Berman PW (2005) Correlation between immunologic responses to a recombinant glycoprotein 120 vaccine and incidence of HIV-1 infection in a phase 3 HIV-1 preventive vaccine trial. J Infect Dis 191(5):666–677CrossRefGoogle Scholar
- Harro CD, Judson FN, Gorse GJ, Mayer KH, Kostman JR, Brown SJ, Koblin B, Marmor M, Bartholow BN, Popovic V et al (2004) Recruitment and baseline epidemiologic profile of participants in the first phase 3 HIV vaccine efficacy trial. J Acquir Immune Defic Syndr 37(3):1385–1392CrossRefGoogle Scholar
- Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967MathSciNetCrossRefzbMATHGoogle Scholar
- Huang J, Wellner JA (1997) Interval censored survival data: a review of recent progress. In: Proceedings of the first Seattle symposium in biostatistics, pp 123–169. SpringerGoogle Scholar
- Huang J, Zhang Y, Hua L (2012) Consistent variance estimation in semiparametric models with application to interval-censored data. In: Chen DG, Sun J, Peace KE (eds)Interval-censored time-to-event data: methods and applications, pp 233–268Google Scholar
- Kang S, Cai J (2009) Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96(4):887–901MathSciNetCrossRefzbMATHGoogle Scholar
- Kulich M, Lin D (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99(467):832–844MathSciNetCrossRefzbMATHGoogle Scholar
- Li Z, Nan B (2011) Relative risk regression for current status data in case-cohort studies. Canad J Stat 39(4):557–577MathSciNetCrossRefzbMATHGoogle Scholar
- Li Z, Gilbert P, Nan B (2008) Weighted likelihood method for grouped survival data in case-cohort studies with application to HIV vaccine trials. Biometrics 64(4):1247–1255MathSciNetCrossRefzbMATHGoogle Scholar
- Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New YorkzbMATHGoogle Scholar
- Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11MathSciNetCrossRefzbMATHGoogle Scholar
- Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81MathSciNetCrossRefzbMATHGoogle Scholar
- Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615MathSciNetCrossRefzbMATHGoogle Scholar
- Song R, Zhou H, Kosorok MR (2009) A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika 96(1):221–228MathSciNetCrossRefzbMATHGoogle Scholar
- Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New YorkzbMATHGoogle Scholar
- Sun Y, Qian X, Shou Q, Gilbert PB (2017) Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal 23(3):377–399MathSciNetCrossRefzbMATHGoogle Scholar
- van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Weaver MA, Zhou H (2005) An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. J Am Stat Assoc 100(470):459–469MathSciNetCrossRefzbMATHGoogle Scholar
- Whittemore AS (1997) Multistage sampling designs and estimating equations. J R Stat Soc B 59(3):589–602MathSciNetCrossRefzbMATHGoogle Scholar
- Xue H, Lam K, Li G (2004) Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Am Stat Assoc 99(466):346–356MathSciNetCrossRefzbMATHGoogle Scholar
- Yu J, Liu Y, Sandler DP, Zhou H (2015) Statistical inference for the additive hazards model under outcome-dependent sampling. Canad J Stat 43(3):436–453MathSciNetCrossRefzbMATHGoogle Scholar
- Zeng D, Lin DY (2014) Efficient estimation of semiparametric transformation models for two-phase cohort studies. J Am Stat Assoc 109(505):371–383MathSciNetCrossRefzbMATHGoogle Scholar
- Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354MathSciNetCrossRefzbMATHGoogle Scholar
- Zhou H, Weaver M, Qin J, Longnecker M, Wang M (2002) A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58(2):413–421MathSciNetCrossRefzbMATHGoogle Scholar
- Zhou H, Song R, Wu Y, Qin J (2011) Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67(1):194–202MathSciNetCrossRefzbMATHGoogle Scholar
- Zhou Q, Zhou H, Cai J (2017a) Case-cohort studies with interval-censored failure time data. Biometrika 104(1):17–29MathSciNetCrossRefGoogle Scholar
- Zhou Q, Hu T, Sun J (2017b) A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J Am Stat Assoc 112(518):664–672MathSciNetCrossRefGoogle Scholar
- Zhou Q, Cai J, Zhou H (2018) Outcome-dependent sampling with interval-censored failure time data. Biometrics 74(1):58–67MathSciNetCrossRefGoogle Scholar