Cumulative risk regression in case–cohort studies using pseudo-observations


Case–cohort studies are useful when information on certain risk factors is difficult or costly to ascertain. Particularly, a case–cohort study may be well suited in situations where several case series are of interest, e.g. in studies with competing risks, because the same sub-cohort may serve as a comparison group for all case series. Previous analyses of this kind of sampled cohort data most often involved estimation of rate ratios based on a Cox regression model. However, with competing risks this method will not provide parameters that directly describe the association between covariates and cumulative risks. In this paper, we study regression analysis of cause-specific cumulative risks in case–cohort studies using pseudo-observations. We focus mainly on the situation with competing risks. However, as a by-product, we also develop a method by which absolute mortality risks may be analyzed directly from case–cohort survival data. We adjust for the case–cohort sampling by inverse sampling probabilities applied to a generalized estimation equation. The large-sample properties of the proposed estimator are developed and small-sample properties are evaluated in a simulation study. We apply the methodology to study the effect of a specific diet component and a specific gene on the absolute risk of atrial fibrillation.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1


  1. Andersen PK (2013) Decomposition of number of years lost according to causes of death. Stat Med 32:5278–5285

  2. Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90:15–27

  3. Andersen PK, Hansen MG, Klein JP (2004) Regression analysis of restricted mean survival time based on pseudo-observations. Life Time Data Anal 10:335–350

  4. Barlow WE (1994) Robust variance estimation for case-cohort design. Biometrics 50:1064–1072

  5. Binder N, Gerds TA, Andersen PK (2014) Pseudo-observations for competing risks with covariate dependent censoring. Life time Data Anal 20(2):303–315

  6. Borgan O, Samuelsen SO (2013) Nested case-control and case-cohort studies. In: Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH (eds) Handbook of survival analysis. Chapman and Hall/CRC, Boca Raton, pp 343–367

  7. Cai J, Zeng D (2007) Power calculation for case-cohort studies with nonrare events. Biometrics 63(4):1288–1295

  8. Chen K (2001) Generalized case-cohort sampling. J R Stat Soc Ser B (Stat Methodol) 63(4):791–809

  9. Jacobsen M, Martinussen T (2016) A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scand J Stat 43(3):845–862

  10. Jewell NP, Lei X, Ghani AC, Donnelly CA, Leung GM, Ho LM, Cowling BJ, Hedley AJ (2007) Non-parametric estimation of the case fatality ratio with competing risks data: an application to severe acute respiratory syndrome (sars). Stat Med 26(9):1982–1998

  11. Josefsson A, Magnusson P, Ylitalo N, Sørensen P, Qwarforth-Tubbin P, Andersen P, Melbye M, Adami HO, Gyllensten U (2000) Viral load of human papilloma virus 16 as a determinant for development of cervical carcinoma in situ: a nested case-control study. Lancet 355(9222):2189–2193

  12. Kalbfleisch J, Lawless J (1988) Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 7(1–2):149–160

  13. Kang S, Cai J (2009) Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96(4):887–901

  14. Klein JP, Andersen PK (2005) Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics 61(1):223–229

  15. Klein JP, Andersen PK, Logan BL, Harhoff MG (2007) Analyzing survival curves at a fixed point in time. Stat Med 26:4505–4519

  16. Kulich M, Lin D (2000) Additive hazards regression for case-cohort studies. Biometrika 87(1):73–87

  17. Kulich M, Lin D (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99(467):832–844

  18. Langholz B, Jiao J (2007) Computational methods for case-cohort studies. Comput Stat Data Anal 51(8):3737–3748

  19. Lin D (2000) On fitting cox’s proportional hazards models to survey data. Biometrika 87(1):37–47

  20. Lin DY, Wei LJ (1989) The robust inference for the cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078

  21. Lin DY, Ying Z (1993) Cox regression with incomplete covariate measurements. J Am Stat Assoc 88:1341–1349

  22. Overgaard M, Parner ET, Pedersen J (2017) Asymptotic theory of generalized estimating equation based on Jack-knife pseudo-observations. Ann Stat 45(5):1988–2015

  23. Overgaard M, Parner ET, Pedersen J (2018) Estimating the variance in a pseudo-observation scheme with competing risks. Scand J Stat 45(4):923–940

  24. Petersen L, Sørensen T, Andersen P (2003) Comparison of case-cohort estimators based on data on premature death of adult adoptees. Stat Med 22:3795–3803

  25. Petersen L, Andersen P, Sørensen T (2005) Premature death of adult adoptees: analyses of a case-cohort sample. Biom J 47:815–824

  26. Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

  27. Scheike T, Martinussen T (2004) Maximum likelihood estimation for Cox’s regreession model under case-cohort sampling. Scand J Stat 31:283–293

  28. Scheike T, Zhang M, Gerds T (2008) Predicting cumulative incidence probability by direct binomial regression. Biometrika 95(1):205–220

  29. Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81

  30. Thomas DC (1977) Addendum to: Methods of cohort analysis: appraisal by application to asbestos mining, by F. D. K. Liddell, J. C. McDonald and D. C. Thomas. Royal Statistical Society: Series A (General) 140:469–491

  31. Tjønneland A, Olsen A, Boll K, Stripp C, Christensen J, Overvad K (2007) Study design, exposure variables, and socioeconomic determinants of participation in diet, cancer and health: a population-based prospective cohort study of 57,053 men and women in Denmark. Scand J Public Health 35(4):432–41

  32. Zhang H, Goldstein L (2003) Information and asymptotic efficiency of the case-cohort sampling design in cox’s regression model. J Multivar Anal 85(2):292–317

Download references


We are grateful for constructive comments from the referees and an associate editor. The data from Danish Diet, Cancer, and Health Cohort was kindly made available by Lotte Maxild Mortensen. The work presented in this article is supported by Novo Nordisk Foundation grant NNF17OC0028276.

Author information

Correspondence to Erik T. Parner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 186 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Parner, E.T., Andersen, P.K. & Overgaard, M. Cumulative risk regression in case–cohort studies using pseudo-observations. Lifetime Data Anal (2020) doi:10.1007/s10985-020-09492-3

Download citation


  • Case–cohort study
  • Competing risks
  • Cumulative incidence
  • Cumulative risk
  • Pseudo-observations