Abstract
The case-cohort design, among many two-phase sampling designs, substantially reduces the cost of an epidemiological study by selecting more informative participants within the full cohort for expensive variable measurements. Despite their benefits, additive hazards models, which estimate hazard differences, have rarely been used for the analysis of case-cohort studies due to the lack of software and application examples. In this paper, we describe a newly developed estimation method that fits the additive hazards models to general two-phase sampling studies along with the R package addhazard that implements it. It allows for missing covariates among cases, cohort stratification, robust variances, and the incorporation of auxiliary information from the full cohort to enhance inference precision. We demonstrate the use of this tool to estimate the association of the risk of coronary heart disease (CHD) with biomarkers high-sensitivity C-reactive protein (hs-CRP) and Lipoprotein-associated phospholipase A2 (Lp-PLA2) by analyzing the Atherosclerosis Risk in Communities Study, which adopted a two-phase sampling design for studying these two biomarkers. We show that the use of auxiliary variables from the full cohort based on calibration techniques improves the precision of the hazard difference being estimated. We observe a synergistic effect of the two biomarkers among participants with lower LDL cholesterol (LDL-C): the CHD hazard rate attributable to the combined action of high hs-CRP and high Lp-PLA2 exceeded the sum of the CHD hazard rate attributable to each one independently by 11.58 (95% CI 2.16–21.01) cases per 1000 person-years. With higher LDL-C, we observe the CHD hazard rate attributable to the combined action of high hs-CRP and medium Lp-PLA2 was less than the sum of their individual effects by 13.42 (95% CI 2.44–24.40) cases per 1000 person-years. This demonstration serves the dual purposes of illustrating analysis techniques and providing insights about the utility of hs-CRP and Lp-PLA2 for identifying the high-risk population of CHD that the traditional risk factors such as the LDL-C may miss. Epidemiologists are encouraged to use this new tool to analyze other case-cohort studies and incorporate auxiliary variables embedded in the full cohort in their analysis.
Similar content being viewed by others
Availability of data and materials
The ARIC dataset can be requested through the ARIC website https://biolincc.nhlbi.nih.gov/studies/aric/. The dataset identifier is available upon request from the correspondence author. The NWTSG dataset is available in the R package addhazard. The data dictionary is provided in its reference manual.
Code availability
The analysis code is provided in the supplementary material and the software can be downloaded from the CRAN website https://cran.r-project.org/web/packages/addhazard/index.html or from the github website https://github.com/cran/addhazard.
References
Neyman J. Contribution to the theory of sampling human populations. J Am Stat Assoc. 1938;33(201):101–16.
Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73(1):1–11.
Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposure stratified case-cohort designs. Lifetime Data Anal. 2000;6(1):39–58.
Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics. 2004;60(4):1015–24.
Gray RJ. Weighted analyses for cohort sampling designs. Lifetime Data Anal. 2009;15(1):24–40.
Breslow NE, Wellner JA. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression. Scand J Stat. 2007;34(1):86–102.
Ballantyne CM, Hoogeveen RC, Bang H, Coresh J, Folsom AR, Heiss G, et al. Lipoprotein-associated phospholipase A2, high-sensitivity C-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the Atherosclerosis Risk In Communities (ARIC) study. Circulation. 2004;109(7):837–42.
InterAct Consortium and others. Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study. Diabetologia. 2011; 54(9):2272.
van den Brandt PA, Goldbohm RA, Veer PV, Volovics A, Hermus RJ, Sturmans F. A large-scale prospective cohort study on diet and cancer in The Netherlands. J Clin Epidemiol. 1990;43(3):285–95.
Sharp SJ, Poulaliou M, Thompson SG, White IR, Wood AM. A review of published analyses of case-cohort studies and recommendations for future reporting. PLoS ONE. 2014;9(6):357–81.
Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81(1):61–71.
Aalen O. A model for nonparametric regression analysis of counting processes. In: Mathematical statistics and probability theory. Springer, New York; 1980. p. 1–25.
McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81(3):501–14.
Cox DR, Oakes D. Analysis of survival data. London: Chapman and Hall; 1984.
Thomas DC. Use of auxiliary information in fitting nonproportional hazards models. In: Modern statistical methods in chronic disease epidemiology. Wiley, New York; 1986. p. 197–210.
Breslow NE, Day NE. The design and analysis of cohort studies. IARC Scientific Publications No 82. 1987; International Agency for Research on Cancer, Lyon.
Breslow NE, Day NE. Statistical methods in cancer research. International agency for research on cancer Lyon; 1980.
Sjölander A, Dahlqwist E, Zetterqvist J. A note on the noncollapsibility of rate differences and rate ratios. Epidemiology. 2016;27(3):356–9.
Klein JP. Modelling competing risks in cancer studies. Stat Med. 2006;25(6):1015–34.
Rothman KJ. Synergy and antagonism in cause-effect relationships. Am J Epidemiol. 1974;99(6):385–8.
Hu JK. addhazard: Fit Additive Hazards Models for Survival Analysis; 2020. R package version 1.2.0.
Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol. 2009;169(11):1398–405.
Ridker PM. High-sensitivity C-reactive protein potential adjunct for global risk assessment in the primary prevention of cardiovascular disease. Circulation. 2001;103(13):1813–8.
Kang S, Cai J, Chambless L. Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk In Communities (ARIC) study. Biostatistics. 2013;14(1):28–41.
Silva IT, Mello AP, Damasceno NR. Antioxidant and inflammatory aspects of lipoprotein-associated phospholipase A2 (Lp-PLA2): a review. Lipids Health Disease. 2011;10:170.
Rod NH, Lange T, Andersen I, Marott JL, Diderichsen F. Additive interaction in survival analysis: use of the additive hazards model. Epidemiology. 2012;23(5):733–7.
D’angio GJ, Breslow NE, Beckwith JB, Evans A, Baum E, Delorimier A, et al. Treatment of Wilms’ tumor. Results of the third national Wilms’ tumor study. Cancer. 1989;64(2):349–60.
Green DM, Breslow NE, Beckwith JB, Finklestein JZ, Grundy PE, Thomas PR, et al. Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms’ tumor: a report from the National Wilms’ Tumor Study Group. J Clin Oncol. 1998;16(1):237–45.
Hu J. A Z-estimation system for two-phase sampling with applications to additive hazards models and epidemiologic studies. University of Washington; 2014.
Breslow NE, Hu JK. Survival analysis of case-control data: a sample survey approach. In: Handbook of statistical methods for case-control studies. Chapman and Hall/CRC; 2018. p. 303–327.
Huber PJ. Robust estimation of a location parameter. Ann Math Stat. 1964;35(1):73–101.
Pollard D. New ways to prove central limit theorems. Econom Theory. 1985;1(3):295–313.
van der Vaart AW. Asymptotic statistics. Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press; 1998.
Kulich M, Lin DY. Additive hazards regression for case-cohort studies. Biometrika. 2000;87(1):73–87.
Nan B, Wellner JA. A general semiparametric \(Z\)-estimation approach for case-cohort studies. Statist Sin. 2013;23(3):1155–80.
Deville JC, Särndal CE. Calibration estimators in survey sampling. J Am Stat Assoc. 1992;87(418):376–82.
Sun Y, Qian X, Shou Q, Gilbert PB. Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal. 2017;23(3):377–99.
The ARIC investigators. The Atherosclerosis Risk In Cimmunities (ARIC) Study: Design and Objectives. American Journal of Epidemiology. 1989;129(4):687–702.
Pearson TA, Mensah GA, Alexander RW, Anderson JL, Cannon RO, Criqui M, et al. Markers of Inflammation and Cardiovascular Disease Application to Clinical and Public Health Practice: A Statement for Healthcare Professionals From the Centers for Disease Control and Prevention and the American Heart Association. circulation. 2003;107(3):499–511.
Chan KCG, Yam SCP, et al. Oracle, multiple robust and multipurpose calibration in a missing response problem. Stat Sci. 2014;29(3):380–96.
Ford ES. Body mass index, diabetes, and C-reactive protein among US adults. Diabetes Care. 1999;22(12):1971–7.
Visser M, Bouter LM, McQuillan GM, Wener MH, Harris TB. Elevated C-reactive protein levels in overweight and obese adults. J Am Med Assoc. 1999;282(22):2131–5.
Rawson ES, Freedson PS, Osganian SK, Matthews CE, Reed G, Ockene IS. Body mass index, but not physical activity, is associated with C-reactive protein. Med Sci Sports Exerc. 2003;35(7):1160–6.
Ohsawa M, Okayama A, Nakamura M, Onoda T, Kato K, Itai K, et al. CRP levels are elevated in smokers but unrelated to the number of cigarettes and are decreased by long-term smoking cessation in male smokers. Prev Med. 2005;41(2):651–6.
Kulich M, Lin DY. Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc. 2004;99(467):832–44.
Acknowledgements
The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. (HHSN268201700001I, HHSN268201700003I, HHSN268201700005I, HHSN268201700004I, HHSN2682017000021). The authors thank the staff and participants of the ARIC study for their important contributions.
Funding
The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. (HHSN268201700001I, HHSN268201700003I, HHSN268201700005I, HHSN268201700004I, HHSN2682017000021). The second author is partially funded by US National Institutes of Health Grant R01HL122212 and the US National Science Foundation Grant DMS1711952.
Author information
Authors and Affiliations
Author notes
Norman Breslow is deceased. The first author thanks his guidance on her Ph.D. dissertation that leads to this work.
- Norman E. Breslow
Contributions
JH, GC, and NB conceptualized the paper. JH analyzed and led the manuscript drafting. GC and NB provided technical guidance and DC prepared the dataset. All authors contributed significantly to the manuscript editing.
Corresponding author
Ethics declarations
Conflict of interest
There are no known conflicts of interest.
Ethical approval
The Human Subjects Division of the University of Washington identified the research activity did not need IRB review and approval.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Hu, J.K., Chan, K.C.G., Couper, D.J. et al. Estimating the hazard rate difference from case-cohort studies. Eur J Epidemiol 36, 1129–1142 (2021). https://doi.org/10.1007/s10654-021-00739-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-021-00739-3