Skip to main content
Log in

Nested exposure case-control sampling: a sampling scheme to analyze rare time-dependent exposures

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

For large cohort studies with rare outcomes, the nested case-control design only requires data collection of small subsets of the individuals at risk. These are typically randomly sampled at the observed event times and a weighted, stratified analysis takes over the role of the full cohort analysis. Motivated by observational studies on the impact of hospital-acquired infection on hospital stay outcome, we are interested in situations, where not necessarily the outcome is rare, but time-dependent exposure such as the occurrence of an adverse event or disease progression is. Using the counting process formulation of general nested case-control designs, we propose three sampling schemes where not all commonly observed outcomes need to be included in the analysis. Rather, inclusion probabilities may be time-dependent and may even depend on the past sampling and exposure history. A bootstrap analysis of a full cohort data set from hospital epidemiology allows us to investigate the practical utility of the proposed sampling schemes in comparison to a full cohort analysis and a too simple application of the nested case-control design, if the outcome is not rare.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aalen OO, Borgan Ø, Gjessing HK (2008) Survival and event history analysis. Springer, New York

    Book  Google Scholar 

  • Andersen PK, Keiding N (2012) Interpretability and importance of functionals in competing risks and multistate models. Stat Med 31(11–12):1074–1088

    Article  MathSciNet  Google Scholar 

  • Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York

    Book  Google Scholar 

  • Bang CN, Gislason GH, Greve AM, Bang CA, Lilja A, Torp-Pedersen C, Andersen PK, Køber L, Devereux RB, Wachtell K (2014) New-onset atrial fibrillation is associated with cardiovascular events leading to death in a first time myocardial infarction population of 89 703 patients with long-term follow-up: a nationwide study. J Am Heart Assoc 3(1):e000382

    Article  Google Scholar 

  • Beyersmann J, Gastmeier P, Grundmann H, Bärwolff S, Geffers C, Behnke M, Rüden H, Schumacher M (2006) Use of multistate models to assess prolongation of intensive care unit stay due to nosocomial infection. Infect Control Hosp Epidemiol 27(5):493–499

    Article  Google Scholar 

  • Beyersmann J, Wolkewitz M, Schumacher M (2008) The impact of time-dependent bias in proportional hazards modelling. Stat Med 27(30):6439–6454

    Article  MathSciNet  Google Scholar 

  • Beyersmann J, Allignol A, Schumacher M (2012) Competing risk and multistate models with R. Springer, New York

    Book  Google Scholar 

  • Borgan Ø, Keogh RH (2015) Nested case–control studies: should one break the matching? Lifetime Data Anal 21(4):517–541

    Article  MathSciNet  Google Scholar 

  • Borgan Ø, Samuelsen SO (2013) Nested case-control and case-cohort studies. In: Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH (eds) Handbook of survival analysis. Chapman & Hall/CRC, Boca Raton, pp 343–367

    Google Scholar 

  • Borgan Ø, Goldstein L, Langholz B (1995) Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann Stat 23(5):1749–1778

    Article  MathSciNet  Google Scholar 

  • Borgan Ø, Langholz B, Samuelsen SO, Goldstein L, Pogoda J (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6(1):39–58

    Article  MathSciNet  Google Scholar 

  • Breslow NE (2014) Lessons in biostatistics. In: Lin X, Genest C, Banks DL, Molenberghs G, Scott DW, Wang JL (eds) Past, present and future of statistical science. Chapman and Hall/CRC, Boca Raton, pp 335–347

    Chapter  Google Scholar 

  • Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Stat 34(1):86–102

    Article  MathSciNet  Google Scholar 

  • Cox DR (1972) Regression models and life-tables. J R Stat Soc 34(2):187–220

    MathSciNet  MATH  Google Scholar 

  • Essebag V, Platt RW, Abrahamowicz M, Pilote L (2005) Comparison of nested case–control and survival analysis methodologies for analysis of time-dependent exposure. BMC Med Res Methodol 5(1):5

    Article  Google Scholar 

  • García Rodríguez LA, Soriano-Gabarró M, Bromley S, Lanas A, Cea Soriano L (2017) New use of low-dose aspirin and risk of colorectal cancer by stage at diagnosis: a nested case–control study in UK general practice. BMC Cancer 17(1):637

    Article  Google Scholar 

  • Goldstein L, Langholz B (1992) Asymptotic theory for nested case–control sampling in the Cox regression model. Ann Stat 20(4):1903–1928

    Article  MathSciNet  Google Scholar 

  • Grundmann H, Glasner C, Albiger B, Aanensen DM, Tomlinson CT, Andrasević AT, Cantón R, Carmeli Y, Friedrich AW, Giske CG, Glupczynski Y, Gniadkowski M, Livermore DM, Nordmann P, Poirel L, Rossolini GM, Seifert H, Vatopoulos A, Walsh T, Woodford N, Monnet DL (2017) Occurrence of carbapenemase-producing Klebsiella pneumoniae and Escherichia coli in the European survey of carbapenemase-producing Enterobacteriaceae (EuSCAPE): a prospective, multinational study. Lancet Infect Dis 17(2):153–163

    Article  Google Scholar 

  • Gutiérrez-Gutiérrez B, Sojo-Dorado J, Bravo-Ferrer J, Cuperus N, de Kraker M, Kostyanev T, Raka L, Daikos G, Feifel J, Folgori L, Pascual A, Goossens H, O’Brien S, Bonten MJM, Rodríguez-Baño J (2017) European prospectivecohort study on Enterobacteriaceae showing REsistance to Carbapenems (EURECA): a protocol of a European multicentre observational study. BMJ Open 7(4):e015365

    Article  Google Scholar 

  • Keogh RH, Cox DR (2014) Case–control studies. Institute of Mathematical Statistics Monographs. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Keogh RH, Mangtani P, Rodrigues L, Nguipdop Djomo P (2016) Estimating time-varying exposure-outcome associations using case–control data: logistic and case-cohort analyses. BMC Med Res Methodol 16(1):2

    Article  Google Scholar 

  • Kessing LV, Gerds TA, Knudsen NN, Jørgensen LF, Kristiansen SM, Voutchkova D, Ernstsen V, Schullehner J, Hansen B, Andersen PK, Ersbøll AK (2017) Association of lithium in drinking water with the incidence of dementia. JAMA Psychiatry 74(10):1005–1010

    Article  Google Scholar 

  • Langholz B, Borgan Ø (1995) Counter-matching: a stratified nested case–control sampling method. Biometrika 82(1):69–79

    Article  Google Scholar 

  • Langholz B, Clayton D (1994) Sampling strategies in nested case–control studies. Environ Health Perspect 102:47–51

    Article  Google Scholar 

  • Leffondre K, Wynant W, Cao Z, Abrahamowicz M, Heinze G, Siemiatycki J (2010) A weighted Cox model for modelling time-dependent exposures in the analysis of case–control studies. Stat Med 29(7–8):839–850

    Article  MathSciNet  Google Scholar 

  • Lin D (2000) On fitting Cox’s proportional hazards models to survey data. Biometrika 87(1):37–47

    Article  MathSciNet  Google Scholar 

  • Lumley T (2011) Complex surveys: a guide to analysis using R. Wiley Series in Survey Methodology. Wiley, New York

    Google Scholar 

  • Oakes D (1981) Survival times: aspects of partial likelihood. Int Stat Rev 49(3):235–252

    Article  MathSciNet  Google Scholar 

  • Ohneberg K, Wolkewitz M, Beyersmann J, Palomar-Martinez M, Olaechea-Astigarraga P, Alvarez-Lerma F, Schumacher M (2015) Analysis of clinical cohort data using nested case–control and case-cohort sampling designs. Methods Inf Med 54(6):505–514

    Article  Google Scholar 

  • Paixão ES, da Conceição N, Costa M, Teixeira MG, Harron K, de Almeida ME, Barreto ML, Rodrigues LC (2017) Symptomatic dengue infection during pregnancy and the risk of stillbirth in Brazil, 2006–12: a matched case–control study. Lancet Infect Dis 17(9):957–964

    Article  Google Scholar 

  • Pang D (1999) A relative power table for nested matched case–control studies. Occup Environ Med 56(1):67–69

    Article  Google Scholar 

  • Prentice RL (1986) A case–cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

    Article  MathSciNet  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Thomas DC (1977) Addendum to ‘methods of cohort analysis: appraisal by application to asbestos mining’ by Liddell, Francis. D. K. and Mcdonald, John C. and Thomas, Duncan C. and Cunliffe, Stella V. J R Stat Soc 140:483–485

    Google Scholar 

  • Wolkewitz M, Beyersmann J, Gastmeier P, Martin S (2009) Efficient risk set sampling when a time-dependent exposure is present. Methods Inf Med 48(5):438–443

    Article  Google Scholar 

  • World Health Organization (WHO) (2014) Antimicrobial resistance: global report on surveillance. http://www.who.int/drugresistance/documents/surveillancereport/en/. Accessed 14 Dec 2017

Download references

Acknowledgements

This work was supported by Grant BE-4500/1-2 of the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Feifel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 241 KB)

Appendices

A Theoretical background

The following presentation is based on Borgan et al. (1995).

Fix \([0, \tau ]\), \(\tau \in (0,\infty ]\) and consider a cohort \(\mathscr {C}=\{1,\ldots ,n\}\) and a probability space \((\varOmega , \mathscr {F}, \mathbb {P})\). On this space, the marked point process \(\{(t_j, i_j), j\ge 1\}\), consists of the failure times \(t_j\in [0,\tau ]\) and the mark \(i_j\in \mathscr {C}\), that typifies the outcome positive individual at this point in time. The information at time-origin and generated by this process over time is included in the filtration \((\mathscr {F}_{t})_{t\ge 0}\). We assume the counting process of the observed failure times to be

$$\begin{aligned} N_i(t)=\sum _{j\ge 1}\mathbb {1} \left\{ t_j\le t,\, i_j=i \right\} , \qquad i\in \mathscr {C}, \end{aligned}$$

and its intensity is \(\lambda _i(t)=Y_i(t)\alpha _0(t)\exp ({\varvec{\beta }}^T\mathbf {z}_i(t))\). The at-risk indicator \(Y_i\) and the covariates \(\mathbf {z}_i(t)\) are assumed to be left-continuous and adapted to \((\mathscr {F}_{t})_{t\ge 0}\). Let \({\widetilde{\mathscr {R}}}_j\) denote the sampled risk set consisting of the controls together with the matched outcome positive individual \(i_j\). The resulting marked point process is \(\{(t_j, (i_j, {\widetilde{\mathscr {R}}}_j)), j\ge 1\}\) with the finite mark space \(E=\{(i,{R}){:}\,{R} \in \mathscr {P}, i \in {R} \}=\{(i,{R}){:}\,{R} \subset \mathscr {C}, {R} \in \mathscr {P}_i \}\), where \(\mathscr {P}\) is the powerset of \(\mathscr {C}\). Using this construction, we extend \((\mathscr {F}_{t})_{t\ge 0}\) to \((\mathscr {H}_{t})_{t\ge 0}=(\mathscr {F}_{t}\vee \sigma ({\widetilde{\mathscr {R}}}_j; t_j\le t))_{t\ge 0}\) by the sampling process. For each tuple \((i,{R})\in E\), there exists a corresponding counting process

$$\begin{aligned} N_{(i,{R})}(t)=\sum _{j\ge 1}\mathbb {1} \left\{ t_j\le t,\, i_j=i,\; {\widetilde{\mathscr {R}}}_j={R} \right\} \end{aligned}$$
(7)

that counts all observed failure times of individual i in [0, t] within the sampled risk set \({R} \). We assume that the sampling procedure is independent in the sense that the \(\mathscr {F}_{t}-\)intensity processes of \(N_{i}(t):=\sum _{{R} \in \mathscr {P}_i}N_{(i,R)}(t)\) and their counterparts w.r.t \(\mathscr {H}_{t}\) coincide (Andersen et al. 1993, Sec. III.2). Using \(\pi \left( {R} |t,i\right) :=\mathbb {P}\left( {R} \text { sampled at } t|dN_i(t)=1, \mathscr {H}_{t-}\right) \) with \(\pi \left( {R} |t,i\right) =0\) if \(Y_i(t)=0\) and \(\pi \left( {R} |t,i\right) =0\) if \(i\notin {R} \) we obtain

$$\begin{aligned} \lambda _{(i,R)}(t)=\lambda _i(t)\pi \left( {R} |t,i\right) =Y_i(t)w_i(t,{R})\pi \left( {R} |t\right) \alpha \left( t|\mathbf {z}_i(t)\right) \end{aligned}$$
(8)

as the intensity process for the counting process (7), where \(\pi \left( {R} |t\right) =n_\bullet ^{-1}(t)\sum _{i=1}^{n}\pi \left( {R} |t,i\right) \) and \(w_i(t,{R})={\pi \left( {R} |t,i\right) }/{\pi \left( {R} |t\right) }\) characterizes the weight. The inference is based on the partial likelihood given by

$$\begin{aligned} \mathscr {L}({\varvec{\beta }})=\prod _{t_i}\frac{\exp ({\varvec{\beta }}^T\mathbf {z}_i(t_i))\cdot w_i(t_i,{\widetilde{\mathscr {R}}}_i)}{\sum _{\ell \in {\widetilde{\mathscr {R}}}_i}\exp ({\varvec{\beta }}^T\mathbf {z}_\ell (t_i))\cdot w_\ell (t_i,{\widetilde{\mathscr {R}}}_i)}. \end{aligned}$$
(9)

In conclusion, only outcome positive individuals with their respective time of failure \(t_i\) as well as the corresponding sampled risk sets \({\widetilde{\mathscr {R}}}_i\) contribute to the inference based on this model. The estimator \({\widehat{{\varvec{\beta }}}}\) is obtained by maximizing the partial likelihood (9). Theoretical properties and asymptotic results can be obtained from Borgan et al. (1995).

The question of how to specify \(\pi \left( \cdot |t,i\right) \) for every t where \(Y_i(t)=1\) arouses immediately. The probability can be based on information available until but not including time t, i.e. \(\pi \left( {R} |t, i\right) \) is left-continuous and adapted. Using this, we develop the new sampling procedure for investigating the association between a time-dependent exposure and the outcome by simultaneously sampling with respect to this exposure.

For the NECC, we consider a random variable \(B{:}\,\left( \varOmega ,[0,\tau ]\right) \rightarrow \{0,1\}\) indicating whether an individual should be considered as a case within the partial likelihood, i.e. whether controls should be assigned to that observed outcome event. Further, we write \(B(t):=B(\omega , t)\), \(\mathscr {R}_\bullet (t)=\{i{:}\,Y_i(t\text {-})=1\}\) and \(\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R}):=\mathbb {P}({\widetilde{\mathscr {R}}}(t)={R} |dN_i(t)=1, \mathscr {H}_{t\text {-}})\). We consider the failure time \(t_j\) and assume that for the respective individual the sampled risk set only contains \(i_j\), i.e. no controls are sampled. Thus, the contribution to the likelihood is then given by

$$\begin{aligned} \frac{\exp ({\varvec{\beta }}^T\mathbf {z}_j(t_j))\cdot \pi \left( {\widetilde{\mathscr {R}}}_j |t_j,i_j\right) }{\sum _{\ell \in {\widetilde{\mathscr {R}}}_j}\exp ({\varvec{\beta }}^T\mathbf {z}_\ell (t_j))\cdot \pi \left( {\widetilde{\mathscr {R}}}_j |t_j,\ell \right) }=\frac{\exp ({\varvec{\beta }}^T\mathbf {z}_j(t_j))\cdot \pi \left( {\widetilde{\mathscr {R}}}_j |t_j,i_j\right) }{\exp ({\varvec{\beta }}^T\mathbf {z}_j(t_j))\cdot \pi \left( {\widetilde{\mathscr {R}}}_j |t_j,i_j\right) }=1. \end{aligned}$$

Formalizing the NECC sampling design which was discussed in Sect. 2 we obtain as the sampled risk sets

$$\begin{aligned} {\widetilde{\mathscr {R}}}_j={\left\{ \begin{array}{ll} \{i_j, k_1,\ldots ,k_{m-1}\}, k_\ell \in \mathscr {R}_{\bullet }(t_j) &{}\quad \text {if}\,\,x_{j}(t_j)=1\\ \{i_j, k_1,\ldots ,k_{m-1}\}, k_\ell \in \mathscr {R}_{\bullet }(t_j) &{}\quad \text {if}\,\,x_{j}(t_j)=0 \text { and } B(t_j)=1\\ \{i_j\} &{}\quad \text {if}\,\,x_{j}(t_j)=0 \text { and } B(t_j)=0. \end{array}\right. } \end{aligned}$$

Using \({\widetilde{\mathscr {R}}}_j\) and defining \(\mathscr {R}_0(t)=\{i:Y_i(t\text {-})=1, x_i(t)=0\}\) and \(n_\bullet (t)={\text {card}}\left( \mathscr {R}_\bullet (t)\right) \), we derive

$$\begin{aligned} \pi \left( {R} |t,i\right)&=\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R})\nonumber \\&=\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R}, x_i(t\text {-})=1)+ \mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R}, x_i(t\text {-})=0, B(t)=1) \nonumber \\&\quad +\,\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R}, x_i(t\text {-})=0, B(t)=0)\nonumber \\&=\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R} |x_i(t\text {-})=1)\mathbb {P}_{t\text {-}}(x_i(t\text {-})=1)\nonumber \\&\quad +\,\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R} |x_i(t\text {-})=0, B(t)=1)\mathbb {P}_{t\text {-}}(x_i(t\text {-})=0)\mathbb {P}_{t\text {-}}(B(t)=1| x_i(t\text {-})=0)\nonumber \\&\quad +\,\mathbb {P}_{t\text {-}}({\widetilde{\mathscr {R}}}(t)={R} |x_i(t\text {-})=0, B(t)=0)\mathbb {P}_{t\text {-}}(x_i(t\text {-})=0)\mathbb {P}_{t\text {-}}(B(t)=0| x_i(t\text {-})=0)\nonumber \\&=\left( {\begin{array}{c}n_\bullet (t)-1\\ m-1\end{array}}\right) ^{-1}\mathbb {1}_{\left( i\in {R}, {\text {card}}\left( {R} \right) =m, {R} \subset \mathscr {R}_\bullet (t)\right) }\mathbb {1} \left\{ x_i(t\text {-})=1 \right\} \nonumber \\&\quad +\,\left( {\begin{array}{c}n_\bullet (t)-1\\ m-1\end{array}}\right) ^{-1}\mathbb {1}_{\left( i\in {R}, {\text {card}}\left( {R} \right) =m, {R} \subset \mathscr {R}_\bullet (t)\right) }\mathbb {1} \left\{ x_i(t\text {-})=0 \right\} \mathbb {P}_{t\text {-}}(B(t)=1|x_i(t\text {-})=0) \nonumber \\&\quad +\,\mathbb {1}_{\left( \{i\}={R}, {R} \subset \mathscr {R}_0(t)\right) }\mathbb {1} \left\{ x_i(t\text {-})=0 \right\} \mathbb {P}_{t\text {-}}(B(t)=0|x_i(t\text {-})=0), \end{aligned}$$
(10)

where \(\mathbb {P}_{t\text {-}}(x_i(t\text {-})=a)=\mathbb {1} \left\{ x_i(t\text {-})=a \right\} \) for \(a\in \{0,1\}\). Equation (10) can be used for the calculation of the denominator of the weight. The structure of the random variable B allows for several sampling procedures within the NCC.

We choose \(B(t)\sim \text {Ber}(q(t))\), i.e. independently Bernoulli distributed with probability \(q(t)\in (0,1]\). In the simplest setting, \(q(t)\) is deterministic from the very beginning. The sampling probabilities and weights can be calculated with \(m_0(t)={\text {card}}\left( \mathscr {R}_0(t)\cap {\widetilde{\mathscr {R}}}(t)\right) \) by

$$\begin{aligned} \pi \left( {R} |t,i\right)&=\left( {\begin{array}{c}n_\bullet (t)-1\\ m-1\end{array}}\right) ^{-1}\mathbb {1}_{\left( i\in {R}, {\text {card}}\left( {R} \right) =m, {R} \subset \mathscr {R}_\bullet (t)\right) }\left( \mathbb {1} \left\{ x_i(t\text {-})=1 \right\} +\mathbb {1} \left\{ x_i(t\text {-})=0 \right\} q(t)\right) \\&\quad +\,\mathbb {1} \left\{ \{i\}={R}, {R} \subset \mathscr {R}_0(t) \right\} \mathbb {1} \left\{ x_i(t\text {-})=0 \right\} (1-q(t))\\ \pi \left( {R} |t\right)&=\frac{1}{n_\bullet (t)}\sum _{i=1}^{n}\pi \left( {R} |t,i\right) \\&\quad = \frac{1}{n_\bullet (t)}\left( {\begin{array}{c}n_\bullet (t)-1\\ m-1\end{array}}\right) ^{-1}\mathbb {1}_{\left( {\text {card}}\left( {R} \right) =m, {R} \subset \mathscr {R}_\bullet (t)\right) }\left[ m-m_0(t)+ q(t)m_0(t)\right] \nonumber \\&\quad +\,\mathbb {1} \left\{ {\text {card}}\left( {R} \right) =1, {R} \subset \mathscr {R}_{0} \right\} (1-q(t))\\ w_i(t,{R})&=\frac{q(t)^{1-x_i(t)}n_\bullet (t)}{m-m_0(t)+q(t)\cdot m_0(t)}\mathbb {1} \left\{ i\in {R}, {\text {card}}\left( {R} \right) =m, {R} \subset {\widetilde{\mathscr {R}}}_\bullet (t) \right\} . \end{aligned}$$

In Sect. 2.4 we set \(q=1\) to state the history-dependent sampling scheme. This contradicts the requirement above, since \(q(t)=0\) if the inequality in (6b) is fulfilled. A motivation for excluding zero from the interval of the inclusion probability is as follows: Let \(q(t)=0\) for some t. Then weights take the form

figure b

meaning whenever a risk set has the same exposure value, the weight in (11a) will be infinite. If there are different exposure levels in a risk set, the set will be uninformative [the ratio in Eq. (9) is one] or destructive for the partial likelihood since the ratio equals zero [see Eq. (11b)]. Either way, in all cases the estimation of the log hazard ratio by the partial likelihood will be disrupted.

Sampling non-exposed individuals as cases is mandatory for the NECC to meaningful estimate the log hazard rate \({\varvec{\beta }}\) within the Cox proportional hazards model. Assume we only consider exposed individuals as cases, then for every nominator in Equation (9) we obtain

$$\begin{aligned} \exp ({\varvec{\beta }}^T \mathbf {z}_i(t_i))= & {} \exp \left( [\beta _{1}, \beta _{2},\ldots , \beta _{p}] \times [x_i(t_i)=1, z_{i1}(t_i), z_{ip-1}(t_i)]^T \right) \\= & {} \exp (\beta _1)\times \exp \left( [\beta _{2},\ldots , \beta _{p}] \times [z_{i1}(t_i), z_{ip-1}(t_i)]^T\right) , \end{aligned}$$

where \(\mathbf {y}=[y_1, \ldots , y_n]^T \in \mathbb {R}^n\). For the ease of presentation, we consider the projection on the first component of \({\varvec{\beta }}\), i.e. the estimation of the regression parameter \(\beta _1\) associated to the main exposure. We stratify the sampled risk set into \({\widetilde{\mathscr {R}}}_i={\widetilde{\mathscr {R}}}_i^0{\dot{\cup }} {\widetilde{\mathscr {R}}}_i^1\) using the covariate values \(x(t_i)\). The \(w_i(t_i, {R})\) only depend on the exposure status within one fixed risk set

$$\begin{aligned} \mathscr {L}_i(\beta _1)&=\frac{\exp (\beta _1x_i(t_i))w_i(t_i,{R})}{\sum _{\ell \in {\widetilde{\mathscr {R}}}_i}\exp (\beta _1x_\ell (t_i))w_\ell (t_i,{R})}\\&=\frac{w_i(t_i,{R})}{e^{-\beta _1}\sum _{\ell \in {\widetilde{\mathscr {R}}}_i}\exp (\beta _1x_\ell (t_i))w_\ell (t_i,{R})}\\&=\frac{w_i(t_i,{R})}{e^{-\beta _1}(\sum _{\ell \in {\widetilde{\mathscr {R}}}_i^0}\exp (\beta _1(x_\ell (t_i)=0))w_\ell (t_i,{R}) + \sum _{\ell \in {\widetilde{\mathscr {R}}}_i^1}\exp (\beta _1(x_\ell (t_i)=1))w_\ell (t_i,{R}))}\\&=\frac{1}{e^{-\beta _1}{\text {card}}\left( {\widetilde{\mathscr {R}}}_i^0\right) q(t_i) + {\text {card}}\left( {\widetilde{\mathscr {R}}}_i^1\right) }, \end{aligned}$$

which is maximized by \(\beta _1=\infty \). This leads to \(\max _{\beta _1} \mathscr {L}(\beta _1) =\infty \) and thus, an inappropriate estimation if we only consider exposed individuals (\(x_i(t_i)=1\)) to become cases.

B Results traditional nested case-control design

Table 5 follows the structure of Table 1 in the main document and gives the results for a bootstrap analysis of the SIR 3 data using the traditional NCC.

Table 5 Results from 10,000 bootstrap simulations of the full cohort and an NCC design with one up to four controls

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feifel, J., Gebauer, M., Schumacher, M. et al. Nested exposure case-control sampling: a sampling scheme to analyze rare time-dependent exposures. Lifetime Data Anal 26, 21–44 (2020). https://doi.org/10.1007/s10985-018-9453-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-018-9453-4

Keywords

Navigation