Skip to main content

Part of the book series: Lecture Notes in Statistics ((LNSP,volume 211))

Abstract

Prospective cohort studies typically involve repeated assessment of individuals to determine whether they have a particular health condition. The usual goal in such studies is to relate the presence of the condition to disease markers or exposure variables. Disease markers are often too difficult or costly to measure for all individuals in a sample. In such settings, two- and multi-phase sampling designs are routinely adopted to enable researchers to select individuals on whom these expensive markers are to be assessed. In this article we review the rationale and format of two-phase sampling designs in retrospective and cross-sectional studies. We then develop frameworks for multi-phase designs in the context of studies with clustered or longitudinal responses. Model-based and semi-parametric methods are discussed for estimation and inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Breslow, N.E., Cain, K.C.: Logistic regression for two-stage case-control data. Biometrika. 75(1), 11–20 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow, N.E., Chatterjee, N.: Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis. Appl. Stat. 48(4), 457–468 (1999)

    MATH  Google Scholar 

  • Chandran, V., Tolusso, D.C., Cook, R.J., Gladman, D.D.: Risk factors for axial inflammatory arthritis in patients with psoriatic arthritis. J. Rheumatol. 37(4), 809–815 (2010)

    Article  Google Scholar 

  • Chatterjee, N., Chen, Y., Breslow, N.E.: A pseudoscore estimator for regression problems with two-phase sampling. J. Am. Stat. Assoc. 98(461), 158–168 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman & Hall, London (1974)

    Book  MATH  Google Scholar 

  • del Rincon, I., Williams, K., Stern, M.P., Freeman, G.L., O’Leary, D.H., Escalante, A.: Association between carotid atherosclerosis and markers of inflammation in rheumatoid arthritis patients and healthy subjects. Arthritis Rheum. 48(7), 1833–1840 (2003)

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Heagerty, P.J., Zeger, S.L.: Marginalized multilevel models and likelihood inference. Stat. Sci. 15, 1–26 (2000)

    MathSciNet  Google Scholar 

  • Heagerty, P.J.: Marginalized transition models and likeliood inference for longitudinal categorical data. Biometrics 58(2), 342–351 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  • Horton, N.J., Laird, N.M.: Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information. Biometrics 57, 34–42 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  • Lawless, J.F., Kalbfleisch, J.D., Wild, C.J.: Semiparametric methods for response-selective and missing data problems in eegression. J. Roy. Stat. Soc. B 61(2), 413–438 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Laird, N., Ware, J.H.: Random-effects models for longitudinal data. Biometrics 38(4), 963–974 (1982)

    Article  MATH  Google Scholar 

  • Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Lipsitz, S.R., Laird, N.M., Harrington, D.P.: Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78(1), 153–160 (1991)

    Article  MathSciNet  Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  • Neuhaus, J.M.: Statistical methods for longitudinal and clustered designs with binary responses. Stat. Meth. Med. Res. 1, 249–273 (1992)

    Article  Google Scholar 

  • Pepe, M.S., Reilly, M., Fleming, T.R.: Auxiliary outcome data and the mean-score method. J. Stat. Plann. Infer. 42, 137–160 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Pickles, A., Dunn, G., Vazquez-Barquero, J.L.: Screening for stratification in two-phase (“two-stage”) epidemiological surveys. Stat. Meth. Med. Res. 4, 73–89 (1995)

    Article  Google Scholar 

  • Prentice, R.L.: Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4), 1033–1048 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Rahman, P., Gladman, D.D., Cook, R.J., Zhou, Y., Young, G., Salonen, D.: Radiological assessment in psoriatic arthritis. Rheumatology 37(7), 760–765 (1998)

    Article  Google Scholar 

  • Raina, P.S, Wolfson, C., Kirkland, S.A., Griffith, L.E., Oremus, M., Patterson, C., Tuokko, H., Penning, M., Balion, C.M., Hogan, D., Wister, A., Payette, H., Shannon, H., Brazil, K.: The Canadian longitudinal study on aging (CLSA). Can. J. Aging 28(3), 221–229 (2009)

    Article  Google Scholar 

  • Reilly, M.: Optimal sampling strategies for two phase studies. Am. J. Epidemiol. 143, 92–100 (1996)

    Article  Google Scholar 

  • Reilly, M., Pepe, M.S.: A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82(2), 299–314 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89(427), 846–866 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of Missing Data. J. Am. Stat. Assoc. 90(429), 106–121 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Stiratelli, R., Laird, N., Ware, J.H.: Random-effects models for serial observations with binary response. Biometrics 40(4), 961–971 (1984)

    Article  Google Scholar 

  • Sutradhar, B.C.: On auto-regression type dynamic mixed models for binary panel data. Metron 66(2), 209–221 (2008)

    Google Scholar 

  • Sutradhar, R., Cook, R.J.: A bivariate mover-stayer model for interval-censored recurrent event data: application to joint damage in rheumatology. Comm. Stat. Theor. Meth. 18, 3389–3405 (2009)

    Article  MathSciNet  Google Scholar 

  • Tolusso, D.C., Cook, R.J.: Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Comm. Stat. Theor. Meth. 38(18), 3307–3325 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Troxel, A.B., Lipsitz, S.R., Brennan, T.A.: Weighted estimating equations with nonignorable nonresponse data. Biometrics 53(3), 857–869 (1997)

    Article  MATH  Google Scholar 

  • Tsiatis, A.A.: Semiparametric Theory and Missing Data. Springer, New York (2006)

    MATH  Google Scholar 

  • Whittemore, A.S., Halpern, J.: Multi-stage sampling in genetic epidemiology. Stat. Med. 16, 153–167 (1997)

    Article  Google Scholar 

  • Zeng, L., Cook, R.J.: Transition models for multivariate longitudinal binary data. J. Am. Stat. Assoc. 102, 211–223 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, L.P., Prentice, R.L.: Correlated binary regression using a quadratic exponential model. Biometrika 77(3), 642–648 (1990)

    Article  MathSciNet  Google Scholar 

  • Zhao, Y.: Design and efficient estimation in regression analysis with missing data in two-phase studies. PhD thesis, University of Waterloo (2005)

    Google Scholar 

  • Zhao, Y., Lawless, J.F., McLeish, D.L.: Likelihood methods for pegression models with expensive variables missing by design. Biom. J. 51(1), 123–136 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Michael McIsaac’s research was supported by an Alexander Graham Bell Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Discovery Grants to Richard Cook from NSERC (RGPIN 155849) and the Canadian Institutes for Health Research (FRN 13887). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The authors gratefully acknowledge the careful review and comments from a referee and Dr. Brajendra Sutradhar.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael A. McIsaac .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

McIsaac, M.A., Cook, R.J. (2013). Response-Dependent Sampling with Clustered and Longitudinal Data. In: Sutradhar, B. (eds) ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers. Lecture Notes in Statistics(), vol 211. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6871-4_8

Download citation

Publish with us

Policies and ethics