Abstract
Prospective cohort studies typically involve repeated assessment of individuals to determine whether they have a particular health condition. The usual goal in such studies is to relate the presence of the condition to disease markers or exposure variables. Disease markers are often too difficult or costly to measure for all individuals in a sample. In such settings, two- and multi-phase sampling designs are routinely adopted to enable researchers to select individuals on whom these expensive markers are to be assessed. In this article we review the rationale and format of two-phase sampling designs in retrospective and cross-sectional studies. We then develop frameworks for multi-phase designs in the context of studies with clustered or longitudinal responses. Model-based and semi-parametric methods are discussed for estimation and inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breslow, N.E., Cain, K.C.: Logistic regression for two-stage case-control data. Biometrika. 75(1), 11–20 (1988)
Breslow, N.E., Chatterjee, N.: Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis. Appl. Stat. 48(4), 457–468 (1999)
Chandran, V., Tolusso, D.C., Cook, R.J., Gladman, D.D.: Risk factors for axial inflammatory arthritis in patients with psoriatic arthritis. J. Rheumatol. 37(4), 809–815 (2010)
Chatterjee, N., Chen, Y., Breslow, N.E.: A pseudoscore estimator for regression problems with two-phase sampling. J. Am. Stat. Assoc. 98(461), 158–168 (2003)
Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman & Hall, London (1974)
del Rincon, I., Williams, K., Stern, M.P., Freeman, G.L., O’Leary, D.H., Escalante, A.: Association between carotid atherosclerosis and markers of inflammation in rheumatoid arthritis patients and healthy subjects. Arthritis Rheum. 48(7), 1833–1840 (2003)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
Heagerty, P.J., Zeger, S.L.: Marginalized multilevel models and likelihood inference. Stat. Sci. 15, 1–26 (2000)
Heagerty, P.J.: Marginalized transition models and likeliood inference for longitudinal categorical data. Biometrics 58(2), 342–351 (2002).
Horton, N.J., Laird, N.M.: Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information. Biometrics 57, 34–42 (2001).
Lawless, J.F., Kalbfleisch, J.D., Wild, C.J.: Semiparametric methods for response-selective and missing data problems in eegression. J. Roy. Stat. Soc. B 61(2), 413–438 (1999)
Laird, N., Ware, J.H.: Random-effects models for longitudinal data. Biometrics 38(4), 963–974 (1982)
Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)
Lipsitz, S.R., Laird, N.M., Harrington, D.P.: Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78(1), 153–160 (1991)
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data, 2nd edn. Wiley, New York (2002)
Neuhaus, J.M.: Statistical methods for longitudinal and clustered designs with binary responses. Stat. Meth. Med. Res. 1, 249–273 (1992)
Pepe, M.S., Reilly, M., Fleming, T.R.: Auxiliary outcome data and the mean-score method. J. Stat. Plann. Infer. 42, 137–160 (1994)
Pickles, A., Dunn, G., Vazquez-Barquero, J.L.: Screening for stratification in two-phase (“two-stage”) epidemiological surveys. Stat. Meth. Med. Res. 4, 73–89 (1995)
Prentice, R.L.: Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4), 1033–1048 (1988)
Rahman, P., Gladman, D.D., Cook, R.J., Zhou, Y., Young, G., Salonen, D.: Radiological assessment in psoriatic arthritis. Rheumatology 37(7), 760–765 (1998)
Raina, P.S, Wolfson, C., Kirkland, S.A., Griffith, L.E., Oremus, M., Patterson, C., Tuokko, H., Penning, M., Balion, C.M., Hogan, D., Wister, A., Payette, H., Shannon, H., Brazil, K.: The Canadian longitudinal study on aging (CLSA). Can. J. Aging 28(3), 221–229 (2009)
Reilly, M.: Optimal sampling strategies for two phase studies. Am. J. Epidemiol. 143, 92–100 (1996)
Reilly, M., Pepe, M.S.: A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82(2), 299–314 (1995)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89(427), 846–866 (1994)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of Missing Data. J. Am. Stat. Assoc. 90(429), 106–121 (1995)
Stiratelli, R., Laird, N., Ware, J.H.: Random-effects models for serial observations with binary response. Biometrics 40(4), 961–971 (1984)
Sutradhar, B.C.: On auto-regression type dynamic mixed models for binary panel data. Metron 66(2), 209–221 (2008)
Sutradhar, R., Cook, R.J.: A bivariate mover-stayer model for interval-censored recurrent event data: application to joint damage in rheumatology. Comm. Stat. Theor. Meth. 18, 3389–3405 (2009)
Tolusso, D.C., Cook, R.J.: Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Comm. Stat. Theor. Meth. 38(18), 3307–3325 (2009)
Troxel, A.B., Lipsitz, S.R., Brennan, T.A.: Weighted estimating equations with nonignorable nonresponse data. Biometrics 53(3), 857–869 (1997)
Tsiatis, A.A.: Semiparametric Theory and Missing Data. Springer, New York (2006)
Whittemore, A.S., Halpern, J.: Multi-stage sampling in genetic epidemiology. Stat. Med. 16, 153–167 (1997)
Zeng, L., Cook, R.J.: Transition models for multivariate longitudinal binary data. J. Am. Stat. Assoc. 102, 211–223 (2007)
Zhao, L.P., Prentice, R.L.: Correlated binary regression using a quadratic exponential model. Biometrika 77(3), 642–648 (1990)
Zhao, Y.: Design and efficient estimation in regression analysis with missing data in two-phase studies. PhD thesis, University of Waterloo (2005)
Zhao, Y., Lawless, J.F., McLeish, D.L.: Likelihood methods for pegression models with expensive variables missing by design. Biom. J. 51(1), 123–136 (2009)
Acknowledgements
Michael McIsaac’s research was supported by an Alexander Graham Bell Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Discovery Grants to Richard Cook from NSERC (RGPIN 155849) and the Canadian Institutes for Health Research (FRN 13887). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The authors gratefully acknowledge the careful review and comments from a referee and Dr. Brajendra Sutradhar.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
McIsaac, M.A., Cook, R.J. (2013). Response-Dependent Sampling with Clustered and Longitudinal Data. In: Sutradhar, B. (eds) ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers. Lecture Notes in Statistics(), vol 211. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6871-4_8
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6871-4_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6870-7
Online ISBN: 978-1-4614-6871-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)