Response-Dependent Sampling with Clustered and Longitudinal Data

McIsaac, Michael A.; Cook, Richard J.

doi:10.1007/978-1-4614-6871-4_8

Michael A. McIsaac⁸ &
Richard J. Cook⁸

Part of the book series: Lecture Notes in Statistics ((LNSP,volume 211))

1189 Accesses
5 Citations

Abstract

Prospective cohort studies typically involve repeated assessment of individuals to determine whether they have a particular health condition. The usual goal in such studies is to relate the presence of the condition to disease markers or exposure variables. Disease markers are often too difficult or costly to measure for all individuals in a sample. In such settings, two- and multi-phase sampling designs are routinely adopted to enable researchers to select individuals on whom these expensive markers are to be assessed. In this article we review the rationale and format of two-phase sampling designs in retrospective and cross-sectional studies. We then develop frameworks for multi-phase designs in the context of studies with clustered or longitudinal responses. Model-based and semi-parametric methods are discussed for estimation and inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breslow, N.E., Cain, K.C.: Logistic regression for two-stage case-control data. Biometrika. 75(1), 11–20 (1988)
Article MathSciNet MATH Google Scholar
Breslow, N.E., Chatterjee, N.: Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis. Appl. Stat. 48(4), 457–468 (1999)
MATH Google Scholar
Chandran, V., Tolusso, D.C., Cook, R.J., Gladman, D.D.: Risk factors for axial inflammatory arthritis in patients with psoriatic arthritis. J. Rheumatol. 37(4), 809–815 (2010)
Article Google Scholar
Chatterjee, N., Chen, Y., Breslow, N.E.: A pseudoscore estimator for regression problems with two-phase sampling. J. Am. Stat. Assoc. 98(461), 158–168 (2003)
Article MathSciNet MATH Google Scholar
Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman & Hall, London (1974)
Book MATH Google Scholar
del Rincon, I., Williams, K., Stern, M.P., Freeman, G.L., O’Leary, D.H., Escalante, A.: Association between carotid atherosclerosis and markers of inflammation in rheumatoid arthritis patients and healthy subjects. Arthritis Rheum. 48(7), 1833–1840 (2003)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Heagerty, P.J., Zeger, S.L.: Marginalized multilevel models and likelihood inference. Stat. Sci. 15, 1–26 (2000)
MathSciNet Google Scholar
Heagerty, P.J.: Marginalized transition models and likeliood inference for longitudinal categorical data. Biometrics 58(2), 342–351 (2002).
Article MathSciNet MATH Google Scholar
Horton, N.J., Laird, N.M.: Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information. Biometrics 57, 34–42 (2001).
Article MathSciNet MATH Google Scholar
Lawless, J.F., Kalbfleisch, J.D., Wild, C.J.: Semiparametric methods for response-selective and missing data problems in eegression. J. Roy. Stat. Soc. B 61(2), 413–438 (1999)
Article MathSciNet MATH Google Scholar
Laird, N., Ware, J.H.: Random-effects models for longitudinal data. Biometrics 38(4), 963–974 (1982)
Article MATH Google Scholar
Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22 (1986)
Article MathSciNet MATH Google Scholar
Lipsitz, S.R., Laird, N.M., Harrington, D.P.: Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78(1), 153–160 (1991)
Article MathSciNet Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data, 2nd edn. Wiley, New York (2002)
MATH Google Scholar
Neuhaus, J.M.: Statistical methods for longitudinal and clustered designs with binary responses. Stat. Meth. Med. Res. 1, 249–273 (1992)
Article Google Scholar
Pepe, M.S., Reilly, M., Fleming, T.R.: Auxiliary outcome data and the mean-score method. J. Stat. Plann. Infer. 42, 137–160 (1994)
Article MathSciNet MATH Google Scholar
Pickles, A., Dunn, G., Vazquez-Barquero, J.L.: Screening for stratification in two-phase (“two-stage”) epidemiological surveys. Stat. Meth. Med. Res. 4, 73–89 (1995)
Article Google Scholar
Prentice, R.L.: Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4), 1033–1048 (1988)
Article MathSciNet MATH Google Scholar
Rahman, P., Gladman, D.D., Cook, R.J., Zhou, Y., Young, G., Salonen, D.: Radiological assessment in psoriatic arthritis. Rheumatology 37(7), 760–765 (1998)
Article Google Scholar
Raina, P.S, Wolfson, C., Kirkland, S.A., Griffith, L.E., Oremus, M., Patterson, C., Tuokko, H., Penning, M., Balion, C.M., Hogan, D., Wister, A., Payette, H., Shannon, H., Brazil, K.: The Canadian longitudinal study on aging (CLSA). Can. J. Aging 28(3), 221–229 (2009)
Article Google Scholar
Reilly, M.: Optimal sampling strategies for two phase studies. Am. J. Epidemiol. 143, 92–100 (1996)
Article Google Scholar
Reilly, M., Pepe, M.S.: A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82(2), 299–314 (1995)
Article MathSciNet MATH Google Scholar
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89(427), 846–866 (1994)
Article MathSciNet MATH Google Scholar
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of Missing Data. J. Am. Stat. Assoc. 90(429), 106–121 (1995)
Article MathSciNet MATH Google Scholar
Stiratelli, R., Laird, N., Ware, J.H.: Random-effects models for serial observations with binary response. Biometrics 40(4), 961–971 (1984)
Article Google Scholar
Sutradhar, B.C.: On auto-regression type dynamic mixed models for binary panel data. Metron 66(2), 209–221 (2008)
Google Scholar
Sutradhar, R., Cook, R.J.: A bivariate mover-stayer model for interval-censored recurrent event data: application to joint damage in rheumatology. Comm. Stat. Theor. Meth. 18, 3389–3405 (2009)
Article MathSciNet Google Scholar
Tolusso, D.C., Cook, R.J.: Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Comm. Stat. Theor. Meth. 38(18), 3307–3325 (2009)
Article MathSciNet MATH Google Scholar
Troxel, A.B., Lipsitz, S.R., Brennan, T.A.: Weighted estimating equations with nonignorable nonresponse data. Biometrics 53(3), 857–869 (1997)
Article MATH Google Scholar
Tsiatis, A.A.: Semiparametric Theory and Missing Data. Springer, New York (2006)
MATH Google Scholar
Whittemore, A.S., Halpern, J.: Multi-stage sampling in genetic epidemiology. Stat. Med. 16, 153–167 (1997)
Article Google Scholar
Zeng, L., Cook, R.J.: Transition models for multivariate longitudinal binary data. J. Am. Stat. Assoc. 102, 211–223 (2007)
Article MathSciNet MATH Google Scholar
Zhao, L.P., Prentice, R.L.: Correlated binary regression using a quadratic exponential model. Biometrika 77(3), 642–648 (1990)
Article MathSciNet Google Scholar
Zhao, Y.: Design and efficient estimation in regression analysis with missing data in two-phase studies. PhD thesis, University of Waterloo (2005)
Google Scholar
Zhao, Y., Lawless, J.F., McLeish, D.L.: Likelihood methods for pegression models with expensive variables missing by design. Biom. J. 51(1), 123–136 (2009)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Michael McIsaac’s research was supported by an Alexander Graham Bell Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Discovery Grants to Richard Cook from NSERC (RGPIN 155849) and the Canadian Institutes for Health Research (FRN 13887). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The authors gratefully acknowledge the careful review and comments from a referee and Dr. Brajendra Sutradhar.

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1
Michael A. McIsaac & Richard J. Cook

Authors

Michael A. McIsaac
View author publications
You can also search for this author in PubMed Google Scholar
Richard J. Cook
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael A. McIsaac .

Editor information

Editors and Affiliations

Dept. Mathematics & Statistics, Memorial University of Newfoundland Dept. Mathematics & Statistics, St. John's, Newfoundland and Labrador, Canada
Brajendra C. Sutradhar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McIsaac, M.A., Cook, R.J. (2013). Response-Dependent Sampling with Clustered and Longitudinal Data. In: Sutradhar, B. (eds) ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers. Lecture Notes in Statistics(), vol 211. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6871-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6871-4_8
Published: 06 June 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6870-7
Online ISBN: 978-1-4614-6871-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics