Statistical Modelling of Partially Observed Data Using Multiple Imputation: Principles and Practice

Carpenter, James R.; Goldstein, Harvey; Kenward, Michael G.

doi:10.1007/978-94-007-3024-3_2

James R. Carpenter³,
Harvey Goldstein⁴ &
Michael G. Kenward³

4322 Accesses
1 Citations

Abstract

Missing data are inevitably ubiquitous in experimental and observational epidemiological research. Nevertheless, despite a steady flow of theoretical work in this area, from the mid-1970s onwards, recent studies have shown that the way partially observed data are reported and analysed in experimental research falls far short of best practice (Wood et al. 2004; Chan and Altman 2005; Sterne et al. 2009). The aim of this Chapter is thus to present an accessible review of the issues raised by missing data, together with the advantages and disadvantages of different approaches to the analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blatchford, P., Goldstein, H., Martin, C., & Browne, W. (2002). A study of class size effects in English school reception year classes. British Educational Research Journal, 28, 169–185.
Article Google Scholar
Cao, W., Tsiatis, A. A., & Davidian, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika, 96, 723–734.
Article PubMed Google Scholar
Carpenter, J. R., & Kenward, M. G. (2008). Missing data in clinical trials — A practical guide. Birmingham: National Health Service Co-ordinating Centre for Research Methodology. Freely downloadable from www.missingdata.org.uk . Accessed 25 Jan 2012.
Carpenter, J. R., & Plewis, I. (2011). Coming to terms with non-response in longitudinal studies. In M. Williams & P. Vogt (Eds.), SAGE handbook of methodological innovation. London: Sage.
Google Scholar
Carpenter, J. R., Kenward, M. G., & Vansteelandt, S. (2006). A comparison of multiple imputation and inverse probability weighting for analyses with missing data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169, 571–584.
Article Google Scholar
Carpenter, J. R., Kenward, M. G., & White, I. R. (2007). Sensitivity analysis after multiple imputation under missing at random — A weighting approach. Statistical Methods in Medical Research, 16, 259–275.
Article PubMed Google Scholar
Carpenter, J. R., Roger, J. H., & Kenward, M. G. (2012). Analysis of longitudinal trials with missing data:—A framework for relevant, accessible assumptions, and inference via multiple imputation (Submitted).
Google Scholar
Chan, A., & Altman, D. G. (2005). Epidemiology and reporting of randomised trials published in PubMed journals. The Lancet, 365, 1159–1162.
Article Google Scholar
Clayton, D., Spiegelhalter, D., Dunn, G., & Pickles, A. (1998). Analysis of longitudinal binary data from multi-phase sampling (with discussion). Journal of the Royal Statistical Society, Series B (statistical methodology), 60, 71–87.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B (Statistical Methodology), 39, 1–38.
Google Scholar
Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197.
Article Google Scholar
Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion). Statistical Science, 22, 523–539.
Article Google Scholar
Kenward, M. G., & Carpenter, J. R. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16, 199–218.
Article PubMed Google Scholar
Klebanoff, M. A., & Cole, S. R. (2008). Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology, 168, 355–357.
Article PubMed Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Chichester: Wiley.
Google Scholar
Louis, T. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 226–233.
Google Scholar
Orchard, T., & Woodbury, M. (1972). A missing information principle: theory and applications. In L. M. L. Cam, J. Neyman, & E. L. Scott (Eds.), Proceedings of the Sixth Berkely Symposium on Mathematics, Statistics and Probability: Vol. 1 (pp. 697–715). Berkeley: University of California Press.
Google Scholar
Royston, P. (2007). Multiple imputation of missing values: Further update of ice with emphasis on interval censoring. The Stata Journal, 7, 445–464.
Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Rubin, D. B. (1996). Multiple imputation after 18 years. Journal of the American Statistical Association, 91, 473–490.
Article Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman and Hall.
Book Google Scholar
Spratt, M., Sterne, J. A. C., Tilling, K., Carpenter, J. R., & Carlin, J. B. (2010). Strategies for multiple imputation in longitudinal studies. American Journal of Epidemiology, 172, 478–487.
Article PubMed Google Scholar
Sterne, J. A. C., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 339, 157–160.
Google Scholar
van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681–694.
Article PubMed Google Scholar
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064.
Article Google Scholar
Vansteelandt, S., Carpenter, J. R., & Kenward, M. G. (2010). Analysis of incomplete data using inverse probability weighting and doubly robust estimators. Methodology, 6, 37–48.
Article Google Scholar
White, I. R., & Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.
Article PubMed Google Scholar
Wood, A. M., White, I. R., & Thompson, S. G. (2004). Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clinical Trials, 1, 368–376.
Article PubMed Google Scholar

Download references

Acknowledgements

James Carpenter is funded by ESRC research fellowship RES-063-27-0257. We are grateful to Peter Blatchford for permission to use the class size data.

Author information

Authors and Affiliations

Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
James R. Carpenter & Michael G. Kenward
Graduate School of Education, University of Bristol, 35 Berkeley Square, BS8 1JA, Bristol, UK
Harvey Goldstein

Authors

James R. Carpenter
View author publications
You can also search for this author in PubMed Google Scholar
Harvey Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. Kenward
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James R. Carpenter .

Editor information

Editors and Affiliations

Leeds Institute of Genetics, University of Leeds, Division of Biostatistics, Clarendon Way, Worsley Building, Room 8.01, Level 8, Leeds, LS2 9JT, United Kingdom
Yu-Kang Tu
Leeds Institute of Genetics, University of Leeds, Division of Biostatistics, Clarendon Way, Worsley Building, Room 8.01, Level 8, Leeds, LS2 9JT, United Kingdom
Darren C. Greenwood

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Carpenter, J.R., Goldstein, H., Kenward, M.G. (2012). Statistical Modelling of Partially Observed Data Using Multiple Imputation: Principles and Practice. In: Tu, YK., Greenwood, D. (eds) Modern Methods for Epidemiology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-3024-3_2

Download citation

DOI: https://doi.org/10.1007/978-94-007-3024-3_2
Published: 17 February 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-3023-6
Online ISBN: 978-94-007-3024-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics