Pediatric Nephrology

, Volume 34, Issue 2, pp 223–231 | Cite as

A guide to missing data for the pediatric nephrologist

  • Nicholas G. LarkinsEmail author
  • Jonathan C. Craig
  • Armando Teixeira-Pinto
Educational Review


Missing data is an important and common source of bias in clinical research. Readers should be alert to and consider the impact of missing data when reading studies. Beyond preventing missing data in the first place, through good study design and conduct, there are different strategies available to handle data containing missing observations. Complete case analysis is often biased unless data are missing completely at random. Better methods of handling missing data include multiple imputation and models using likelihood-based estimation. With advancing computing power and modern statistical software, these methods are within the reach of clinician-researchers under guidance of a biostatistician. As clinicians reading papers, we need to continue to update our understanding of statistical methods, so that we understand the limitations of these techniques and can critically interpret literature.


Multiple imputation Statistics Epidemiology Nephrology 


Funding source

This review is supported by the National Health and Medical Research Council (APP1092957 program grant including ATP; GNT1114218 to NL).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Wood AM, White IR, Thompson SG (2004) Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials 1:368–376CrossRefGoogle Scholar
  2. 2.
    Little RJ, D'Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Sa M, Neaton JD, Rotnitzky A, Scharfstein D, Shih WJ, Siegel JP, Stern H (2012) The prevention and treatment of missing data in clinical trials. N Engl J Med 367:1355–1360CrossRefGoogle Scholar
  3. 3.
    von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP (2008) The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 61:344–349CrossRefGoogle Scholar
  4. 4.
    Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Lang T (2001) The revised CONSORT statement for reporting randomized trials. Ann Intern Med 134:663–694CrossRefGoogle Scholar
  5. 5.
    Fleming TR (2011) Research and reporting methods addressing missing data in clinical trials. Ann Intern Med 154:113–113CrossRefGoogle Scholar
  6. 6.
    Hoberman A, Greenfield SP, Mattoo TK, Keren R, Mathews R, Pohl HG, Kropp BP, Skoog SJ, Nelson CP, Moxey-Mims M, Chesney RW, Carpenter MA (2014) Antimicrobial prophylaxis for children with vesicoureteral reflux. N Engl J Med 370:2367–2376CrossRefGoogle Scholar
  7. 7.
    Craig JC, Williams GJ, Hodson EM (2014) Antimicrobial prophylaxis for children with vesicoureteral reflux. N Engl J Med 371:1070–1070CrossRefGoogle Scholar
  8. 8.
    Ford I, Norrie J (2016) Pragmatic trials. N Engl J Med 375:454–463CrossRefGoogle Scholar
  9. 9.
    Jeffries-Stokes C, Stokes A, McDonald L (2015) Pulkurlkpa: the joy of research in aboriginal communities. J Paediatr Child Health 51:1054–1059CrossRefGoogle Scholar
  10. 10.
    Cleland JGF, Torp-pedersen C, Coletta AP, Lammiman MJ (2004) A method to reduce loss to follow-up in clinical trials: informed, withdrawal of consent. Eur J Heart Fail 6:1–2CrossRefGoogle Scholar
  11. 11.
    Young C, Gunasekera H, Kong K, Purcell A, Muthayya S, Vincent F, Wright D, Gordon R, Bell J, Gillor G, Booker J, Fernando P, Kalucy D, Sherriff S, Tong A, Parter C, Bailey S, Redman S, Banks E, Craig JC (2016) A case study of enhanced clinical care enabled by aboriginal health research: the Hearing, EAr health and Language Services (HEALS) project. Aust N Z J Public Health 40:523–528CrossRefGoogle Scholar
  12. 12.
    Rubin DB (1976) Inference and missing data. Biometrika 63:581–592CrossRefGoogle Scholar
  13. 13.
    Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177CrossRefGoogle Scholar
  14. 14.
    Little RJA, Rubin DB (2014) Statistical analysis with missing data. Wiley, HobokenGoogle Scholar
  15. 15.
    Groenwold RH, Donders AR, Roes KC, Harrell FE Jr, Moons KG (2012) Dealing with missing outcome data in randomized trials and observational studies. Am J Epidemiol 175:210–217CrossRefGoogle Scholar
  16. 16.
    Bartlett JW, Harel O, Carpenter JR (2015) Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. Am J Epidemiol 182:730–736CrossRefGoogle Scholar
  17. 17.
    Liublinska V, Rubin DB (2012) Re: “Dealing with missing outcome data in randomized trials and observational studies”. Am J Epidemiol 176:357–358CrossRefGoogle Scholar
  18. 18.
    Cologne J, Furukawa K (2016) Re: “Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression”. Am J Epidemiol 184:160CrossRefGoogle Scholar
  19. 19.
    White IR, Carlin JB (2010) Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 29:2920–2931CrossRefGoogle Scholar
  20. 20.
    Little R, An H (2004) Robust likelihood-based analysis of multivariate data with missing values. Stat Sin 14:949–968Google Scholar
  21. 21.
    Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH (2005) Missing-data methods for generalized linear models. J Am Stat Assoc 100:332–346CrossRefGoogle Scholar
  22. 22.
    Cheng J, Edwards LJ, Maldonado-Molina MM, Komro KA, Muller KE (2010) Real longitudinal data analysis for real people: building a good enough mixed model. Stat Med 29:504–520CrossRefGoogle Scholar
  23. 23.
    Verbeke G, Fieuws S, Molenberghs G, Davidian M (2014) The analysis of multivariate longitudinal data: a review. Stat Methods Med Res 23:42–59CrossRefGoogle Scholar
  24. 24.
    Teixeira-Pinto A, Mauri L (2011) Statistical analysis of noncommensurate multiple outcomes. Circ Cardiovasc Qual Outcomes 4:650–656CrossRefGoogle Scholar
  25. 25.
    White IR, Horton NJ, Carpenter J, Pocock SJ (2011) Strategy for intention to treat analysis in randomised trials with missing outcome data. BMJ 342:d40CrossRefGoogle Scholar
  26. 26.
    Ibrahim JG, Chu H, Chen LM (2010) Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol 28:2796–2801CrossRefGoogle Scholar
  27. 27.
    Faucett CL, Schenker N, Jeremy MGT (2002) Survival analysis using auxiliary variables via multiple imputation, with application to AIDS clinical trial data. Biometrics 58:37–47CrossRefGoogle Scholar
  28. 28.
    Hogan JW, Laird NM (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16:239–257CrossRefGoogle Scholar
  29. 29.
    Seaman SR, White IR (2013) Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res 22:278–295CrossRefGoogle Scholar
  30. 30.
    Kreuter F, Valliant R (2007) A survey on survey statistics: what is done and can be done in Stata. Stata J 7:1–21CrossRefGoogle Scholar
  31. 31.
    De Goeij MCM, Van Diepen M, Jager KJ, Tripepi G, Zoccali C, Dekker FW (2013) Multiple imputation: dealing with missing data. Nephrol Dial Transplant 28:2415–2420CrossRefGoogle Scholar
  32. 32.
    van Buuren S, Groothuis-Oudshoorn K (2011) Mice : multivariate imputation by chained equations in R. J Stat Softw 45:1–67. CrossRefGoogle Scholar
  33. 33.
    StataCorp (2015) Stata 14 base reference manual. Stata Press, College StationGoogle Scholar
  34. 34.
    Moons KGM, Donders RART, Stijnen T, Harrell FE Jr (2006) Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 59:1092–1101CrossRefGoogle Scholar
  35. 35.
    Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393CrossRefGoogle Scholar
  36. 36.
    Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P (2007) Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ 335:136–136CrossRefGoogle Scholar
  37. 37.
    Peto R (2007) Doubts about QRISK score: total/HDL cholesterol should be important [electronic response to Hippisley-Cox J, et al]. BMJ [rapid response].
  38. 38.
    Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P (2007) QRISK: authors’ response. BMJ [rapid response].
  39. 39.
    Graham JW, Olchowski AE, Gilreath TD (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci 8:206–213CrossRefGoogle Scholar
  40. 40.
    Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res 33:545–571CrossRefGoogle Scholar
  41. 41.
    Herring AH, Ibrahim JG, Lipsitz SR (2004) Non-ignorable missing covariate data in survival analysis: a case-study of an International Breast Cancer Study Group trial. J R Stat Soc Ser C Appl Stat 53:293–310CrossRefGoogle Scholar
  42. 42.
    Klebanoff MA, Cole SR (2008) Use of multiple imputation in the epidemiologic literature. Am J Epidemiol 168:355–357CrossRefGoogle Scholar
  43. 43.
    Laine C, Goodman SN, Griswold ME, Sox HC (2007) Reproducible research: moving toward research the public can really trust. Ann Intern Med 146:450–453CrossRefGoogle Scholar

Copyright information

© IPNA 2018

Authors and Affiliations

  1. 1.Department of NephrologyPrincess Margaret HospitalSubiacoAustralia
  2. 2.Sydney School of Public HealthUniversity of SydneySydneyAustralia
  3. 3.Centre for Kidney ResearchThe Children’s Hospital at WestmeadWestmeadAustralia

Personalised recommendations