Missing data are uncollected data but meaningful for the statistical analysis due to clinical relevancy of the data for properly specified estimands in clinical trials. Meanwhile the efforts to prevent or minimize missing data are commonly applied in clinical trials, in practice, missing data still occurs. Choosing a statistical method for imputation that deals with missing data targeting specified estimands provides the more reliable estimates of treatment effects.
We considered longitudinal clinical settings that have different degrees of missing data and treatment effects, and simulated different missing mechanisms using data from randomized, double-blind, placebo-controlled phase 3 confirmatory clinical trials of approved drugs. We compared four commonly used statistical methods to deal with missing data in clinical trials.
We find that, when the data are missing not at random (MNAR) with higher missing rates, mixed model for repeated measurements (MMRM) method overestimates treatment difference. Pattern-mixture model estimates were seen to be more conservative in our studies than MMRM given MNAR assumptions, which are more realistic with missing data in clinical trials.
We emphasize the importance of prevention of missing data and specifying the estimand based on trial objectives beforehand. The specified proper estimand and the proper statistical method might be key features to value the clinical trial results despite missing data.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
International Council for Harmonization (2017) Draft ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials (EMA/CHMP/ICH/436221/2017)
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
Little RJ. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. 1988;83(404):1198–202.
Laird NM. Missing data in longitudinal studies. Stat Med. 1988;7(1–2):305–15.
Barnes SA, Mallinckrodt CH, Lindborg SR, Carter MK. The impact of missing data and how it is handled on the rate of false-positive results in drug development. Pharm Stat. 2008;7(3):215–25.
Mallinckrodt CH, Kaiser CJ, Watkin JG, Detke MJ, Molenberghs G, Carroll RJ. Type I error rates from likelihood-based repeated measures analyses of incomplete longitudinal data. Pharm Stat. 2004;3(3):171–86.
Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat. 2013;23(6):1352–71.
Nüesch E, Häuser W, Bernardy K, Barth J, Jüni P. Comparative efficacy of pharmacological and non-pharmacological interventions in fibromyalgia syndrome: network meta-analysis. Ann Rheum Dis. 2013;72(6):955–62.
Stone NJ, Robinson JG, Lichtenstein AH, Merz CB, Blum CB, Eckel RH. American College of Cardiology/American Heart Association Task Force on Practice Guidelines. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(25 Suppl 2):S1–S45.
Molenberghs G, Kenward M. Missing Data in Clinical Studies. Hoboken: Wiley; 2007.
R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Wickham H, et al. Welcome to the tidyverse. J Open Source Softw. 2019;4(43):1686. https://doi.org/10.21105/joss.01686.
Siddiqui O, Hung HJ, O'Neill R. MMRM vs. LOCF: a comprehensive comparison based on simulation study and 25 NDA datasets. J Biopharm Stat. 2009;19(2):227–46.
Glynn RJ, Laird NM, Rubin DB. Selection modeling versus mixture modeling with nonignorable nonresponse. In: Wainer H, editor. DDrawing Inferences from Self-selected Samples. New York, NY: Springer; 1986. p. 115–142.
Little RJ. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc. 1993;88(421):125–34.
Rubin DB. Multiple Imputation for Survey Nonresponse. New York: Wiley; 1987.
SAS Institute Inc. 2018. SAS/STAT® 15.1 User’s Guide. Cary, NC, USA
Rombach I, Jenkinson C, Gray AM, Murray DW, Rivero-Arias O. Comparison of statistical approaches for analyzing incomplete longitudinal patient-reported outcome data in randomized controlled trials. Pat Related Outcome Meas. 2018;9:197.
Elobeid MA, Padilla MA, McVie T, Thomas O, Brock DW, Musser B, Gadde KM, et al. Missing data in randomized clinical trials for weight loss: scope of the problem, state of the field, and performance of statistical methods. PLoS ONE. 2009;4(8):6624.
Genolini C, Jacqmin-Gadda H. Copy mean: a new method to impute intermittent missing values in longitudinal studies. Open J Stat. 2013;3(04):26.
Lee M, Rahbar MH, Gensler LS, Brown M, Weisman M, Reveille JD. A latent class based imputation method under Bayesian quantile regression framework using asymmetric Laplace distribution for longitudinal medication usage data with intermittent missing values. J Biopharm Stat. 2020;30(1):160–77.
Filippatos GS, de Graeff P, Bax JJ, Borg J-J, Cleland JG, Dargie HJ, Flather M, Ford I, Friede T, Greenberg B, Henon-Goburdhun C, Holcomb R, Horst B, Lekakis J, Mueller-Velten G, Papavassiliou AG, Prasad K, Rosano GM, Severin T, Sherman W, Stough WG, Swedberg K, Tavazzi L, Tousoulis D, Vardas P, Ruschitzka F, Anker SD. Independent academic Data Monitoring Committees for clinical trials in cardiovascular and cardiometabolic diseases. Eur J Heart Fail. 2017;19:449–56. https://doi.org/10.1002/ejhf.761.
This work was supported in part by the Oak Ridge Institute for Science and Education (ORISE) summer fellowship program. This paper reflects the views of the authors and should not be construed to represent FDA’s views or policies. We would like to thank the anonymous reviewers for the careful reading of our manuscript and for providing us with critical and insightful comments.
Conflict of interest
The authors declare that they have no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
About this article
Cite this article
Gnang, J., Kim, Y., Ren, Y. et al. An Empirical Comparison of Statistical Methods for Missing Data in Randomized, Double-Blind, Placebo-Controlled, Phase 3 Clinical Trials for Chronic Pain and Lipid-Lowering Products. Ther Innov Regul Sci (2020). https://doi.org/10.1007/s43441-020-00168-6
- Pattern-mixture model
- Multiple imputation