Missing Data

Cleophas, Ton J.; Zwinderman, Aeilko H.

doi:10.1007/978-94-007-2863-9_22

Ton J. Cleophas^3,4 &
Aeilko H. Zwinderman^3,5

6137 Accesses

Abstract

The imputation of missing data using mean values or values of the “closest neighbor observed”, has been routinely carried out on demographic data files since 1960 (Anonymous). The appointment of congressional seats and other political decisions have been partly based on it (Anonymous 2001), and president Obama is having the White House use it again in its 2010 census (Anonymous). Also in clinical research missing data are common, but compared to demographics, clinical research produces generally smaller files, making a few missing data more of a problem than it is with demographic files. As an example, a 35 patient data file of 3 variables consists of 3 × 35 = 105 values if the data are complete. With only 5 values missing (1 value missing per patient) 5 patients will not have complete data, and are rather useless for the analysis. This is not 5% but 15% of this small study population of 35 patients. An analysis of the remaining 85% patients is likely not to be powerful to demonstrate the effects we wished to assess. This illustrates the necessity of data imputation. Apart from the above two methods for data imputation regression-substitution has been employed in clinical research. In principle, the blanks are replaced with the best predicted values from a multiple linear regression-equation obtained from the data available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anonymous. Hot deck imputation. http://www.convervapedia.com/Hot-Deck_Imputation. Accessed 15 Dec 2011
Anonymous (2001) Utah v Evans, 182 F. supp. 2d 1165
Google Scholar
Chi GY, Jin K, Chen G (2003) Some statistical issues of relevance to confirmatory trials; statistical bias. In: Lu Y, Fang J (eds) Advanced medical statistics. World Scientific, Hackensack, pp 523–579
Chapter Google Scholar
Feingold M (1982) Missing data in linear models with correlated errors. Comm Stat 11:2831–2833
Article Google Scholar
Haitovsky Y (1968) Missing data in regression analysis. J R Stat Soc 3:2–3
Google Scholar
IBM SPSS Missing values. www.spss.com/software/statistics/missing-values/. Accessed 14 Dec 2011
IBM SPSS Statistics 17.0. www.spss.com/software/statistics/chnages.htm. Accessed 14 Dec 2011
Kshirsagar AM, Deo S (1989) Distribution of the biased hypothesis sum of squares in linear models with missing observations. Commun Stat 18:2747–2754
Article Google Scholar
Little RJ, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Google Scholar
Scheuren F (2005) Multiple imputation: how it began and continues. Am Stat 59:315–319
Article Google Scholar

Download references

Author information

Authors and Affiliations

Applied to Clinical Trials, European Interuniversity College of Pharmaceutical Medicine, Lyon, France
Ton J. Cleophas (Past-President American College of Angiology Co-Chair Module Statistics) & Aeilko H. Zwinderman (President-Elect International Society of Biostatistics Co-Chair Module Statistics)
Department of Medicine, Albert Schweitzer Hospital, Dordrecht, Netherlands
Ton J. Cleophas (Past-President American College of Angiology Co-Chair Module Statistics)
Department of Biostatistics and Epidemiology, Academic Medical Center, Amsterdam, Netherlands
Aeilko H. Zwinderman (President-Elect International Society of Biostatistics Co-Chair Module Statistics)

Authors

Ton J. Cleophas
View author publications
You can also search for this author in PubMed Google Scholar
Aeilko H. Zwinderman
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

In order to perform the multiple imputation method the SPSS add-on module “Missing Values” is suitable. For explanation the example of the constipation study is used once more. First, the pattern of the missing data must be checked using the command “analyze pattern”. If the missing data are equally distributed and no “islands” of missing data exist, the model will be appropriate.

The following commands are needed:

Transform…random number generators…

Analyze…multiple imputations…impute missing data… (the imputed data file must be given a new name e.g. “study name imputed”).

Five or more times a file is produced by the software program in which the missing values are replaced with simulated versions using the Monte Carlo method (Table 22.7, see also Chap. 57). In our example the variables are continuous and, thus, need no transformation. If you run a usual linear regression of the summary of your “imputed” data files, then the software will automatically produce pooled regression coefficients instead of the usual regression coefficients. In our example the multiple imputation method produced a much larger p-value for the predictor age than the regression imputation did, and the result was, thus, less overstated than it was with regression imputation. Actually, the result was rather similar to that of mean and hot deck imputation, and statistical significance at p < 0.05 was not obtained (Table 22.8). Why then do it anyway. The argument is that, with the multiple imputation method, the imputed values are not used as constructed real values, but rather as a device for representing missing data uncertainty. This approach is a safe and probably, scientifically, better alternative to the standard methods. In the given example, unlike regression imputation, it did not seem to overstate the sensitivity of testing (Table 22.8, p-values regression imputation versus multiple imputation 0.005 versus 0.097).

Table 22.7 Missing data file and 5 imputed data files (35 patients) produced by the SPSS add-on module “Missing Vales” using the command multiple imputations

Full size table

Table 22.8 The regression coefficients and their p-values obtained using different methods of data imputation

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cleophas, T.J., Zwinderman, A.H. (2012). Missing Data. In: Statistics Applied to Clinical Studies. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2863-9_22

Download citation

DOI: https://doi.org/10.1007/978-94-007-2863-9_22
Published: 14 December 2011
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2862-2
Online ISBN: 978-94-007-2863-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Missing Data

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation