Skip to main content

Missing Data

  • Chapter
  • First Online:
Statistics Applied to Clinical Studies

Abstract

The imputation of missing data using mean values or values of the “closest neighbor observed”, has been routinely carried out on demographic data files since 1960 (Anonymous). The appointment of congressional seats and other political decisions have been partly based on it (Anonymous 2001), and president Obama is having the White House use it again in its 2010 census (Anonymous). Also in clinical research missing data are common, but compared to demographics, clinical research produces generally smaller files, making a few missing data more of a problem than it is with demographic files. As an example, a 35 patient data file of 3 variables consists of 3 × 35 = 105 values if the data are complete. With only 5 values missing (1 value missing per patient) 5 patients will not have complete data, and are rather useless for the analysis. This is not 5% but 15% of this small study population of 35 patients. An analysis of the remaining 85% patients is likely not to be powerful to demonstrate the effects we wished to assess. This illustrates the necessity of data imputation. Apart from the above two methods for data imputation regression-substitution has been employed in clinical research. In principle, the blanks are replaced with the best predicted values from a multiple linear regression-equation obtained from the data available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

In order to perform the multiple imputation method the SPSS add-on module “Missing Values” is suitable. For explanation the example of the constipation study is used once more. First, the pattern of the missing data must be checked using the command “analyze pattern”. If the missing data are equally distributed and no “islands” of missing data exist, the model will be appropriate.

The following commands are needed:

Transform…random number generators…

Analyze…multiple imputations…impute missing data… (the imputed data file must be given a new name e.g. “study name imputed”).

Five or more times a file is produced by the software program in which the missing values are replaced with simulated versions using the Monte Carlo method (Table 22.7, see also Chap. 57). In our example the variables are continuous and, thus, need no transformation. If you run a usual linear regression of the summary of your “imputed” data files, then the software will automatically produce pooled regression coefficients instead of the usual regression coefficients. In our example the multiple imputation method produced a much larger p-value for the predictor age than the regression imputation did, and the result was, thus, less overstated than it was with regression imputation. Actually, the result was rather similar to that of mean and hot deck imputation, and statistical significance at p  <  0.05 was not obtained (Table 22.8). Why then do it anyway. The argument is that, with the multiple imputation method, the imputed values are not used as constructed real values, but rather as a device for representing missing data uncertainty. This approach is a safe and probably, scientifically, better alternative to the standard methods. In the given example, unlike regression imputation, it did not seem to overstate the sensitivity of testing (Table 22.8, p-values regression imputation versus multiple imputation 0.005 versus 0.097).

Table 22.7 Missing data file and 5 imputed data files (35 patients) produced by the SPSS add-on module “Missing Vales” using the command multiple imputations
Table 22.8 The regression coefficients and their p-values obtained using different methods of data imputation

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Cleophas, T.J., Zwinderman, A.H. (2012). Missing Data. In: Statistics Applied to Clinical Studies. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2863-9_22

Download citation

Publish with us

Policies and ethics