Abstract
The imputation of missing data using mean values or values of the “closest neighbor observed”, has been routinely carried out on demographic data files since 1960 (Anonymous). The appointment of congressional seats and other political decisions have been partly based on it (Anonymous 2001), and president Obama is having the White House use it again in its 2010 census (Anonymous). Also in clinical research missing data are common, but compared to demographics, clinical research produces generally smaller files, making a few missing data more of a problem than it is with demographic files. As an example, a 35 patient data file of 3 variables consists of 3 × 35 = 105 values if the data are complete. With only 5 values missing (1 value missing per patient) 5 patients will not have complete data, and are rather useless for the analysis. This is not 5% but 15% of this small study population of 35 patients. An analysis of the remaining 85% patients is likely not to be powerful to demonstrate the effects we wished to assess. This illustrates the necessity of data imputation. Apart from the above two methods for data imputation regression-substitution has been employed in clinical research. In principle, the blanks are replaced with the best predicted values from a multiple linear regression-equation obtained from the data available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anonymous. Hot deck imputation. http://www.convervapedia.com/Hot-Deck_Imputation. Accessed 15 Dec 2011
Anonymous (2001) Utah v Evans, 182 F. supp. 2d 1165
Chi GY, Jin K, Chen G (2003) Some statistical issues of relevance to confirmatory trials; statistical bias. In: Lu Y, Fang J (eds) Advanced medical statistics. World Scientific, Hackensack, pp 523–579
Feingold M (1982) Missing data in linear models with correlated errors. Comm Stat 11:2831–2833
Haitovsky Y (1968) Missing data in regression analysis. J R Stat Soc 3:2–3
IBM SPSS Missing values. www.spss.com/software/statistics/missing-values/. Accessed 14 Dec 2011
IBM SPSS Statistics 17.0. www.spss.com/software/statistics/chnages.htm. Accessed 14 Dec 2011
Kshirsagar AM, Deo S (1989) Distribution of the biased hypothesis sum of squares in linear models with missing observations. Commun Stat 18:2747–2754
Little RJ, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Scheuren F (2005) Multiple imputation: how it began and continues. Am Stat 59:315–319
Author information
Authors and Affiliations
Appendix
Appendix
In order to perform the multiple imputation method the SPSS add-on module “Missing Values” is suitable. For explanation the example of the constipation study is used once more. First, the pattern of the missing data must be checked using the command “analyze pattern”. If the missing data are equally distributed and no “islands” of missing data exist, the model will be appropriate.
The following commands are needed:
Transform…random number generators…
Analyze…multiple imputations…impute missing data… (the imputed data file must be given a new name e.g. “study name imputed”).
Five or more times a file is produced by the software program in which the missing values are replaced with simulated versions using the Monte Carlo method (Table 22.7, see also Chap. 57). In our example the variables are continuous and, thus, need no transformation. If you run a usual linear regression of the summary of your “imputed” data files, then the software will automatically produce pooled regression coefficients instead of the usual regression coefficients. In our example the multiple imputation method produced a much larger p-value for the predictor age than the regression imputation did, and the result was, thus, less overstated than it was with regression imputation. Actually, the result was rather similar to that of mean and hot deck imputation, and statistical significance at p < 0.05 was not obtained (Table 22.8). Why then do it anyway. The argument is that, with the multiple imputation method, the imputed values are not used as constructed real values, but rather as a device for representing missing data uncertainty. This approach is a safe and probably, scientifically, better alternative to the standard methods. In the given example, unlike regression imputation, it did not seem to overstate the sensitivity of testing (Table 22.8, p-values regression imputation versus multiple imputation 0.005 versus 0.097).
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Cleophas, T.J., Zwinderman, A.H. (2012). Missing Data. In: Statistics Applied to Clinical Studies. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2863-9_22
Download citation
DOI: https://doi.org/10.1007/978-94-007-2863-9_22
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2862-2
Online ISBN: 978-94-007-2863-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)