Abstract
A new nonparametric technique to impute missing data is proposed in order to obtain a completed data-matrix, capable of producing a degree of reliability for the imputations. Without taking into account strong assumptions, we introduce multiple imputations using bootstrap and nonparametric predictors. It is shown that, in this manner, we can obtain better imputations than with other known methods producing a more reliable completed data-matrix. Using two simulations, we show that the proposed technique can be generalized to consider non-monotone patterns of missing data with interesting results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Conversano, C., Siciliano, R.: Incremental Tree-based missing data imputation with lexicographic ordering. In: Minotte, M., Swzychak, A. (eds.) Interface 2003 Proceedings, Interface Foundation of North America, Washington, DC (2003)
Di Ciaccio, A., Vallely, T.: Use of non-parametric methods for the imputation of missing data. A comparison based on extensive Montecarlo simulations. In: S.Co.2007, Venice. http://venus.unive.it/sco2007/ocs/papers.php (2007)
Di Zio, M., Guarnera, U., Luzi, O.: Imputation through finite Gaussian mixture models. Comput. Stat. Data Anal. 51, 5305–5316 (2007)
Efron, B.: Missing data, imputation, and the bootstrap. J. Am. Stat. Assoc. 89(426), 463–475 (1994)
Fay, E.R.: Alternative paradigms for the analysis of imputed survey data. J. Am. Stat. Assoc. 91(434), 490–498 (1996)
Little, R., Rubin, D.: Statistical Analysis with Missing Data. Wiley, New York, NY (1987)
Mesa, D., Tsai, P., Chambers, R.L.: Using tree-based models for missing data imputation: an evaluation using UK census data. Research Note, Department of Social Statistics, University of Southampton, London (2000)
Nielsen, S.F.: Proper and improper multiple imputation. Intern. Stat. Rev. 71(3), 593–607 (2003)
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using sequence of regression models. Surv. Methodol. 27(1), 85–95 (2001)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York, NY (1987)
Schafer, J.L., Schenker, N.: Inference with imputed conditional means. J. Am. Stat. Assoc. 95(449), 144–154 (2000)
Schenker, N., Taylor, J.M.G.: Partially parametric techniques for multiple imputation. Comput. Stat. Data Anal. 22, 425–446 (1996)
Shao J., Sitter R.R.: Bootstrap for imputed survey data. J. Am. Stat. Assoc. 91(435), 1278–1288 (1996)
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)
Van Buuren, S., Oudshoorn, C.G.M.: Multivariate imputation by chained equations: MICE V1.0 User’s manual. Report G/VGZ/00.038. Leiden, TNO Preventie en Gezondheid (2000)
Zhang, J., Everson, R.: Bayesian estimation and classification with incomplete data using mixture models. Proceedings of the 2004 International Conference on Machine Learning and Applications, Louisville, KY, USA, pp. 296–303 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Ciaccio, A. (2011). Bootstrap and Nonparametric Predictors to Impute Missing Data. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-13312-1_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13311-4
Online ISBN: 978-3-642-13312-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)