Summary
The problem of missing data is often addressed with imputation. Traditional single imputation methods, such as the ratio imputation, multiple regression imputation, nearest neighbor imputation, respondent mean imputation or hot deck imputation, have been widely used to compensate for non-response. Nonparametric regression methods have been recently applied to the estimation of the regression function in a wide range of settings and areas of research. The focus of this work is on replacing missing observations on a variable of interest by imputed values obtained from a new algorithm based on Multivariate Adaptive Regression Splines. Some imputation methods can lead to serious underestimation for measures of population distributions. This bias can be reduced by adding to the imputed values a small disturbance drawn from a given distribution. Two different methods of adding the random disturbance are also described. Numerical examples are presented to illustrate the theoretical results and analyze the precision of the proposed method.
The authors would like to thank the editors for this opportunity to contribute to this volume in honour to María Luisa Menéndez.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breidt, F.J., Claeskens, G., Opsomer, J.D.: Model-assisted estimation for complex surveys using penalized splines. Biometrika 92, 831–846 (2005)
Breiman, L., Friedman, J.H., Olhsen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Chambers, R.L., Dunstan, R.: Estimating distribution functions from survey data. Biometrika 73, 597–604 (1986)
Chen, J., Shao, J.: Nearest neighbor imputation for survey data. J. Offic. Statist. 16, 113–131 (2000)
Cheng, P.E.: Nonparametric Estimation of Mean Functionals with Data Missing ar Random. J. Amer. Statist. Assoc. 89, 81–87 (1994)
Chu, C.K., Cheng, P.E.: Nonparametric regression estimation with missing data. J. Statist. Plan Infer. 48, 85–99 (1995)
Conversano, C., Siciliano, R.: Incremental Tree-Based Missing Data Imputation with Lexicographic Ordering. J. Classif. 26(3), 361–379 (2009)
D’Ambrosio, A., Aria, M., Siciliano, R.: Robust tree-based incremental imputation method for data fusion. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 174–183. Springer, Heidelberg (2007)
Ding, Y., Simonoff, J.S.: An investigation of missing data methods for classification trees applied to binary response data. J. Mach. Learn Res. 11, 131–170 (2010)
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Statist. Sci. 11(2), 86–121 (1996)
Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Chapman & Hall, London (1996)
Fox, J.: Applied Regression Analysis, Linear Models and Related Methods. Sage Publications, Hamilton (1997)
Friedman, J.H.: Multivariate Adaptive Regression Splines. Ann. Statist. 19(1), 1–67 (1991)
Hastie, T.: Pseudosplines. J. Royal Statist. Soc. Ser. B 58, 376–396 (1996)
Hu, M., Salvucci, S., Lee, R.: A Study of Imputation Algorithms. Working Paper No. 2001-17. Washington DC: U.S. Department of Education, National Center for Education Statistics, 27 Stata Statistical Software (2001)
Iacus, S.M., Porro, G.: Missing data imputation, matching and other applications of random recursive partitioning. Comp. Statist. Data Anal. 52(2), 773–789 (2007)
Little, R.J.A., Rubin, D.: Statistical Analysis with missing data. Wiley, New York (2002)
Marx, B.D., Eilers, P.H.C.: Direct generalized additive modelling with penalized likelihood. Comp. Statist. Data Anal. 28, 193–209 (1998)
Montaquila, J.M., Ponikowski, C.H.: An evaluation of alternative imputation methods. Proc. Section on Surv. Res. Meth. Amer. Statist. Assoc., 281–286 (1995)
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58, 415–434 (1963)
Nitter, T.: The additive model affected by missing completely at random in the covariate. Comput. Statist. 19(2), 261–282 (2004)
Pineo, P.C., Porter, J., McRoberts, H.A.: The 1971 census and the socioeconomic misclassification of occupations. Can Rev. Sociol. Anthropol. 14, 147–157 (1977)
Rao, J.N.K., Kovar, J.G., Mantel, H.J.: On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77, 365–375 (1990)
Rubin, D.B.: Formalizing subjective notions about the effect of nonrespondents in sample surveys. J. Amer. Statist. Assoc. 72, 53–543 (1977)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Rubin, D.B.: Multiple imputations in sample surveys. Proc. Section on Surv. Res. Meth. Amer. Statist. Assoc., 20–34 (1978)
Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Statist. 22(3), 1346–1370 (1994)
Schafer, J.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)
Särndal, C.E., Lunström, S.: Estimation in Surveys with Nonresponse. Wiley Series in Survey Methodology. Wiley, New York (2005)
Särndal, C.E., Swensson, B., Wretman: Model Assisted Survey Sampling. Springer, New York (1992)
Wand, M.: Smoothing and mixed models. Comput. Statist. 18, 223–249 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sánchez-Borrego, I., del Mar Rueda, M., Muñoz, J.F. (2011). Imputation and Inference with Multivariate Adaptive Regression Splines. In: Pardo, L., Balakrishnan, N., Gil, M.Á. (eds) Modern Mathematical Tools and Techniques in Capturing Complexity. Understanding Complex Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20853-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-20853-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20852-2
Online ISBN: 978-3-642-20853-9
eBook Packages: EngineeringEngineering (R0)