Missing Data Analysis: A Kernel-Based Multi-Imputation Approach

  • Shichao Zhang
  • Zhi Jin
  • Xiaofeng Zhu
  • Jilian Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5300)


Many missing data analysis techniques are of single-imputation. However, single-imputation cannot provide valid standard errors and confidence intervals, since it ignores the uncertainty implicit in the fact that the imputed values are not the actual values. Filling in each missing value with a set of plausible values is called multi-imputation. In this paper we propose a kernel-based stochastic non-parametric multi-imputation method under MAR (Missing at Random) and MCAR (Missing Completely at Random) missing mechanisms in nonparametric regression settings. Furthermore, we present a kernel-based stochastic semi-parametric multi-imputation method while we have some priori knowledge about the dataset with missing. Our algorithms are designed specifically with the aim of optimizing the confidence-interval and the relative efficiency. The proposed technique is evaluated by experimentations, using simulation data and real data, and the results demonstrate that our method performs much better than the NORM method, and is promising.


Multiple Imputation Relative Efficiency Coverage Probability Imputation Method Multiple Imputation Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allison, P.D.: Multiple imputation for missing data: a cautionary tale. Sociological Methods and Research 28, 301–309 (2000)CrossRefGoogle Scholar
  2. 2.
    Blake, C., Merz, C.: UCI Repository of machine learning database. University of California, Department of Information and Computer Science, Irvine (1998), Google Scholar
  3. 3.
    Cios, K., Kurgan, L.: Trends in Data Mining and Knowledge Discovery. In: Pal, N., Jain, L., Teoderesku, N. (eds.) Knowledge Discovery in Advanced Information Systems. Springer, Heidelberg (2002)Google Scholar
  4. 4.
    Engle, R.F., et al.: Semiparametric Estimates of the Relation Between Weather and Electricity Sales Applications, JASA, vol. 394 (June 1986)Google Scholar
  5. 5.
    Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A Novel Framework for Imputation of Missing Values in Databases. IEEE Transactions on SMC(A) (2007)Google Scholar
  6. 6.
    Faris, P., et al.: Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology 55, 184–191 (2002)CrossRefGoogle Scholar
  7. 7.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)zbMATHGoogle Scholar
  8. 8.
    Karmaker, A., Kwek, S.: Incorporating an EM-Approach for Handling Missing Attribute-Values in Decision Tree Induction. In: Fifth International Conference on Hybrid Intelligent Systems (2005)Google Scholar
  9. 9.
    Kang, S.S., Koehler, K., Larsen, M.D.: Partial FEFI for Incomplete Tables with Covariates Iowa State University, JSM (2007)Google Scholar
  10. 10.
    Kim, Y.: The curse of the missing data (2001),
  11. 11.
    Little, R., Rubin, D.: Statistical analysis with missing data, 2nd edn. John Wiley & Sons, New York (2002)zbMATHGoogle Scholar
  12. 12.
    Millimet, D., List, J., Stengos, T.: The Environmental Kuznets Curve: Real Progress or Misspecified Models? Review of Economics and Statistics 85(4), 1038–1047 (2003)CrossRefGoogle Scholar
  13. 13.
    Mostafa, M.H., Amir, F.A., Neamat, E.G., Raafat, E.F.: Regression in the Presence Missing Data Using Ensemble Methods. In: Proc. IJCNN (2007)Google Scholar
  14. 14.
    Peixoto, J.: A property of well-formulated polynomial regression models. American Statistician 44, 26–30 (1990)MathSciNetGoogle Scholar
  15. 15.
    Peng, C., Zhu, J.: Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression. Educational and Psychological Measurement (2008)Google Scholar
  16. 16.
    Pin, T., James, L.: The Elasticity of Demand for Gasoline: A Semi-Parametric Analysis (2005),
  17. 17.
    Qin, Y.S., et al.: Semi-parametric optimization for missing data imputation. Appl. Intell. 27(1), 79–88 (2007)CrossRefzbMATHGoogle Scholar
  18. 18.
    Schafer, J.: Analysis of incomplete multivariate data, 1st edn. Chapman and Hall, London (1997)CrossRefzbMATHGoogle Scholar
  19. 19.
    Schafer, J.: NORM: Multiple imputation of incomplete multivariate data under a normal model. Version 2 (1999)Google Scholar
  20. 20.
    Scheffer, J.: Dealing with Missing Data. Res. Lett. Inf. Math. Sci. 3, 153–160 (2002)Google Scholar
  21. 21.
    Wang, Q., Rao, J.: Empirical likelihood-based inference under imputation for missing response data. Ann. Statist. 30, 896–924 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Yuan, Y.: Multiple imputation for missing data: concepts and new development SAS/STAT 8.2. SAS Institute Inc., Cary, NC (2001),
  23. 23.
    Zhang, C.Q., et al.: Efficient Imputation Method for Missing Values. In: PAKDD (2007)Google Scholar
  24. 24.
    Zhang, L.: Nonparametric Markov chain bootstrap for multiple imputation (2004)Google Scholar
  25. 25.
    Zhang, S., et al.: Missing is useful: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12) (2005)Google Scholar
  26. 26.
    Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing Value Imputation Based on Data Clustering. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science I. LNCS, vol. 4750, pp. 128–138. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  27. 27.
    Zhang, S.C., et al.: POP Algorithm: Kernel-Based Imputation to Treat Missing data in Knowledge Discovery from Databases, Expert Systems with Applications (2008)Google Scholar
  28. 28.
    Zhang, W.: Association based multiple imputation in multivariate datasets: A summary. In: Proc. 16th ICDE (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Shichao Zhang
    • 1
    • 2
    • 3
  • Zhi Jin
    • 4
  • Xiaofeng Zhu
    • 1
  • Jilian Zhang
    • 1
  1. 1.College of CS and ITGuangxi Normal UniversityChina
  2. 2.Faculty of EITUTSAustralia
  3. 3.State Key Lab for Novel Software TechnologyNanjing UniversityPR China
  4. 4.School of EE and CSPeking UniversityPR China

Personalised recommendations