Missing Data Methods

  • Kristian Kleinke
  • Jost Reinecke
  • Daniel Salfrán
  • Martin Spiess
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


In this chapter missing data procedures and techniques are reviewed and discussed. Among them are both, ad-hoc methods but also more sophisticated techniques including maximum likelihood estimation, weighting and imputation. We discuss pros and cons of the different approaches and techniques, and give practical advice which procedure might be suited best in a given scenario because valid inferences in applied research can only be expected based on informed decisions. A conclusion of this chapter will be that there is not the one method or technique that works best under every possible scenario.


  1. Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and applications. Cambridge, UK: University Press.zbMATHCrossRefGoogle Scholar
  2. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1), 1–22.MathSciNetzbMATHGoogle Scholar
  3. Freedman, D. A., & Berk, R. A. (2008). Weighting regressions by propensity scores. Evaluation Review, 32(4), 392–409.CrossRefGoogle Scholar
  4. Hausman, J. A., & Wise, D. A. (1979). Attrition bias in experimental and panel data: The Gary income maintenance experiment. Econometrica, 47(2), 455–473.CrossRefGoogle Scholar
  5. Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.MathSciNetzbMATHCrossRefGoogle Scholar
  6. Horvitz, D. G. & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47 (260), 663–685.MathSciNetzbMATHCrossRefGoogle Scholar
  7. Jamshidian, M., & Jennrich, R. I. (2000). Standard errors for EM estimation. Journal of the Royal Statistical Society, Series B, 62(2), 257–270.MathSciNetCrossRefGoogle Scholar
  8. Jones, M. P. (1996). Indicator and stratification methods for missing explanatory variables in multiple linear regression. Journal of the American Association, 91(433), 222–230.MathSciNetzbMATHCrossRefGoogle Scholar
  9. Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539.MathSciNetzbMATHCrossRefGoogle Scholar
  10. Lehmann, E. L. (1998). Elements of large-sample theory. New York, NY: Springer.Google Scholar
  11. Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed.). New York, NY: Springer.zbMATHGoogle Scholar
  12. Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88(421), 125–134.zbMATHGoogle Scholar
  13. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New Jersey, NJ: Wiley.zbMATHCrossRefGoogle Scholar
  14. Liu, C., & Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81(4), 633–648.MathSciNetzbMATHCrossRefGoogle Scholar
  15. McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403–425.CrossRefGoogle Scholar
  16. Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9(4), 538–558.CrossRefGoogle Scholar
  17. Meng, X.-L., & Rubin, D. B. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86(416), 899–909.CrossRefGoogle Scholar
  18. Meng, X.-L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80(2), 267–278.MathSciNetzbMATHCrossRefGoogle Scholar
  19. Meng, X.-L., & van Dyk, D. (1997). The EM algorithm—an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society, 59(3), 511–540.MathSciNetzbMATHCrossRefGoogle Scholar
  20. Newey, W. K., & McFadden, D. L. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. L. McFadden (Eds.), Handbook of econometrics (Vol. IV, pp. 2111–2245). Amsterdam: Elsevier.Google Scholar
  21. Oakes, D. (1999). Direct calculation of the information matrix via the EM algorithm. Journal of the Royal Statistical Society, Series B, 61(2), 479–482.MathSciNetzbMATHCrossRefGoogle Scholar
  22. Pagan, A., & Ullah, A. (1999). Nonparametric econometrics. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  23. Ridder, G. (1990). Attrition in multi-wave panel data. In J. Hartog, G. Ridder, & J. Theeuwes (Eds.), Panel data and labor market studies (pp. 45–67). Amsterdam: Elsevier.Google Scholar
  24. Robins, J. M., Rotnitzky, A., & Zhao, L. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429), 106–121.MathSciNetzbMATHCrossRefGoogle Scholar
  25. Robins, J. M., & Wang, N. (2000). Inference for imputation estimators. Biometrika, 87(1), 113–124.MathSciNetzbMATHGoogle Scholar
  26. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.zbMATHCrossRefGoogle Scholar
  27. Rubin, D. B. (1996). Multiple imputation after 18+  years. Journal of the American Statistical Association, 91(434), 473–489.zbMATHCrossRefGoogle Scholar
  28. Särndal, C.-E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. New York, NY: Springer.zbMATHCrossRefGoogle Scholar
  29. Schafer, J. L. (2003). Multiple imputation in multivariate problems when the imputation and analysis models differ. Statistica Neerlandica, 57(1), 19–35.MathSciNetCrossRefGoogle Scholar
  30. Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York, NY: Wiley.zbMATHCrossRefGoogle Scholar
  31. Spanos, A. (1995). On normality and the linear regression model. Econometric Reviews, 14(2), 195–203.MathSciNetzbMATHCrossRefGoogle Scholar
  32. Spiess, M., & Kroh, M. (2010). A selection model for panel data: The prospects of green party support. Political Analysis, 18(2), 172–188.CrossRefGoogle Scholar
  33. Tsiatis, A. A. (2006). Semiparametric theory and missing data. New York, NY: Springer.zbMATHGoogle Scholar
  34. Vella, F. (1998). Estimating models with sample selection bias: A survey. The Journal of Human Resources, 33(1), 127–169.CrossRefGoogle Scholar
  35. Wang, N., & Robins, J. M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85(4), 935–948.MathSciNetzbMATHCrossRefGoogle Scholar
  36. Wooldridge, J. M. (2002a). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  37. Wooldridge, J. M. (2002b). Inverse probability weighted M-estimators for sample selection, attrition and stratification. Portuguese Economic Journal, 1(2), 117–139.CrossRefGoogle Scholar
  38. Wooldridge, J. M. (2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141(2), 1281–1301.MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Kristian Kleinke
    • 1
  • Jost Reinecke
    • 2
  • Daniel Salfrán
    • 3
  • Martin Spiess
    • 3
  1. 1.Department of Education Studies and PsychologyUniversity of SiegenSiegenGermany
  2. 2.Faculty of SociologyUniversity of BielefeldBielefeldGermany
  3. 3.University of HamburgInstitute of PsychologyHamburgGermany

Personalised recommendations