Skip to main content

Hot Deck Methods for Imputing Missing Data

The Effects of Limiting Donor Usage

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Abstract

Missing data methods, within the data mining context, are limited in computational complexity due to large data amounts. Amongst the computationally simple yet effective imputation methods are the hot deck procedures. Hot deck methods impute missing values within a data matrix by using available values from the same matrix. The object, from which these available values are taken for imputation within another, is called the donor. The replication of values leads to the problem, that a single donor might be selected to accommodate multiple recipients. The inherent risk posed by this is that too many, or even all, missing values may be imputed with the values from a single donor. To mitigate this risk, some hot deck variants limit the amount of times any one donor may be selected for donating its values. This inevitably leads to the question under which conditions such a limitation is sensible. This study aims to answer this question though an extensive simulation. The results show rather clear differences between imputations by hot deck methods in which the donor limit was varied. In addition to these differences, influencing factors are identified that determine whether or not a donor limit is sensible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pearson, R.: Mining Imperfect Data. Society for Industrial and Applied Mathematics, Philadelphia (2005)

    Book  MATH  Google Scholar 

  2. Rubin, D.B.: Inference and Missing Data (with discussion). Biometrika 63, 581–592 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  3. Kim, J.O., Curry, J.: The Treatment of Missing Data in Multivariate Analysis. Sociological Methods and Research 6, 215–240 (1977)

    Article  Google Scholar 

  4. Allison, P.D.: Missing Data, Sage University Papers Series on Quantitative Applications in the Social Sciences, Thousand Oaks (2001)

    Google Scholar 

  5. Bankhofer, U.: Unvollständige Daten- und Distanzmatrizen in der Multivariaten Datenanalyse, Eul, Bergisch Gladbach (1995)

    Google Scholar 

  6. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)

    MATH  Google Scholar 

  7. Kalton, G., Kasprzyk, D.: Imputing for Missing Survey Responses. In: Proceedings of the Section on Survey Research Methods, pp. 22–31. American Statistical Association (1982)

    Google Scholar 

  8. Marker, D.A., Judkins, D.R., Winglee, M.: Large-Scale Imputation for Complex Surveys. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (eds.) Survey Nonresponse, pp. 329–341. John Wiley & Sons, New York (2001)

    Google Scholar 

  9. Ford, B.: An Overview of Hot Deck Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 2, pp. 185–207. Academic Press (1983)

    Google Scholar 

  10. Kalton, G., Kish, L.: Two Efficient Random Imputation Procedures. In: Proceedings of the Survey Research Methods Section 1981, pp. 146–151 (1981)

    Google Scholar 

  11. Sande, I.: Hot Deck Imputation Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 3, pp. 339–349. Academic Press (1983)

    Google Scholar 

  12. Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Transactions on Software Engineering 27, 890–908 (2001)

    Article  Google Scholar 

  13. Andridge, R.R., Little, R.J.A.: A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review 78(1), 40–64 (2010)

    Article  Google Scholar 

  14. Barzi, F., Woodward, M.: Imputations of Missing Values in Practice: Results from Imputations of Serum Cholesterol in 28 Cohort Studies. American Journal of Epidemiology 160, 34–45 (2004)

    Article  Google Scholar 

  15. Roth, P.L., Switzer III, F.S.: A Monte Carlo Analysis of Missing Data Techniques in a HRM Setting. Journal of Management 21, 1003–1023 (1995)

    Article  Google Scholar 

  16. Yenduri, S., Iyengar, S.S.: Performance Evaluation of Imputation Methods for Incomplete Datasets. International Journal of Software Engineering and Knowledge Engineering 17, 127–152 (2007)

    Article  Google Scholar 

  17. Kaiser, J.: The Effectiveness of Hot Deck Procedures in Small Samples. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 523–528 (1983)

    Google Scholar 

  18. Kalton, G.: Compensating for Missing Survey Data. Institute for Social Research, University of Michigan, Ann Arbor (1983)

    Google Scholar 

  19. Brick, J.M., Kalton, G., Kim, J.K.: Variance Estimation with Hot Deck Imputation Using a Model. Survey Methodology 30, 57–66 (2004)

    Google Scholar 

  20. Brick, J.M., Kalton, G.: Handling Missing Data in Survey Research. Statistical Methods in Medical Research 5, 215–238 (1996)

    Article  Google Scholar 

  21. Kalton, G., Kasprzyk, D.: The Treatment of Missing Survey Data. Survey Methodology 12, 1–16 (1986)

    Google Scholar 

  22. Roth, P.L.: Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques. Organizational Research Methods 2, 211–232 (1999)

    Article  Google Scholar 

  23. Nordholt, E.S.: Imputation: methods, simulation experiments and practical examples. International Statistical Review 66, 157–180 (1998)

    Article  MATH  Google Scholar 

  24. Cohen, J.: A Power Primer. Quantitative Methods in Psychology 112, 155–159 (1992)

    Google Scholar 

  25. Borz, J., Döring, N.: Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. Springer, Berlin (2009)

    Google Scholar 

  26. Fröhlich, M., Pieter, A.: Cohen’s Effektstärken als Mass der Bewertung von praktischer Relevanz – Implikationen für die Praxis. Schweizerische Zeitschrift für Sportmedizin und Sporttraumatologie 57(4), 139–142 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joenssen, D.W., Bankhofer, U. (2012). Hot Deck Methods for Imputing Missing Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics