Hot Deck Methods for Imputing Missing Data

Joenssen, Dieter William; Bankhofer, Udo

doi:10.1007/978-3-642-31537-4_6

Hot Deck Methods for Imputing Missing Data

The Effects of Limiting Donor Usage

Dieter William Joenssen²⁰ &
Udo Bankhofer²⁰

Conference paper

6285 Accesses
19 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Abstract

Missing data methods, within the data mining context, are limited in computational complexity due to large data amounts. Amongst the computationally simple yet effective imputation methods are the hot deck procedures. Hot deck methods impute missing values within a data matrix by using available values from the same matrix. The object, from which these available values are taken for imputation within another, is called the donor. The replication of values leads to the problem, that a single donor might be selected to accommodate multiple recipients. The inherent risk posed by this is that too many, or even all, missing values may be imputed with the values from a single donor. To mitigate this risk, some hot deck variants limit the amount of times any one donor may be selected for donating its values. This inevitably leads to the question under which conditions such a limitation is sensible. This study aims to answer this question though an extensive simulation. The results show rather clear differences between imputations by hot deck methods in which the donor limit was varied. In addition to these differences, influencing factors are identified that determine whether or not a donor limit is sensible.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pearson, R.: Mining Imperfect Data. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Book MATH Google Scholar
Rubin, D.B.: Inference and Missing Data (with discussion). Biometrika 63, 581–592 (1976)
Article MathSciNet MATH Google Scholar
Kim, J.O., Curry, J.: The Treatment of Missing Data in Multivariate Analysis. Sociological Methods and Research 6, 215–240 (1977)
Article Google Scholar
Allison, P.D.: Missing Data, Sage University Papers Series on Quantitative Applications in the Social Sciences, Thousand Oaks (2001)
Google Scholar
Bankhofer, U.: Unvollständige Daten- und Distanzmatrizen in der Multivariaten Datenanalyse, Eul, Bergisch Gladbach (1995)
Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
MATH Google Scholar
Kalton, G., Kasprzyk, D.: Imputing for Missing Survey Responses. In: Proceedings of the Section on Survey Research Methods, pp. 22–31. American Statistical Association (1982)
Google Scholar
Marker, D.A., Judkins, D.R., Winglee, M.: Large-Scale Imputation for Complex Surveys. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (eds.) Survey Nonresponse, pp. 329–341. John Wiley & Sons, New York (2001)
Google Scholar
Ford, B.: An Overview of Hot Deck Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 2, pp. 185–207. Academic Press (1983)
Google Scholar
Kalton, G., Kish, L.: Two Efficient Random Imputation Procedures. In: Proceedings of the Survey Research Methods Section 1981, pp. 146–151 (1981)
Google Scholar
Sande, I.: Hot Deck Imputation Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 3, pp. 339–349. Academic Press (1983)
Google Scholar
Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Transactions on Software Engineering 27, 890–908 (2001)
Article Google Scholar
Andridge, R.R., Little, R.J.A.: A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review 78(1), 40–64 (2010)
Article Google Scholar
Barzi, F., Woodward, M.: Imputations of Missing Values in Practice: Results from Imputations of Serum Cholesterol in 28 Cohort Studies. American Journal of Epidemiology 160, 34–45 (2004)
Article Google Scholar
Roth, P.L., Switzer III, F.S.: A Monte Carlo Analysis of Missing Data Techniques in a HRM Setting. Journal of Management 21, 1003–1023 (1995)
Article Google Scholar
Yenduri, S., Iyengar, S.S.: Performance Evaluation of Imputation Methods for Incomplete Datasets. International Journal of Software Engineering and Knowledge Engineering 17, 127–152 (2007)
Article Google Scholar
Kaiser, J.: The Effectiveness of Hot Deck Procedures in Small Samples. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 523–528 (1983)
Google Scholar
Kalton, G.: Compensating for Missing Survey Data. Institute for Social Research, University of Michigan, Ann Arbor (1983)
Google Scholar
Brick, J.M., Kalton, G., Kim, J.K.: Variance Estimation with Hot Deck Imputation Using a Model. Survey Methodology 30, 57–66 (2004)
Google Scholar
Brick, J.M., Kalton, G.: Handling Missing Data in Survey Research. Statistical Methods in Medical Research 5, 215–238 (1996)
Article Google Scholar
Kalton, G., Kasprzyk, D.: The Treatment of Missing Survey Data. Survey Methodology 12, 1–16 (1986)
Google Scholar
Roth, P.L.: Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques. Organizational Research Methods 2, 211–232 (1999)
Article Google Scholar
Nordholt, E.S.: Imputation: methods, simulation experiments and practical examples. International Statistical Review 66, 157–180 (1998)
Article MATH Google Scholar
Cohen, J.: A Power Primer. Quantitative Methods in Psychology 112, 155–159 (1992)
Google Scholar
Borz, J., Döring, N.: Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. Springer, Berlin (2009)
Google Scholar
Fröhlich, M., Pieter, A.: Cohen’s Effektstärken als Mass der Bewertung von praktischer Relevanz – Implikationen für die Praxis. Schweizerische Zeitschrift für Sportmedizin und Sporttraumatologie 57(4), 139–142 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Fachgebiet für Quantitative Methoden, Technische Universität Ilmenau, Ilmenau, Germany
Dieter William Joenssen & Udo Bankhofer

Authors

Dieter William Joenssen
View author publications
You can also search for this author in PubMed Google Scholar
Udo Bankhofer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joenssen, D.W., Bankhofer, U. (2012). Hot Deck Methods for Imputing Missing Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-31537-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics