Abstract
Missing data methods, within the data mining context, are limited in computational complexity due to large data amounts. Amongst the computationally simple yet effective imputation methods are the hot deck procedures. Hot deck methods impute missing values within a data matrix by using available values from the same matrix. The object, from which these available values are taken for imputation within another, is called the donor. The replication of values leads to the problem, that a single donor might be selected to accommodate multiple recipients. The inherent risk posed by this is that too many, or even all, missing values may be imputed with the values from a single donor. To mitigate this risk, some hot deck variants limit the amount of times any one donor may be selected for donating its values. This inevitably leads to the question under which conditions such a limitation is sensible. This study aims to answer this question though an extensive simulation. The results show rather clear differences between imputations by hot deck methods in which the donor limit was varied. In addition to these differences, influencing factors are identified that determine whether or not a donor limit is sensible.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pearson, R.: Mining Imperfect Data. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Rubin, D.B.: Inference and Missing Data (with discussion). Biometrika 63, 581–592 (1976)
Kim, J.O., Curry, J.: The Treatment of Missing Data in Multivariate Analysis. Sociological Methods and Research 6, 215–240 (1977)
Allison, P.D.: Missing Data, Sage University Papers Series on Quantitative Applications in the Social Sciences, Thousand Oaks (2001)
Bankhofer, U.: Unvollständige Daten- und Distanzmatrizen in der Multivariaten Datenanalyse, Eul, Bergisch Gladbach (1995)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Kalton, G., Kasprzyk, D.: Imputing for Missing Survey Responses. In: Proceedings of the Section on Survey Research Methods, pp. 22–31. American Statistical Association (1982)
Marker, D.A., Judkins, D.R., Winglee, M.: Large-Scale Imputation for Complex Surveys. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (eds.) Survey Nonresponse, pp. 329–341. John Wiley & Sons, New York (2001)
Ford, B.: An Overview of Hot Deck Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 2, pp. 185–207. Academic Press (1983)
Kalton, G., Kish, L.: Two Efficient Random Imputation Procedures. In: Proceedings of the Survey Research Methods Section 1981, pp. 146–151 (1981)
Sande, I.: Hot Deck Imputation Procedures. In: Madow, W., Nisselson, H., Olkin, I. (eds.) Incomplete Data in Sample Surveys, Theory and Bibliographies, 3, pp. 339–349. Academic Press (1983)
Strike, K., Emam, K.E., Madhavji, N.: Software Cost Estimation with Incomplete Data. IEEE Transactions on Software Engineering 27, 890–908 (2001)
Andridge, R.R., Little, R.J.A.: A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review 78(1), 40–64 (2010)
Barzi, F., Woodward, M.: Imputations of Missing Values in Practice: Results from Imputations of Serum Cholesterol in 28 Cohort Studies. American Journal of Epidemiology 160, 34–45 (2004)
Roth, P.L., Switzer III, F.S.: A Monte Carlo Analysis of Missing Data Techniques in a HRM Setting. Journal of Management 21, 1003–1023 (1995)
Yenduri, S., Iyengar, S.S.: Performance Evaluation of Imputation Methods for Incomplete Datasets. International Journal of Software Engineering and Knowledge Engineering 17, 127–152 (2007)
Kaiser, J.: The Effectiveness of Hot Deck Procedures in Small Samples. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 523–528 (1983)
Kalton, G.: Compensating for Missing Survey Data. Institute for Social Research, University of Michigan, Ann Arbor (1983)
Brick, J.M., Kalton, G., Kim, J.K.: Variance Estimation with Hot Deck Imputation Using a Model. Survey Methodology 30, 57–66 (2004)
Brick, J.M., Kalton, G.: Handling Missing Data in Survey Research. Statistical Methods in Medical Research 5, 215–238 (1996)
Kalton, G., Kasprzyk, D.: The Treatment of Missing Survey Data. Survey Methodology 12, 1–16 (1986)
Roth, P.L.: Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques. Organizational Research Methods 2, 211–232 (1999)
Nordholt, E.S.: Imputation: methods, simulation experiments and practical examples. International Statistical Review 66, 157–180 (1998)
Cohen, J.: A Power Primer. Quantitative Methods in Psychology 112, 155–159 (1992)
Borz, J., Döring, N.: Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. Springer, Berlin (2009)
Fröhlich, M., Pieter, A.: Cohen’s Effektstärken als Mass der Bewertung von praktischer Relevanz – Implikationen für die Praxis. Schweizerische Zeitschrift für Sportmedizin und Sporttraumatologie 57(4), 139–142 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joenssen, D.W., Bankhofer, U. (2012). Hot Deck Methods for Imputing Missing Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)