Advertisement

Journal of Statistical Theory and Practice

, Volume 5, Issue 2, pp 335–347 | Cite as

A Comparison of Posterior Simulation and Inference by Combining Rules for Multiple Imputation

  • Yajuan Si
  • Jerome P. Reiter
Article

Abstract

Multiple imputation is a common approach for handling missing data. It is also used by government agencies to protect confidential information in public use data files. One reason for the popularity of multiple imputation approaches is ease of use: analysts make inferences by combining point and variance estimates with simple rules. These combining rules are based on method of moments approximations to full Bayesian inference. With modern computing, however, it is as easy to perform the full Bayesian inference as it is to combine point and variance estimates. This begs the question: is there any advantage of using full Bayesian inference over multiple imputation combining rules? We use simulation studies to investigate this question. We find that, in general, the full Bayesian inference is not preferable to using the combining rules in multiple imputation for missing data. The full Bayesian inference can have advantages over the combining rules when using multiple imputation to protect confidential information.

Key-words

Bayesian Confidentiality Missing Synthetic 

AMS Subject Classification

62D99 62F15 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abowd, J., Stinson, M., Benedetto, G., 2006. Final report to the Social Security Administration on the SIPP/SSA/IRS public use file project. Tech. rept. U.S. Census Bureau Longitudinal Employer-Household Dynamics Program. Available at http://www.bls.census.gov/sipp/synth_data.htmlGoogle Scholar
  2. Barnard, J., Meng, X., 1999. Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17–36.CrossRefGoogle Scholar
  3. Barnard, J., Rubin, D.B., 1999. Small-sample degrees of freedom with multiple-imputation. Biometrika, 86, 948–955.MathSciNetCrossRefGoogle Scholar
  4. Drechsler, J., Bender, S., Rässler, S., 2008a. Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Transactions on Data Privacy, 1, 105–130.MathSciNetGoogle Scholar
  5. Drechsler, J., Dundler, A., Bender, S., Rässler, S., Zwick, T., 2008b. A new approach for disclosure control in the IAB Establishment Panel–Multiple imputation for a better data access. Advances in Statistical Analysis, 92, 439–458.MathSciNetCrossRefGoogle Scholar
  6. Fienberg, S.E., 1994. A radical proposal for the provision of micro-data samples and the preservation of confidentiality. Tech. rept. Department of Statistics, Carnegie-Mellon University.Google Scholar
  7. Graham, P., Penny, R., 2005. Multiply imputed synthetic data files. Tech. rept. University of Otago, http://www.uoc.otago.ac.nz/departments/pubhealth/pgrahpub.htm.Google Scholar
  8. Graham, P., Young, J., Penny, R., 2009. Multiply imputed synthetic data: Evaluation of hierarchical Bayesian imputation models. Journal of Official Statistics, 25, 245–268.Google Scholar
  9. Hawala, S., 2008. Producing partially synthetic data to avoid disclosure. In Proceedings of the Joint Statistical Meetings. American Statistical Association, Alexandria, VA.Google Scholar
  10. Heitjan, D.F., Little, R.J.A., 1991. Multiple imputation for the Fatal Accident Reporting System. Applied Statistics, 40, 13–29.CrossRefGoogle Scholar
  11. Kennickell, A.B., 1997. Multiple imputation and disclosure protection: The case of the 1995 Survey of Consumer Finances. In Record Linkage Techniques, 1997, Alvey, W. and Jamerson, B. (Editors), National Academy Press, Washington, D.C., pp. 248–267.Google Scholar
  12. Kinney, S.K., Reiter, J.P., 2007. Making public use, synthetic files of the Longitudinal Business Database. In Proceedings of the Joint Statistical Meetings, American Statistical Association, Alexandria, VA.Google Scholar
  13. Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics, 9, 407–426.Google Scholar
  14. Meng, Xiao-Li., 1994. Multiple-imputation inferences with uncongenial sources of input (Disc: P558–573). Statistical Science, 9, 538–558.CrossRefGoogle Scholar
  15. Raghunathan, T.E., Paulin, G.S., 1998. Multiple imputation of income in the Consumer Expenditure Survey: Evaluation of statistical inference. Proceedings of the Section on Business and Economic Statistics of the American Statistical Association, pp. 1–10.Google Scholar
  16. Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P., 2001. A multivariate technique for multiply imputing missing values using a series of regression models. Survey Methodology, 27, 85–96.Google Scholar
  17. Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19, 1–16.Google Scholar
  18. Reiter, J.P., 2002. Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18, 531–544.Google Scholar
  19. Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29, 181–189.Google Scholar
  20. Reiter, J.P., 2004. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology, 30, 235–242.Google Scholar
  21. Reiter, J.P., 2005a. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, 168, 185–205.MathSciNetCrossRefGoogle Scholar
  22. Reiter, J.P., 2005b. Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. Journal of Statistical Planning and Inference, 131, 365–377.MathSciNetCrossRefGoogle Scholar
  23. Reiter, J.P., 2008. Selecting the number of imputed datasets when using multiple imputation for missing data and disclosure limitation. Statistics and Probability Letters, 78, 15–20.MathSciNetCrossRefGoogle Scholar
  24. Reiter, J.P., Raghunathan, T.E., 2007. The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102, 1462–1471.MathSciNetCrossRefGoogle Scholar
  25. Rubin, D.B., 1981. The Bayesian bootstrap. The Annals of Statistics, 9, 130–134.MathSciNetCrossRefGoogle Scholar
  26. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys, John Wiley Sons, New York.CrossRefGoogle Scholar
  27. Rubin, D.B., 1993. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9, 462–468.Google Scholar
  28. Rubin, D.B., 1996. Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.CrossRefGoogle Scholar
  29. Schafer, J.L., Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., 1998. The NHANES III multiple imputation project. Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 28–37.Google Scholar
  30. Schenker, N., Raghunathan, T.E., Chiu, P., Makuc, D.M., Zhang, G., Cohen, A.J., 2006. Multiple imputation of missing income data in the National Health Interview Survey. Journal of the American Statistical Association, 101, 924–933.MathSciNetCrossRefGoogle Scholar
  31. Zhou, X., Reiter, J.P., 2010. A note on Bayesian inference after multiple imputation. The American Statistician, 64, 159–163.MathSciNetCrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2011

Authors and Affiliations

  1. 1.Department of Statistical ScienceDuke UniversityDurhamUSA

Personalised recommendations