Journal of Statistical Theory and Practice

, Volume 5, Issue 2, pp 335–347

# A Comparison of Posterior Simulation and Inference by Combining Rules for Multiple Imputation

• Yajuan Si
• Jerome P. Reiter
Article

## Abstract

Multiple imputation is a common approach for handling missing data. It is also used by government agencies to protect confidential information in public use data files. One reason for the popularity of multiple imputation approaches is ease of use: analysts make inferences by combining point and variance estimates with simple rules. These combining rules are based on method of moments approximations to full Bayesian inference. With modern computing, however, it is as easy to perform the full Bayesian inference as it is to combine point and variance estimates. This begs the question: is there any advantage of using full Bayesian inference over multiple imputation combining rules? We use simulation studies to investigate this question. We find that, in general, the full Bayesian inference is not preferable to using the combining rules in multiple imputation for missing data. The full Bayesian inference can have advantages over the combining rules when using multiple imputation to protect confidential information.

## Key-words

Bayesian Confidentiality Missing Synthetic

62D99 62F15

## References

1. Abowd, J., Stinson, M., Benedetto, G., 2006. Final report to the Social Security Administration on the SIPP/SSA/IRS public use file project. Tech. rept. U.S. Census Bureau Longitudinal Employer-Household Dynamics Program. Available at http://www.bls.census.gov/sipp/synth_data.htmlGoogle Scholar
2. Barnard, J., Meng, X., 1999. Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17–36.
3. Barnard, J., Rubin, D.B., 1999. Small-sample degrees of freedom with multiple-imputation. Biometrika, 86, 948–955.
4. Drechsler, J., Bender, S., Rässler, S., 2008a. Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Transactions on Data Privacy, 1, 105–130.
5. Drechsler, J., Dundler, A., Bender, S., Rässler, S., Zwick, T., 2008b. A new approach for disclosure control in the IAB Establishment Panel–Multiple imputation for a better data access. Advances in Statistical Analysis, 92, 439–458.
6. Fienberg, S.E., 1994. A radical proposal for the provision of micro-data samples and the preservation of confidentiality. Tech. rept. Department of Statistics, Carnegie-Mellon University.Google Scholar
7. Graham, P., Penny, R., 2005. Multiply imputed synthetic data files. Tech. rept. University of Otago, http://www.uoc.otago.ac.nz/departments/pubhealth/pgrahpub.htm.Google Scholar
8. Graham, P., Young, J., Penny, R., 2009. Multiply imputed synthetic data: Evaluation of hierarchical Bayesian imputation models. Journal of Official Statistics, 25, 245–268.Google Scholar
9. Hawala, S., 2008. Producing partially synthetic data to avoid disclosure. In Proceedings of the Joint Statistical Meetings. American Statistical Association, Alexandria, VA.Google Scholar
10. Heitjan, D.F., Little, R.J.A., 1991. Multiple imputation for the Fatal Accident Reporting System. Applied Statistics, 40, 13–29.
11. Kennickell, A.B., 1997. Multiple imputation and disclosure protection: The case of the 1995 Survey of Consumer Finances. In Record Linkage Techniques, 1997, Alvey, W. and Jamerson, B. (Editors), National Academy Press, Washington, D.C., pp. 248–267.Google Scholar
12. Kinney, S.K., Reiter, J.P., 2007. Making public use, synthetic files of the Longitudinal Business Database. In Proceedings of the Joint Statistical Meetings, American Statistical Association, Alexandria, VA.Google Scholar
13. Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics, 9, 407–426.Google Scholar
14. Meng, Xiao-Li., 1994. Multiple-imputation inferences with uncongenial sources of input (Disc: P558–573). Statistical Science, 9, 538–558.
15. Raghunathan, T.E., Paulin, G.S., 1998. Multiple imputation of income in the Consumer Expenditure Survey: Evaluation of statistical inference. Proceedings of the Section on Business and Economic Statistics of the American Statistical Association, pp. 1–10.Google Scholar
16. Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P., 2001. A multivariate technique for multiply imputing missing values using a series of regression models. Survey Methodology, 27, 85–96.Google Scholar
17. Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19, 1–16.Google Scholar
18. Reiter, J.P., 2002. Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18, 531–544.Google Scholar
19. Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29, 181–189.Google Scholar
20. Reiter, J.P., 2004. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology, 30, 235–242.Google Scholar
21. Reiter, J.P., 2005a. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, 168, 185–205.
22. Reiter, J.P., 2005b. Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. Journal of Statistical Planning and Inference, 131, 365–377.
23. Reiter, J.P., 2008. Selecting the number of imputed datasets when using multiple imputation for missing data and disclosure limitation. Statistics and Probability Letters, 78, 15–20.
24. Reiter, J.P., Raghunathan, T.E., 2007. The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102, 1462–1471.
25. Rubin, D.B., 1981. The Bayesian bootstrap. The Annals of Statistics, 9, 130–134.
26. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys, John Wiley Sons, New York.
27. Rubin, D.B., 1993. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9, 462–468.Google Scholar
28. Rubin, D.B., 1996. Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.
29. Schafer, J.L., Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., 1998. The NHANES III multiple imputation project. Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 28–37.Google Scholar
30. Schenker, N., Raghunathan, T.E., Chiu, P., Makuc, D.M., Zhang, G., Cohen, A.J., 2006. Multiple imputation of missing income data in the National Health Interview Survey. Journal of the American Statistical Association, 101, 924–933.
31. Zhou, X., Reiter, J.P., 2010. A note on Bayesian inference after multiple imputation. The American Statistician, 64, 159–163.

## Authors and Affiliations

1. 1.Department of Statistical ScienceDuke UniversityDurhamUSA