Abstract
There has been little discussion in the literature on how many multiply imputed datasets an agency should release. From the perspective of the secondary data analyst, a large number of datasets is desirable. The additional variance introduced by the imputation decreases with the number of released datasets. For example, Reiter (2003) finds nearly a 100% increase in the variance of regression coefficients when going from 50 to two partially synthetic datasets. From the perspective of the agency, a small number of datasets is desirable. The information available to illintentioned users seeking to identify individuals in the released datasets increases with the number of released datasets. Thus, agencies considering the release of partially synthetic data generally are confronted with a trade-off between disclosure risk and data utility.
Most of this chapter is taken from Drechsler and Reiter (2009) and Reiter and Drechsler (2010).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Drechsler, J. (2011). A Two-Stage Imputation Procedure to Balance the Risk–Utility Trade-Off. In: Synthetic Datasets for Statistical Disclosure Control. Lecture Notes in Statistics(), vol 201. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0326-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0326-5_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0325-8
Online ISBN: 978-1-4614-0326-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)