Abstract
In 1993, the Journal of Official Statistics published a special issue on data confidentiality. Two articles in this volume laid the foundation for the development of multiply imputed synthetic datasets (MISDs). In his discussion “Statistical Disclosure Limitation,” Rubin (1993) for the first time suggested generating synthetic datasets based on his ideas of multiple imputation for missing values (Rubin, 1987). He proposed to treat all the observations from the sampling frame that are not part of the sample as missing data and to impute them according to the multiple imputation framework. Afterwards, simple random samples from these fully imputed datasets should be released to the public. Because the released dataset does not contain any real data, disclosure of sensitive information is very difficult. On the other hand, if the imputation models are selected carefully and the predictive power of the models is high, most of the information contained in the original data will be preserved. This approach is now called generating fully synthetic datasets in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Drechsler, J. (2011). Background on Multiply Imputed Synthetic Datasets. In: Synthetic Datasets for Statistical Disclosure Control. Lecture Notes in Statistics(), vol 201. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0326-5_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0326-5_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0325-8
Online ISBN: 978-1-4614-0326-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)