Skip to main content

Background on Multiply Imputed Synthetic Datasets

  • Chapter
  • First Online:
Synthetic Datasets for Statistical Disclosure Control

Part of the book series: Lecture Notes in Statistics ((LNS,volume 201))

Abstract

In 1993, the Journal of Official Statistics published a special issue on data confidentiality. Two articles in this volume laid the foundation for the development of multiply imputed synthetic datasets (MISDs). In his discussion “Statistical Disclosure Limitation,” Rubin (1993) for the first time suggested generating synthetic datasets based on his ideas of multiple imputation for missing values (Rubin, 1987). He proposed to treat all the observations from the sampling frame that are not part of the sample as missing data and to impute them according to the multiple imputation framework. Afterwards, simple random samples from these fully imputed datasets should be released to the public. Because the released dataset does not contain any real data, disclosure of sensitive information is very difficult. On the other hand, if the imputation models are selected carefully and the predictive power of the models is high, most of the information contained in the original data will be preserved. This approach is now called generating fully synthetic datasets in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jörg Drechsler .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Drechsler, J. (2011). Background on Multiply Imputed Synthetic Datasets. In: Synthetic Datasets for Statistical Disclosure Control. Lecture Notes in Statistics(), vol 201. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0326-5_2

Download citation

Publish with us

Policies and ethics