Skip to main content

Generating Useful Test Data for Complex Linked Employer-Employee Datasets

  • Conference paper
Privacy in Statistical Databases (PSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7556))

Included in the following conference series:

  • 908 Accesses

Abstract

When data access for external researchers is difficult or time consuming it can be beneficial if test datasets that mimic the structure of the original data are disseminated in advance. With these test data researchers can develop their analysis code or can decide whether the data are suitable for their planned research before they go through the lengthly process of getting access at the research data center. The aim of these data is not to provide any meaningful results. Instead it is important to maintain the structure of the data as closely as possible including skip patterns, logical constraints between the variables, and longitudinal relationships so that any code that is developed using these test data will also run on the original data without further modifications. Achieving this goal can be challenging for complex datasets such as linked employer-employee datasets (LEED) where the links between the establishments and the employees also need to be maintained. Using the LEED of the Institute for Employment Research we illustrate how useful test data can be developed for such complex datasets. Our approach mainly relies on traditional statistical disclosure control (SDC) techniques such as data swapping and noise addition for data protection. Since statistical inferences need not be preserved, high swapping rates can be applied to sufficiently protect the data. At the same time it is straightforward to maintain the structure of the data by adding some constraints on the swapping procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dalenius, T., Reiss, S.P.: Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  2. Drechsler, J., Reiter, J.P.: Sampling with Synthesis: A New Approach for Releasing Public Use Census Microdata. Journal of the American Statistical Association 105, 1347–1357 (2010)

    Article  MathSciNet  Google Scholar 

  3. Fischer, G., Janik, F., Müller, D., Schmucker, A.: The IAB Establishment Panel – things users should know. Schmollers Jahrbuch - Journal of Applied Social Science Studies 129, 133–148 (2009)

    Google Scholar 

  4. Jacobebinghaus, P., Müller, D., Orban, A.: How to use data swapping to create useful dummy data for panel datasets. Tech. rep., FDZ-Methodenreport, No. 3 (2010)

    Google Scholar 

  5. Jacobebinghaus, P., Seth, S.: Linked-Employer-Employee-Daten des IAB: LIAB-Querschnittmodell 2, 1993-2008. Tech. rep., FDZ-Datenreport, No. 5 (2010)

    Google Scholar 

  6. Kölling, A.: The IAB-Establishment Panel. Journal of Applied Social Science Studies 120, 291–300 (2000)

    Google Scholar 

  7. Winkler, W.E.: Examples of Easy-to-implement, Widely Used Methods of Masking for which Analytic Properties are not Justified. Tech. rep., Statistical Research Division, U.S. Bureau of the Census, Washington, DC (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dorner, M., Drechsler, J., Jacobebbinghaus, P. (2012). Generating Useful Test Data for Complex Linked Employer-Employee Datasets. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33627-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33626-3

  • Online ISBN: 978-3-642-33627-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics