Controlled Shuffling, Statistical Confidentiality and Microdata Utility: A Successful Experiment with a 10% Household Sample of the 2011 Population Census of Ireland for the IPUMS-International Database

McCaa, Robert; Muralidhar, Krishnamurty; Sarathy, Rathindra; Comerford, Michael; Esteve-Palos, Albert

doi:10.1007/978-3-319-11257-2_25

Robert McCaa¹⁶,
Krishnamurty Muralidhar¹⁶,
Rathindra Sarathy¹⁶,
Michael Comerford¹⁶ &
…
Albert Esteve-Palos¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8744))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

1400 Accesses

Abstract

IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cleveland, L., McCaa, R., Ruggles, S., Sobek, M.: When Excessive Perturbation Goes Wrong and Why IPUMS-International Relies Instead on Sampling, Suppression, Swapping, and Other Minimally Harmful Methods to Protect Privacy of Census Microdata. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 179–187. Springer, Heidelberg (2012)
Chapter Google Scholar
Elliot, M., Lomax, S., Mackey, E., Purdam, K.: Data Environment Analysis and the Key Variable Mapping System. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 138–147. Springer, Heidelberg (2010), http://www.springerlink.com/index/6KL805434G016U15.pdf (July 13, 2012)
Chapter Google Scholar
Elliot, M., Dale, A.: Scenarios of attack: the data intruder’s perspective on statistical disclosure risk. Netherlands Official Statistics 14, 6–10 (1999)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A critique of k-anonymity and some of its enhancements. In: Third International Conference on Availability, Reliability and Security, ARES 2008, pp. 990–993 (2008), http://ieeexplore.ieee.org/xpls/abs_all.jsp?rnumber=4529451 (accessed July 14, 2012)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley Series in Survey Methodology. John Wiley & Sons, London (2012)
Book Google Scholar
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10, 557–570 (2001)
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Muralidhar, K., Ruffian-Torrell, G.: Anonymization Methods for Taxonomic Microdata. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 90–102. Springer, Heidelberg (2012)
Chapter Google Scholar
World Health Organization. International Classification of Diseases. Geneva, 9th Revision, Clinical Modification, 6th edn. (2008), http://icd9cm.chrisendres.com/
Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Article MATH MathSciNet Google Scholar
Muralidhar, K., Sarathy, R.: Data Shuffling-A New Masking Approach for Numerical Data. Management Science 52(5), 658–670 (2006)
Article Google Scholar
Muralidhar, K., Sarathy, R., Dandekar, R.: Why Swap when you can Shuffle? A Comparison of the Proximity Swap and the Data Shuffle for Numeric Data. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 164–176. Springer, Heidelberg (2006)
Chapter Google Scholar
Raftery, A.E.: Choosing models for cross-classifications. American Sociological Review 51(1), 145–146 (1986)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Robert McCaa, Minnesota Population Center, 50 Willey Hall, Minneapolis, MN, 55455, USA
Robert McCaa, Krishnamurty Muralidhar, Rathindra Sarathy, Michael Comerford & Albert Esteve-Palos

Authors

Robert McCaa
View author publications
You can also search for this author in PubMed Google Scholar
Krishnamurty Muralidhar
View author publications
You can also search for this author in PubMed Google Scholar
Rathindra Sarathy
View author publications
You can also search for this author in PubMed Google Scholar
Michael Comerford
View author publications
You can also search for this author in PubMed Google Scholar
Albert Esteve-Palos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering and Mathematics, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia
Josep Domingo-Ferrer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCaa, R., Muralidhar, K., Sarathy, R., Comerford, M., Esteve-Palos, A. (2014). Controlled Shuffling, Statistical Confidentiality and Microdata Utility: A Successful Experiment with a 10% Household Sample of the 2011 Population Census of Ireland for the IPUMS-International Database. In: Domingo-Ferrer, J. (eds) Privacy in Statistical Databases. PSD 2014. Lecture Notes in Computer Science, vol 8744. Springer, Cham. https://doi.org/10.1007/978-3-319-11257-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-11257-2_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11256-5
Online ISBN: 978-3-319-11257-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics