Mitigating the Curse of Dimensionality in Data Anonymization

Soria-Comas, Jordi; Domingo-Ferrer, Josep

doi:10.1007/978-3-030-26773-5_30

Jordi Soria-Comas¹² &
Josep Domingo-Ferrer¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11676))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

681 Accesses
2 Citations

Abstract

In general, just suppressing identifiers from released microdata is insufficient for privacy protection. It has been shown that the risk of re-identification increases with the dimensionality of the released records. Hence, sound anonymization procedures are needed to anonymize high-dimensional records. Unfortunately, most privacy models yield very poor utility if enforced on data sets with many attributes. In this paper, we propose a method based on principal component analysis (PCA) to mitigate the curse of dimensionality in anonymization. Our aim is to reduce dimensionality without incurring large utility losses. We instantiate our approach with anonymization based on differential privacy. Empirical work shows that using differential privacy on the PCA-transformed and dimensionality-reduced data set yields less information loss than directly using differential privacy on the original data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barbaro, M., Zeller, T.: A face is exposed for AOL searcher no. 4417749. New York Times, 9 Aug 2006
Google Scholar
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for the protection of numerical microdata. Deliverable of the CASC project (IST-2000-25069) (2002)
Google Scholar
Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res. 14(1), 2905–2943 (2013)
MathSciNet MATH Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Adv. Neural Inf. Process. Syst.-NIPS 2012, 2339–2347 (2012)
Google Scholar
Hundepool, A., et al.: Statistical Disclosure Control. Wiley, New Jersey (2012)
Book Google Scholar
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: IEEE 24th International Conference on Data Engineering-ICDE 2008, pp. 277–286. IEEE (2008)
Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science- FOCS 2007, pp. 94–103. IEEE Computer Society (2007)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on Security and Privacy, pp. 111–125. IEEE (2008)
Google Scholar
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: 39th Annual ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)
Google Scholar
Sánchez, D., Domingo-Ferrer, J., Martínez, S., Soria-Comas, J.: Utility-preserving differentially private data releases via individual ranking microaggregation. Inf. Fusion 30, 1–14 (2016)
Article Google Scholar
Snoke, J., Slavković, A.: pMSE mechanism: differentially private synthetic data with maximal distributional similarity. In: Domingo-Ferrer, J., Montes, F. (eds.) PSD 2018. LNCS, vol. 11126, pp. 138–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99771-1_10
Chapter Google Scholar
Solon, O.: ‘Data is a fingerprint’: why you aren’t as anonymous as you think online. The Guardian, 13 Jul 2018
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Optimal data-independent noise for differential privacy. Inf. Sci. 250, 200–214 (2013)
Article MathSciNet Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data publishing via optimal univariate microaggregation and record perturbation. Knowl.-Based Syst. 125, 13–23 (2018)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data sets based on microaggregation and record perturbation. In: Torra, V., Narukawa, Y., Honda, A., Inoue, S. (eds.) MDAI 2017. LNCS (LNAI), vol. 10571, pp. 119–131. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67422-3_11
Chapter Google Scholar
Sweeney, L.: Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3, Pittsburgh (2000)
Google Scholar
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via bayesian networks. ACM Trans. Database Syst. 42(4), 25 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgment and Disclaimer

We thank Fadi Hassan for help with the empirical work. Partial support to this work has been received from the European Commission (project H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705), and from the Spanish Government (project RTI2018-095094-B-C21 “Consent”). The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, CYBERCAT-Center for Cybersecurity Research of Catalonia, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia
Jordi Soria-Comas & Josep Domingo-Ferrer

Authors

Jordi Soria-Comas
View author publications
You can also search for this author in PubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josep Domingo-Ferrer .

Editor information

Editors and Affiliations

Maynooth University, Maynooth, Ireland
Vicenç Torra
Tamagawa University, Machida, Tokyo, Japan
Yasuo Narukawa
University of Milano-Bicocca, Milan, Italy
Gabriella Pasi
University of Milano-Bicocca, Milan, Italy
Marco Viviani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soria-Comas, J., Domingo-Ferrer, J. (2019). Mitigating the Curse of Dimensionality in Data Anonymization. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2019. Lecture Notes in Computer Science(), vol 11676. Springer, Cham. https://doi.org/10.1007/978-3-030-26773-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-26773-5_30
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26772-8
Online ISBN: 978-3-030-26773-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics