Abstract
In general, just suppressing identifiers from released microdata is insufficient for privacy protection. It has been shown that the risk of re-identification increases with the dimensionality of the released records. Hence, sound anonymization procedures are needed to anonymize high-dimensional records. Unfortunately, most privacy models yield very poor utility if enforced on data sets with many attributes. In this paper, we propose a method based on principal component analysis (PCA) to mitigate the curse of dimensionality in anonymization. Our aim is to reduce dimensionality without incurring large utility losses. We instantiate our approach with anonymization based on differential privacy. Empirical work shows that using differential privacy on the PCA-transformed and dimensionality-reduced data set yields less information loss than directly using differential privacy on the original data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barbaro, M., Zeller, T.: A face is exposed for AOL searcher no. 4417749. New York Times, 9 Aug 2006
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for the protection of numerical microdata. Deliverable of the CASC project (IST-2000-25069) (2002)
Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res. 14(1), 2905–2943 (2013)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Adv. Neural Inf. Process. Syst.-NIPS 2012, 2339–2347 (2012)
Hundepool, A., et al.: Statistical Disclosure Control. Wiley, New Jersey (2012)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: IEEE 24th International Conference on Data Engineering-ICDE 2008, pp. 277–286. IEEE (2008)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science- FOCS 2007, pp. 94–103. IEEE Computer Society (2007)
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on Security and Privacy, pp. 111–125. IEEE (2008)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: 39th Annual ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)
Sánchez, D., Domingo-Ferrer, J., Martínez, S., Soria-Comas, J.: Utility-preserving differentially private data releases via individual ranking microaggregation. Inf. Fusion 30, 1–14 (2016)
Snoke, J., Slavković, A.: pMSE mechanism: differentially private synthetic data with maximal distributional similarity. In: Domingo-Ferrer, J., Montes, F. (eds.) PSD 2018. LNCS, vol. 11126, pp. 138–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99771-1_10
Solon, O.: ‘Data is a fingerprint’: why you aren’t as anonymous as you think online. The Guardian, 13 Jul 2018
Soria-Comas, J., Domingo-Ferrer, J.: Optimal data-independent noise for differential privacy. Inf. Sci. 250, 200–214 (2013)
Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data publishing via optimal univariate microaggregation and record perturbation. Knowl.-Based Syst. 125, 13–23 (2018)
Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data sets based on microaggregation and record perturbation. In: Torra, V., Narukawa, Y., Honda, A., Inoue, S. (eds.) MDAI 2017. LNCS (LNAI), vol. 10571, pp. 119–131. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67422-3_11
Sweeney, L.: Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3, Pittsburgh (2000)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via bayesian networks. ACM Trans. Database Syst. 42(4), 25 (2017)
Acknowledgment and Disclaimer
We thank Fadi Hassan for help with the empirical work. Partial support to this work has been received from the European Commission (project H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705), and from the Spanish Government (project RTI2018-095094-B-C21 “Consent”). The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Soria-Comas, J., Domingo-Ferrer, J. (2019). Mitigating the Curse of Dimensionality in Data Anonymization. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2019. Lecture Notes in Computer Science(), vol 11676. Springer, Cham. https://doi.org/10.1007/978-3-030-26773-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-26773-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26772-8
Online ISBN: 978-3-030-26773-5
eBook Packages: Computer ScienceComputer Science (R0)