Skip to main content

Mitigating the Curse of Dimensionality in Data Anonymization

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11676))

Abstract

In general, just suppressing identifiers from released microdata is insufficient for privacy protection. It has been shown that the risk of re-identification increases with the dimensionality of the released records. Hence, sound anonymization procedures are needed to anonymize high-dimensional records. Unfortunately, most privacy models yield very poor utility if enforced on data sets with many attributes. In this paper, we propose a method based on principal component analysis (PCA) to mitigate the curse of dimensionality in anonymization. Our aim is to reduce dimensionality without incurring large utility losses. We instantiate our approach with anonymization based on differential privacy. Empirical work shows that using differential privacy on the PCA-transformed and dimensionality-reduced data set yields less information loss than directly using differential privacy on the original data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barbaro, M., Zeller, T.: A face is exposed for AOL searcher no. 4417749. New York Times, 9 Aug 2006

    Google Scholar 

  2. Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for the protection of numerical microdata. Deliverable of the CASC project (IST-2000-25069) (2002)

    Google Scholar 

  3. Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res. 14(1), 2905–2943 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  6. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  7. Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Adv. Neural Inf. Process. Syst.-NIPS 2012, 2339–2347 (2012)

    Google Scholar 

  8. Hundepool, A., et al.: Statistical Disclosure Control. Wiley, New Jersey (2012)

    Book  Google Scholar 

  9. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: IEEE 24th International Conference on Data Engineering-ICDE 2008, pp. 277–286. IEEE (2008)

    Google Scholar 

  10. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science- FOCS 2007, pp. 94–103. IEEE Computer Society (2007)

    Google Scholar 

  11. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on Security and Privacy, pp. 111–125. IEEE (2008)

    Google Scholar 

  12. Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: 39th Annual ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)

    Google Scholar 

  13. Sánchez, D., Domingo-Ferrer, J., Martínez, S., Soria-Comas, J.: Utility-preserving differentially private data releases via individual ranking microaggregation. Inf. Fusion 30, 1–14 (2016)

    Article  Google Scholar 

  14. Snoke, J., Slavković, A.: pMSE mechanism: differentially private synthetic data with maximal distributional similarity. In: Domingo-Ferrer, J., Montes, F. (eds.) PSD 2018. LNCS, vol. 11126, pp. 138–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99771-1_10

    Chapter  Google Scholar 

  15. Solon, O.: ‘Data is a fingerprint’: why you aren’t as anonymous as you think online. The Guardian, 13 Jul 2018

    Google Scholar 

  16. Soria-Comas, J., Domingo-Ferrer, J.: Optimal data-independent noise for differential privacy. Inf. Sci. 250, 200–214 (2013)

    Article  MathSciNet  Google Scholar 

  17. Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data publishing via optimal univariate microaggregation and record perturbation. Knowl.-Based Syst. 125, 13–23 (2018)

    Google Scholar 

  18. Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data sets based on microaggregation and record perturbation. In: Torra, V., Narukawa, Y., Honda, A., Inoue, S. (eds.) MDAI 2017. LNCS (LNAI), vol. 10571, pp. 119–131. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67422-3_11

    Chapter  Google Scholar 

  19. Sweeney, L.: Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3, Pittsburgh (2000)

    Google Scholar 

  20. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via bayesian networks. ACM Trans. Database Syst. 42(4), 25 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment and Disclaimer

We thank Fadi Hassan for help with the empirical work. Partial support to this work has been received from the European Commission (project H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705), and from the Spanish Government (project RTI2018-095094-B-C21 “Consent”). The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Domingo-Ferrer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Soria-Comas, J., Domingo-Ferrer, J. (2019). Mitigating the Curse of Dimensionality in Data Anonymization. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2019. Lecture Notes in Computer Science(), vol 11676. Springer, Cham. https://doi.org/10.1007/978-3-030-26773-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26773-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26772-8

  • Online ISBN: 978-3-030-26773-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics