On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker Perspective

  • Nicolas RuizEmail author
  • Krishnamurty Muralidhar
  • Josep Domingo-Ferrer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11126)


Generating synthetic data for the dissemination of individual information in a privacy-preserving way is an approach that is often presented as superior to other statistical disclosure control techniques. The reason for such claim is straightforward at first glance: since all records disseminated are synthetic and not actual observed values, no individual can reasonably claim to face a privacy threat. Thus, and if the synthesizer used is good enough, synthetic data will potentially always offer a high level of information with low disclosure risk attached. Building on recent advances in the literature regarding the conceptualization of an intruder, this paper aims at challenging this claim by reassessing the privacy guarantees of synthetic data. Using the concept of a maximum-knowledge intruder, we demonstrate that synthetic data can in fact be always expressed as a re-arrangement of the original data and that, as a result, they may lead to configurations where disclosure risk may be higher than for non-synthetic disclosure control approaches. We illustrate the application of these results by an empirical example.


Statistical disclosure control Synthetic data Maximum-knowledge attacker 


Acknowledgments and Disclaimer

The following funding sources are gratefully acknowledged by the third author: European Commission (project H2020-700540 “CANVAS”), Government of Catalonia (ICREA Acadèmia Prize) and Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis” and TIN2015-70054-REDC). The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or any of the funders.


  1. 1.
    Domingo-Ferrer, J., Muralidhar, K.: New directions in anonymization: permutation paradigm, verifiability by subjects and intruders, transparency to users. Inf. Sci. 337, 11–24 (2016)CrossRefGoogle Scholar
  2. 2.
    Domingo-Ferrer, J., Ricci, S., Soria-Comas, J.: Disclosure risk assessment via record linkage by a maximum-knowledge attacker. In: 13th Annual International Conference on Privacy, Security and Trust-PST 2015, Izmir, Turkey, September 2015Google Scholar
  3. 3.
    Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)CrossRefGoogle Scholar
  4. 4.
    Drechsler, J.: Synthetic Datasets for Statistical Disclosure Control. Springer, New York (2011). Scholar
  5. 5.
    Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB establishment panel. Trans. Data Priv. 1, 105–130 (2008)MathSciNetGoogle Scholar
  6. 6.
    Hu, J., Reiter, J.P., Wang, Q.: Disclosure risk evaluation for fully synthetic categorical data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 185–199. Springer, Cham (2014). Scholar
  7. 7.
    Hundepool, A., et al.: Statistical Disclosure Control. Wiley, Hoboken (2012)CrossRefGoogle Scholar
  8. 8.
    Muralidhar, K., Domingo-Ferrer, J.: Rank-based record linkage for re-identification risk assessment. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds.) PSD 2016. LNCS, vol. 9867, pp. 225–236. Springer, Cham (2016). Scholar
  9. 9.
    Muralidhar, K., Domingo-Ferrer, J.: Microdata masking as permutation. In: UNECE/EUROSTAT Work Session on Statistical Data Confidentiality, Helsinki, Finland, October 2015Google Scholar
  10. 10.
    Muralidhar, K., Sarathy, R.: A comparison of multiple imputation and data perturbation for masking numerical variables. J. Off. Stat. 22, 507–524 (2006)Google Scholar
  11. 11.
    Muralidhar, K., Sarathy, R., Domingo-Ferrer, J.: Reverse mapping to preserve the marginal distributions of attributes in masked microdata. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 105–116. Springer, Cham (2014). Scholar
  12. 12.
    Reiter, J.P., Wang, Q., Zhang, B.: Bayesian estimation of disclosure risks in multiply imputed, synthetic data. J. Priv. Confid. 6(1), 17–33 (2014). Article no. 2Google Scholar
  13. 13.
    Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)Google Scholar
  14. 14.
    Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. Roy. Stat. Soc. Ser. A 168, 185–205 (2005)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Rubin, D.B.: Discussion: statistical disclosure control limitation. J. Off. Stat. 9, 462–468 (1993)Google Scholar
  16. 16.
    Ruiz, N.: On some consequences of the permutation paradigm for data anonymization: centrality of permutation matrices, universal measures of disclosure risk and information loss, evaluation by dominance. Inf. Sci. 430–431, 620–633 (2018)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ruiz, N.: A general cipher for individual data anonymization. Inf. Sci. (2017, under review). (
  18. 18.
    Soria-Comas, J., Domingo-Ferrer, J.: A non-parametric model for accurate and provably private synthetic data sets. In: Proceedings of International Conference on Availability, Reliability and Security-ARES 2017, Article no. 3. ACM (2017)Google Scholar
  19. 19.
    Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Nicolas Ruiz
    • 1
    Email author
  • Krishnamurty Muralidhar
    • 2
  • Josep Domingo-Ferrer
    • 1
  1. 1.UNESCO Chair in Data Privacy, Department of Computer Science and Mathematics, CYBERCAT-Center for Cybersecurity Research of CataloniaUniversitat Rovira i VirgiliTarragonaSpain
  2. 2.Department of Marketing and Supply Chain Management, Price College of BusinessUniversity of OklahomaNormanUSA

Personalised recommendations