Abstract
Anonymization or de-identification techniques are methods for protecting the privacy of human subjects in sensitive data sets while preserving the utility of those data sets. In the case of health data, anonymization techniques may be used to remove or mask patient identities while allowing the health data content to be used by the medical and pharmaceutical research community. The efficacy of anonymization methods has come under repeated attacks and several researchers have shown that anonymized data can be re-identified to reveal the identity of the data subjects via approaches such as “linking.” Nevertheless, even given these deficiencies, many government privacy policies depend on anonymization techniques as the primary approach to preserving privacy. In this report, we survey the anonymization landscape and consider the range of anonymization approaches that can be used to de-identify data containing personally identifiable information. We then review several notable government privacy policies that leverage anonymization. In particular, we review the European Union’s General Data Protection Regulation (GDPR) and show that it takes a more goal-oriented approach to data privacy. It defines data privacy in terms of desired outcome (i.e., as a defense against risk of personal data disclosure), and is agnostic to the actual method of privacy preservation. And GDPR goes further to frame its privacy preservation regulations relative to the state of the art, the cost of implementation, the incurred risks, and the context of data processing. This has potential implications for the GDPR’s robustness to future technological innovations – very much in contrast to privacy regulations that depend explicitly on more definite technical specifications.
Similar content being viewed by others
Notes
Or “natural persons”, the terms that GDPR uses for individuals.
The definition of what is sensitive depends on personal opinions and tastes, though many would agree that certain attributes would universally be considered sensitive.
The statement of this policy and links to archived PIAs can be found at: (http://www.census.gov/about/policies/privacy/pia.html). The PIAs also serve to record information-sharing partners (usually other federal agencies) and consent collection practices.
The Census data releases on languages spoken at home and English-speaking ability demonstrate this approach (https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html). These tables withhold state-level statistics on low-use languages like Welsh or Papia Mentae.
See, for example, the Census Bureau’s online data visualization map: http://onthemap.ces.census.gov/
Zarsky, T.Z., 2016. Incompatible: The GDPR in the Age of Big Data. Seton Hall L. Rev., 47, p.995.
Goodman, B. and Flaxman, S., 2016. European Union regulations on algorithmic decision-making and a” right to explanation”. arXiv preprint arXiv:1606.08813.
Article 29 Working Party, Opinion 05/2014 on Anonymization Techniques, WP216, http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion- recommendation/files/2014/wp216_en.pdf
edX is a provider of university-level massive open online courses, http://www.edx.org
References
Tanner, Adam, Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records, Beacon Press. 2017.
G Cormode, D Srivastava. Anonymized Data: Generation, Models, Usage. SIGMOD, Providence, Rhode Island. 2009.
Dalenius T. Finding a needle in a haystack: identifying anonymous census records. J Off Stat. 1986;2(3):329–36.
L Sweeney. Uniqueness of Simple Demographics in the U.S. Population , LIDAPWP4. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA. Forthcoming book entitled, The Identifiability of Data. 2000.
de Montjoye Y-A, Radaelli L, Singh VK. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science. 2015;347(6221):536–9.
Adam Tanner. Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study,” Forbes, April 25, http://www.forbes.com/sites/adamtanner/2013/04/25/harvard-professor-re-identifies-anonymous-volunteers-in-dna-study/print/. 2013.
Michael Barbaro and Tom Zeller. A Face Is Exposed for AOL Searcher No. 4417749, New York Times. 2006.
A. Narayanan, V. Shmatikov. Robust De-anonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP '08). IEEE Computer Society, Washington, 2008, 111–125..
Clifton C, Tassa T. On syntactic anonymity and differential privacy. Trans Data Privacy. 2013;6(2):161–83.
Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertainty, Fuzziness Knowledge-Based Syst. 2002;10(05):557–70.
Dondi R, Mauri G, Zoppis I. The l-diversity problem: tractability and approximability. Theor Comput Sci. 2013;511:159–71.
Truta TM, Campan A, Meyer P. Generating microdata with p-sensitive k-anonymity property. Berlin: Springer; 2007. p. 124–41.
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: privacy beyond k-anonymity. ACM Trans Knowled Discov Data (TKDD). 2007;1(1):3.
Li, Ninghui, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 2007.
Domingo-Ferrer, Josep, and Vicenç Torra. "A critique of k-anonymity and some of its enhancements." Availability, Reliability and Security, 2008. ARES 08. Third International Conference on. IEEE, 2008.
Bonizzoni, P, Gianluca Della Vedova, and Riccardo Dondi. "The k-anonymity problem is hard." Fundamentals of Computation Theory. Springer Berlin Heidelberg, 2009.
LeFevre, Kristen, David J. DeWitt, and Raghu Ramakrishnan. "Incognito: Efficient full-domain k-anonymity." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.
LeFevre, Kristen, David J. DeWitt, and Raghu Ramakrishnan. "Mondrian multidimensional k-anonymity." Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on. IEEE, 2006.
Liang, H, H Yuan. "On the complexity of t-closeness anonymization and related problems." In Database Systems for Advanced Applications, pp. 331-345. Springer Berlin Heidelberg, 2013.
Cao J, et al. SABRE: a sensitive attribute Bucketization and REdistribution framework for t-closeness. VLDB J. 2011;20(1):59–81.
Cynthia Dwork. Differential privacy: a survey of results. In Proceedings of the 5th international conference on Theory and applications of models of computation (TAMC'08), Manindra Agrawal, Dingzhu Du, Zhenhua Duan, and Angsheng Li (Eds.). Springer-Verlag, Berlin, Heidelberg, 2008, 1-19.
Dwork C. An ad omnia approach to defining and achieving private data analysis. In: Bonchi F, Ferrari E, Malin B, Saygin Y, editors. Proceedings of the 1st ACM SIGKDD international conference on privacy, security, and trust in KDD (PinKDD'07). Berlin, Heidelberg: Springer-Verlag; 2007. p. 1–13.
Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS '07). IEEE Computer Society, Washington, DC, USA, 94-103.
Soria-Comas, Jordi, and Josep Domingo-Ferrer. Differential privacy via t-closeness in data publishing. Privacy, Security and Trust (PST), 2013 Eleventh Annual International Conference on. IEEE, 2013.
Sarathy R, Muralidhar K. Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans Data Privacy. 2011;4(1):1–17.
Leoni, D. (2012), Non-interactive differential privacy: a survey., in Guillaume Raschia & Martin Theobald, ed., 'WOD' , ACM, , pp. 40-52.
G. Cormode, M. Procopiuc, D. Srivastava, and T. Tran. Differentially private publication of sparse data. In International Conference on Database Theory (ICDT), 2012.
Ohm P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 2010;57:1701.
Zayatz L. Disclosure avoidance practices and research at the US Census Bureau: an update. J Off Stat. 2007;23(2):253.
Klarreich, Erica. "Privacy by the Numbers: A New Approach to Safeguarding Data." Quanta Magazine. Quanta Magazine, 2012. Web. <https://www.quantamagazine.org/20121210-privacy-by-the-numbers-a-new-approach-to-safeguarding-data/>.
Chawla S, et al. Toward privacy in public databases. Berlin: Theory of Cryptography Springer; 2005. p. 363–85.
Roscorla, Tanya. “3 Student Data Privacy Bills That Congress Could Act On.” Center for Digital Education March 24, 2016, http://www.centerdigitaled.com/k-12/3-Student-Data-Privacy-Bills-That-Congress-Could-Act-On.html
Daries JP, Reich J, Waldo J, Young EM, Whittinghill J, Seaton DT, et al. Privacy, anonymity, and big data in the social sciences. Queue. 2014;12(7):30. 12 pages
Access to Classified Information, Executive Order #12968, August 4, 1995, http://www.fas.org/sgp/clinton/eo12968.html
Dana Priest and William M. Arkin, “A hidden world, growing beyond control,” Washington Post – Top Secret America, http://projects.washingtonpost.com/top-secret-america/
“White House orders review of 5 million security clearances,” Nov 22, 2013, https://www.rt.com/usa/clapper-demands-security-clearance-review-173/
Gentry C, Halevi S. Implementing Gentry's fully-homomorphic encryption scheme. In: Paterson KG, editor. Proceedings of the 30th annual international conference on theory and applications of cryptographic techniques: advances in cryptology (EUROCRYPT'11). Berlin: Springer-Verlag; 2011. p. 129–48.
Lindell Y, Pinkas B. Privacy preserving data mining. In: Bellare M, editor. Proceedings of the 20th annual international cryptology conference on advances in cryptology (CRYPTO '00). London: Springer-Verlag; 2000. p. 36–54.
Acknowledgements
We would like to thank Marjory Blumenthal and Rebecca Balebako for their detailed and thoughtful review of early drafts of this document. We are immensely grateful for their comments and feedback. Any errors contained herein are our own and should not be attributed to them.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Regulation, G.D.P., 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46. Official Journal of the European Union (OJ), 59, pp.1-88.
Rights and permissions
About this article
Cite this article
Davis, J.S., Osoba, O. Improving privacy preservation policy in the modern information age. Health Technol. 9, 65–75 (2019). https://doi.org/10.1007/s12553-018-0250-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-018-0250-6