A Framework for Privacy Quantification: Measuring the Impact of Privacy Techniques Through Mutual Information, Distance Mapping, and Machine Learning
- 112 Downloads
In this paper, we propose to investigate how the effects of privacy techniques can be practically assessed in the specific context of data anonymization, and present some possible tools for measuring the effects of such anonymization. We develop an approach using mutual information for measuring the information content in any dataset, including over non-Euclidean data spaces, by means of mapping non-Euclidean distances to a Euclidean space. We further evaluate the proposed approach over toy datasets composed of timestamped GPS traces, and attempt to quantify the information content loss created by three state-of-the-art anonymization approaches. The results allow for an objective quantification of the effects of the k-anonymity and differential privacy algorithms, and illustrate on the toy data used, that such privacy techniques have very non-linear effects on the information content of the data.
KeywordsDistance mapping Non-Euclidean data Data privacy Privacy quantification Mutual information
This work was supported by the research from SCOTT project. SCOTT (http://www.scott-project.eu) has received funding from the Electronic Component Systems for European Leadership Joint Undertaking under grant agreement no. 737422. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and Austria, Spain, Finland, Ireland, Sweden, Germany, Poland, Portugal, Netherlands, Belgium, and Norway.
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflicts of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 1.Abramowitz M. Handbook of mathematical functions, with formulas, graphs, and mathematical tables. New York: Dover Publications; 1974.Google Scholar
- 4.Belghazi M I, Baratin A, Rajeswar S, Ozair S, Bengio Y, Courville A, Hjelm RD. 2018. MINE: mutual information neural estimation. arXiv:1801.04062 [cs, stat]. 00003.
- 6.European Commission. 2012. European Commission’s press release announcing the proposed comprehensive reform of data protection rules, 25 January.Google Scholar
- 7.Cover TM, Thomas JA. Elements of information theory (Wiley series in telecommunications and signal processing). New York: Wiley-Interscience; 2006.Google Scholar
- 11.Dwork C. Differential privacy. Berlin: Springer; 2006, pp. 1–12.Google Scholar
- 12.Dwork C. Differential privacy: a survey of results. Theory and applications of models of computation, volume 4978 of Lecture Notes in Computer Science. Berlin: Springer; 2008. p. 1–19.Google Scholar
- 13.Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. Berlin: Springer; 2006, pp. 265–84.Google Scholar
- 14.EU. 2000. 2000/520/EC: Commission Decision of 26 July 2000 pursuant to Directive 95/46/EC of the European Parliament and of the Council on the adequacy of the protection provided by the safe harbour privacy principles and related frequently asked questions issued by the US Department of Commerce (notified under document number C(2000) 2441) (Text with EEA relevance.)Google Scholar
- 15.François D. 2008. High-dimensional data analysis: optimal metrics and feature selection. VDM Verlag, 01.Google Scholar
- 17.Hafiz M. A collection of privacy design patterns. Proceedings of the 2006 conference on pattern languages of programs, PLoP ’06. New York: ACM; 2006. p. 7:1–13.Google Scholar
- 18.Holmes C, Nemenman I. Progress in estimation of mutual information for real-valued data. Bulletin of the American Physical Society; 2018.Google Scholar
- 21.The Information Commissioner’s Office (UK). Direct marketing: data protection act privacy and electronic communications regulations, 24 November 2013. Version 1.1.Google Scholar
- 23.Lauren P, Qu G, Yang J, Watta P, Huang G-B, Lendasse A. 2018. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cognit Comput. 1–14. 00000.Google Scholar
- 24.Li N, Li T. t-closeness: privacy beyond κ-anonymity and ℓ-diversity. Proceedings of IEEE 23rd international conference on data engineering (ICDE’07); 2007.Google Scholar
- 26.Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M. ℓ-diversity: privacy beyond κ-anonymity. 2013 IEEE 29th international conference on data engineering (ICDE); 2006. p. 24.Google Scholar
- 27.Mahmud M, Kaiser M S, Hussain A, Vassanelli S. 2017. Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst. PP. 00004.Google Scholar
- 28.Miche Y, Oliver I, Holtmanns S, Akusok A, Lendasse A, Björk K-M. On mutual information over non-Euclidean Spaces, data mining and data privacy levels. Cham: Springer International Publishing; 2016, pp. 371–83.Google Scholar
- 29.Miche Y, Oliver I, Holtmanns S, Kalliola A, Akusok A, Lendasse A, Björk K-M. Data anonymization as a vector quantization problem: control over privacy for health data. Availability, reliability, and security in information systems, Lecture Notes in Computer Science. Cham: Springer; 2016. p. 193– 203.Google Scholar
- 30.Miche Y, Oliver I, Ren W, Holtmanns S, Akusok A, Lendasse A. Practical estimation of mutual information on non-Euclidean spaces. Machine learning and knowledge extraction. Cham: Springer; 2017. p. 123–36.Google Scholar
- 35.Oliver I. Privacy engineering: a data flow and ontological approach. CreateSpace Independent Publishing, July 2014. 978-1497569713.Google Scholar
- 36.Pál D, Póczos B, Szepesvári C. Estimation of rényi entropy and mutual information based on generalized nearest-neighbor graph. Advances in neural information processing systems; 2010. p. 1849–57.Google Scholar
- 37.Rao C R, Mitra S K. 1971. Generalized inverse of matrices and its applications.Google Scholar
- 38.Reed J, Pierce BC. Distance makes the types grow stronger: a calculus for differential privacy. ACM SIGPLAN international conference on functional programming (ICFP), Baltimore; 2010.Google Scholar
- 43.Ustaran E, editor. European Privacy: Law and Practice for Data Protection Professionals. An IAPP Publication, 2012. 978-0-9795901-5-3.Google Scholar
- 49.Zhang Y, Zhou G, Jin J, Zhao Q, Wang X, Cichocki A. Sparse Bayesian classification of EEG for brain-computer interface. IEEE Trans Neural Netw Learn Syst 2015;27:1–1. 00058.Google Scholar