Abstract
Masking methods modify databases in order to avoid disclosure. This causes some information loss that can be quantified. In this chapter we discuss different alternatives to evaluate in what extent relevant information is lost. We give an overview of generic and specific information loss measures.
Farfar, får får får?
Nej, får får inte får,
får får vattenäpple.
Privacy-preserving Swedish proverb.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Commonality is the percentage of each variable that is explained by a principal component.
- 3.
The factor scores stand for the factors that should multiply each variable in X to obtain its projection on each principal component.
- 4.
Reference [34] has a similar use of the Hellinger distance for comparing tables, but for tabular data protection.
References
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 91–110 (2001)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 111–134 (2001)
Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70(10), 892–921 (2011)
Rebollo-Monedero, D., Forné, J., Pallarés, E., Parra-Arnau, J.: A modification of the Lloyd algorithm for \(k\)-anonymous quantization. Inf. Sci. 222, 185–202 (2013)
Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55, 3232–3243 (2011)
Liu, L., Wang, J., Zhang, J.: Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In: IEEE ICDM Workshops (2008)
Muralidhar, K., Sarathy, R.: An enhanced data perturbation approach for small data sets. Decis. Sci. 36(3), 513–529 (2005)
Kim, J., Winkler, W.: Multiplicative noise for masking continuous data, U.S. Bureau of the Census, RR2003/01 (2003)
Carlson, M., Salabasis, M.: A data swapping technique using ranks: a method for disclosure control. Res. Off. Stat. 5(2), 35–64 (2002)
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)
Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Stat. Sinica 20, 405–421 (2010)
Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans. Data Priv. 1, 105–130 (2008)
Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Dataase Syst. 9(1), 20–37 (1984)
Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
Hajian, S., Azgomi, M.A.: A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation. IEEE (2008)
Bapna, S., Gangopadhyay, A.: A wavelet-based approach to preserve privacy for classification mining. Decis. Sci. 37(4), 623–642 (2006)
Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15, 293–315 (2006)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, vol. 2, pp. 807–826. Eurostat (2001)
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152 (2002)
Trottini, M.: Decision models for data disclosure limitation, Ph.D. Dissertation, Carnegie Mellon University (2003)
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the PODS 2001, pp. 247–255 (2001)
Torra, V., Carlson, M.: On the Hellinger distance for measuring information loss in microdata, UNECE/Eurostat Work Session on Statistical Confidentiality, 8th Work Session 2013, Ottawa, Canada (2013)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)
Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25(5), 1191–1195 (2013)
Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Soft. 100, 15–26 (2015)
Cheng, L., Cheng, S., Jiang, F.: ADKAM: A-diversity k-anonymity model via microaggregation. In: Proceedings of the ISPEC 2015. LNCS, vol. 9065, pp. 533–547 (2015)
Salari, M., Jalili, S., Mortazavi, R.: TBM, a transformation based method for microaggregation of large volume mixed data. Data Min. Knowl. Discov. (2016, in press). doi:10.1007/s10618-016-0457-y.
Gomatam, S., Karr, A.F., Sanil, A.P.: Data swapping as a decision problem. J. Off. Stat. 21(4), 635–655 (2005)
Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Torra, V.: Progress report on record linkage for risk assessment. DwB project, Deliverable 11.3 (2014)
Torra, V.: On information loss measures for categorical data, Report 3, Ottilie Project (2000)
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Proceedings of the EDBT, pp. 183–199 (2004)
Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29, 875–890 (2010)
Sakuma, J.: Recommendation based on k-anonymized ratings. Arxiv (2017)
Torra, V., Navarro-Arribas, G.: Integral privacy. In: Proceedings of the CANS 2016. LNCS, vol. 10052, pp. 661–669 (2016)
Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Unc. Fuzz. Knowl. Based Syst. 16(1), 107–120 (2008)
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Torra, V.: On the definition of cluster-specific information loss measures. In: Solanas, A., Martínez-Ballesté, A. (eds.) Advances in Artificial Intelligence for Privacy Protection and Security, pp. 145–163. World Scientific (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Torra, V. (2017). Information Loss: Evaluation and Measures. In: Data Privacy: Foundations, New Developments and the Big Data Challenge. Studies in Big Data, vol 28. Springer, Cham. https://doi.org/10.1007/978-3-319-57358-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-57358-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57356-4
Online ISBN: 978-3-319-57358-8
eBook Packages: EngineeringEngineering (R0)