Information Loss: Evaluation and Measures

Torra, Vicenç

doi:10.1007/978-3-319-57358-8_7

Vicenç Torra³

Part of the book series: Studies in Big Data ((SBD,volume 28))

2315 Accesses
1 Citations

Abstract

Masking methods modify databases in order to avoid disclosure. This causes some information loss that can be quantified. In this chapter we discuss different alternatives to evaluate in what extent relevant information is lost. We give an overview of generic and specific information loss measures.

Farfar, får får får?

Nej, får får inte får,

får får vattenäpple.

Privacy-preserving Swedish proverb.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recall that we have discussed distances and metrics, as well as their properties in Sects. 5.4.7 and 5.6.1.
2.
Commonality is the percentage of each variable that is explained by a principal component.
3.
The factor scores stand for the factors that should multiply each variable in X to obtain its projection on each principal component.
4.
Reference [34] has a similar use of the Hellinger distance for comparing tables, but for tabular data protection.

References

Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 91–110 (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 111–134 (2001)
Google Scholar
Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70(10), 892–921 (2011)
Article Google Scholar
Rebollo-Monedero, D., Forné, J., Pallarés, E., Parra-Arnau, J.: A modification of the Lloyd algorithm for \(k\)-anonymous quantization. Inf. Sci. 222, 185–202 (2013)
Article MathSciNet MATH Google Scholar
Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55, 3232–3243 (2011)
Article MathSciNet Google Scholar
Liu, L., Wang, J., Zhang, J.: Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In: IEEE ICDM Workshops (2008)
Google Scholar
Muralidhar, K., Sarathy, R.: An enhanced data perturbation approach for small data sets. Decis. Sci. 36(3), 513–529 (2005)
Article Google Scholar
Kim, J., Winkler, W.: Multiplicative noise for masking continuous data, U.S. Bureau of the Census, RR2003/01 (2003)
Google Scholar
Carlson, M., Salabasis, M.: A data swapping technique using ranks: a method for disclosure control. Res. Off. Stat. 5(2), 35–64 (2002)
Google Scholar
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)
Google Scholar
Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Stat. Sinica 20, 405–421 (2010)
MathSciNet MATH Google Scholar
Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans. Data Priv. 1, 105–130 (2008)
MathSciNet Google Scholar
Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Dataase Syst. 9(1), 20–37 (1984)
Article MATH Google Scholar
Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
Article Google Scholar
Hajian, S., Azgomi, M.A.: A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation. IEEE (2008)
Google Scholar
Bapna, S., Gangopadhyay, A.: A wavelet-based approach to preserve privacy for classification mining. Decis. Sci. 37(4), 623–642 (2006)
Article Google Scholar
Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15, 293–315 (2006)
Article Google Scholar
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, vol. 2, pp. 807–826. Eurostat (2001)
Google Scholar
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)
Article Google Scholar
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152 (2002)
Google Scholar
Trottini, M.: Decision models for data disclosure limitation, Ph.D. Dissertation, Carnegie Mellon University (2003)
Google Scholar
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)
Article MathSciNet Google Scholar
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the PODS 2001, pp. 247–255 (2001)
Google Scholar
Torra, V., Carlson, M.: On the Hellinger distance for measuring information loss in microdata, UNECE/Eurostat Work Session on Statistical Confidentiality, 8th Work Session 2013, Ottawa, Canada (2013)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)
Article Google Scholar
Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25(5), 1191–1195 (2013)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Soft. 100, 15–26 (2015)
Article Google Scholar
Cheng, L., Cheng, S., Jiang, F.: ADKAM: A-diversity k-anonymity model via microaggregation. In: Proceedings of the ISPEC 2015. LNCS, vol. 9065, pp. 533–547 (2015)
Google Scholar
Salari, M., Jalili, S., Mortazavi, R.: TBM, a transformation based method for microaggregation of large volume mixed data. Data Min. Knowl. Discov. (2016, in press). doi:10.1007/s10618-016-0457-y.
Gomatam, S., Karr, A.F., Sanil, A.P.: Data swapping as a decision problem. J. Off. Stat. 21(4), 635–655 (2005)
Google Scholar
Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015)
Google Scholar
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Book MATH Google Scholar
Torra, V.: Progress report on record linkage for risk assessment. DwB project, Deliverable 11.3 (2014)
Google Scholar
Torra, V.: On information loss measures for categorical data, Report 3, Ottilie Project (2000)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Proceedings of the EDBT, pp. 183–199 (2004)
Google Scholar
Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29, 875–890 (2010)
Article Google Scholar
Sakuma, J.: Recommendation based on k-anonymized ratings. Arxiv (2017)
Google Scholar
Torra, V., Navarro-Arribas, G.: Integral privacy. In: Proceedings of the CANS 2016. LNCS, vol. 10052, pp. 661–669 (2016)
Google Scholar
Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Unc. Fuzz. Knowl. Based Syst. 16(1), 107–120 (2008)
Article Google Scholar
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Article Google Scholar
Torra, V.: On the definition of cluster-specific information loss measures. In: Solanas, A., Martínez-Ballesté, A. (eds.) Advances in Artificial Intelligence for Privacy Protection and Security, pp. 145–163. World Scientific (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Skövde, Skövde, Sweden
Vicenç Torra

Authors

Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicenç Torra .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Torra, V. (2017). Information Loss: Evaluation and Measures. In: Data Privacy: Foundations, New Developments and the Big Data Challenge. Studies in Big Data, vol 28. Springer, Cham. https://doi.org/10.1007/978-3-319-57358-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-57358-8_7
Published: 18 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57356-4
Online ISBN: 978-3-319-57358-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics