Skip to main content

Part of the book series: Studies in Big Data ((SBD,volume 28))

Abstract

Masking methods modify databases in order to avoid disclosure. This causes some information loss that can be quantified. In this chapter we discuss different alternatives to evaluate in what extent relevant information is lost. We give an overview of generic and specific information loss measures.

Farfar, får får får?

Nej, får får inte får,

får får vattenäpple.

Privacy-preserving Swedish proverb.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recall that we have discussed distances and metrics, as well as their properties in Sects. 5.4.7 and 5.6.1.

  2. 2.

    Commonality is the percentage of each variable that is explained by a principal component.

  3. 3.

    The factor scores stand for the factors that should multiply each variable in X to obtain its projection on each principal component.

  4. 4.

    Reference [34] has a similar use of the Hellinger distance for comparing tables, but for tabular data protection.

References

  1. Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 91–110 (2001)

    Google Scholar 

  2. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 111–134 (2001)

    Google Scholar 

  3. Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70(10), 892–921 (2011)

    Article  Google Scholar 

  4. Rebollo-Monedero, D., Forné, J., Pallarés, E., Parra-Arnau, J.: A modification of the Lloyd algorithm for \(k\)-anonymous quantization. Inf. Sci. 222, 185–202 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55, 3232–3243 (2011)

    Article  MathSciNet  Google Scholar 

  6. Liu, L., Wang, J., Zhang, J.: Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In: IEEE ICDM Workshops (2008)

    Google Scholar 

  7. Muralidhar, K., Sarathy, R.: An enhanced data perturbation approach for small data sets. Decis. Sci. 36(3), 513–529 (2005)

    Article  Google Scholar 

  8. Kim, J., Winkler, W.: Multiplicative noise for masking continuous data, U.S. Bureau of the Census, RR2003/01 (2003)

    Google Scholar 

  9. Carlson, M., Salabasis, M.: A data swapping technique using ranks: a method for disclosure control. Res. Off. Stat. 5(2), 35–64 (2002)

    Google Scholar 

  10. Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)

    Google Scholar 

  11. Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Stat. Sinica 20, 405–421 (2010)

    MathSciNet  MATH  Google Scholar 

  12. Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans. Data Priv. 1, 105–130 (2008)

    MathSciNet  Google Scholar 

  13. Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Dataase Syst. 9(1), 20–37 (1984)

    Article  MATH  Google Scholar 

  14. Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)

    Article  Google Scholar 

  15. Hajian, S., Azgomi, M.A.: A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation. IEEE (2008)

    Google Scholar 

  16. Bapna, S., Gangopadhyay, A.: A wavelet-based approach to preserve privacy for classification mining. Decis. Sci. 37(4), 623–642 (2006)

    Article  Google Scholar 

  17. Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15, 293–315 (2006)

    Article  Google Scholar 

  18. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)

    Google Scholar 

  19. Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, vol. 2, pp. 807–826. Eurostat (2001)

    Google Scholar 

  20. Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)

    Article  Google Scholar 

  21. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152 (2002)

    Google Scholar 

  22. Trottini, M.: Decision models for data disclosure limitation, Ph.D. Dissertation, Carnegie Mellon University (2003)

    Google Scholar 

  23. Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)

    Article  MathSciNet  Google Scholar 

  24. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the PODS 2001, pp. 247–255 (2001)

    Google Scholar 

  25. Torra, V., Carlson, M.: On the Hellinger distance for measuring information loss in microdata, UNECE/Eurostat Work Session on Statistical Confidentiality, 8th Work Session 2013, Ottawa, Canada (2013)

    Google Scholar 

  26. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  27. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)

    Article  Google Scholar 

  28. Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)

    Article  Google Scholar 

  29. Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25(5), 1191–1195 (2013)

    Article  Google Scholar 

  30. Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Soft. 100, 15–26 (2015)

    Article  Google Scholar 

  31. Cheng, L., Cheng, S., Jiang, F.: ADKAM: A-diversity k-anonymity model via microaggregation. In: Proceedings of the ISPEC 2015. LNCS, vol. 9065, pp. 533–547 (2015)

    Google Scholar 

  32. Salari, M., Jalili, S., Mortazavi, R.: TBM, a transformation based method for microaggregation of large volume mixed data. Data Min. Knowl. Discov. (2016, in press). doi:10.1007/s10618-016-0457-y.

  33. Gomatam, S., Karr, A.F., Sanil, A.P.: Data swapping as a decision problem. J. Off. Stat. 21(4), 635–655 (2005)

    Google Scholar 

  34. Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015)

    Google Scholar 

  35. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)

    Book  MATH  Google Scholar 

  36. Torra, V.: Progress report on record linkage for risk assessment. DwB project, Deliverable 11.3 (2014)

    Google Scholar 

  37. Torra, V.: On information loss measures for categorical data, Report 3, Ottilie Project (2000)

    Google Scholar 

  38. Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Proceedings of the EDBT, pp. 183–199 (2004)

    Google Scholar 

  39. Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29, 875–890 (2010)

    Article  Google Scholar 

  40. Sakuma, J.: Recommendation based on k-anonymized ratings. Arxiv (2017)

    Google Scholar 

  41. Torra, V., Navarro-Arribas, G.: Integral privacy. In: Proceedings of the CANS 2016. LNCS, vol. 10052, pp. 661–669 (2016)

    Google Scholar 

  42. Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Unc. Fuzz. Knowl. Based Syst. 16(1), 107–120 (2008)

    Article  Google Scholar 

  43. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)

    Article  Google Scholar 

  44. Torra, V.: On the definition of cluster-specific information loss measures. In: Solanas, A., Martínez-Ballesté, A. (eds.) Advances in Artificial Intelligence for Privacy Protection and Security, pp. 145–163. World Scientific (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Torra, V. (2017). Information Loss: Evaluation and Measures. In: Data Privacy: Foundations, New Developments and the Big Data Challenge. Studies in Big Data, vol 28. Springer, Cham. https://doi.org/10.1007/978-3-319-57358-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57358-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57356-4

  • Online ISBN: 978-3-319-57358-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics