A Semantic Information Loss Metric for Privacy Preserving Publication

  • Yu Liu
  • Ting Wang
  • Jianhua Feng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5982)


Data distortion is inevitable in privacy-preserving data publication and a lot of quality metrics have been proposed to measure the quality of anonymous data, where information loss metrics are popularly used. Most of existing information loss metrics, however, are non-semantic and hence are limited in reflecting the data distortion. Thus, the utility of anonymous data based on these metrics is constrained. In this paper, we propose a novel semantic information loss metric SILM, which takes into account the correlation among attributes. This new metric can capture the distortion more precisely than the state of art information loss metrics especially for the scenario where strong correlations exist among attributes. We evaluated the effect of SILM on data quality in terms of the accuracy of aggregate query answering and classification. Comprehensive experiments demonstrate that SILM can help improve the quality of anonymous data much more especially if integrated with proper anonymization algorithms.


k-anonymity information loss metric data distortion data utility 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 571–588 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Roberto, J., Bayardo Jr., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005, pp. 217–228 (2005)Google Scholar
  3. 3.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE 2006, p. 25 (2006)Google Scholar
  4. 4.
    Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: SIGKDD 2006, pp. 785–790 (2006)Google Scholar
  5. 5.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: SIGKDD 2006, pp. 277–286 (2006)Google Scholar
  6. 6.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE 2005, pp. 205–216 (2005)Google Scholar
  7. 7.
    Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: ICDE 2009, pp. 429–440 (2009)Google Scholar
  8. 8.
    Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD 2006, pp. 229–240 (2006)Google Scholar
  9. 9.
    Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: SIGMOD 2007, pp. 689–700 (2007)Google Scholar
  10. 10.
    Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB 2006, pp. 139–150 (2006)Google Scholar
  11. 11.
    Du, Y., Xia, T., Tao, Y., Zhang, D., Zhu, F.: On multidimensional k-anonymity with local recoding generalization. In: ICDE 2007, pp. 1422–1424 (2007)Google Scholar
  12. 12.
    Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proc. of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 153–162 (2006)Google Scholar
  13. 13.
    Li, J., Wong, R.C.W., Fu, A.W.C., Pei, J.: Anonymization by local recoding in data with attribute hierarchical taxonomies. IEEE Trans. Knowl. Data Eng. 20(9), 1181–1194 (2008)CrossRefGoogle Scholar
  14. 14.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD 2002, pp. 279–288 (2002)Google Scholar
  15. 15.
    Liu, Y., Lv, D., Ye, Y., Feng, J., Hong, Q.: Set-expression based method for effective privacy preservation. In: Proc. of the Ninth International Conference on Web-Age Information Management, pp. 325–332 (2008)Google Scholar
  16. 16.
    Newman, D., Hettich, S., Blake, C., Merz, C.: Uci repository of machine learning databases (1998),
  17. 17.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines (2001),
  18. 18.
    Witten, I.H., Frank, E.: Datamining:Practiacal machine learning tolls and techniques, 2nd edn. Prentice-Hall, Englewood Cliffs (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yu Liu
    • 1
  • Ting Wang
    • 1
  • Jianhua Feng
    • 1
  1. 1.Department of Computer Science and Technology Tsinghua National Laboratory for Information Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations