Maximizing Privacy under Data Distortion Constraints in Noise Perturbation Methods

  • Yaron Rachlin
  • Katharina Probst
  • Rayid Ghani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5456)


This paper introduces the ‘guessing anonymity,’ a definition of privacy for noise perturbation methods. This definition captures the difficulty of linking identity to a sanitized record using publicly available information. Importantly, this definition leads to analytical expressions that bound data privacy as a function of the noise perturbation parameters. Using these bounds, we can formulate optimization problems to describe the feasible tradeoffs between data distortion and privacy, without exhaustively searching the noise parameter space. This work addresses an important shortcoming of noise perturbation methods, by providing them with an intuitive definition of privacy analogous to the definition used in k-anonymity, and an analytical means for selecting parameters to achieve a desired level of privacy. At the same time, our work maintains the appealing aspects of noise perturbation methods, which have made them popular both in practice and as a subject of academic research.


Noise perturbation privacy anonymity statistical disclosure control 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barbaro, M., Zeller, T.: A face is exposed for aol searcher no. 4417749, New York Times (August 9, 2006)Google Scholar
  2. 2.
    Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: A general survey of privacy-preserving data mining models and algorithms. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: ACM SIGMOD Conference (2000)Google Scholar
  5. 5.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining. In: ACM PODS Conference (2002)Google Scholar
  6. 6.
    Muralidhar, K., Sarathy, R.: Security of random data perturbation methods. ACM Trans. Database Syst. 24(4) (1999)Google Scholar
  7. 7.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of the IEEE Symposium on Research in Security and Privacy (1998)Google Scholar
  8. 8.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: IEEE International Conference on Data Engineering, pp. 217–228 (2005)Google Scholar
  9. 9.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: ACM SIGMOD (2005)Google Scholar
  10. 10.
    Aggarwal, C.C.: On randomization, public information, and the curse of dimensionality. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  11. 11.
    Torra, V., Abowd, J., Domingo-Ferrer, J.: Using mahalanobis distance-based record linkage for disclosure risk assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies, pp. 111–133. Elsevier, Amsterdam (2001)Google Scholar
  13. 13.
    Arikan, E.: An inequality on guessing and its application to sequential decoding. IEEE Transactions on Information Theory 42(1), 99–105 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Aggarwal, C.C.: On unifying privacy and uncertain data models. In: IEEE International Conference on Data Engineering (2008)Google Scholar
  15. 15.
    Renyi, A.: On measures of entropy and information. In: 4th Berkeley Symposium on Mathematical Statistics and Probability (1961)Google Scholar
  16. 16.
    Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. J. Official Statistics 2(3), 329–336 (1986)Google Scholar
  17. 17.
    Massey, J.L.: Guessing and entropy. In: IEEE Symposium on Information Theory (1994)Google Scholar
  18. 18.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 571–588 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Asuncion, A., Newman, D.: UCI machine learning repository adult dataset (2007)Google Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniquesGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Yaron Rachlin
    • 1
  • Katharina Probst
    • 2
  • Rayid Ghani
    • 1
  1. 1.Accenture Technology LabsChicagoUSA
  2. 2.Google Inc.AtlantaUSA

Personalised recommendations